EasyTextPrint

if 0 {


Summary

HJG 2016-02-02: This is another attempt at quick, easy, ad-hoc printing of plain textfiles.

I often have some informations in a textfile, and if I need to print that, I want a nice looking page,
e.g. with a few headers, some text in bold etc., but I don't want to use an 'Office'-textprocessor for that.

The idea is to convert that textfile to a html-file, then print that with the webbrowser.

The basic operation of that converter is to copy the inputfile x.txt to x.html,
add some lines like "<html>" "<head>", "<body>" etc.,
then wrap the first line of text in <h1>-tags to get a big header,
and put the rest of the textlines in <pre>- or <p>-tags.
Also, replace some chars (like &, <, >) with html-entities.

Then drop the resulting file x.html into the webbrowser and do print-preview / print.
Or, with a fixed location for the output-file, use a bookmark in the browser.

Add some CSS to taste, and extent the converters "basic operation"
to cover more markup (headers, lists, etc.), as need arises.

There are some programs available that work like that, e.g. Markdown.
But Markdown uses Perl, and I want even more minimal markup.


With ideas and code from the following pages:

}


Code

 # EasyTextPrint012.tcl - HaJo Gurt - 2016-02-14
 # https://wiki.tcl-lang.org/42409

  set      progVersion "EasyTextPrint v0.12"
  puts "# $progVersion"

  set fn1 "Todo.txt"
  set fn1 "City.txt"

  set msg "$progVersion - Select inputfile" 
  set fn1 [tk_getOpenFile -title $msg \
              -filetypes {{TEXT .txt} {"All files" *}} \
              -defaultextension .txt  -initialfile $fn1]

  set fn2 "EasyTextPrint.html"

  set LineNr  0;        # count non-comment lines from inputfile
  set Skip    0
  set Title   "EasyTextPrint"
  set Cmd     "Hdr"
  set H       1
  set Prev    "H";
  set Default "p";      # Default: wrap inputline in paragraph-tags


  catch {console show}
  catch {wm withdraw .}

  proc e {} { exit }
  proc q {} { exit }

#---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+---

  proc repl {T S1 S2} {
    set p1 [string first $S1 $T]
    set p2 [expr { $p1 + [string length $S1] -1 } ]
    set T2 [string replace $T $p1 $p2 $S2]
    return $T2
  }

  proc tagReplace {T0 S1 S2 S3} {
  #: change "**bold**" to "<b>bold</b>", etc.

    set p1 [string first $S1 $T0]
    set p9 [string last  $S1 $T0]

    if {$p1==$p9} {return $T0};     # only 1 tag found
    incr p9 -1
    if {$p1==$p9} {return $T0}

    set T1 [repl $T0 $S1 $S2 ]
    set T2 [repl $T1 $S1 $S3 ]

    incr ::Changes
    return $T2
  }

  proc Out {T} {
    puts $::fh2 $T
  }

  proc Head {T} {
    Out "<!DOCTYPE HTML>"
    Out "<html>"
    Out "<HEAD>"
    Out "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />"

    Out "<style type=\"text/css\"> "
    Out "* {"
    Out " margin:       0;"
    Out " margin-left: 10px;"
    Out " padding:      0; }"
    Out "body {"
    Out " background:  silver;"
   #Out " font-family: Verdana, Helvetica, sans-serif;"
    Out " font-family: \"DejaVu Sans\", Helvetica, sans-serif;"
    Out " font-size:   12px; }"
    Out "h1,h2,h3,h4,h5,h6,p,ul,li,hr,blockquote {"
    Out " padding:        1px;"
    Out " background:     #eeEEee; } "
    Out "h1 { background: #ffFF80; text-align: center; } "
    Out "h2 { background: #80FFFF; text-decoration: underline; } "
    Out "h3 { background: #80FF80; } "
    Out "h4 { background: #FF8080; } "
    Out "li { margin-left: 13px; }"
    Out "blockquote {"
    Out " margin-top: 2px; margin-right: 16px; margin-bottom: 2px; margin-left: 24px; }"
    Out "code,kbd {"
    Out " font-family: \"Lucida Console\", \"DejaVu Sans Mono\", monospace;"
    Out " font-size:   10px; background: orange; }"
   #Out " ...more style-css..."
    Out "</style>"

    Out "<TITLE>$T</TITLE>"
    Out "</HEAD>\n" ;
  }

  proc Footer {} {
    Out "</BODY>"
    Out "</HTML>"
  }

#---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+---

  proc main {} {
  #...
    return
  }

  puts "# Read file $fn1 ..."
  if {![file exists $fn1] || [catch { set fh1 [open $fn1 r] } ] } {
      puts "# Error: open $fn1"
      return 1
  }

  puts "# Write to file $fn2 ..."
  set fh2 [open $fn2 w]        ;# w / w+

  update 

  set i 0
  while {![chan eof $fh1]} {   ;# needs Tcl 8.5
      gets $fh1 line
      incr i 1
     #puts "$i $Cmd : ($line)";  ##

      set c1 [string index $line 0]
      set c2 [string range $line 0 1]
      set c4 [string range $line 0 3]

      if {$c4 eq "##__"} { set Cmd "EOF"; break; }

      if {$c4 eq "##++"} { set Skip 1; continue; };   # Skip following text / Stop printing
      if {$c4 eq "##(("} { set Skip 1; continue; };   #
      if {$c4 eq "##--"} { set Skip 0 };              # Continue printing
      if {$c4 eq "##))"} { set Skip 0 };              #
      if {$Skip > 0}     { continue };

      if {$c4 eq "##::"} { set Default [string range $line 4 end] };

      if {$line eq ""  } { Out $line; continue };
      if {$c1   eq "\t"} { Out $line; continue };

      if {$c4 eq "##!!"} {
          Out "<DIV style=\"page-break-after:always\"></DIV>";
          set Cmd "FF"; continue
      };

      if {$c1 eq "#"   } { continue };                # comment

      if {$line eq "_" } { Out "&nbsp;"; continue };

      if {$c2 eq "^^"  } { set ::H 1; set Cmd "Hdr"; continue }
      if {$c2 eq "=="  } { set ::H 2; set Cmd "Hdr"; continue }
      if {$c2 eq "--"  } { set ::H 3; set Cmd "Hdr"; continue }

      set line [string map {< &lt;     > &gt;     & &amp;   – &dash;  } $line]
      set line [string map {Ä &Auml;   Ö &Ouml;   Ü &Uuml;  ² &sup2;  } $line]
      set line [string map {ä &auml;   ö &ouml;   ü &uuml;  ß &szlig; } $line]
      set line [string map {é &eacute; è &egrave; ê &ecirc; ç &ccedil;} $line]

      if {$LineNr == 0} {
          Head $line
          Out "<BODY>"
      }

      set Changes 1
      while {$Changes>0} {
        set Changes 0
        set line [tagReplace $line "**" "<B>" "</B>"]
        set line [tagReplace $line "//" "<I>" "</I>"]
        set line [tagReplace $line "__" "<U>" "</U>"]
        set line [tagReplace $line "%%" "<center>" "</center>"]
      }

      set line1 [string range $line 1 end ]
      if {$c1 eq " "} { Out "<PRE>$line1</PRE>";  continue };

      if {$Cmd eq ""} {
        if {$c1 eq "*"} { Out "<UL><LI>$line1</LI></UL>"; continue };
        if {$c1 eq ">"} { Out "<BLOCKQUOTE>[string range $line 4 end ]</BLOCKQUOTE>"; continue};

      } else {

        if {$Cmd eq "Hdr"} {
          if {$Prev ne "H"} { Out "<hr>\n" }

          Out "<H$H>$line</H$H>"
          incr LineNr;
          set Prev "H"
          set Cmd ""
          continue
        }
      }

    #Out "<P>$line</P>";                  # Default: wrap inputline in paragraph-tags
     Out "<$Default>$line</$Default>";

     set Prev "P"

  }; # while

  if {$LineNr > 0} {
    Footer
  }
  close $fh1
  close $fh2

  puts "# Output written to file: $fn2"
  puts "# Done."
 #exit

#---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+---
#.

Code - awk

I did a first prototype of this program using awk, and this script already has the 'basics' plus a few additional features implemented:

#!/usr/bin/awk -f
# txt2html.awk - gurt.gmx@de - 2016-02-15
#
#: Read plain text, output as html, marked up for printing via webbrowser

#: Markup - String at start of line determines type of header in next line:
#  ^^ H1-header in next line (implicit just before first line of inputfile)
#  == H2-header in next non-comment, non-blank line
#  -- H3-header in next non-comment, non-blank line

# Usage:
#   gawk -f txt2htm.awk  Tel.txt
#   gawk -f txt2htm.awk City.txt > City.html

# See also: https://css-tricks.com/almanac/properties/p/page-break/

#
#-##+####1####+####2####+####3####+####4####+####5####+####6####+####7####+###
#
  function chr(c) \
  {
    return sprintf( "%c", c+0 );  # make c numeric by adding 0
  }

  BEGIN           { Q1  = "'"; Q2  = "\"";  # Quotes
                    A   = "\\&";
                    LineNr = 0;
                    Skip   = 0
                    Title  = "EasyTextPrint"
                    Cmd    = "Hdr";
                    H      = 1;
                    Prev   = "H";
                  }

  function Head(T) \
  {
                    print("<!DOCTYPE HTML>")
                    print("<html>")
                    print("<HEAD>")
                    print("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />")

                    print("<style type=\"text/css\"> ")
                    print("* {")
                    print(" margin:       0;")
                    print(" margin-left: 10px;")
                    print(" padding:      0; }")
                    print("body {")
                    print(" background:  silver;")
                    print(" font-family: verdana, helvetica, sans-serif;")
                    print(" font-size:   12px;")
                    print("}")
                    print("h1,h2,h3,h4,h5,p,ul,li,hr {")
                    print(" padding:        1px;")
                    print(" background:     #eeEEee; } ")
                    print("h1 { background: #ffFF80; text-align: center; } ")
                    print("h2 { background: #80FFFF; text-decoration: underline; } ")
                    print("h3 { background: #80FF80; } ")
                    print("h4 { background: #FF8080; } ")
                   #print(" ...more style-css...")
                    print("</style>")

                    print("<TITLE>" T "</TITLE>")
                    print("</HEAD>\n");
    return
  }

  /^##__/         { exit }
  /^##!!/         { print "<DIV style=\"page-break-after:always\"></DIV>"; Cmd="FF"; next }

  /^_$/           { print "&nbsp;"; next }

  /^##\++/        { Skip=1 }   ##++ skip
  /^##--/         { Skip=0 }   ##--Start--
   Skip>0         { next }

  /^#/            { next }

  /^\^+/          { Cmd = "Hdr"; H=1; next }
  /^==/           { Cmd = "Hdr"; H=2; next }
  /^--/           { Cmd = "Hdr"; H=3; next }

  NF<1            { print; next }

                  { gsub( "&", A"amp;"); }
                  { gsub( "<", A"lt;" ); }
                  { gsub( ">", A"gt;" ); }

                  { gsub( "Ä", A"Auml;" ); }
                  { gsub( "Ö", A"Ouml;" ); }
                  { gsub( "Ü", A"Uuml;" ); }
                  { gsub( "ä", A"auml;" ); }
                  { gsub( "ö", A"ouml;" ); }
                  { gsub( "ü", A"uuml;" ); }
                  { gsub( "ß", A"szlig;"); }

                  { gsub( "²", A"sup2;");  }        
                  { gsub( "–", A"dash;");  }
                 #{ gsub( "-", A"ndash;"); }    #

                  { sub( "[*][*]", "<B>");  }
                  { sub( "[*][*]", "</B>"); }

                  { sub( "//", "<I>");  }
                  { sub( "//", "</I>"); }

                  { sub( "__", "<U>");  }
                  { sub( "__", "</U>"); }

                  { sub( "%%", "<center>");  }   # ^^
                  { sub( "%%", "</center>"); }

  /^ /            { print("<pre>" $0 "</pre>" ); next }


  LineNr==0       { Title = $0;
                    LineNr++;
                    Head(Title);
                    print("<BODY>");
                   #next
                  }

  Cmd=="Hdr"      { Hdr = $0; LineNr++; Cmd="";
                    if (Prev!="H") { print("<hr>\n"); }        
                    print("<h" H ">" Hdr "</h" H ">");      # H1..H3
                    Prev="H";
                    next
                  }

  /^\*/           { T = $0;
                    T = substr( $0,2 );
                    print("<UL><LI>" T "</LI></UL>" ); Prev="u"; next
                  }

                  { print("<p>" $0 "</p>" ); Prev="p"; next }
#                 { print }

  END             { # print "# Done."
                    print("</BODY>")
                    print("</html>")
                  }
#.

Input

This is an example of a plain textfile used as input.

It will show pretty much all features implemented for now, along with some of the more common special chars.

With the 'slimlined' CSS above, the result should be 2 printed pages
(DIN A4, with margins set at 10mm left and right, and at 6 mm for top and bottom).

# comment - This is the file: City.txt
# 2* H1-header:
Großstädte in Deutschland
^^
Kommunalverband besonderer Art 

==

##++ skip: don't print the following lines of text, until reaching a line starting with "##--"

# Test1:
==
Test-H2
Umlaute: < ÄÖÜ & äöüß >
Textstyle: **bold** //italic// __underline__ **bold2** //italic2// __underline2__ *** ///
--
Test-H3
Text-Paragraph
Text=P
 Text-Pre
 Text=Pre
* Text-UL
* Text=UL
> Text-BQ
--
Jäger, Müller & Förster GmbH & Co. KG 
Erzhäuser Straße. 90, 88662 Überlingen
Tel. 07773 74 75 76
Internet: www.nospam.de - [email protected]
--
Lorem ipsum
# show blockquote, and wrapping of long lines
>ubique nostro singulis in vix, vis eu doctus scripserit ullamcorper. His quidam detraxit referrentur ei, affert adolescens intellegam sea in. Eros phaedrum imperdiet vim ei, ex amet voluptatum efficiendi eos, nihil sanctus intellegebat at nec. Adipisci theophrastus ei duo, eos cu conceptam percipitur, an dicta eripuit similique his. Graeci convenire in sit, eum errem laoreet ancillae ut, qui at facilisi periculis. 

##-- Start/continue printing here
==
Niedersachsen
--
Göttingen
Niedersachsen
Einwohner:         117.665 
Postleitzahlen:         37001–37099
Vorwahl:         0551
Kfz-Kennzeichen:         GÖ
37083 Göttingen
--
Hannover
Niedersachsen
Höhe:         55 m ü. NHN
Fläche:         204,14 km²
Einwohner:         523.642 
Postleitzahlen:         30159–30659
Vorwahl:         0511
Kfz-Kennzeichen:         H
30159 Hannover
--

==
Baden-Württemberg
--
Reutlingen
Baden-Württemberg
Regierungsbezirk:         Tübingen
Landkreis:         Reutlingen
Einwohner:         112.452 
Postleitzahlen:         72760–72770
Vorwahlen:         07121, 07072 und 07127
Kfz-Kennzeichen:         RT
72764 Reutlingen
--
==
Saarland
--
Saarbrücken
Saarland
Einwohner:         180.047
Postleitzahlen:         66001–66133
Vorwahlen:         0681, 06893, 06897, 06898, 06805, 06806, 06881
Kfz-Kennzeichen:         SB
66111 Saarbrücken
--

##!! page-break

==
Nordrhein-Westfalen
--
Aachen
Nordrhein-Westfalen
Einwohner:         243.336
Postleitzahlen:         52056–52080
Vorwahlen:         0241, 02403, 02405, 02407, 02408
Kfz-Kennzeichen:         AC, MON
52062 Aachen
--
Bergisch Gladbach
Nordrhein-Westfalen
Einwohner:         109.697 
Postleitzahlen:         51427–51469
Vorwahlen:         02202, 02204, 02207
Kfz-Kennzeichen:         GL
51465 Bergisch Gladbach
--
Moers
Nordrhein-Westfalen
Einwohner:         102.923 
Postleitzahlen:         47441–47447
Vorwahl:         02841
Kfz-Kennzeichen:         WES, DIN, MO
47441 Moers
--
Neuss
Nordrhein-Westfalen
Einwohner:         152.644 
Postleitzahlen:         41460–41472
Vorwahlen:         02131, 02137, 02182
Kfz-Kennzeichen:         NE, GV
41460 Neuss
--
Paderborn
Nordrhein-Westfalen
Einwohner:         145.176 
Postleitzahlen:         33098–33109
Vorwahlen:         05251, 05252, 05254, 05293
Kfz-Kennzeichen:         PB, BÜR
33098 Paderborn
--
Recklinghausen
Nordrhein-Westfalen
Einwohner:         114.147 
Postleitzahlen:         45601–45665
Vorwahl:         02361
Kfz-Kennzeichen:         RE, CAS, GLA
45657 Recklinghausen
-- 
Siegen
Nordrhein-Westfalen
Einwohner:         100.325 
Postleitzahlen:         57072–57080
Vorwahlen:         0271, 02732 (Meiswinkel), 02737 (Feuersbach)
Kfz-Kennzeichen:         SI, BLB
57072 Siegen
==
Code
##::kbd
# this cannot have whitespace at start of line (that would result in <pre>-formatted text):
awk '{sub(/[ \t]+$/,"")}; 1';  # delete trailing whitespace
##::P
# back to standard <p>-paragraphs
Done :-)
--
_
 Hi    Hi
 Hi    Hi
 Hi Hi Hi
 Hi    Hi
 Hi    Hi
_

%%End%%

##__EOF__

don't print this
bla
blah

Comments

HJG 2016-02-13: Change of plan: there is no need to use H6 as pagebreak, and I want to use all the headers H1,H2,H3 directly.
The demo-inputfile has been modified.

Markup

  • # : Comments: lines starting with a '#' don't get printed.
  • ## : Commands: some special comments are used as commands:
    • ##__ : End-of-file. Stop printing, end the program.
    • ##!! : Pagebreak. Continue printing on a new page.
    • ##++ : Start-marker: pause printing, and skip the following lines, until the endmarker '##--' is found.
    • ##-- : Endmarker: resume printing.
    • the same: ##(( ...ignore lines... ##))
    • ##::kbd : Set default-tag for wrapping lines (standard is 'p', for paragraph). Only a single tag please!
  • The first non-comment line of the textfile will be used as title and H1-header.
  • ^^ : The text in the following (non-comment, non-blank) line will be used as a H1-header.
  • == : Dito, H2-header to follow.
  • -- : Dito, H3-header to follow.
    The lines after that header will be formatted as 'normal' text.
    Normal text gets wrapped in <p>-tags (can be changed via '##::', e.g. to q, kbd, code, pre).
  • Textstyles: **bold** //italic// __underline__ %%centered%%
  • Lines starting with a blank: the line gets wrapped in <pre>-tags ==> preformated text
  • Lines starting with a '*' : the line gets wrapped in <UL><LI>-tags ==> unnumbered list
  • Lines starting with a '>' : the line gets wrapped in <BLOCKQUOTE>-tags ==> text is indented
  • A line with a single '_' : it gets replaced with a &nbsp; ==> blank line

Features

  • Comments, <pre>, <UL>, H1..H3, EOF, and skip-ranges are extensions to the "basic operation" of the converter.
  • Blank lines are not used for headers. The formatting of the inputfile can be as spaced-out as you want.
  • Markup for **bold**, italic, underline is done only when a pair of '**' etc. is found on the same line.
    So, a single '***' or '///' remains unchanged.
  • The special chars I use most commonly are replaced with html-entities (ÄÖÜ, dashes, etc.) - Easy to extend.
  • Textsize, line-height, margins, padding are set to minimal values, to fit as much text on a page as possible.
    To see how much space a normal print would need, use the browsers's "Inspect element", and uncheck 'margins' in the Rules-tab.
  • Light background-colors, to show the structure of the text - and to make it easy to spot errors...
  • Pagebreak is a CSS-feature that only works when printing.
    To see the position of the break, change the empty DIV, e.g. to '<DIV style="page-break-after:always">-</DIV>'.
  • Print-Preview in the browser allows to customize headers and footers, e.g. filename, pagenumbers, etc.

Quirks & Todo

  • No ordered-lists: I rarely use these, so I have no plans to implement them here, and I wanted '#' as comment-char.
  • No links, no images, no forms. Well, this is for printing fairly short notes etc., not for browsing.
    • (low-priority todo)
  • Center: uses the obsolete tag '<center>'.
    Also, I wanted the markup as '^^center^^', but ^ is a very special char - This might get fixed.
  • Unnumbered-lists: only first level is supported for now - Todo.
  • No tables - Todo.
  • More ideas/todos:
    • detect and underline links and eMails.
    • 2 or 3 columns, to fit more short text-snippets on a single page - without organizing them into a table.

See also: