EKB This is a follow-up to the discussion on Is Tcl Different!.
Here's the wonderful HTML parser in 10 lines as posted on that page:
############################################ # Turn HTML into TCL commands # html A string containing an html document # cmd A command to run for each html tag found # start The name of the dummy html start/stop tags proc HMparse_html {html {cmd HMtest_parse} {start hmstart}} { regsub -all \{ $html {\&ob;} html regsub -all \} $html {\&cb;} html set w " \t\r\n" ;# white space proc HMcl x {return "\[$x\]"} set exp <(/?)([HMcl ^$w>]+)[HMcl $w]*([HMcl ^>]*)> set sub "\}\n$cmd {\\2} {\\1} {\\3} \{" regsub -all $exp $html $sub html eval "$cmd {$start} {} {} \{ $html \}" eval "$cmd {$start} / {} {}" }
But it was missing the default value for cmd, HMtest_parse, so I wrote one and applied it to a sample bit of HTML:
proc HMtest_parse {tag state props body} { if {$state == ""} { set msg "Start $tag" if {$props != ""} { set msg "$msg with args: $props" } set msg "$msg\n$body" } else { set msg "End $tag" } puts $msg } HMparse_html { <html> <p class="bubba"> This is my very first paragraph. How do you like it? I think it has a lot to recommend it. </p> <p class="louielouie"> This is my second paragraph, which is OK, but not as nice as my first one. </p> </html> }
This gives the following output:
Start hmstart Start html Start p with args: class="bubba" This is my very first paragraph. How do you like it? I think it has a lot to recommend it. End p Start p with args: class="louielouie" This is my second paragraph, which is OK, but not as nice as my first one. End p End html End hmstart
In fact, the code is not HTML-specific, and can handle simple XML code (e.g., that doesn't use the self-closing <tag/> format). It's like a mini-SAX.
In spite of its incredible (to me) brevity, the code can actually be shortened somewhat. The proc HMcl is introduced in order to avoid trouble with [ ]'s. But it can also be avoided by enclosing the value of exp in { }'s. Also, the variable w doesn't need to be defined (at least in recent Tcl versions): \s can be used instead. Here's the new HMparse_html proc:
proc HMparse_html {html {cmd HMtest_parse} {start hmstart}} { regsub -all \{ $html {\&ob;} html regsub -all \} $html {\&cb;} html set exp {<(/?)([^\s>]+)\s*([^>]*)>} set sub "\}\n$cmd {\\2} {\\1} {\\3} \{" regsub -all $exp $html $sub html eval "$cmd {$start} {} {} \{ $html \}" eval "$cmd {$start} / {} {}" }
OK, one more thing... If the cmd is an ensemble, then the different tags can be sub-procs within the ensemble. For example, just like string length is a command, where string is the ensemble, and length is the sub-proc, it should be possible to set up cmd so that cmd p would invoke the proc for parsing p tags, cmd html would invoke the command for parsing html tags, etc.
It's pretty easy to create ensembles in snit, so here's a snit version:
package require snit ############################################ # Turn HTML into TCL commands # html A string containing an html document # cmd A command to run for each html tag found # start The name of the dummy html start/stop tags proc HMparse_html {html {cmd HMtest_parse} {start hmstart}} { regsub -all \{ $html {\&ob;} html regsub -all \} $html {\&cb;} html set exp {<(/?)([^\s>]+)\s*([^>]*)>} set sub "\}\n$cmd {\\2} {\\1} {\\3} \{" regsub -all $exp $html $sub html eval "$cmd {$start} {} {} \{ $html \}" eval "$cmd {$start} / {} {}" } snit::type parser { proc isend {state} { if {$state == ""} { return false } else { return true } } method hmstart {args} {} method html {state args} { if [isend $state] { puts "That's all, folks!" } else { puts "Let's get going!" } } method p {state props body} { if {![isend $state]} {puts $body} } } parser HMtest_parse HMparse_html { <html> <p class="bubba"> This is my very first paragraph. How do you like it? I think it has a lot to recommend it. </p> <p class="louielouie"> This is my second paragraph, which is OK, but not as nice as my first one. </p> </html> }
This is the output:
Let's get going! This is my very first paragraph. How do you like it? I think it has a lot to recommend it. This is my second paragraph, which is OK, but not as nice as my first one. That's all, folks!