'''Stephen Uhler's HTML parser in 10 lines''', originally by [Stephen Uhler], previously known as '''HTML parser in 8 lines of Tcl''', and currently known as '''HTML parser in 4 lines of Tcl''', is a small toy [HTML] parser. It's not correct in that it can get messed up by angle brackets in attribute values and unbalanced braces in the HTML content, but it's an interesting code snippet nonetheless. ** Attributes ** location (defunct): http://freegis.org/cgi-bin/viewcvs.cgi/grass51/lib/form/html_library.tcl?rev=1.1&content-type=text/vnd.viewcvs-markup [https://groups.google.com/d/msg/comp.lang.tcl/3TtLuDPal9k/OUJEaHieGB0J%|%Didn't anyone go to last week's Tcl/Tk Workshop?] ,[comp.lang.tcl] ,1995-07-10: [https://groups.google.com/d/msg/comp.lang.tcl/_PFtAe-o2so/EcWfxP_3QO4J%|%Tcl/Tck and HTML] ,[comp.lang.tcl] ,1995-07-20: ** Description ** [EKB] et al: [Stephen Uhler]'s [HTML] parser in 8 lines is now actually in 4 lines. Here is the current version: ====== ############################################ # Turn HTML into TCL commands # html A string containing an html document # cmd A command to run for each html tag found # start The name of the dummy html start/stop tags proc HMparse_html {html {cmd HMtest_parse} {start hmstart}} { set exp {<(/?)([^ \t\r\n>]+)[ \t\r\n]*([^>]*)>} set sub "\}\n[list $cmd] {\\2} {\\1} {\\3} \{" regsub -all $exp [string map {\{ \&ob; \} \&cb;} $html] $sub html eval "$cmd {$start} {} {} \{ $html \}; $cmd {$start} / {} {}" } ====== But it was missing the default value for ''cmd'', ''HMtest_parse'', so I wrote one and applied it to a sample bit of HTML: ====== proc HMtest_parse {tag state props body} { if {$state eq {}} { set msg "Start $tag" if {$props ne {}} { set msg "$msg with args: $props" } set msg "$msg\n$body" } else { set msg "End $tag" } puts $msg } HMparse_html {
This is my very first paragraph. How do you like it? I think it has a lot to recommend it.
This is my second paragraph, which is OK, but not as nice as my first one.
} ====== '''Output''': ======none Start hmstart Start html Start p with args: class="bubba" This is my very first paragraph. How do you like it? I think it has a lot to recommend it. End p Start p with args: class="louielouie" This is my second paragraph, which is OK, but not as nice as my first one. End p End html End hmstart ====== In fact, the code is not HTML-specific, and can handle simple [XML] code (e.g., that doesn't use the self-closingThis is my very first paragraph. How do you like it? I think it has a lot to recommend it.
This is my second paragraph, which is OK, but not as nice as my first one.
} ====== '''Output''': ======none Let's get going! This is my very first paragraph. How do you like it? I think it has a lot to recommend it. This is my second paragraph, which is OK, but not as nice as my first one. That's all, folks! ====== ---- The problem with using snit (or [incr tcl] is you have to declare handlers for all tags or you will end up with a runtime error (for example "method body not found"). I myself use the following mechanism with some success: ====== proc HMtest_parse {tag state props body} { if {[info proc handle_$tag] ne {}} { handle_$tag $state $props $body } } proc handle_a {state props body} { ... } proc handle_img {state props body} { ... } ====== This way, you only have to declare handlers for the tags that you care about. Hai Vu ---- [WHD]: Actually, Snit allows you to define a method that receives all unknown methods: ====== delegate method * using {%s UnknownMethod %m} method UnknownMethod {methodName args} { ... } ====== <