Updated 2015-11-26 00:52:11 by aspect

HTML, or HyperText Markup Language, is a markup language used on the World-Wide Web.

Parsing Tools  edit

Tcllib html
a module for generating html
tools to parse html
an extension that parses and renders HTML, compiled for use without Tk
a wrapper to Tidy
the successor to tkHTML
tDOM's XPath-oriented parser
can be used to manipulate HTML
includes xmlgen for generating HTML or XML
An interface to the Gumbo HTML5 parsing library

Generation Tools  edit

html form generator, by CMcC
Generate HTML forms from Tcl lists.
structure and layout a static collection of html pages arranging a wide variety of materials
includes a utility for structured HTML tag generation
Wiki format to HTML

See Also  edit

HTML widgets
discusses widgets that render HTML into a visual representation.
Web scraping
august html editor

Description  edit

For extracting data from HTML, it's generally more robust to parse the HTML page into some document model, perhaps using tDOM, than to hack at it with regular expressions, and then using XPath to find the data.

If the task is to 'pull out' some data out of a HTML page, I'm indeed a strong believer in the 'parse the HTML page into a tree and query that tree' approach. For real life problems, I claim that this approach is much simpler and easier to maintain - and for sure, you have to maintain such a thingy, because the layout of HTML pages tend to change frequently - than every regexp approach. Sure, you have to learn another query language - xpath in this case. But if you are really in the web business, there are chances you have to learn xpath anyway.