, or HyperText Markup Language
, is a markup language
used on the World-Wide Web
Parsing Tools edit
- Tcllib html
- a module for generating html
- tools to parse html
- an extension that parses and renders HTML, compiled for use without Tk
- a wrapper to Tidy
- the successor to tkHTML
- tDOM's XPath-oriented parser
- can be used to manipulate HTML
- includes xmlgen for generating HTML or XML
- An interface to the Gumbo HTML5 parsing library
Generation Tools edit
- html form generator, by CMcC
- Generate HTML forms from Tcl lists.
- structure and layout a static collection of html pages arranging a wide variety of materials
- includes a utility for structured HTML tag generation
- Wiki format to HTML
See Also edit
- HTML widgets
- discusses widgets that render HTML into a visual representation.
- Web scraping
- august html editor
For extracting data from HTML, it's generally more robust to parse the HTML page into some document model, perhaps using tDOM
, than to hack at it with regular expressions, and then using XPath
to find the data.
If the task is to 'pull out' some data out of a HTML page, I'm indeed a strong believer in the 'parse the HTML page into a tree and query that tree' approach. For real life problems, I claim that this approach is much simpler and easier to maintain - and for sure, you have to maintain such a thingy, because the layout of HTML pages tend to change frequently - than every regexp approach. Sure, you have to learn another query language - xpath in this case. But if you are really in the web business, there are chances you have to learn xpath anyway.