Updated 2010-06-17 06:18:34 by aricb
 What: tDOM
 Where: http://www.tdom.org/ 
        http://groups.yahoo.com/group/tdom 
 Description: C based XML extension for Tcl.  Based on the Expat parser,
        with uses Tcl namespaces, allows you to access the DOM trees as Tcl
        DOM objects.  Includes an HTML reader that reads HTML
        and generates a DOM tree.
        Currently at v0.8.2.
 Updated: 08/2007
 Contact: See web site

This tightly-coded extension emphasizes speed and memory economy. In contrast to the "Pure Tcl" TclDOM, it bests leading Java DOM implementations by an order of magnitude (!) in both processing speed and memory demand. Jochen also exploits tDOM's expressivity to offer a nice HTML reader, XML validator--called tnc--and XSLT engine. tDOM is in production at several commercial installations.

The official home of Jochen Loewer's and Rolf Ade's tDOM XML engine is http://www.tdom.org/

escargo 28 Feb 2008 - The "official home" does not provide any way to submit bug reports. I did report a problem to the Yahoo group, since seemed to be the only resort available.

Documentation

  • [1] overview
  • [2] [dom] - Create an in-memory DOM tree from XML
  • [3] [domDoc] - Manipulates an instance of a DOM document object
  • [4] [domNode] - Manipulates an instance of a DOM node object
  • [5] [expat] - Creates an instance of an expat parser object
  • [6] [expatapi] - Functions to create, install and remove expat parser object extensions.
  • [7] [tdom] - tdom is an expat parser object extension to create an in-memory DOM tree from the input while parsing.
  • [8] [tnc] - tnc is an expat parser object extension, that validates the XML stream against the document DTD while parsing.

Tutorials and articles

  • A tDOM tutorial is available here on the Wiki.
  • Take a look at the examples on this page too.
  • Jochen presented a valuable description of tDOM's origins and uses [9] to the first European Tcl Conference.
  • in the summer of 2003, Carsten Zerbst showed for Linux Magazine readers how practical and easy tDOM is [10]. And here's the same article in german: XML-Dokumente mit Tcl und tDOM bearbeiten [11]

Example: An XPath query

a quick way to get values from XML documents:
    set xml {
    <agents>
        <agent id="007">
            <name type="first">James</name>
            <name type="last">Bond</name>
            <age>Still attractive</age>
            <sex>Male</sex>
        </agent>
        <agent id="013">
            <name type="first">Austin</name>
            <name type="last">Powers</name>
            <age>Depends on your timeline</age>
            <sex>Yes, please</sex>
        </agent>
    </agents>
    }

    set dom [dom parse $xml]
    set doc [$dom documentElement]
    puts "Agent: [$doc selectNodes {string(/agents/agent[@id='013']/@id)}]"
    puts "First Name: [$doc selectNodes {string(/agents/agent[@id='013']/name[@type='first'])}]"
    puts "Last Name: [$doc selectNodes {string(/agents/agent[@id='013']/name[@type='last'])}]"
    puts "Age: [$doc selectNodes {string(/agents/agent[@id='013']/age)}]"

Will output:
    Agent: 013
    First Name: Austin
    Last Name: Powers
    Age: Depends on your timeline

-- 04Nov2004 PS

Example: An XPath query with a namespace

Trying to retrieve elements from a namespace was confusing me until I found a post demonstrating the use of -namespaces with selectNodes [12].
 % set fh [open small.gpx]
 file13bbfa8
 % set xmldata [read $fh]
 <?xml version="1.0"?>
 <gpx
 version="1.0"
 creator="ExpertGPS 1.1 - http://www.topografix.com"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns="http://www.topografix.com/GPX/1/0"
 xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd">
 <time>2002-02-27T17:18:33Z</time>
 <bounds minlat="42.401051" minlon="-71.126602" maxlat="42.468655" maxlon="-71.102973"/>
 <wpt lat="42.438878" lon="-71.119277">
 <ele>44.586548</ele>
 <time>2001-11-28T21:05:28Z</time>
 <name>5066</name>
 <desc><![CDATA[5066]]></desc>
 <sym>Crossing</sym>
 <type><![CDATA[Crossing]]></type>
 </wpt>
 <wpt lat="42.439227" lon="-71.119689">
 <ele>57.607200</ele>
 <time>2001-06-02T03:26:55Z</time>
 <name>5067</name>
 <desc><![CDATA[5067]]></desc>
 <sym>Dot</sym>
 <type><![CDATA[Intersection]]></type>
 </wpt>
 </gpx>

 % close $fh

This is edited down from http://www.topografix.com/fells_loop.gpx for brevity's sake.

Get the doc and the root.
 % set doc [dom parse $xmldata]
 domDoc013CAE80
 % set root [$doc documentElement]
 domNode013B1FDC

Try to get a list of waypoints (specification at [13])

Define a namespace and use it with selectNodes
 % set ns {gpxns http://www.topografix.com/GPX/1/0}
 gpxns http://www.topografix.com/GPX/1/0

Sample XPath queries with and without the namespace
 % $root selectNodes -namespaces $ns //wpt
 % $root selectNodes -namespaces $ns //gpxns:wpt
 domNode013B2060 domNode013B2194
 % $root selectNodes -namespaces $ns {//gpxns:wpt[1]}
 domNode013B2060

--skm 2006/12/10

Here's an example of using tdom to parse some html:
 package require tdom
 package require http

 # html - the source of the html page
 proc pullOutTheURLs {html} {

    # Parse your HTML document into a DOM tree structur
    set doc [dom parse -html $html]

    # root will be the root element of your HTML document,
    # ie. the HTML element
    set root [$doc documentElement]

    # The following finds all anchor links <a>. It isn't clear to me,
    # if you also interested in the urls of <area>, <link> and <base>
    # elements.
    set nodeList [$root selectNodes {descendant::a}]

    # init the result list
    set urlList {}

    # Pull out the Values of the href attributes
    foreach node $nodeList {
        set attList [$node attributes *]
        foreach attribute $attList {
            if {[string tolower $attribute] == "href"} {
                lappend urlList [$node getAttribute $attribute]
                break
            }
        }
    }

    # Get rid of the DOM representation of your HTML document
    $doc delete

    # finished
    return $urlList
 }

 # Test it
 set urlList [pullOutTheURLs [http::data [http::geturl [lindex $argv 0]]]]
 foreach url $urlList {
    puts $url
 }

XSLT Example

I am an XSLT newbie. I didn't find an XSLT example for tDOM on the wiki, so I thought I would provide one. It is almost too easy to warrant one, but I think it helps anyway.

I took the sample XML and simple XSL from the " What kind of language is XSLT?" [14] essay. For paste here, I will use a shorter version of the xml file.
 % package require tdom
 080
 % set gamedata {<results group="A">
 <match>
    <date>10-Jun-1998</date>
    <team score="2">Brazil</team>
    <team score="1">Scotland</team>
 </match>
 <match>
    <date>23-Jun-1998</date>
    <team score="0">Scotland</team>
    <team score="3">Morocco</team>
 </match>
 </results>}
 % set gamedoc [dom parse $gamedata]
 domDoc012E5690

xsldata was set to the first xsl example from the What kind of language is XSLT? essay. That is:
 % set xsldata {
         <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
            <xsl:template match="results">
            <html>
                    <head>
                    <title>
                        Results of Group <xsl:value-of select="@group"/>
                    </title>
                    </head>
                    <body>
                            <h1>
                                Results of Group <xsl:value-of select="@group"/>
                            </h1>
                            <xsl:apply-templates/>
                    </body>
            </html>
            </xsl:template>
            <xsl:template match="match">
                    <h2>
                        <xsl:value-of select="team[1]"/> versus <xsl:value-of select="team[2]"/>
                    </h2>
                    <p>Played on <xsl:value-of select="date"/></p>
                    <p>Result:
                            <xsl:value-of select="team[1] "/>
                            <xsl:value-of select="team[1]/@score"/>,
                            <xsl:value-of select="team[2] "/>
                            <xsl:value-of select="team[2]/@score"/>
                    </p>
            </xsl:template>
    </xsl:transform>}

 % set soccerstyle [dom parse $xsldata]
 domDoc012DB570
 % $gamedoc xslt $soccerstyle gamehtml
 domDoc012E6CC0
 % $gamehtml asXML
 <html>
    <head>
        <title>
        Results of Group A</title>
    </head>
    <body>
        <h1>
        Results of Group A</h1>
        <h2>Brazil versus Scotland</h2>
        <p>Played on 10-Jun-1998</p>
        <p>Result:
            Brazil2,
            Scotland1</p>
        <h2>Scotland versus Morocco</h2>
        <p>Played on 23-Jun-1998</p>
        <p>Result:
            Scotland0,
            Morocco3</p>
    </body>
 </html>

Example: Loading XML from file with the right encoding settings

tDOM provides some helpers for this in the tdom.tcl lib distributed with the tDOM. Cited from a comp.lang.tcl answer by Rolf Ade [15]:

tDOM::xmlOpenFile expects a filename and returns a file channel handle, which is readily fconfigure'd and seek'ed to get feeded into a dom parse -channel ... Please note, that the proc open a channel and returns that. That channel will not magically go away, if you're done with it. It's your responsibility to close that channel, if you don't need them anymore. So, a typical use pattern (sure, not the only) is
    set xmlfd [tDOM::xmlOpenFile $filename]
    set doc [dom parse -channel $xmlfd]
    close $xmlfd

tDOM::xmlReadFile is just a wrapper around tDOM::xmlOpenFile. The pattern is
    set doc [dom parse [tDOM::xmlReadFile $filename]]

and you're done. No leaking file channels, filename in, DOM tree out.

Obtaining and compiling tDOM

The current version is 0.8.2. Grab it from http://www.tdom.org/files.

The tDOM CVS allows read-only anonymous access. To get the latest status of the project just do (press return, when prompted for login)
   cvs -d:pserver:anonymous@cvs.tdom.org:/usr/local/pubcvs login
   cvs -z3 -d:pserver:anonymous@cvs.tdom.org:/usr/local/pubcvs co tdom

MAKR (2009-02-12) started to mirror tDOM's CVS repository as GIT repository at
   git://github.com/makr/tdom.git
   http://github.com/makr/tdom

I will occasionally check the CVS for updates and push them into GIT accordingly.

snichols I recently compiled tdom 0.8.0 on Windows XP successfully with threads enabled, but when doing a package require from within Tcl 8.4.7 I get the following error, "too many nested evaluations (infinite loop?)" when doing the package require tdom command. Any ideas? snichols Thanks in advance. - RS: Oh yes, a silly bug - in the file tdom.tcl, comment out the line
 package require tdom

a file can't require a package during providing it :)

snichols Thank you very much. That fixed the issue.

AK Aug 2, 2006. When was recently ? According to Rolf Ade this problem was fixed Sep 29, 2004. Both the package index generated by the TEA Makefile, and the one found in the 'win' directory load the shared library (i.e. DLL) first, then source tdom.tcl. As the DLL runs the C equivalent of 'package provide tdom' the 'package require tdom' executed by 'tdom.tcl' is satisfied and will not loop.

I've build a binary package of tDOM 0.7.4 for Debian GNU/Linux 3.0 (Woody) [16] -- Jochem Huhmann

Various combinations of mirrors, backups, and experimental work are often announced through the mailing list at http://groups.yahoo.com/group/tdom/ . This is particularly important, as sdf.lonestar.org seems relatively erratic in reliability.

The authors of tdom announces their new releases on the tdom yahoo group.

Phaseit provides a mirror [17] (This link is to a VERY old version of tdom).

An article [18] on IBM's developerWorks pages in Nov 2001 profiles a company (Ideogramic) that uses tDOM for XMI processing in its product.

PS 04Nov2004Deadlink Is it still available somewhere? Another article [19] in February 2002 highlights tDOM's performance, while also supplying a recipe for your own start with the package. More recently,

In the midst of one chat conversation, Rolf Ade explained that, "The most valuable thing in tDOM is his [Jochen]'s C-coded xpath engine."

PT - I'll second that. tDOM's XPath processing is extremely competent.

A tDOM tutorial could include these points (proposed by Rolf Ade):

  • how to use the SAX interface (setting up event handler scripts, stacking event handler scripts, error handling etc)
  • how to parse (parse a string, or read the XML data out of a channel, with notes about the encoding problems)
  • how to get a DOM tree representation of XML data
  • the XPointer, DOM 1 and DOM 2 and XPath commands to find or to navigate to some nodes of interest
  • how to use tDOM's XSLT engine (to build an XSLT processor or in server application)
  • how to serialize DOM trees (as XML, HTML or text)
  • how to validate the XML data while parsing
  • how to create DOM trees from scratch
  • how to create additional, tcl scripted DOM methods
  • how to create additional, tcl scripted XPath functions

June 24th: tDOM-0.7.1 released. Enhancements to XSLT/XPath support -- now full compliant (thanks Rolf), Tcl-Thread support (thanks Zoran), a lot of bug fixes (Rolf), some HTML parser enhancements, even simple XML parser is namespace-aware.

Uhm, this "full compliant" XSLT support is a little bit to much said - Jochen would have done better, if he had used my "almost compliant".

Don't get this wrong. It is true, that tDOM's XSLT support was greatly improved over the last releases. I'm pretty sure, you could use every 'real live' XSLT stylesheet with it, with correct result. I won't confuse you with outlying nifty difty XSLT details, therefor I omit the list of things that are not quit right.

There are (prominent) tools out there, that don't do it better than tDOM's XSLT engine, due to my extensive testing, but nevertheless claim since a couple of months 100% XSLT complicance, which is simply not completely true, and nobody bothers.

So just use tDOM's XSLT and you will be happy with it. It's only, that I know my business and "full compliant" is not 100% true. Even the missing outlying details will be added, in the next months, for sure. I wonder, what Jochen then will write, in the announcement ;-).

The XPath support is now indeed really very compliant and complete. de.

LV finally, with some editing, was able to get this package to compile.

MSW now understands why tDOM crashed for him ... obviously you shouldn't be generating variables more than once (like two consecutive [<domDoc> documentElement root] in different functions make tDOM trip). He's still intrigued by appended nodes not being addressable (no localName, no URI, not reachable via [selectNodes])..

See also:


2005/03/11 skm Many of the links here in Further Readings [20] are stale. Does anyone have up-to-date locations for these papers? thanks.

LV Note that even though the tdom.org web site shows no release since 2004, email on the tdom mail list encourages people to use the CVS to fetch the latest version of code, which developers assure continues to be updated. Note that 64 bit users need to get the cvs version of tdom to get it working properly, according to the developer.

Also note that http://www.tdom.org/files/ appears to have the tdom releases that are being performed.

LV 2007 Sep 11

Just a note - if you build tdom from source, do a "make test" and see a crash, try rebuilding tdom with the --disable-tdomalloc flag; on my SPARC Solaris 9 system, this resolved the crash.

Getting the Current Namespace Mapping for a Node

DKF: It came up in comp.lang.tcl recently that Gerald Lester needed access to the current namespace mapping for an arbitrary DOM node (parsing [XML Schema] and WSDL requires this sort of thing). Rolf Ade told us how:
 Given, that the variable node2 hold the node command, for which you
 want to know all prefix-URI mappings in scope, do

 $node2 selectNodes namespace::*

This returns a list of two-element lists (i.e. pairs). Of each pair, the first element is either the literal "xmlns" (stating that this is the description of the namespace of unqualified elements) or "xmlns:" followed by a string local namespace name (stating that this is the description of a named namespace and that elements and attributes in that namespace will be using qualified names). The second element of the pair is the URI that characterises the namespace (which need not resolve to anything).

Note (for XML neophytes) that unqualified attributes are always in no namespace at all (unlike unqualified elements).

LV 2007 Oct 09 Anyone have an example of using tDOM to validate XML using a DTD?

See the page tnc for that.

LV from comp.lang.tcl, we read:
>  2. I evidently don't understand the domNode man page. For the 
>     "getAttribute" method it says: 

>        getAttribute attributeName ?defaultValue? 
>           Returns the value of the attribute attributeName. If 
>           attribute is not available defaultValue is returned. 


>      It also doesn't give any example(s). 


> Can someone point me to some sample code? 



If I have a Dom Document $doc 
Which has an element somewhere: 
<Foo Bar="baz">Stuff</Foo> 

set $node [$doc getElementsByTagName "Foo"] 
puts [$node getAttribute "Bar"] 


will print: 
baz 


male 2010-02-24 - Sorting nodes in tdom

See also:


CMcC - 2010-04-13 21:05:27

I was having some confusion with [$node attributes] and xml namespaces. [evilotto] helped me decode what it returns, and I record the findings for posterity.

attributes may return a singleton. In that case, the attribute name is just that.

attributes may return a three-element list. In that case it may be approximated as [lassign $a name namespace uri] ... however: the uri may be empty and the name and namespace equal. In that case, the attribute appears to be a definition of the uri for the namespace given by $name, although the uri thus defined is not returned in the uri field, the uri-defining attribute is named as if it were $ns:$ns. Finally, the {xmlns {} {}} form appears to be special, and to indicate that the xmlns namespace's uri is being defined.

There. Clear as mud. No wonder XML is so popular (?)

aricb 2010-06-17

An XML declaration is a processing instruction along the lines of <?xml version="1.0" encoding="UTF-8" standalone="no" ?> at the beginning of an XML document. As nearly as I can determine, the recommended way to put this line in a tDOM-generated XML file is something along the lines of [puts $xmlfile "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\" ?>"]; in other words, tDOM doesn't provide any facilities for outputting XML declarations (see [21] and [22]).

However, when parsing an XML file with tDOM, you can capture the value of the XML declaration's encoding attribute using [dom parse [tDOM::xmlReadFile $filename encStr]]; the value will be stored in $encStr ([23]).

Generated in 1278ms