XML Tree Walking

Keith Vetter 2005-03-21 : Recently I had to optimize some code that was using pure-tcl to extract out each element of an XML file in the order it appears in the file. My first step in optimization was simply to use tdom to build a dom tree. Then, the following code does a restartable pre-order traversal of that tree

The first call to :::WalkXML::NextNode with XML data initializes the dom tree. Each subsequent call to ::WalkXML::NextNode returns the next dom node.

Just this optimization alone, without redoing the code to utilize the dom tree, greatly speeded up the processing.

NEM 21Mar05: If you want to walk the elements in the order they appear in the file, might a SAX interface be just what you are after?

KPV Close, but the two models differ in who drives whom. The SAX model makes repeated calls into user code via callbacks while in this model works user code makes repeated calls into the this model. For parsing XML, I find the second way better fitting my mental model of how it should work: e.g. I'm looking at a <time> tag so I just need to call NextNode to get its value. With the SAX model I guess you'd accomplish the same by setting some global state variable.

NEM 22Mar05 Ah, OK. Yes, with SAX you generally have to layer some sort of state machine over the callbacks. E.g. in my Simple XML report writer I used a global stack. Come to think of it, it'd be nice if the SAX callbacks could return some state, which would be passed to the next callback. Then you wouldn't have to manage the lifecycle of global/namespace'd state variables. But I digress.


 ##+##########################################################################
 #
 # WalkXML -- walks an tdom XML tree with each call to NextNode returning
 # the next node in a pre-order traversal of the dom tree.
 # by Keith Vetter, March 2005
 #
 
 package require tdom
 namespace eval ::WalkXML { variable stack}
 
 proc ::WalkXML::NextNode {{XML ""}} {
    variable stack                              ;# List of VISITED ancestors
 
    if {$XML ne ""} {                           ;# Parse given xml
        set doc [dom parse $XML]
        set root [$doc documentElement]
        set stack [list $root]
        return $root
    }
        
    while {1} {
        set last [lindex $stack end]            ;# Last already visited node
 
        set next [$last firstChild]             ;# Try going down the tree
        if {$next ne ""} break
        
        set stack [lrange $stack 0 end-1]
        set next [$last nextSibling]            ;# Try going across the tree
        if {$next ne ""} break
    
        while {1} {                             ;# Must go up the tree
            set next [lindex $stack end]        ;# Parent
            set stack [lrange $stack 0 end-1]
            if {$next eq ""} break
            
            set next [$next nextSibling]        ;# Try parent's sibling
            if {$next ne ""} break
        }
        break
    }
    if {$next ne ""} {
        lappend stack $next
    }
    return $next
 }
 
 
 
 ################################################################
 #
 # Demo code
 #
 set XML {<?xml version="1.0" encoding="ISO-8859-1"?>
    <loc version="1.0" src="Groundspeak">
    <waypoint>
    <name id="GCGPXK"><![CDATA[Playing Poker with the Squirrels by Rino 'n Rinette]]></name>
    <coord lat="40.1548166" lon="-82.5202833"/>
    <type>Geocache</type>
    <link text="Cache Details">http://www.geocaching.com/seek/cache_details.aspx?wp=GCGPXK</link>
    </waypoint><waypoint>
    <name id="GC19DF"><![CDATA[Great Playground Caper by Treasure Hunters Inc.]]></name>
    <coord lat="40.0667166666667" lon="-82.5358"/>
    <type>Geocache</type>
    <link text="Cache Details">http://www.geocaching.com/seek/cache_details.aspx?wp=GC19DF</link>
    </waypoint>
    </loc>
 }
 
 
 set node [::WalkXML::NextNode $XML]
 while {$node ne ""} {
    if {[$node nodeType] eq "TEXT_NODE"} {
        puts "\"[$node nodeValue]\""
    } else {
        puts "[$node nodeName]"
    }
    set node [::WalkXML::NextNode]
 }