Version 48 of Complex data structures

Updated 2003-06-10 14:56:18

Title: Complex data structures

Arjen Markus Quite often the question arises: how to model complex data structures? As Tcl allows various solutions, it seemed to me a good idea to start a new page on this subject.

escargo 17 Jan 2003 - Just for the sake of illustration, could you give some examples of these complex data structures? Are we trying to talk about AVL trees, red-black trees, tries, DAGs, petri nets, bags, or what all else?

AM (24 january 2003) When I set up this page, I was thinking of C structs, Fortran arrays and other "passive" data structures. Note that the Tcllib contains a number of modules for managing data structures - matrix, tree, ...

awalczak I deal with tcl code generated by idl2tcl. In my case IDL code is modelling struct with other structs and collections inside. I would be very thankfull for example how to build in TCL a big struct-like construct step-by-step (having smaller units defined earlier and combining them together before return form a function).

RBTree is a Tcl extension that implements Red-Black trees, having features of an ordered list and direct access as array. - RS found explanations at [L1 ], but from there I gather it's just well-balanced binary trees, and on insertion are (by possible rotation) brought back to well-balancedness.


Possible solutions:

  • Use one of the object-oriented extensions
  • Use TclX keyed lists
  • Use arrays (associative arrays allow indices that are strings, rather than mere numbers or tuples of numbers)
  • Use tagged lists
  • Use namespaces
  • Use key--value lists

Pros and cons of the object-oriented extensions:

  • Pro: Natural and intuitive to anybody who has done any OO (and probably to people who haven't either).
  • Con: At the moment garbage collection can't be applied to these objects so you need to take care of destruction yourself.
  • Con: Requires an external package.

Pros and cons of TclX keyed lists:

  • Pro: a keyed list can handle nested data structures. For example, the keyed list "{ID 106} {NAME {{FIRST Frank} {LAST Zappa}}}" represents the following hierarchical structure:
     ID = 106
     NAME:
        FIRST = Frank
        LAST = Zappa
  • Pro: keyed lists are first-class data items. They can be passed in and out of procs like any other list or string.
  • Pro: it is possible to query the existence and value of a key in a single command, without the speed drawback of catch.
  • Con: the access syntax is more verbose than for arrays.
  • Con: access to nested elements requires multiple commands.

Pros and cons of arrays:

  • Pro: by one name, you gather all the pieces of information
     set personal_data($name,birthday) "..."
     set personal_data($name,telnum) "..."
     ...
  • Pro: easy access to all the data pieces.
  • Pro: implemented internally by hash tables, hence access is fast.
  • Con: arrays are not "first-class" variables in Tcl, you will have to use upvar or array get and set commands to transfer them between procedures.
  • Con: essentially only one level of structuring. Otherwise you would have to use structured indices to store the values.

Pros and cons of tagged lists:

  • Pro: tagged lists {NAME ... {BIRTHDAY ...} {TELNUM ...} ...} are simple to use, they are first-class variables (they are simply a form of list variables). They can have any level of substructure.
  • Con: replacing values deep inside such a structure can be a nuisance. (But this may force you to use constructor and accessor procs to do this - nice modular design).
  • Con: because of the need for linear or binary searching, tagged lists will be slower than arrays for large data structures.

Pros and cons of namespaces:

  • Pro: separate variables into independent groups, all variables are first-class. No special syntax required

.

  • Pro: implemented internally with hash tables, for O(1) access.
  • Con: if you need many namespaces it can be difficult to maintain the application.
  • Con: variables will exist independent of any procedure, so they have a non-local scope. Again, maintenance problems will arise.

What do you mean by namespaces here? I can think of two uses: One simply as a way to group a bunch of variabl es, and the other with included procs to act more like an object. Perhaps both approaches need entries?


Pros and cons of key--value lists:

A key--value list is a list where every even index item is a "key" (arbitrary identifier string) and the following (odd index) item is the value associated with that key.

  • Pro: a key--value list can handle nested data structures. For example, the keyed list "ID 106 NAME {FIRST Frank LAST Zappa}" represents the following hierarchical structure:
     ID = 106
     NAME:
        FIRST = Frank
        LAST = Zappa
  • Pro: key--value lists are first-class data items. They can be passed in and out of procs like any other list or string.
  • Pro: key--value lists do not require any Tcl extensions.
  • Pro: The [array get] and [array set] subcommands use key--value lists natively.
  • Con: there are no commands for directly accessing data stored in key--value lists.

The most convenient way of accessing information in a key--value list is to unroll it into an array. (In practice, key--value lists spend a lot of time unrolled into arrays.) Typical code might look like

  proc update_KVL {KVL some more args} {
      array set A $KVL
      # Do the update
      return [array get A]
  }

It is often convenient to use a key--value list for storing "the settings". Consider

  proc do_something {settings_KVL args} {
      # Store defaults into array A
      array set A $settings_KVL; # Override with specific settings
      # Use data in A when doing something
  }

12may03 jcw - FWIW, there's a little "ihash" package in critlib coded in C (and I have a minimal pure Tcl script for doing the same somewhere), which makes key-value lists (which I used to call interleaved lists) in the same performance order as arrays, because it maintains a hash table alongside. This is transparent, but you have to be careful to avoid shimmering or things will slow down as the hash table gets rebuilt way too often. Probably also some more info, search for ihash on the wiki.


Of course it is possible to combine these solutions - to get the best (or worst) out of them.


RS 2002-07-05 - In the Tcl chatroom we discussed the potential of nested lists for data storage, given the new "multi-dimensional" lindex and lset starting from 8.4. Steve Landers wished for structures so list elements could be addressed with meaningful strings rather than positional indices only. Here's a nice minimal solution that I came up with:

 proc struct {name elements} {interp alias {} $name {} lsearch $elements}

 #... used to define a "struct" (map from names to positions):
 struct staff-> {firstname lastname birthdate phone}

 #... used to abstract such details in coding accesses:
 lset database $employeeID [staff-> phone] 123-4567890

Letting the struct name end in "->" is cute sugar. Note however that after it (and not before it) there must be whitespace.

Nother way out would be to use "staff:" -jcw

RS Yep - see Implementing enumerated types for a two-way map that generates both "staff:" and "staff@" for the inverse direction.

Brett Schwarz - humm, I just found this page. I actually put a package in tcllib that does what RS is talking about (essentially). It is called 'record', and is located under the 'struct' module in tcllib. It is only in CVS now (as of Jan 2003), but should go into next tcllib release. Here is how you would use it:

 package require struct

 namespace import struct::*

 # define records
 record phones {
    home
    work
    cell
 }

 record addr {
   name
   street
   {country USA}
   {record phones phlist}
 }

 # instantiate
 addr inst1
 # assign members
 inst1 configure -name Brett -street main
 # get a values (returns list)
 inst1 cget -name -street
 # you also use the dot notation
 set name [inst1.name]
 inst1.street = Main
 # the phlist is a nested record, and is accesses like this:
 inst1 configure -phlist.home 425-555-1212
 inst1.phlist.work = 206-555-1212
 # following would return a list of the phones (home work cell)
 set phones [inst1.phlist]

Note the 'country' member. I assigned it a default value of "USA". Anyways, you get the picture. See the docs in CVS for more info...