Tuplespace

Tuplespaces are the generic term for what was called at Yale a Lindaspace. It's essentially a very simple, powerful and elegant technique for networking clients together where no one talks directly to each other and communication is accomplished via a publish/subscribe paradigm. You can do cool things like chat systems, load balancers and fault tolerance through Tuplespaces.

See also http://c2.com/cgi/wiki?TupleSpace and http://c2.com/cgi/wiki?LindaTupleSpaces . See Tupleserver for an implementation using Sqlite as a backend and to which you can connect over sockets.

This is just a simple implementation in Tcl. An answer to the Street Programming Tuplespace that I have in production for my employer. -- Todd Coram [The proprietary variant apparently beefs up memory management and access control capabilities.] Yes, the memory manager flushed tuples from memory if they hadn't been accessed within N hours. The tuples would continue to exist in the metakit db, but didn't waste precious memory if they weren't being actively accessed. Since that tuplespace was used to hold hundreds of thousands of email contacts, inactive email accounts didn't take up memory... Access control allowed for readonly accounts -- some users could peruse the tuplespace (and even get callbacks) but couldn't "take" (remove) items...

 package require md5 2

 # A very simple implementation of a Linda-like Tuplespace.
 # Tuples take the form {field1 ... fieldN}.
 #
 namespace eval tuplespace {
    # This array defines the tuplespace. Each column contains a list
    # of matching tuple ids (tids). The hash is a combination of
    # column number, tuple field. (.e.g - tspace(0,hello) references any
    # tuple (via its id) that has 'hello' in its first column.
    #
     variable tspace

    # Tuples are referenced in tspace by tuple ids (tids). Tids are
    # md5 hashes of the actual tuple and are used as an index into the
    # array that holds the tuples.
    #
    variable tid_arr
    
    # Put a tuple out into tuplespace.
    #
    proc out {tuple} {
        variable tspace
        set tid [new_tid $tuple];   # create a new tuple id.
        # For each field in the tuple, asign it to a column in tuplespace.
        #
        set fn 0
        foreach tpf $tuple {
            lappend tspace($fn,$tpf) $tid;# this column now references tid.
            incr fn
        }
        return $tid
    }
    
    # Read (non-destructively) one (default) or more tuples from the space
    # matched by 'tpat'. Tpat is a tuple specification that may have a
    # "don't care" placeholder (?) in place of any tuple field. At least
    # one matching tuple field must be supplied. If rd_multiple is 1, then
    # return all matching tuples.
    #
    proc rd {tpat {rd_multiple 0}} {
        set res {}
        foreach tid [_rd $tpat $rd_multiple] {
            if {!$rd_multiple} {
                return [tid.tuple $tid]
            }
            lappend res [tid.tuple $tid]
        }
        return $res
    }
    
    
    # Take (by removing) a tuple from the space that matches 'tpat'.
    #
    proc in {tpat} {
        set res {}

        foreach tid [_rd $tpat] {
            set res [tid.tuple $tid]
            _remove $tid
        }
        return $res
    }

    # Return a directory listing of all tuples that match 'tpat'.
    #
    proc dir {tpat} {
        return [rd $tpat 1]
    }

    # Do all of the work of reading a tuple.
    #
    proc _rd {tpat {rd_multiple 0}} {
        variable tspace
        set match {}

        # Look at all of the supplied pattern fields and collect
        # a list of matching tuples (based on each field). If a field
        # fails to match (and the field is not ?, return an empty tuple).
        #
        set fn 0
        foreach tpf $tpat {
            if {[string equal $tpf "?"]} {
                incr fn
                continue
            }
            if {![info exists tspace($fn,$tpf)]} {
                return {}
            }
            lappend match $tspace($fn,$tpf)
            incr fn
        }
        return [_intersection $match $tpat $rd_multiple]
    }
    
    # Find the intersection between all of the matched (candidate) tuples.
    #
    proc _intersection {match tpat rd_multiple} {
        # Calculate the number of fields actually supplied in tpat.
        # This is the number intersections in the matches we must find
        # in order to return a tuple.
        #
        set tpat_len [llength [lsearch -not -all -exact $tpat ?]]
        set tpat_len [llength [lsearch -not -all -exact $tpat ?]]
        # Find the intersect of these collected matches. The first
        # intersect count that equal $tpat_len is our matched
        # tuple (unless all matches are requested).
        #
        set result {}
        foreach tid [join $match] {
            set tuple [tid.tuple $tid]
            if {[llength $tuple] != [llength $tpat]} {
                continue
            }
            if {![info exists intersect($tuple)]} {
                set intersect($tuple) 1
            } else {
                incr intersect($tuple)
            }
            if {$intersect($tuple) == $tpat_len} {
                lappend result $tid
                if {!$rd_multiple} {
                    break
                }
            }
        }
        return $result
    }
    
    # Remove a tuple from the space based on the tuple id.
    #
    proc _remove {tid} {
        variable tspace
        set tuple [tid.tuple $tid]
        set fn 0
        foreach tpf $tuple {
            if {[info exists tspace($fn,$tpf)]} {
                set tspace($fn,$tpf) [lsearch -all -not -inline \
                                          $tspace($fn,$tpf) $tid]
            }
            incr fn
        }
        tid.delete $tid
    }
    
    # Create a new tid to hold a tuple.
    #
    proc new_tid {tuple} {
        variable tid_arr
        set id [tid $tuple]
        set tid_arr($id) $tuple
        return $id
    }

    # Calculate the tid from a give tuple.
    #
    proc tid {tuple} {
        return "\#Tuple[::md5::md5 -hex $tuple]"
    }

    # Return the tuple from the given tid.
    #
    proc tid.tuple {tid} {
        variable  tid_arr
        return $tid_arr($tid)
    }
    
    # Destroy a tuple named by tid.
    #
    proc tid.delete {tid} {
        variable tid_arr
        unset tid_arr($tid)
    }

    # Flush a tuple from memory, but keep the tspace references around.
    # DANGEROUS!
    proc tid.flush_tuple {tid} {
        variable tid_arr
        set tid_arr($tid) {}
    }
 }

 # Examples:

 puts [tuplespace::out {hello there Todd Coram}]
 puts [tuplespace::out {hello there Maroc Ddot}]
 puts [tuplespace::out {linda space}]

 puts [tuplespace::dir {hello there ? ?}]
 puts [tuplespace::rd {hello there ? ?}]
 puts [tuplespace::in {hello there ? ?}]
 puts [tuplespace::dir {hello there ? ?}]
 puts [tuplespace::in {hello there ? ?}]
 puts [tuplespace::dir {hello there ? ?}]

jmn 2004-02-01 The above example seems to work just fine, but fails when I try:

 puts [tuplespace::out {hello there test etc}]

I don't know what's magic about that string, but the tid returned doesn't begin with #Tuple like all the others. Something fishy with the md5 implementation I'm using I guess.

Using tcllib1.5, md5 2.0.0, Tcl8.4.4 on Windows.

Further experimentation seems to suggest there's some sort of carriage return or backspace type character being returned by

 [md5 {hello there test etc}]

Seems the 1st 9 chars on the line are getting eaten!

 % puts "123456789[md5 {hello there test etc}]"
 ììÆ?¼ê[3#ü~²ºq
 % puts "123456789xxx[md5 {hello there test etc}]"
 ììÆ?¼ê[3#xxxü~²ºq

Beats me what this means or whether it's a problem in the md5 implementation itself, or just in how it's being used here.

Argh. The md5 package appears to have changed its API between versions. The prior version returned hex strings by default. Adding -hex to the tid generator proc's md5 call should fix the problem -- Todd Coram

Either that or package require version 1 of the md5 package. schlenk


Ruby, incidentally, has a nice, standard Tuple implementation as part of "Distributed Ruby" (drb).


09/11/02 -- Incorporated Tcl'ish improvements made by Michael Schlenker into the code above (replacing a few clumsy for loops with foreach loops).

Also, Each tid is now uniquely identified by an md5 hash from the tcllib md5 package. And a few more tid procs were added. Why? Well it makes it easier to add persistence (via metakit!) --- Todd Coram :

 # Meta-kit persistence for the tuplespace.
 #
 namespace eval tuplespace-db {
    proc open {dbpath} {
        mk::file open db $dbpath
        mk::view layout db.tspace "id tuple"
        _load
    }

    proc close {} {
        mk::file close db
    }

    proc _load {} {
        trace add variable ::tuplespace::tid_arr {read unset write} {}
        mk::loop row db.tspace {
            set tuple [mk::get $row tuple]
            set tid [tuplespace::out $tuple]
            tuplespace::tid.flush_tuple $tid
        }
        trace add variable ::tuplespace::tid_arr {read unset write} ::tuplespace-db::getset
    }

    proc getset {name tid op} {
        switch -- $op {
            write { _write $tid }
            read { _read $tid }
            unset { _unset $tid }
        }
    }

    proc _unset {tid} {
        set r [mk::select db.tspace -count 1 \
                   -exact id $tid]
        if {$r != {}} {
            mk::row delete db.tspace!$r
            mk::file commit db.tspace
        }
    }

    proc _read {tid} {
        if {[tuplespace::tid.tuple $tid] == {}} {
            # reload from database
            #
            set r [mk::select db.tspace -count 1 -exact id $tid]
            tuplespace::new_tid [mk::get db.tspace!$r tuple]
        }
    }

    proc _write {tid} {
        set tuple [tuplespace::tid.tuple $tid]

        # See if the tuple already exists in the database.
        #
        set r [mk::select db.tspace -count 1 -exact id $tid]
        if {$r != {}} {
            # Replace the tuple.
            #
            mk::set db.tspace!$r id $tid tuple $tuple
        } else {
            # Nope, add a new row.
            #
            mk::row append db.tspace id $tid tuple $tuple
        }
        mk::file commit db.tspace
    }
 }

15/Oct/03 schlenk: I noticed that the Metakit persistence for this tuplespace does not work reliably, if md5 hashes are used. I base64 encoded things and everything works well:

    # Calculate the tid from a give tuple.
    #
    package require base64
    proc tid {tuple} {
        return "\#Tuple[::base64::encode [::md5::md5 $tuple]]"
    }

Is the problem really with Metakit itself, or with Mk4Tcl ?

schlenk Bad phrasing on my side: It did not work with Metakit if only MD5 was used for this, but it worked when base64 encoded MD5 was used, probably due to embedded nulls or something like it...; have to take a look more closely tomorrow.

jcw - Yep, nulls are trouble. Change:

  mk::view layout db.tspace "id tuple"

to

  mk::view layout db.tspace "id:B tuple"

Are nulls trouble for Metakit, or just for Mk4Tcl?

jcw - Neither. Nulls terminate strings in C. That's why MK has type S (default in Mk4tcl) and type B properties. If you store a string with embedded null bytes in a S property, MK's C api will ignore everything past the first one, just like every other C function taking char*'s. Hmm...now you have me wondering: in a Tcl_Obj* string rep, embedded nulls are not possible, right? I wonder where the problem lies in this case.

Wrong. Tcl_Obj string reps are counted strings precisely so they can include embedded NULLs. Since 8.1 established (modified) UTF-8 as the preferred internal string encoding, embedded NULLs are no longer needed and are frowned upon, but for compatibility with extensions written for 8.0, Tcl_Obj's still accept and preserve embedded NULLS in their string reps.

jcw - Thanks, Don, for setting this straight. In that case, the conclusion is: if your strings can have null bytes, don't store them in an S property - use B instead. The frown carries over to Mk4tcl, MK, C, and C++. It would be nice for Mk4tcl to catch such cases and throw an exception - right now (MK 2.4.9.2), it doesn't - it truncates.

Jacob Levy JCW, are the costs of storing B and S items basically the same? If yes, why not make the default property type for mk4tcl be B?


3nov2003 Todd Coram A first cut at a Tuplespace server:

 namespace eval tuplespace_server {
    array set cb {}
    
    proc register_cb {chan tpat} {
        variable cb
        lappend cb($chan) $tpat
        return "ok"
    }
    
    proc deregister_cb {chan} {
        variable cb
        unset cb($chan)
        return "ok"
    }
    
    proc read {chan cmd tpat} {
        set res [::tuplespace::$cmd $tpat]
        return "ok $res"
    }
    
    proc write {chan tuple} {
        variable cb
        set tid [::tuplespace::out $tuple]
        foreach cbchan [array names cb] {
            foreach tpat $cb($cbchan) {
                if {[::tuplespace::rd $tpat] != {}} {
                    puts $cbchan "cb $tpat"
                }
            }
        }
        return "ok $tid"
    }
    
    proc dispatch {chan cmd tuple} {
        switch -- $cmd {
            cb {
                set res [register_cb $chan $tuple]
            }
            rd -
            dir -
            in {

                set res [read $chan $cmd $tuple]
            }
            out {
                set res [write $chan $tuple]
            }
            default {
                set res "error invalid command!"
            }
        }
        puts $chan $res
    }
 }

 proc register_client {chan addr port} {
    fconfigure $chan -blocking 0 -buffering line
    fileevent $chan readable [list handle_input $chan]
 }

 proc handle_input {chan} {
    if {![eof $chan]} {
        if {[gets $chan data] == -1} {
            return;                     # only handle complete lines
        }
    } else {
        catch {close $chan}
        ::tuplespace_server::deregister_cb $chan
        return
    }
    set l [regexp -inline -- {(\w+)\s+(.*)} $data]
    if {[llength $l] != 3} {
        puts $chan "error usage: command {tuple}"
    } else {
        foreach {dummy cmd tuple} $l break
        puts stderr "cmd=($cmd), tuple=($tuple)"
        ::tuplespace_server::dispatch $chan $cmd $tuple
    }
 }

 socket -server [list register_client] 6667
 vwait ::forever

Connect to the server via telnet and try the following commands:

  out tom baker 56
  out bob baker 54
  dir ? baker ?
  cb ? baker ?
  out ginger baker

Todd Coram Ugh. The above tuplespace server commits the sin of not practicing what it preaches. Here are revised procs that use the tuplespace itself to store callback information (not a Tcl array!):

    proc register_cb {chan tpat} {
        ::tuplespace::out [list cb_clients $chan $tpat]
        return "ok"
    }
    
    proc deregister_cb {chan {tpat ?}} {
        foreach cb [::tuplespace::dir [list cb_clients $chan $tpat]] {
            ::tuplespace::in [list cb_clients $chan [lindex $cb 2]]
        }
        return "ok"
    }

    proc tuple_match {tpat tid} {
        foreach tuple [::tuplespace::dir $tpat] {
            if {[::tuplespace::tid $tuple] == $tid} {
                return 1
            }
        }
        return 0
    }

    proc write {chan tuple} {
        set tid [::tuplespace::out $tuple]
        foreach client [::tuplespace::dir [list cb_clients ? ?]] {
            foreach {cb_clients chan tpat} $client break
            if {[tuple_match $tpat $tid]} {
                if {[catch {puts $chan "cb $tpat"}] != 0} {
                    deregister_cb $chan
                }
            }
        }
        return "ok $tid"
    }

1/31/2004 Yikes! A long standing bug in the tuple callback code above has been fixed. The addition of tuple_match means that you don't get a callback EVERYTIME anything is written to the space (and you had a previous match that hadn't yet been read). -- Todd Coram (Yes this code is getting too unwieldy for the wiki)


schlenk 01/Feb/2004. I have a variant of the above code (without the notification part at the moment) running under Tclhttpd exposed as a SOAP webservice. Ask me if you are interested, it's not really polished yet.


AK: While storing the callback information in the tuplespace itself is interesting it also opens the possibility of an application screwing with the server by modifying and/or removing the relevant tuples. That is IMHO a bad thing, from a security point of view. I.e., keeping the server management information (callbacks) and the application data (tuples) separate is IMHO the better approach. And nothing prevents us from storing the data in a second tuplespace, if we wish to :). Of course, that requires rephrasing the code above as a class, so that we can have multiple independent t-spaces. ... I would also use comm for the communication part. Its hooks allow the restricted protocol we are running here as well. Possibly even the cross-linkage of many tuple-servers into one space.

Todd Coram: Or... reserve any tuple beginning with cb_clients as system use only by disallowing any modifications via the write proc. A feature of my Street Programming Tuplespace was to reserve any tuple with a first element beginning with a '#' as a system tuple that couldn't be modified without a privileged connection. The benefit of storing tuplespace meta information as tuples themselves is that you can delegate system facilities to privileged clients outside of the tuplespace server. Maybe you would want a callback managager that could track who was getting what and displayed the frequency of queries in a graph. That would bloat the tuplespace server, but would make a nice external client.

Of course, you now need some sort of password facility to make this useful.

AK: Right. Access-Control, authentication, secrecy, the works. Because these system tuples should not be seen by a regular user either. Or otherwise system data leaks out which can help in attacks in other ways. Who was getting what ... That type of manager I would place on top of a tuplespace actually. Otherwise the tuplespace has an hardwired assumption that requests come from a network and that there is an id to be had identifying the requester. Note that this type of tracking can be of general interest beyond statistics. If you are linking several tuplespaces and each space can query others if he cannot answer a query on its own. That goes into the realm of P2P systems, or HA through replication and redundancy. Quite a lot of fun can be had.

CMcC - I have a few questions:

  1. Take only grabs one matching tuple, right? (Yes - Todd Coram)
  2. Do tuple IDs have to be unique for all time as well as at any given time for all tuples? (The ID value is directly related to the contents of the tuple. {Hello World} will always have the same unique ID, regardless when it was put into the space. Or, at least as unique as MD5 hashing allows - Todd Coram ;-)
  3. Are duplicates allowed (duplicates for all but tuple ID, I guess) (No. A exact duplicate tuple will produce the same ID and squash the old one. - Todd Coram) escargo 3 Apr 2005 - Wouldn't an exact duplicate not need to be written, since it's already in the data base?

NEM 1 April 2005 - From what I can make out from this page, and what I've heard in the past, a tuplespace is a distributed set of tuples -- i.e., a (distributed) relation. Could someone summarise what the differences are between a tuplespace and a relational database? Can a tuplespace contain more than one relation? Can you do joins across tuplespaces? With the discussion above about access-control, authentication, scalability etc, this seems to be entering the territory of RDBMS's.

AK: The tuples in a tuplespace are completely unorganized. Consider it as a bag of tuples. Any type of relational operations would have to be done by a client. The server has no idea at all about that. It might try to put tuples of identical stucture into tables, but that would be an internal optimization only, semantically there are no tables or higher structures.

NEM: Hmm... from the no duplicates note above, it would appear to be a set of tuples, rather than a bag. A set of tuples is the mathematical definition of a relation (strictly, a graph which, together with some domain sets, forms a relation), although relations are usually "typed" in some way (hence the domain sets), which doesn't seem to be the case here. But, I see what you are saying: compared to a relational db, the tuplespace imposes much less structure on the data, and does less with it. Would it be fair to say that the primary aim of tuplespaces is distribution of data, rather than data processing? Would a comparison to Tequila be more appropriate?

AK: Right, when no dups are allowed a set, not a bag. However I am usually operating under the assumption that duplicates of a tuple are allowed. IIRC this is what the other spaces allow. Re primary use, distribution is secondary I think. At least tuplespaces evolved out of parallel processing, a method to describe parallel algorithms without having to use low-level threads, and communication primitives. I.e. they are a method of synchronization. Comparison to tequila ... Possibly more appropriate.

Todd Coram: How are duplicate tuples handled in a traditional Tuplespace? If I take {a b c} from the space does that mean that there can be another {a b c} left for the taking? Interesting. I've handled duplicate tuples in the past by putting {a b c 1} and {a b c 2} and taking {a b c ?}.

I am afraid that the above implementation isn't geared toward doing true duplicates(although I am tempted to modify it to support duplicates -- actually... duplicates ARE allowed to exist in the above code, its just that once you remove one instance of the tuple, the duplicate is deleted too).


One of the reasons Erlang is so cool is that it builds in tuplespace concepts, although, to my astonishment, I have yet to come across a presentation of that line of descent. Erlang is all about sending messages between processes [explain status of "cloud" and comparison to Erlang semantics]; receipt is pattern-matched.


NEM 21 Aug 2006: A tuplespace also seems to be quite related to Blackboard Systems.


MJ - YATS (Yet Another Tupleserver) moved to Tupleserver