Jim closures

SS 18Mar2005

Important: before trying these examples, be sure you have the latest version of Jim available. There were many bugs in past versions that may prevent some of these examples from working properly.

This page is a simple introduction to one of the Jim features: closures. Actually closures in Jim are the sum of many features:

Anonymous procedures

Jim has a lambda command that creates anonymous procedures. Actually they are not unnamed at all, just a random name is assigned to procedures created using the lambda command:

 . set f [lambda x {+ $x $x}]
 <reference.<functio>.00000000000000000000>
 . $f 10
 20
 . $f 5
 10

The lambda command does not need the name of the procedure as argument; the name assigned to the created procedure is instead the return value.

Unlike normal procedures, anonymous procedures in Jim are garbage collected. You don't need to delete them in some explicit way; when they are no longer referenced they are automatically destroyed. In other words just don't bother to free what lambda returns and you will do the right thing.

AvL: When exactly does the GC step in? And how does it know about strings containing a reference to some object? Does it scan through all strings before and after they are being modified?

SS: The collection is a periodic process that works like this: there are two counters, one counts the number of references created since the last collection, one the number of seconds elapsed since the last collection. Every time a new reference is created, if at least one of the counters reached a given limit, a garbage collection cycle is performed.

About strings containing references, it scans the whole space of string representations of every object that is in a given instant "live" in a Jim program. But because some type of objects can't contain references (when you create a new type of Jim Object you have to set a flag to tell Jim if it's possible or not for a given object to contain valid references), many objects are just not scanned, for example integers, floats, every object having a string representation that is less long than the length of a reference (that is fixed exactly for this reason).

Jim takes care also of the fragile references problem. It is able to reach every Jim_Obj created even if this object is stored at the moment of the GC scan inside some data structure of an extension.

RHS Along those same lines, what happens if you have circular references in closures. For example, you create closures A, B, and C, each having a local variable. A's local var points to B, B to C, and C to A. You then return from the proc you're in without returning the closures (no one external now points to any of the 3 closures, just them to each other). How do you handle GC in this instance?

SS: Jim's reference system is currently not able to deal with circular references. There is a reason for this limitation, that is, to be able to deal with fragile references. This may appear strange since it is using a mark & sweep GC, but the GC is able to deal with circular references only if you start to collect from root nodes. Instead because we don't know if there are bad extensions taking references to objects without to make it clear via some API, we are forced to scan the whole space of objects. Actually it may be possible to collect circular references testing for cycles actively, but for now I consider this a minor problem. For example Python started to have some cycles collection algorithm after ten years :)

Static variables

Static variables are a set of variables relative to a given procedure that are persistent across calls. In other words, static vars are the state of a procedure. In order to create a procedure with static vars, what is needed is to pass an additional argument to the proc command:

 proc counter {} {{x 0}} {
    incr x
 }

As you can see just after the arguments list, that is empty in the example, there is the list of static vars associated with the procedure. The syntax used to specify default arguments here is used to specify the initial value of the variable. Because the variable is persistent, the initial value must either be specified or a variable of same name must exist in the context and its current value is used. (wording modified by AvL)

This is an interactive usage of the counter procedure:

 . counter
 1
 . counter
 2
 . counter
 3

Every time I call the procedure, the static x is incremented, and the value is returned to the caller. Statics are automatically destroyed when a procedure is destroyed.

Context capturing

In Jim the initialization of static variables is mandatory, but if the initialization value is not provided directly as in the above example, it is "captured" from a local variable having the same name of the static variable. So the above counter procedure can be written also in this form:

 set x 0
 proc counter {} x {
    incr x
 }

The initialization value was not specified, so it is obtained from the value of the x variable that appears in the context where proc was called. If no local x variable exists, an error is generated by the proc command.

NEM: Just to clarify what is meant by capturing here, is the local variable just used to obtain the initial value, or is there a link. In other words, what does the following print:

 set x 0
 proc counter {} x {
     incr x
 }
 counter
 puts "x = $x"

jcw - 0, i.e. value is used as initial value for local persistent x.

SS: exactly, like jcw wrote, the local variable is only used to obtain the value, there is no reference between the context variable and the static, so there is no way for a closure to indirectly modify the local x variable. Of course closures are upvar and uplevel capable, but that's another story.

Closures

Static variables can also be used with anonymous procedures in exactly the same way. This allows one to write the following procedure that creates a counter anonymous procedure:

 proc make-counter {} {
     set count 0
     lambda {} count {
         set count [+ $count 1]
     }
 }

Usage example:

 . set a [make-counter]
 <reference.<functio>.00000000000000000001>
 . $a 
 1
 . $a
 2
 . $a
 3

Note how similar the Jim procedure is compared to the scheme version:

 (define (make-counter)
   (let ((count 0))
     (lambda ()
       (set! count (+ count 1))
       count)))

In practice, Jim supports something very similar to lexical scoping, but the semantics are much simpler than the one in the Scheme programming language because the user only needs to know about static variables and the feature of capturing a copy from the context to initialize statics.

We can make the counter procedure smarter, allowing us to specify an initialization value:

 proc make-counter initval {
     set count $initval
     lambda {} count {
         set count [+ $count 1]
     }
 }

Usage example:

 . set b [make-counter 10]
 <reference.<functio>.00000000000000000002>
 . $b
 11
 . $b
 12

LV Is there a way to define the proc to optionally supply the initialization? That is, if I supply it, use it, but if I don't, then set to a default? That is, without making use of the copy value from local variable behavior?

AvL: You mean the case where where no value is given for a var and the var does not exist in proc-definition scope? I think it's ok to throw an error then.

SS: What happens is actually that an error is generated:

 . proc foo {} myVar {}
 Runtime error, file "?", line 1:
     variable for initialization of static "myVar" not found in the local context

More examples

The following is a more complex example of closure that uses another feature of Jim: arrays are not collection of variables but dictionaries. So an array is just a list with an even number of elements, and you can still use $var($index) form and so on. A very small example before to introduce the new closure example:

 . set foo [list x 1 y 2]
 x 1 y 2
 . puts $foo(x)
 1
 . puts $foo(y)
 2
 . set foo(z) 5
 5
 . puts $foo
 x 1 y 2 z 5

It's simple Tcl arrays done The Right Way ;).

AvL: it's simple only because jim does not yet have any hook-mechanism to watch for changes in variables' values (traces in tcl, that is). Does upvar work with dict-elements in jim?

SS: Jim will probably never have traces. I found they a semantically useless feature. This does not mean traces are useless, but that there is in my opinion always an equivalent solution in Jim to handle problems where usually traces are used. upvar works against dict elements only the way it should work, i.e. if they are target of an upvar, but they can't be source of upvar (well the whole dict var can, single elements can't).

Of course because in Jim arrays are values like any other thing a Jim static can contain an array (that is, a dictionary). So we can create a make-words-counter procedure that returns a closure that is able to count the number occurrences of words:

 proc make-words-counter {} {
     lambda word {{dictionary {}}} {
         global dictionary
         if {$word eq {}} {
             return [array get dictionary]
         }
         if {![info exists dictionary($word)]} {
             set dictionary($word) 0
         }
         incr dictionary($word)
     }
 }

Now we can create a words counter, and use it to count words:

 . set d [make-words-counter]
 <reference.<functio>.00000000000000000000>
 . $d apple
 1
 . $d foo
 1
 . $d bar
 1
 . $d bar
 2
 . $d bar
 3
 . $d apple
 2
 . $d {}
 apple 2 foo 1 bar 3

Called with the empty string the words counter will return the full dictionary. Usually it returns the number of occurrences of the input word. As usually we don't need to worry about freeing it at all.

Closures as objects

Closures are, after all, code with associated data; this is why they are somewhat similar to objects. The following is the usual bank account example done with Jim closures:

 proc make-bank-account {} {
     lambda {method args} {{balance 0}} {
         switch $method {
             see {set balance}
             deposit {incr balance [lindex $args 0]}
             withdraw {set balance [- $balance [lindex $args 0]]}
             default {error "unknown method $method"}
         }
     }
 }

Usage:

 . set a [make-bank-account]
 <reference.<functio>.00000000000000000000>
 . $a deposit 100
 100
 . $a deposit 50
 150
 . $a withdraw 15
 135
 . $a see
 135

Of course it's possible to have multiple bank accounts, and every bank account will be independent, and as usual, objects done this way are automatically garbage collected.

Limits

Because Jim closures work capturing a copy of the context variable, they are not fully equivalent to Scheme's closures. For example, see the following Scheme program:

 (define count 0)

 (define (make-counter)
      (lambda ()
        (set! count (+ count 1))
          count))

We now try to use it from an interactive mzScheme session:

 > (define f (make-counter))
 > (define g (make-counter))
 > (f)
 1
 > (f)
 2
 > (g)
 3
 > (g)
 4

As you can see f and g are sharing the same 'counter'. The same code in Jim has a different effect:

 set count 0

 proc make-counter {} {
     global count
     lambda {} count {
         set count [+ $count 1]
     }
 }

Let's try to use it in Jim this time:

 . set f [make-counter]
 <reference.<functio>.00000000000000000000>
 . set g [make-counter]
 <reference.<functio>.00000000000000000001>
 . $f
 1
 . $f
 2
 . $g
 1
 . $g
 2 

As you can see the two closures are working with a private copy of count. The Scheme behavior is useful when there is the need to return multiple closures acting as methods of a single object: they share the unbound variables. In Jim the same effect is obtained using the first argument of the returned closure as a method, like in the bank account example above.

Breaking limits

But Jim has references! That is, tags associated to values, similar to dicts, but garbage collected (the whole Jim GC system is based on references). So we can have the same behavior of Scheme if needed:

 set countRef [ref 0 int]

 proc make-counter {} {
     global countRef
     lambda {} countRef {
         setref $countRef [+ [getref $countRef] 1]
     }
 }

 set f [make-counter]
 set g [make-counter]

(note: The idea to use references to render the Scheme example was proposed by RS).

For more information on references check Jim References.

AvL: looking forward to it. I don't yet grasp them (especially, why they have a type int attached to them).

SS: Right, this sounds strange... so I'll at least explain now the "int" thing. A reference is just a name on the form <reference<.....>> associated to a string, but this name is generated by Jim itself when you create a reference:

 . set r [ref "hello there!" string]
 <reference.<string_>.00000000000000000001>

The first version of Jim returned something like:

  :reference:0000....1:

Then I realized that in interactive hacking, it is not too nice to have reference names in this form; you have no clue about what they contain at all. So I added a third mandatory argument to the ref command that is used to create a reference- you have to specify a "tag" that's just a little description of what the reference contains. This has no semantical meaning at all; it is only used to build the reference name. So for example lambda will use "functio" as a reference tag:

 . lambda x {+ $x $x}
 <reference.<functio>.00000000000000000002>

and so on. It is a feature needed for interactive hacking and at debugging time. The language itself may execute every kind of program involving references without having to use tags at all.

RS suggests that the ref tag could be optional, and default to "variable" or so :)

SS well I try to force the user to give some useful info ;) also this makes the interface more simple because there is an optional further argument, that is the finalizer, i.e. a procedure that is called when that reference is collected.


SS29Mar2005, curry in Jim using closures:

 proc curry {cmd args} {
     lambda args [list cmd [list pref $args]] {
         uplevel 1 [list $cmd {expand}$pref {expand}$args]
     }
 }

Example:

 . set f [curry + 10]
 <reference.<functio>.00000000000000000000>
 . $f 3
 13
 . $f 3 4 5
 22