Garbage collection

Garbage Collection is the cleanup of resources that are no-longer needed.

See Also

Arts and Crafts of Tcl-Tk programming
Reference counted command objects
linked lists
Tcl and LISP
complex data structures
event processing & garbage collection in Tcl and Tk
RJM: I have recently had practical experiences with garbage collection in Tcl/Tk related to timing. Here a new page is referenced with the purpose to document.
Reference counting vs. tracing garbage collection

Further Reding

Everybody thinks about garbage collection the wrong way , Raymond Chen, 2010-08-09
Garbage collection is simulating a computer with an infinite amount of memory
What's New in Lua 5.4
A discussion of why garbage collection on doesn't fit into the Tcl, i.e. why it's pointless to try to clean up references in a system where an each object is referenced by name rather than by a direct binding. copy-on-write is the technique that fits Tcl.

Description

garbage collection is used in other languages to address a problem Tcl doesn't have: The dispoal of objects that are no longer reachable because all references to them have been destroyed. In the case of a system that produces no garbage, i.e., no explicit references to objects, a garbage collector becomes pointless. Tcl is such a system. Variables and routines can not be passed as as arguments to other routines, so it becomes a simple matter of deleting a variable, namespace or routine when one is finished with it.

In Tcl, data structures are not built by directly referencing other other structures. Instead, a values are used directly, and copy-on-write is used internally to store the value only once even though it may appear at the script level in multiple places.

The only place at the script level where a reference is created in is when upvar or namespace upvar is used to create a local reference in a routine to a variable outside the routine. This reference is naturally cleaned up when the routine returns.


The following "AI koan" (see The Jargon File for more) points out a fundamental difference between the Tcl and Lisp approaches to when unused memory is reclaimed and the implications this has for what can be a value.

One day a student came to Moon and said: "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons."

Moon patiently told the student the following story:

    "One day a student came to Moon and said: `I understand how to make a better garbage collector ...

(Editorial note: Pure reference-count garbage collectors have problems with circular structures that point to themselves. On the primitive level Tcl avoids that problem by the principle that everything is a string, since strings don't point to anything.)

Reference Counting

At the C level a count of references to each Tcl_Obj is kept within the object itself. Each time the address of the object is stored somewhere the reference count is incremented, and each time the address is released the reference count is decremented.

An extension can implement and register its own Tcl_Obj internal representation in order to more efficiently handle the data it uses, and the standard reference-counting scheme will take care of the storage management.

An alternative is to provide an interface that returns a unique identifier or allows the user to provide a unique name as a handle when creating a new set of resources.

Binding Cleanup to a Variable

Arjen Markus: There has been much discussion about references to data in Tcl, in order to build complex data structures and such. Inevitably, garbage collection pops up.

This page is meant to show that at some mundane level you can let Tcl do the job for you. The script below will create a counter "object" that keeps its internal state hidden. It juggles a bit with the counter object and then throws it away.

The essence: the internal state is stored via the interp alias mechanism that allows extra arguments and the counter itself is destroyed via the trace.


namespace eval Counter {
    variable nextid 0

proc makecounter {name initial} {
    upvar $name vname
    variable nextid

    set vname [namespace current]::$nextid
    uplevel [list trace add variable $name unset [
        list [namespace current]::deletecounter $vname]]

    interp alias {} $vname {} [
        namespace current]::increasecounter $vname $initial
    incr nextid
}

proc increasecounter {cmdname current} {
    set result [expr {$current+1}]
    interp alias {} $cmdname {} [
        namespace current]::increasecounter $cmdname $result
    return $result
}

proc deletecounter {aliasname counter dummy op} {
    interp alias {} $aliasname {}
    # puts "deleted: $counter"
}

} ;# End of namespace

#
# Create a counter
#
Counter::makecounter count1 0
# puts [trace info variable count1]
puts "1? [$count1]"
puts "2? [$count1]"

#
# Copy the counter (not a reference, just a harmless alias)
#
set count2 $count1
puts "3? [$count2]"

#
# Deleting the alias has no effect
#
unset count2
puts "4? [$count1]"

#
# Deleting the true counter does!
#
set count3 $count1
unset count1
puts "5? [$count3]"

Result:

1? 1
2? 2
3? 3
4? 4
invalid command name "::Counter::0"
    while executing
"$count3"
    invoked from within
"puts "5? [$count3]"
"
    (file "counter.tcl" line 52)

Misc

I was pondering the http package; the need to call http::cleanup when done with a token and the potential for leaking memory just seems wrong. So I was thinking about a Tcl-level garbage collector, and came up with the following. I suppose it's a mark & sweep collector of sorts, although it doesn't do any marking or recursive sweep.

proc gc-find pattern {
    set vars [info vars $pattern]
    set searchspace [uplevel info vars]
    foreach var $searchspace {
        if {[uplevel array exists $var]} {
            foreach {k v} [array get $var] {
                check-item $v vars
            }
        } else {
            check-item [uplevel set $var] vars
        }
    }
    return $vars
}

proc check-item {item vars} {
    upvar $vars vlist
    catch {
        foreach el $item {
            set s [lsearch -exact $vlist $el]
            if {$s > -1} {
                set vlist [lreplace $vlist $s $s]
            }
        }
    }
}

One would periodically call it as

foreach tok [gc-find {::http::[0-9]*}] {
    ::http::cleanup $tok
}

It assumes that any tokens will be either an individual item in a list, or a variable by itself, and it doesn't search namespaces other than the root.

RLE 2011-09-22: Has anyone considered that with 8.5+'s dict that the http package could return a result dict instead of a handle to an array? This would result in garbage collection happening automatically when the dict's reference count fell to zero.


AM 2007-12-17: In response to a thread on the comp.lang.tcl group, I experimented a bit with procedure traces. The idea I had was that the usual way of creating objects is to create a new procedure/command. If you want to create a local object, i.e. an object that should only exist during the life-time of a procedure, however, there is no way for Tcl to know that that is what you intended. So there is no way to actually remove it when the procedure returns.

Unless you help it a bit.

And that is what is done in the slightly silly script below:

# gc.tcl --
#     An experiment with garbage-collecting "objects"

# localobj --
#     Create a _local_ object
#
# Arguments:
#     name      Name of the object/command
#
# Result:
#     None
#
# Side effects:
#     Creates a new command and a trace on the _calling_
#     procedure if needed
#
proc localobj name {
    global local_objects

    # Create the object
    proc $name {cmd} {
        if {$cmd eq {run}} {
            puts "[lindex [info level 0] 0]: Boo!"
        } else {
            return -code error "Command unknown: $cmd"
        }
    }

    # Administration:
    # - Store the command for later GC
    # - Add a trace to the caller, if this was not done yet
    # (Take care of global objects though!)
    if {[info level] > 1} {
        set caller [lindex [info level -1] 0]
        if {![info exists local_objects($caller)]} {
            trace add execution $caller leave [list localobj_destroy $caller]
        }
        lappend local_objects($caller) $name
    }
}

# localobj_destroy --
#     Destroy the caller's local objects
#
# Arguments:
#     caller    Name of the caller
#     command   Original command (ignored)
#     code      Return code (ignored)
#     result    Result of the command (ignored)
#     ops       Operation (ignored)
#
# Result:
#     None
#
# Side effects:
#     Destroys all objects created in the caller procedure
#
proc localobj_destroy {caller command code result ops} {
    global local_objects

    foreach obj $local_objects($caller) {
        rename $obj {}
    }
    unset local_objects($caller)
}

# main --
#     Test this
#
proc myproc {} {
    localobj cow1

    puts Myproc
    cow1 run
    cow2 run
    myproc2
}
proc myproc2 {} {
    # localobj cow1 ;# Hm, would override the other one

    puts Myproc2
    cow1 run                ;# cow1 was created by the calling procedure - it is still available. This is a slight flaw ;)
    cow2 run                ;# cow2 was created as a _global_ object, is this a flaw?
}

localobj cow2

myproc

puts Main
cow1    ;# Now object "cow1" no longer exists, so we get an error message
cow2

KD: Wouldn't it be better to call localobj with the name of a local variable, in which the name of the object will then be stored? In this way, Tcl's inherent rules for destroying local variables can be used to destroy the object itself too:

proc localobj_destroy {name args} {
    puts "Destroying $name"
    rename $name {}
}

proc localobj &name {
    global handlecounter
    if {![info exists handlecounter]} {set handlecounter 0}
    upvar 1 ${&name} name
    set name handle#[incr handlecounter]
    puts "Creating variable ${&name} = proc $name"
    proc $name {args} {puts "executing: [info level 0]"}
    trace add variable name unset [list [
        namespace which localobj_destroy] $name]
}

#Testing
proc myproc2 {} {
    localobj foo
    $foo testlocal2
}
proc myproc1 {} {
    localobj foo
    $foo testlocal1
    myproc2
}
localobj foo   ;# this one is in fact global
$foo testglobal
myproc1

Result:

Creating variable foo = proc handle#1
executing: handle#1 testglobal
Creating variable foo = proc handle#2
executing: handle#2 testlocal1
Creating variable foo = proc handle#3
executing: handle#3 testlocal2
Destroying handle#3
Destroying handle#2

AM 2007-12-18: Some discussion on this solution was lost, due to a problem with the disks. However, consider the following fragment:

proc myproc {} {
    localobj foo
    $foo testlocal1
    set bar $foo
    unset foo
    $bar testlocal2
}

If I understand the code correctly, then this won't work as expected: unsetting foo will cause the associated object to disappear, leaving bar to pick up the pieces.

KD: Yes, that's right. By declaring localobj foo, you are signing a contract that the lifetime of the object is tied to the lifetime of the $foo. Usually that's also the lifetime of the procedure call, unless $foo is unset manually.

AM: Hm, it is not a perfect solution, but it does have its attractive points - mine was inspired by a partial/incorrect understanding of incr Tcl. Your solution restricts objects to the procedure that created them (unless you pass them to a called procedure). Good :)

DKF: One solution is to give objects a method that instructs them to "return themselves to the caller", which gives them a chance to manage their reference counting/traces. Another is to allow creators of an object to specify what variable to couple the lifetime to, which allows for management via upvar; NAP does this via the as method IIRC.