Updated 2014-04-08 15:11:01 by pooryorick

A reference is a generic term for a value that refers to data that is stored somewhere. Some important examples are:

  • Pointers (as in e.g. C)
  • Names of Tcl variables (see upvar)
  • ...

A feature that is often expected from references in high level languages is that storages should automatically disappear (be freed/deallocated) when there are no more references to them. That kind of reference does not exist in Tcl (but also see Tcl Handles). Some people coming from other languages are disturbed by this, but one doesn't need that kind of generic references as often as these people think. [Explain garbage collection and the like.]

[It might also be reasonable to explain the distinctions between value, variable, and reference. Some languages have variables whose values are either real values if those values are primitive types or references if the values are structured types. Not all languages make the same distinctions. Java, Icon, and Smalltalk all have slightly different implementations of such things.]

References touch upon one of the ways in which Tcl is different from many other languages. What could be called the "standard model" for complex data structures is that:

  • all values are atomic (scalar), structure (complexity) comes from how they are stored.

A basic problem in this model is how to represent a thing that can contain other similar things. The solution is to introduce references: scalar values that lets one access a (usually composite) storage. Some old languages (Algol, old Fortran) couldn't do this in general and as a result had severe restrictions, whereas others (Pascal) just barely provide the basic functionality. C and Lisp were probably the first languages in their respective families to have all the kinks worked out, and owe some of their popularity to that fact. In low-level languages the references are just bare memory addresses (pointers) to a block of memory and the structuring of storage is just a way of labeling offsets within this memory block. In higher level languages there usually exist more flexible ways of structuring storage (variable-length lists, "hashes", etc.) but the role of the references is the same. Every data structure that is not predefined in the language has to be implemented by connecting more basic data structures using references.

Tcl does not follow this standard model, because in Tcl everything is a string. Since a list can contain arbitrary strings it follows that a list can be a list of other lists, or a list of lists of lists, or whatever, nested to arbitrary depth. The basic problem of a thing containing other similar things can be solved without resorting to references. (Although of course, since Tcl is implemented in C, there are references "under the hood". Keeping them hidden is however often a strength!)

SS notes that the point is not nesting, but linking. Also note that Tcl already uses (non garbage collected) references: when you pass/return a variable name in place of the content you are using something of similar to references. The same when you use a namespace name (that encapsulates an object) using some kind of OO extension and any other kind of name. Another example is when you pass/return the key inside an array in place of the value contained. All this are references.

Lars H: The point is that in the standard model even nesting requires references in the language, whereas Tcl can do nesting without resorting to references. Surprisingly much of what people think they need automatically garbage-collected references for is precisely nesting.

SS: Standard model? well known high level languages: Perl, Python, both have references, but both don't require explicit references to do nesting: they allow lists of lists and arrays of arrays (Tcl seems to be lacking even in this respect thanks to the wonderful arrays we had before dict). Of course this languages *actually* are using references to create this structures, so if you pass a list of list to a function that alter it without to use __deepcopy__ or something like it you are going to see the difference, but you can still write: y = [1, 2, 3, [5, 6, 7]]. Even PHP is able to nest arrays (and in such a case I think they are passwd by value like in Tcl).

So you are comparing a low level language like C with Tcl, *if* for standard model you mean C. Also note that in C, FORTH and so on nesting requires user-handled pointers because they use non-automatic memory management. You can still write a couple of functions in C able to deal with strings representing Tcl Lists that are a single big string, so even without pointers C is able to nest.

We should really not get used to Tcl design peculiarities, like broken arrays, inability to build linked data structures, and so on. We have a great language because it's simple, able to do great abstractions with strings, with great introspection capabilities, that allows us to rewrite most of the language itself at the point to create OO systems, new control structures, and many other stuff that are only a privilege of Lisp, SmallTalk, and stack based languages. All this with an ability to deal with real world problems that seems quite superior to other (possibly more powerful) languages that brings the same freedom to the programmer.

But there are things that are a limit without doubt. References are in the nature of computation: the ability for two objects to reference a third other object without to have both a copy is something like you can't miss to do some kind of works.

I agree that all is a value is still an advantage for Tcl: to change it to have implicit references like lisp/python/... is not a win for Tcl, really, but why we should not have something like TclRef? This is disigned so that references are explicit: strings, that identify containers that are automatically empty if no one is still interested in the content.

Lars H: I do not think the "standard model" is something which applies only to such low-level languages as C. Lisp is included in the description above. I know that Postscript follows the standard model (and the manual has to be quite explicit about it, since you're not allowed to store a reference to local VM thingie in global VM); even strings are not atomic. I don't have personal experience of Perl, but http://www.garshol.priv.no/download/text/perl.html#id3.3. (the final period is part of the URL, the Wiki gets it wrong for some reason, even when between brackets) claims that references are indeed necessary to work with lists of lists in that language.

One language which has both structured values and references is the Maple [1] command language, and at least in that case the structured values are straightforward whereas the references are a constant source of trouble. The most striking example comes from comparing lists and vectors. Maple lists are values just like Tcl lists. Maple vectors are a special case of Maple arrays, which are a special case of Maple tables, and these are referenced storages (presumably not unlike Perl hashes). Both look precisely the same when shown in the worksheet, but they behave very different.
 L:=[1,t,t^2]; f:=unapply(L,t);

creates a function f such that f(2) returns [1,2,4], but
 v:=vector([1,t,t^2]); g:=unapply(v,t);

creates a function g which returns the vector [1,t,t^2] no matter what you give it. In order to get g work in the same way as f, you need to dereference v before handing it to unapply.

As for TclRef, does it obey the idiom that everything is a string? What happens if someone goes
  eval lappend refs [linsert $otherRefs 0 [newRef xxx]]

will the things appended to $refs still be references? (Sure, it is silly to do it that way, and in Tcl 8.5 there is {*}, but for Tcl values it should work! If it doesn't then the ability to embed values in scripts has been severely limited.)

davidw Everything is not a string in Tcl. Files and namespaces come to mind right away. In my years on this planet, I have developed a healthy skepticism about "every" and "any". Usually (not every time;-) they are wrong.

Discussion moved here from Why adding Tcl calls to a C/C++ application is a bad idea:

davidw - And this is still an area where Python has us beat:

  • Python has one, reasonably good OO system
  • Python objects are GC'ed, making them easier to use in scripts. Tcl objects must be destroyed by the programmer.
  • Python has references, which are also useful for doing data structures.

Of course, it's not impossible to do this in Tcl... there are numerous testaments to the intelligence and cleverness of Tcl programmers on this wiki that put paid to that notion. However, what we need to keep in mind is the programmer examining both languages for the first time.

It is not obvious how to handle complex data structures in Tcl, as compared with Python, Ruby or other scripting languages.

?: Some arguments are not correct

  • XOTcl is better object system than Python. Tcl is flexible to write OO-Systems (see snit)
  • In XOTcl not all object need to be explicit destroyed (without GC). There are subobjects and volatile objects that are destroyed automatic.

Artur Trzewik In fact XOTcl Objects names are references. Objects are referenced by their names. I do not see big difference amoung
    Point *point = Alloc(sizeof(Point));

    Point *point = new Point();
    delete point;

    Point point = new Point();
    point = null;

    | point |
    point := Point new.
    point := nil.

    set point [Point new]
    $point destroy

    set point [Point new -volatile] # if object sould life only in current block

In all cases point is such kind of reference to object (structure) of type Point.

Memento By using of OO in Tcl we not need really any additional references.

About GC in OO-Tcl. I program this language about several years and after big period with Smalltalk programming I would also like GC in Tcl. But till now I do not very miss GC in XOTcl. There are techniques like volatile objects and subobjects that make destroing of many objects very simple. The C++ problems with forgotten objects (dangling objects) do not occurs in XOTcl. On the other side. Releasing objects (destroying) are very important period of object life-time that should be controlled by program. I have seen many problems with smalltalk memory handling because the programmer have forgotten to do this magic ( point := nil.) in some global references. Also with GC you have to care about object releasing and you do not have full control about it.

It is also interesting to see some C++-programmers arguments agains GC.

davidw - XOTcl looks ok, but it's not available in ActiveTcl or the default starkit setup and there are no books about it. Even the .deb's have been pulled (I might be convinced to work on fixing at least that...). This is the problem with not having one standard thing.

Also, the 'volatile' objects, from what I see on the docs, depend on living in a particular Tcl procedure. I don't want that, I want something that goes away when nothing else references it.

The object system may be 'better' than python's or ruby's or something else, but for the new user, the situation looks like a mess. It's something I think we'll have to face up to if we want to program in this "Tcl is the controlling application" style, and promote it to the world at large.

Lars H: IMO, references are evil. I cannot recall a single case where using references would have simplified a programming task or would have lead to a better solution. Indeed, I do not know of any problem where the asymptotic complexity is smaller in a computing model with references than it is in current Tcl (i.e., without references). But feel free to enlighten me if you have an example.

davidw: The classic example of having references/OO/GC make life easier is the http package. It requires you to do a 'cleanup', because otherwise it will never be GC'ed.

jcw - Another example is when you want to create Tcl commands and nest them yet clean up automatically. The Vkit is a vector engine page goes into an example and tries to find a solution, but it really applies to a lot of problem domains, let's say matrices:
   set a [$matrix invert]
   set b [$a transpose]
   set c [$b invert]
   $c print

I'd like to write:
   [[[$matrix invert] transpose] invert] print

And have everything clean up after use, i.e. the Tcl commands. That's where Python's "matrix.invert().transpose().invert().print()" makes life a lot easier. Even C++'s constructors/destructors with temporaries support such an idiom, with automatic cleanup.

Lars H: The catch in both examples is that they are of idioms that require references rather than actual problems (i.e., something which is solved with an algorithm) that need them. HTTP communication is, as I understand it, completely stateless -- hence there should be no need to create anything that needs to be GC'ed. I find the matrix example almost preposterous. Matrices are data, so why should they ever be make commands? What is wrong with
  print [invert [transpose [invert $matrix]]]


RS: The former form can be read from left to right to indicate the temporal order of the nested methods, while the latter requires the reader to read from right to left, as usual with nested functions (and APL), but less in Western cultures...

jcw - Lars, you say "matrices are data" - I see them as objects. Instances of a class, in OO terms (as in Python, C++, Smalltalk). An example which interests me more than matrices btw, is relational algebra [2]. If you brush aside examples where Tcl commands are used as objects, and with it the issue of object cleanup, so be it. Your example uses a namespace to identify operators and apply them to data, the OO model takes objects and makes them respond to methods/messages. Polymorphism, encapsulation - you're free to not care about that, of course.

Lars H: Well, I suppose I the OO hype must have missed me, because I tend not to see things as objects, and since this furthermore seems to save me a lot of trouble I'm only glad for it. Polymorphism does not require referenced objects.

I can agree that the "object on which a method is acting" view is sometimes appropriate, and I have on some occations suggested using it, but matrix arithmetic does hardly lend itself to that view. When all operations create new things from old then it isn't objects you're working with, but values! The Ratcl you refer to looks like a very striking example of this -- is there anything (apart from your preferences with respect to syntax) that prevent you from implementing views as Tcl values? (And you can of course have any syntax you like, if you just bite the bullet and create a proper little language.)

jcw - State.

Peter da Silva: The problem with the idea that "references are evil" is that Tcl is full of references. $foo is following a reference to the value of foo. [foo] is following a reference to the procedure called foo. The only difference is that in Tcl there are no exposed "unnamed" references, and all dereferencing operations are symbol table lookups. But Tcl post 8.0 is no longer really an "everything is a string" language, it's an "everything can be made to look like a string" language. So there's no reason not to have unnamed references, they can be efficiently thrown around and shimmered into some unique token (eg ::ref::array_40AB65003) when they're used in a string context.

Lars H: The controversial thing about references is whether you should be allowed to modify referenced data. Right now, if you have some value, you have precisely that value and no other part of the program is allowed to change it when you're not looking. This is not the case in languages with explicit references to things. (RS: ..and Tcl's "value security" is a great help for coders who know it. In fact, TOOT shows a way to "pure-value OO", where methods take one value and return another, but don't mutate a thing in place - that's transparently done with variable assignment only.)

Also, the "everything can be made to look like a string" claim is a bit too weak. It's rather "everything can be fully encoded as a string".

Peter da Silva: Everything can be fully encoded as a string, but that doesn't mean you don't have references: there are still references encoded as a string that are distinct from the fully encoded form.
 proc refproc {ref} {
   upvar 1 $ref ary

   puts $ary(3)
   set ary(4) four

 proc valproc {val} {
   array set ary $val

   puts $ary(3)
   set ary(4) four

 array set name {1 one 2 two 3 three}
 valproc [array get name]
 refproc name

The question shouldn't be "should Tcl have references", the question is "should Tcl have anonymous references". Tcl code is full little symbol generators to convert references into text that can be manipulated. So you get little generated names like "file5" that should really be the "stringized" version of anonymous references. For most programs they'd never get converted to a string or looked up in a symbol table, except that a text string is the only way you have to refer to a structured or opaque object.

I was talking to Karl about this the other day and we both wish we had done a better job on arrays, but there's lots of other opaque objects that really should be moved out of the namespace. There's two different kinds of string conversions you really need to be able to apply to them, too. One that gives you a token that you can use in the current context, and one that can describe them, like the difference between "arrayname" and "array get arrayname".

Anyway, I agree that pointer-style references are generally to be avoided, but there are other kinds of references that Tcl uses only clumsily right now.

See Also  edit

linked lists
Tcl references in Tcl
an implementation of references for Tcl