Updated 2015-05-18 13:19:17 by AMG

lassign, a built-in Tcl command, unpacks a list into variables.

lassign became a built-in command in Tcl 8.5.

Synopsis  edit

lassign list varName ?varName ...?

Documentation  edit

official reference
TIP 57
proposed making the TclX lassign command a built-in Tcl command

Description  edit

lassign assigns values from a list to the specified variables, and returns the remaining values. For example:
set end [lassign {1 2 3 4 5} a b c]

will set $a to 1, $b to 2, $c to 3, and $end to 4 5.

In lisp parlance:
set cdr [lassign $mylist car]

The k trick is sometimes used with lassign to improve performance by causing the Tcl_Obj hlding $mylist to be unshared so that it can be re-used to hold the return value of lassign:
set cdr [lassign $mylist[set mylist {}] car]

If there are more varNames than there are items in the list, the extra varNames are set to the empty string:
% lassign {1 2} a b c
% puts $a
1
% puts $b
2
% puts $c

%

In Tcl prior to 8.5, foreach was used to achieve the functionality of lassign:
foreach {var1 var2 var3} $list break

DKF: The foreach trick was sometimes written as:
foreach {var1 var2 var3} $list {}

This was unwise, as it would cause a second iteration (or more) to be done when $list contains more than 3 items (in this case). Putting the break in makes the behaviour predictable.

Example: Perl-ish shift  edit

DKF cleverly points out that lassign makes a Perl-ish shift this easy:
proc shift {} {
    global argv
    set argv [lassign $argv v]
    return $v
}

On the other hand, Hemang Lavana observes that TclXers already have lvarpop ::argv, an exact synonym for shift.

On the third hand, RS would use our old friend K to code like this:
proc shift {} {
    K [lindex $::argv 0] [set ::argv [lrange $::argv[set ::argv {}] 1 end]]
}

Lars H: Then I can't resist doing the above without the K:
proc shift {} {
    lindex $::argv [set ::argv [lrange $::argv[set ::argv {}] 1 end]; expr 0]
}

Default Value  edit

FM: here's a quick way to assign with default value, using apply:
proc args {spec list} {
    apply [list $spec [list foreach {e} $spec {
        uplevel 2 [list set [lindex $e 0] [set [lindex $e 0]]]
    }]] {*}$list
}
set L {}
args {{a 0} {b 0} {c 0} args} $L

AMG: Clever. Here's my version, which actually uses lassign, plus it matches lassign's value-variable ordering. It uses lcomp for brevity.
proc args {vals args} {
    set vars [lcomp {$name} for {name default} inside $args]
    set allvals "\[list [join [lcomp {"\[set [list $e]\]"} for e in $vars]]\]"
    apply [list $args "uplevel 2 \[list lassign $allvals $vars\]"] {*}$vals
}

Without lcomp:
proc args {vals args} {
    lassign "" scr vars
    foreach varspec $args {
        append scr " \[set [list [lindex $varspec 0]]\]"
        lappend vars [lindex $varspec 0]
    }
    apply [list $args "uplevel 2 \[list lassign \[list$scr\] $vars\]"] {*}$vals
}

This code reminds me of the movie "Inception" [1]. It exists, creates itself, and operates at and across multiple levels of interpretation. There's the caller, there's [args], there's [apply], then there's the [uplevel 2] that goes back to the caller. The caller is the waking world, [args] is the dream, [apply] is its dream-within-a-dream, and [uplevel] is its dream-within-a-dream that is used to implant an idea (or variable) into the waking world (the caller). And of course, the caller could itself be a child stack frame, so maybe reality is just another dream! ;^)

Or maybe this code is a Matryoshka nesting doll [2] whose innermost doll contains the outside doll. ;^)

Okay, now that I've put a cross-cap in reality [3], let me demonstrate how [args] is used:
args {1 2 3} a b c          ;# a=1 b=2 c=3
args {1 2} a b {c 3}        ;# a=1 b=2 c=3
args {} {a 1} {b 2} {c 3}   ;# a=1 b=2 c=3
args {1 2 3 4 5} a b c args ;# a=1 b=2 c=3 args={4 5}

FM: to conform to the AMG (and lassign) syntax.
proc args {values args} {
    apply [list $args [list foreach e $args {
        uplevel 2 [list set [lindex $e 0] [set [lindex $e 0]]]
    }]] {*}$values
}

both versions seem to have the same speed.

PYK, 2015-03-06, wonders why lassign decided to mess with variable values that were already set, preventing default values from being set beforehand:
#warning, hypothetical semantics
set color green
lassign 15 size color
set color ;# -> green

AMG: I'm pretty sure this behavior is imported from [foreach] which does the same thing. [foreach] is often used as a substitute for [lassign] on older Tcl.

So, what to do when there are more variable names than list elements? I can think of four approaches:

  • Set the extras to empty string. This is current [foreach] and [lassign] behavior.
  • Leave the extras unmodified. This is PYK's preference.
  • Unset the extras if they currently exist. Their existence can be tested later to see if they got a value.
  • Throw an error. This is what Brush proposes, but now I may be leaning towards PYK's idea.

What are the relative merits of each?

Setting to empty string pretty well sucks. Using empty string doesn't always avoid errors, so if that was the goal, it hasn't been met. But maybe the goal was to generate an error so the programmer can find the problem. That goal isn't met either because sometimes the empty string lingers and propagates long before something chokes on it, making it very difficult to find its genesis. Isn't always distinct from other values and thus can't be unambiguously tested to see if assignment happened or not, so again that goal (if it was a goal) hasn't been met. I think the point was to minimize the odds of a variable containing a leftover value from the previous iteration of [foreach].

Unsetting if existent is highly likely to yield errors in short order when nonexistent variables are accessed. Variable existence is always distinct from any possible variable value, so it provides an unambiguous way to check if assignment happened. That's all pretty good.

Immediately throwing an error is the only surefire way to instantly pinpoint a [foreach] or [lassign] that was short on values. But it's also very inflexible because there may be legitimate reasons to have fewer values, such as in PYK's example.

PYK's approach is interesting. It makes the other conceivable modes possible just by letting the script first [unset -nocomplain] the variables or set them to empty string. Then variables whose failure to be set is truly an error can be tested, and if they remain unset after [foreach] or [lassign], the script can throw an error in the manner of its choice, or recover using whatever mechanism is appropriate to the application. And last, this opens the possibility of setting default values other than empty string. So all that is wonderful.

The drawback is that it takes more code on the part of the script to (1) set (or unset) default values before calling [lassign], then (2) check for unset values after. In the case of [foreach] this would mean having to put [set] or [unset] calls at the end of each loop body (and before each [continue]) plus existence tests at the beginning of each loop body.

The trouble is that no one approach fulfills every script's requirements, whereas getting out of the way so the script can take care of itself will necessarily lengthen the script.

Let's think again about unsetting variables that aren't given a value. If the script cares, it can check for variable existence, then [error] or set a default value as appropriate. This avoids the setup time that PYK's approach requires prior to calling [lassign] or going to the next iteration of [foreach]. And if the script doesn't care (i.e. is careless), there will be an error the first time it tries to get the value of an unset variable. So now I tend to think this is the superior approach.

Gotcha: Ambiguity List Items that are the Empty String  edit

CMcC 2005-11-14: I may be just exceptionally grumpy this morning, but the behavior of supplying default empty values to extra variables means you can't distinguish between a trailing var with no matching value, and one with a value of the empty string. Needs an option, -greedy or something, to distinguish between the two cases. Oh, and it annoys me that lset is already taken, because lassign doesn't resonate well with set.

Kristian Scheibe: I agree with CMcC on both counts - supplying a default empty value when no matching value is provided is bad form; and lset/set would have been better than lassign/set. However, I have a few other tweaks I would suggest, then I'll tie it all together with code to do what I suggest.

First, there is a fundamental asymmetry between the set and lassign behaviors: set copies right to left, while lassign goes from left to right. In fact, most computer languages use the idiom of right to left for assignment. However, there are certain advantages to the left to right behavior of lassign (in Tcl). For example, when assigning a list of variables to the contents of args. Using the right to left idiom would require eval.

Still, the right-to-left behavior also has its benefits. It allows you to perform computations on the values before performing the assignment. Take, for example, this definition of factorial (borrowed from Tail call optimization):
proc fact0 n {
    set result 1.
    while {$n > 1} {
        set result [expr {$result * $n}]
        set n [expr {$n - 1}]
    }
    return $result
}

Now, with lassign as currently implemented, we can "improve" this as follows:
proc fact0 n {
    set result 1.
    while {$n > 1} {
        lassign [list [expr {$result * $n}] [expr {$n - 1}]] result n
    }
    return $result
}

I'm hard-pressed to believe that this is better. However, if we changed lassign to be lassign vars args, we can write this as:
proc fact0 n {
    set result 1.
    while {$n > 1} {
        lassign {result n} [expr {$result * $n}] [expr {$n - 1} ]
    }
    return $result
}

To my eye, at least, this is much more readable.

So, I suggest that we use two procedures: lassign and lassignr (where "r" stands for "reverse"). lassign would be used for the "standard" behavior: right to left. lassignr would then be used for left to right. This is backwards from the way it is defined above for TclX and Tcl 8.5. Nonetheless, this behavior aligns better with our training and intuition.

Also, this provides a couple of other benefits. First, the parallel to set is much more obvious. lassign and set both copy from right to left (of course, we are still left with the asymmetry in their names - I'll get to that later). And, we can now see why assigning an empty string to a variable which doesn't have a value supplied is bad form; this is not what set does! If you enter set a you get the value of $a, you don't assign the empty string to a. lassign should not either. If you want to assign the empty string using set you would enter:
set a {}

With lassign, you would do something similar:
lassign {a b c} 1 {}

Here, $a gets 1, $b gets the empty string, and $c is not touched. This behavior nicely parallels that of set, except that set returns the new value, and lassign returns the remaining values. So, let's take another step in that direction; we'll have lassign and lassignr return the "used" values instead.

But this destroys a nice property of lassign. Can we recover that property? Almost. We can do what proc does; we can use the "args" variable name to indicate a variable that sucks up all the remaining items. So, now we get:
lassign {a b args} 1 2 3 4 5 6

$a gets 1, $b gets 2, and args gets 3 4 5 6. Of course, we would make lassignr work similarly:
lassignr {1 2 3 4 5 6} a b args

But, now that we have one of the nice behaviors of the proc "assignment", what about that other useful feature: default values? We can do that as well. So, if a value is not provided, then the default value is used:
lassign {a {b 2}} one

$b gets the value 2. This also provides for the assignment of an empty list to a variable if the value is not provided. So, those who liked that behavior can have their wish as well:
lassign {a {b {}}} one

But simple defaults are not always adequate. This only provides for constants. If something beyond that is required, then explicit lists are needed. For example:
lassign [list a [list b $defaultb]] one

This gets to be ugly, so we make one more provision: we allow variable references within the defaults:
lassign {a {b $defaultb}} one

Now, this really begins to provide the simplicity and power that we should expect from a general purpose utility routine. And, it parallels other behaviors within Tcl (proc and set) well so that it feels natural.

But we're still left with this lassign/set dichotomy. We can't rename lassign to be lset without potentially breaking someone's code, But notice that lassign now provides features that set does not. So, instead, let's create an assign procedure that provides these same features, but only for a single value:
assign {x 3} 7

Sets x to 7. If no value is provided, x will be 3.

So, we now have three functions, assign, lassign, and lassignr, that collectively provide useful and powerful features that, used wisely, can make your code more readable and maintainable. You could argue that you only "need" one of these (pick one) - the others are easily constructed from whichever is chosen. However, having all three provides symmetry and flexibility.

I have provided the definitions of these functions below. The implementation is less interesting than the simple power these routines provide. I'm certain that many of you can improve these implementations. And, if you don't like my rationale on the naming of lassignr; then you can swap the names. It's easy to change other aspects as well; for example, if you still want lassign to return the unused values, it's relatively easy to modify these routines.
proc assign {var args} {
    if {[llength $var] > 1} {
        uplevel set $var
    }
    uplevel set [lindex $var 0] $args
}

proc lassign {vars args} {
    if { ([lindex $vars end] eq "args") && ([
        llength $args] > [llength $vars])} {

        set last [expr {[llength $vars] - 1}]
        set args [lreplace $args $last end [lrange $args $last end]]
    }

    #This is required so that we can distinguish between the value {} and no
    #value
    foreach val $args {lappend vals [list $val]}
    foreach var $vars val $vals {
        lappend res [uplevel assign [list $var] $val]
    }
    return $res
}

proc lassignr {vals args} {
    uplevel lassign [list $args] $vals
}

slebetman: KS, your proposal seems to illustrate that you don't get the idea of lassign. For several years now I have used my own homegrown proc, unlist, that has the exact same syntax and semantics of lassign. The semantics behind lassign is not like set at all but more like scan where the semantics in most programming languages (at least in C and Python) is indeed assignment from left to right. The general use of a scanning function like lassign is that given an opaque list (one that you did not create) split it into individual variables.

If you really understand the semantics lassign was trying to achieve then you wouldn't have proposed your:
lassign vars args

To achieve the semantics of lassign but with right to left assignment you should have proposed:
lassign vars list

Of course, your proposal above can work with Tcl8.5 using {*}:
lassign {var1 var2 var3} {*}$list

But that means for 90% of cases where you would use lassign you will have to also use {*}. Actually Tcl8.4 already has a command which does what lassign is supposed to do but with a syntax that assigns from right to left: foreach. Indeed, my home-grown unlist is simply a wrapper around foreach as demonstrated by sbron above. With 8.4, if you want lassign-like functionality you would do:
foreach {var1 var2 var3} $list {}

Kristian Scheibe: slebetman, you're right, I did not get that the semantics of lassign (which is mnemonic for "list assign" should match those of scan and not set (which is a synonym for assign). Most languages refer to the operation of putting a value into a variable as "assignment", and, with only specialized exception, this is done right-to-left. I'm certain that others have made this same mistake; in fact, I count myself in good company, since the authors of the lassign TIP 57 for Tcl 8.5 make the same assumption:

It would be more logical if the developer could write the following:
set {x y} [LocateFeature $featureID]

or
mset {x y} [LocateFeature $featureID]

So, you see, when TIP #57 was written, the thinking was in terms of right-to-left semantics (and, as a synonym to set). The fact that they decided to go with the left-to-right semantics was likely due to the issue that you point out about requiring an eval or {*}, not that they were thinking in terms of scan.

But, your point does suggest that a better name for lassign with left-to-right semantics is lscan. However, I was not ignorant of the concern with requiring eval or {*}. I was not proposing eliminating the left-to-right semanitcs, but adding the right-to-left semantics. What was left was more a discussion of proper naming. I think that you have helped to show that lassign should be used for right-to-left semantics, and lscan for left-to-right.

You also did not get that I had much more to say than simply whether lassign should be left-to-right or right-to-left. For example, I made a proposal to allow for default values to be supplied when no value is available in the scan/assignment, in the manner of proc.

This is not a trivial concern. Take, for example, the behaviors of scan and regexp. regexp assigns {} to a variable argument if there is no matching value in the string (but, only if the match succeeded). However, scan only assigns values to those variables for which it finds a match; an unmatched variable is not touched.

This leads to two different idioms for providing default values. The following example, based on an example in [4], uses scan and regexp to parse a simple color specification of the form #RRGGBB, where RR, GG, and BB are 2-digit hex numbers. If a value is missing (eg, #5004), then it defaults to 80.
##  Using [scan]
set r 80
set g 80
set b 80
scan $rgb #%2x%2x%2x r g b
set resultRgb [list $r $g $b]

##  Using [regexp]
regexp {$#(..)?(..)?(..)?^} $rgb r g b
if {! [llength $r]} {set r 80}
if {! [llength $g]} {set g 80}
if {! [llength $b]} {set b 80}
set resultRgb [list $r $g $b]

As you can see, the idioms required are different in each case. If, as you're developing code, you start with the scan approach, then decide you need to support something more sophisticated (eg, you want to have decimal, octal, or hex numbers), then you need to remember to change not just the parsing, but the method of assigning defaults as well.

This also demonstrates again that providing a default value (eg, {}) when no value is provided really ought to be defined by the application and not the operation. The method of using defaults with scan is more straightforward (and amenable to using lassign or [lscan]) than the method with regexp.

The solution that I proposed was to make applying defaults similar to the way that defaults are handled with proc: {var dflt}. In fact, I would go a step farther and suggest that this idiom should be available to all Tcl operations that assign values to variables (including scan and regexp). But, I think that this is unlikely to occur, and is beyond the scope of what I was discussing.

The real point of my original posting was to demonstrate the utility, flexibility, power, and readability or using this idiom. I think it's a shame to limit that idiom to proc. The most general application of it is to use it for assignment, which is what I showed.

slebetman: I agree with the defaults mechanism. Especially since we're so used to using it in proc. I wish we have it in all commands that assigns values to multiple variables:
lassign $foo {a 0} {b {}} {c none}
scan $rgb #%2x%2x%2x {r 80} {g 80} {b 80}
regexp {$#(..)?(..)?(..)?^} $rgb {r 80} {g 80} {b 80}
foreach {x {y 0} {z 100}} $argv {..}

I think such commands should check if the variable it is assigning to is a pair of words of which the second word is the default value. Assigning an empty string have always seemed to me too much like the hackish NULL value trick in C (the number of times I had to restructure apps because the customer insisted that zero is a valid value and should not signify undefined...).

The only downside I can think of is that this breaks apps with spaces in variable names. But then again, most of us are used to not writing spaces in variable names and we are used to this syntax in proc.

BTW, I also think lassign is a bad name for this operation. It makes much more sense if we instead use the name lassign to mean assign "things" to a list which fits your syntax proposal. My personal preference is still unlist (when we finally get 8.5 I'll be doing an interp alias {} unlist {} lassign). lscan doesn't sound right to me but lsplit sounds just right for splitting a list into individual variables.

DKF: The name comes from TclX. Choosing a different name or argument syntax to that very well known piece of code is not worth it; just gratuitous incompatability.

fredderic: I really don't see what all the fuss is about. lassign is fine just the way it is, lset is already taken, anything-scan sounds like it does a heck of a lot more than just assigning words to variables, and the concept of proc-like default values just makes me shudder... Even in the definition of a proc! ;)

Something I would like to see, is an lrassign that does the left-to-right thing, and maybe some variant or option to lassign that takes a second list of default values:
lassign-with-defs defaultsList valuesList ?variable ...?

where the defaults list would be empty-string-extended to the number of variables given (any extra defaults would simply be ignored), the values list wouldn't (any extra values would be returned as per usual), so you'd end up with:
lassign-with-defs {1 2 3} {a {}} w x y z

being the equivalent of:
set w a    ;# from values list
set x {}   ;# also from values list
set y 3    ;# 3rd default value carried through
set z {}   ;# empty-string expanded defaults
# with both arguments consumed, and empty string is returned

The old filling-with-empty-strings lassign behaviour would thus be achieved by simply giving it an empty default values list, and the whole thing would be absolutely fabulous. ;)

Of course, the catch is that if you simply take away the filling-with-empty-strings behaviour from lassign, then the defaults capability is created by simply doing two lassigns. A little wasteful, perhaps (possibly problematic if variable write traces are involved), but still better than most of the alternatives. (Perhaps a third argument to lrepeat would fulfill the empty-string-filling requirement by accepting an initial list, and repeatedly appending the specified item until the list contains at least count words? I can imagine several occasions where that could be handy.)

Ed Hume: I think the syntax of lassign is not as useful as having the value list and the variable name list being of similar structure:
vset {value1 value2 value3 ...} {name1 name2 name3 ...}

I have provided an lset command since Tcl 7.6 in my toolset which was renamed to vset with Tcl 8.4. Having both the names and values as vectors allows you to easily pass both to other procedures without resorting to a variable number of arguments. It is a common idiom to assign each row of table data to a list of column names and work with it:
foreach row $rows {
    vset $row $cols
    # now each column name is defined with the data of the table row
    # ...
}

A second significant advantage of this syntax is that the structure of the names and the values are not limited to vectors. The vset command is actually a simplified case of the rset command which does a recursive set of nested data structures:
rset {1 2 3} {a b c}
# $a is 1, $b is 2, ...
rset {{1.1 1.2 1.3} 2 {3.1 3.2}} {{a b c} d {e f}}
# $a is 1.1, $b is 1.2, $f is 3.2,....

The syntax of vset and rset lend themselves to providing an optional third argument to provide default values in the case where empty values are not desired. So this is a cleaner implementation of frederic's lassign-with-defaults - the defaults values can have the usual empty string default.

Now that Tcl has the expansion operator, the difference between lassign and vset is not as important as it was, but I do think vset is a lot more powerful.

DKF: Ultimately, we went for the option that we did because that was what TclX used. However, a side-benefit is that it also makes compiling the command to bytecode much easier than it would have been with vset. (Command compilers are rather tricky to write when they need to parse apart arguments.)

Script Implementation  edit

Both the built-in lassign and the TclX lassign are faster than the scripted implementations presented below.

KPV: For those who want to use [lassign] before Tcl 8.5, and without getting TclX, here's a tcl-only version of lassign:
if {[namespace which lassign] eq {}} {
    proc lassign {values args} {
        set vlen [llength $values]
        set alen [llength $args]

        # Make lists equal length
        for {set i $vlen} {$i < $alen} {incr i} {
            lappend values {}
        }
        uplevel 1 [list foreach $args $values break]
        return [lrange $values $alen end]
    }
}

jcw: Couldn't resist rewriting in a style I prefer. Chaq'un son gout - a matter of taste - of course:
if {[info procs lassign] eq {}} {
    proc lassign {values args} {
        while {[llength $values] < [llength $args]} {
            lappend values {}
        }
        uplevel 1 [list foreach $args $values break]
        lrange $values [llength $args] end
    }
}

KPV: But from an efficiency point of view, you're calling llength way too many times--every iteration through the while loop does two unnecessary calls. How about this version -- your style, but more efficient:
if {[namespace which lassign] eq {}} {
    proc lassign {values args} {
        set alen [llength $args]
        set vlen [llength $values]
        
        while {[incr vlen] <= $alen} {
            lappend values {}
        }
        uplevel 1 [list foreach $args $values break]
        lrange $values $alen end
    }
}

jcw interjects: Keith... are you sure llength is slower? (be sure to test inside a proc body)

kpv continues: It must be my assembler/C background but I see those function calls, especially the one returning a constant value and wince. But you're correct, calling llength is no slower than accessing a variable. It guess the byte compiler is optimizing out the actual call.

DKF: llength is indeed bytecoded.

sbron: I see no reason to massage the values list at all. foreach will do exactly the same thing even if the values list is shorter than the args list. I.e. this should be all that's needed:
if {[namespace which lassign] eq {}} {
   proc lassign {values args} {
       uplevel 1 [list foreach $args $values break]
       lrange $values [llength $args] end
   }
}

RS: Yup - that's the minimality I like :^)

This version does not work as described in the documentation for lassign for this case:
% lassign [list] a b c
% set a
can't read "a": no such variable

sbron: You are right, I just noticed that myself too. Improved version:
if {[namespace which lassign] eq {}} {
   proc lassign {values args} {
       uplevel 1 [list foreach $args [linsert $values end {}] break]
       lrange $values [llength $args] end
   }
}

AMG: I prefer to use catch to check for a command's existence. Not only will catch check if the command exists, but it can also check if it supports an ensemble subcommand, an option, or some syntax. Plus it works with interp alias, a clear advantage over info commands.
if {[catch {lassign {}}]} {
    proc lassign {vals args} {
        lappend vals ""
        uplevel 1 [list foreach $args $vals break]
        lrange $vals [llength $args] end-1
    }
}

Open Questions  edit

JMN: tclX doesn't seem to use a separate namespace for its commands so if we do a 'package require tclX' in Tcl 8.5+, which version of a command such as lassign will end up being used?