Version 52 of Quoting hell

Updated 2012-11-10 16:04:58 by AMG

If you think you're in "quoting hell" with Tcl, it's almost certain that you're making a mistake. Tcl's grammar (see "BNF for Tcl") is designed to be particularly simple, and we've seen over and over again that resort to complicated quoting schemes almost always is the result of a failure to understand a much easier approach.

[Examples appear in FMM.]

[Editorial comments]

TV (27 Oct '04) In good democratic tradition, see also: Quoting Heaven!.


"Is white space significant in Tcl" covers much of the material pertinent to "Quoting hell".


Times when quoting acrobatics are necessary:

  • traces [but show alternatives] ...
  • Vignette (but this can be largely avoided in 5.0 and later by using the SOURCE command)
  • other third-party APIs
  • when regular expressions are involved (regexp, regsub, ...), because these have different quoting rules than Tcl itself [give example ("{...\n...}" ...)]
  • ...

Brent Welch (I think) wrote a helpful essay on tcl quoting hell, back in 1993 or so. The original is at ftp://ftp.procplace.com/pub/tcl/sorted/info/doc/README.programmer.gz . Here is a copy:

This is a short note to describe a deep "gotcha" with TCL and the standard way to handle it. Up front, TCL seems pretty straight-forward and easy to use. However, trying out some complex things will expose you to the gotcha, which is referred to as "quoting hell", "unexpected evaluation", or "just what is a TCL list?". These problems, which many very smart people have had, are indications that programmer's mental model of the TCL evaluator is incorrect. The point of this note is to sketch out the basic model, the gotcha, and the right way to think (and program) around it.

THE BASIC MODEL (courtesy of John O.)

Almost all problems can be explained with three simple rules:

  1. Exactly one level of substitution and/or evaluation occurs in each pass through the Tcl interpreter, no more and no less.
  2. Each character is scanned exactly once in each pass through the interpreter.
  3. Any well-formed list is also a well-formed command; if evaluated, each element of the list will become exactly one word of the command with no further substitutions.

For example, consider the following four one-line scripts:

set a $b
eval {set a $b}
eval "set a $b"
eval [list set a $b]

In the first script the set command passes through the interpreter once. It is chopped into three words, "set", "a", and the value of variable "b". No further substitutions are performed on the value of b: spaces inside b are not treated as word breaks in the "set" command, dollar-signs in the value of b don't cause variable substitution, etc.

In the second script the "set" command passes through the interpreter twice: once while parsing the "eval" command and again when "eval" passes its argument back to the Tcl interpreter for evaluation. However, the braces around the set command prevent the dollar-sign from inducing variable substitution: the argument to eval is "set a $b". So, when this command is evaluated it produces exactly the same effect as the first script.

In the third script double quotes are used instead of braces, so variable substitution occurs in the argument to eval, and this could cause unwanted effects when eval evaluates its argument. For example, if b contains the string "x y z" then the argument to eval will be "set a x y z"; when this is evaluated as a Tcl script it results in a "set" command with five words, which causes an error. The problem occurs because $b is first substituted and then re-evaluated. This double-evaluation can sometimes be used to produce interesting effects. For example, if the value of $b were "$c", then the script would set variable a to the value of variable c (i.e. indirection).

The fourth script is safe again. While parsing the "eval" command, command substitution occurs, which causes the result of the "list" command to be the second word of the "eval" command. The result of the list command will be a proper Tcl list with three elements: "set", "a", and the contents of variable b (all as one element). For example, if $b is "x y z" then the result of the "list" command will be "set a {x y z}". This is passed to "eval" as its argument, and when eval re-evaluates it the "set" command will be well-formed: by rule #3 above each element of the list becomes exactly one word of the command. Thus the fourth script produces the same effect as the first and second ones.

THE GOTCHA (observations by Brent Welch)

The basic theme to the problem is that you have an arbitrary string and want to protect it from evaluation while passing it around through scripts and perhaps in and out of C code you write. The short answer is that you must use the list command to protect the string if it originates in a TCL script, or you must use the Tcl_Merge library procedure if the string originiates in your C code. Also, avoid double quotes and use list instead so you can keep a grip on things.

Now, let's rewind and start with a simple example to give some context. We want to create a TK button that has a command associated with it. The command will just print out the label on the button, and we'll define a procedure to create this kind of button. There are two opportunities for evaluation here, one when the button is created and the command string is parsed, and again later on when the button is clicked. Here is our TCL proc:

proc mybutton1 { parent self label } {
    if {$parent == "."} {
        set myname $parent$self
    } else {
        set myname $parent.$self
    }
    button $myname -text $label -command "puts stdout $label"
    pack append $parent $myname {left fill}
}

The intent here is that the command associated with the button is

puts stdout $label

Now, label is only defined when creating the button, not later on when the button is clicked. Thus we use double-quoting to group the words in the command and to allow substitution of $label so that the button will print the right value. However, this version will only work if the value for label is a single list element. This is because the double quotes around

"puts stdout $label"

allows variable substitution before grouping the words into a list. If label had a value like "a b c", then the command string defined for the button would be

puts stdout a b c

and pass too many arguments to the puts procedure who would complain.

THE SOLUTION

The right solution is to compose the command using the list operator. list will preserve the list structure and protect the value that was in $label so it will survive correctly until the button is clicked:

proc mybutton2 { parent self label } {
    if {$parent == "."} {
        set myname $parent$self
    } else {
        set myname $parent.$self
    }
    button $myname -text $label -command [list puts stdout $label]
    pack append $parent $myname {left fill}
}

In this case, list will "do the right thing" and massage the value of $label so that it appears as a single list element with respect to the invocation of puts. The command string for the button will be:

puts stdout {a b c}

The second place you experience this problem is when composing commands to be evaluated from inside C code. If the example is at all complex, you'll want to use Tcl_Merge to build up the command string before passing it into Tcl_Eval. Tcl_Merge takes an argc, argv parameter set and converts it to a string while preserving the list structure. That is, if you pass the result to Tcl_Eval, argv[0] will be interpreted as the command name, and argv[1] up through argv[argc-1] will be passed as the parameters to the command. Note that Tcl_VarEval does not make this guarantee. Instead, it behaves more like double-quotes by concatenating all its arguments together and then reparsing to determine list structure.

ANOTHER GOTCHA

Now, let's extend this example with another feature that I've found thorny. Suppose I want the caller of mybutton2 to be able to pass in more arguments that will be passed to the button primitive. Say they want to fiddle with the colors of the button. Now I can add the special parameter "args" to the end of the parameter list. When my button3 is called, the variable args will be a list of all the remaining arguments. The naive, and wrong, approach is:

proc mybutton3 { parent name label args} {
    if {$parent == "."} {
        set myname $parent$self
    } else {
        set myn ame $parent.$sel f
    }
    button $myname -text $label -command [list puts stdout $label] $args
    pack append $parent $myname {left fill}
}

This is wrong because button doesn't want a sublist of more arguments; it wants many arguments. So, how am I gonna stick the value of $args onto my button command? Or, said another way, how am I going to create the proper list structure? It is tempting to do the following:

eval "button $myname -text $label -command [list puts stdout $label] $args"

However, this construct causes things to go through the evaluator twice, which will lead to unexpected results. The double quotes will allow substitution, so, again, if $label has spaces, then the button command will not like its argument list. Another (ugly) try:

eval "button \$myname -text \$label -command \[list puts stdout \$label\] $args"

Now $arg is the only variable that is evaluated twice, once to remove its outermost list structure, and the second time as individual arguments to the button command. I think a better approach is the following:

eval [concat {button $myname -text $label -command [list puts stdout $label]} $args]

In this case, $args is evaluated twice, once before the call to concat, and a second time explicitly by calling eval. The stuff between the curly braces is protected against substitution on the first pass, however, (which is good), and so all concat ends up doing is stripping off the outermost list structure (the curly braces) from its two arguments and putting a space between them. Another, perhaps clearer way of writing this is:

set cmd {button $myname -text $label -command [list puts stdout $label]}
eval [concat $cmd $args]

Now, with this form it is fairly clear(?) that the items in the button command and the $args list will only be evaluated one time. Finally, it turns out you can eliminate the explicit call to concat because eval will do that for us if it is given multiple arguments:

set cmd {button $myname -text $label -command [list puts stdout $label]}
eval $cmd $args

Which leads us back to:

eval {button $myname -text $label -command [list puts stdout $label]} $args

MS notes that Tcl8.5 has new syntax for this; the best solution is now

button $myname -text $label -command [list puts stdout $label] {*}$args

FW: It warrants noting that when you're trying to pass some code with multiple lines to a function using list to ensure it's well-formed, since list will eliminate newlines you must use semicolons, and you must use backslashes at the end of every textual line to make sure the list command recieves every line. AND, you must escape every special character that you don't want to be evaluated immediately. For example, here's a hunk of code, part of a hypothetical program to make a button that increments a counter up each time:

set count 0
button .b -text "0..." -command [list \
    incr count;                         \
    .b config -text \"\$count...\";     \
]

While the list technique isn't necessary in this limited case, as noted at great length above it often is.

The point is, it can be unpleasant and ugly to use list for multiple-command structures. That's why whenever you have, for example, a button with a -command option, if you have anything complicated (eg, multiple lines of code) to do, you should put it in a procedure and invoke that procedure instead. This also has the benefit of compiling the code inside the procedure instead of re-evaluating it each time as is the case when you put it directly into a -command, so it goes faster. Plus, other widgets can then use that procedure in their own -command arguments, if it's generic enough.

Lars H: That code does not do what you seem to think it does, FW! The semicolons will act as command separators inside the brackets, so Tcl will evaluate [.b config -text ...] before the surrounding [button .b ...]. Mixing list and command separators requires special care, but the above is absolutely not the way to do it.

One should instead remember that every list is one command. A multi-command script must be eventually be constructed as a string, but component commands can be constructed using list.

set $countvarname 0
button .b -text "0...." -command\
    "[list incr $countvarname]\n\
    .b configure -text \"\[[list set $countvarname]\]...\""

It is however quite true that many such things are much better coded using an auxiliary procedure.

proc incr_button {widget variable} {
    $widget configure -text "[uplevel #0 [list incr $variable]]..."
}
set $countvarname 0
button .b -text "0..." -command [list incr_button .b $countvarname]

Using list to embed some value in a command that is to be upleveled is a very useful technique.

FW: I typed that absentmindedly, pardon me. I meant to also put backslashes before the semicolons.

Lars H: That won't work either. That would only make a command equivalent to

incr {count;} .b config -text {"$count...";}

or possibly

incr count {;} .b config -text {"$count..."} {;}

depending on the spacing. A pure list is one command. In order to make a multiple-command script, you must treat it as a string.

FW: Well, this has conveniently served to show how precarious multiple-command bindings and such can be. Avoid the whole problem, people.


Sarnold recently noticed that lindex and all list-relative commands suffer from a quoting problem. (maybe not the most adequate words)

code:

set a {{a b} {c}]}

result:

 {a b} {c}]

code:

lindex $a 0

result (an error message):

list element in braces followed by "]" instead of space

If you are not careful enough, string to list conversions may causes bugs. string is list will cover that problem... Input validation should include this checking, developers should be aware of that problem.

The only "problem" is in developers thinking any string can be treated as a list. There are no bugs in Tcl related to list commands and quoting. The problems are when developers try to use list commands on things that are not lists. This is no different then getting an error with [expr ? + #]. Garbage in, ...

Sarnold You are right : thinking of any string can be a list is misunderstanding Tcl, as I have done. But TIP #269 (string is list) is still valuable.


'Nother slant on many of the same points: tclguy explains the unavoidability of eval's quoting hell in a message [L1 ] to TCT.

escargo 10 Dec 2007 - The message link above now points to a page with the subject Email Archive: junkfilter-misses (read-only). Whatever was the intended link isn't there any more. An updated link would be welcome.

AMG: Now the link points to something the mail archive for a project called Fire. This calls for the Wayback Machine! Here is your link [L2 ]. Ironically, Sourceforge or Wayback screwed up the quotes, replacing all apostrophes with double quotes. Here's a (fixed) copy for the Wiki to treasure forever:

> miguel sofer <mig@ut...> writes:
> > In tcllib, every effort is done to provide code that runs 
> > conditionally on the tcl version and provides the new functionality 
> > toold interpreters.
> 
> But the TIP doesn't specify any new functionality.  It only 
> specifies a new syntax for functionality that we already 
> have.  I don't see a need to complicate code by providing two 
> implementations, when one of the implementations (the old 
> one) works on all versions of Tcl and has no functional drawbacks.

Alright, this is where I step in to note how important this change is,
and to correct everyone's completely false assumption that the existing
eval hell has no functional drawbacks.  dgp asked me to post these
points that I made at Tcl2003 earlier, but I didn't think it necessary.
It obviously is.  It goes a little something like this:

Raise your hand if you think this is correct:

        eval entry $path $args

Everyone raising their hand please sit down.  You are wrong.  The $path
arg will be split apart as well, which is bad when using this in low
level code, like megawidgets or the like, where it must be handled
correctly.  After all, widgets with spaces in the names is 100% valid.

OK, so we know the fix, right?

        eval entry [list $path] $args

Ah, that's better ... but something is not right.  Hmmm ... oh, it is
inefficient!  The mix of list and string args will walk the wrong path
for optimization (this isn't important to everybody, but good low level
code writers should be sensitive to this).  OK, so that means this is
the best, right?

        eval [list entry $path] $args

Now I feel better.  What, that's not right?  If string args is actually
a multiline string, it won't work as expected.  Try it with:

        set args {
              -opt1 val1
              -opt2 val2
        }

and unfortunately that isn't theoretical.  I've seen code that uses a
$defaultArgs set up like that before regular args to handle defaults.

Ugh ... what are we left with?  This:

        eval [linsert $args 0 entry $path]

Only the final version is 100% correct, guaranteed not to blow when you
least want it to.

So we get back to the original point ... eval itself may not be
functionally flawed, but 99% of eval uses are.  In fact I don't always
use the final solution in code because it can get so unwieldly, but now
let me focus on tcllib, which was mentioned.  First let me say that my
favored solution is so much easier to use, has a minimal ugly factor,
and doesn't have any of the flaws above:

        entry $path {*}$args

So on to tcllib.  I just grep the .tcl files for "eval" and let me
pick a few:

# I sure hope critcl isn't in a dir with a space
./sak.tcl:        eval exec $critcl -force \
        -libdir [list $target] -pkg [list $pkg] $files

# Isn't this beautifully easy to understand?
./modules/ftp/ftp.tcl:        eval [concat $ftp(Output) {$s $msg $state}]

# I sure hope they don't use namespaces with spaces, or cmds ...
./modules/irc/irc.tcl:      eval [namespace current]::cmd-$cmd $args

# hmmm, just looks dangerous ...
./modules/struct/record.tcl:    eval Create $def ${inst_}.${inst} \
                                                [lindex $args $cnt_plus]

# I'm not sure why eval was used here.
# I think because version can be empty?
./modules/stooop/mkpkgidx.tcl:  eval package require $name $version

# I think someone needs to look up "concat"
./modules/textutil/adjust.tcl:  lappend list [ eval list $i $words($i) 1 ]

OK ... so I'm tired of looking now.

  Jeff Hobbs                     The Tcl Guy
  Senior Developer               http://www.ActiveState.com/
        Tcl Support and Productivity Solutions

Good call on the multiline $args value. Sigh...


AMG: This article [L3 ] by Ian Lance Taylor discusses Tcl and how its EIAS philosophy is its downfall. He argues that EIAS requires very precise quoting in order to get anything nontrivial to work right: "Even then I recently wrote some Tcl code with seven consecutive backslashes, admittedly in a complex use case. That's too much for easy reasoning, and in practice requires trial and error to get right." Sounds like a case of Quoting Hell, alright. I wish I had the chance to see the code in question and suggest an alternative, since in my experience there's always been a safe, clean way to avoid Quoting Hell.