Updated 2014-04-08 20:43:55 by AMG

What is a command prefix? edit

A command prefix is a prefix of a command (in the sense of rule 2 of the dodekalogue, i.e., a sequence of words) that is constructed with the expectation that zero or more arguments will be appended to it and that the resulting command will be evaluated. This is very often done several times for each command prefix, with the arguments completing the command being different for each evaluation.

The classical (although not very efficient) example of this is the -command option of lsort, which takes a command prefix as argument. This command prefix is supposed to take two arguments and compare them according to some custom order, returning -1, 0, or 1 according to how the comparison came out. The equally classical example of such a command prefix is string compare, so one can say
lsort -command {string compare} {a B c D 0}

which however is just the same as lsort {a B c D 0}. A more interesting choice is package vcompare, which compares version numbers:
% lsort -command {package vcompare} {2.0.1 3 1.10 1.9 2a0}
1.9 1.10 2a0 2.0.1 3

The -dictionary option of lsort is rather close, but it wouldn't get the alpha and beta versions right:
% lsort -dictionary {2.0.1 3 1.10 1.9 2a0}
1.9 1.10 2.0.1 2a0 3

In order to see (at least some of) how the sorting gets done, we can define a procedure that calls some other command prefix to do the actual sorting, writes the result to stdout, and then returns it:
proc showsort {prefix a b} {
    set res [{*}$prefix $a $b]
    puts "$a [expr {$res<0 ? "<" : $res>0 ? ">" : "="}] $b"
    return $res

For example,
% lsort -command {showsort {string compare}} {a B c D 0}
a > B
c > D
B < D
a > D
a < c
B > 0
0 B D a c
% lsort -command {showsort {package vcompare}} {2.0.1 3 1.10 1.9 2a0}
2.0.1 < 3
1.10 > 1.9
2.0.1 > 1.9
2.0.1 > 1.10
1.9 < 2a0
1.10 < 2a0
2.0.1 > 2a0
1.9 1.10 2a0 2.0.1 3

This particular technique — to let one or more of the words in a command prefix themselves be command prefixes that handle some subtask — can become very powerful if you have several command prefixes that do related things, much for the same reasons as pipelines make the Unix command-line powerful.

Command prefix flavours edit

Modern command prefixes are typically lists, i.e., they are supposed to be called as
{*}$prefix $arg1 $arg2 ...

but classical command prefixes has sometimes rather been called effectively as
eval $prefix [list $arg1 $arg2 ...]

These differ with respect to how they interpret characters that have special meaning in scripts but not in lists — the eval variant treats the prefix more as a "script prefix" than a list prefix. The most common use of this is in an idiom for trace callbacks where one isn't interested in the extra arguments, and ends the prefix with "; #" to end the command and have the arguments that are appended ignored as part of a comment:
trace add variable localGuardVariable unset "rename someCommand {} ; #"

The more modern way to achieve this effect is however to use apply:
trace add variable localGuardVariable unset [list ::apply {args {rename someCommand ""} ::}]

NEM (corrected by Lars H): Alternatively (and pre-apply) you can use a simple wrapper command that discards any other arguments:
proc discard {script args} {eval $script}
trace add variable foo unset [list discard {rename someCommand ""}]

EG prefers
eval [linsert $prefix end $arg1 $arg2 ...]

[Find and include links to discussion of non-list command prefix flavours — not all of these are the same, either.]

Command prefixes in comparison edit

In general, one uses list to form a command prefix and {*} to expand it, but beyond that:

  • interp alias can be used to turn any command prefix into a named command.
  • apply can be used to turn any lambda into a command prefix.

(On this page, the term lambda refers to what the apply manpage calls "anonymous functions". These are consistently called "lambdas" in the Tcl source code, and also by the info frame and apply commands themselves — try "apply \{" to see the error message "can't interpret "{" as a lambda expression".)

The main difference between scripts and command prefixes is that the latter typically take input (the arguments appended), whereas scripts do not. Hence if you need to communicate any information in to a script when evaluating it, you need to supply that through the context (e.g. store in variables with predetermined names).

The main difference between named commands and command prefixes is that command prefixes can contain additional data. For example, in a socket -server callback one could use the same proc to handle connections for several ports, and use additional arguments to specify the exact protocol to use for incoming connections. Concretely:
proc foo::incoming {secure sock clientaddr clientport} {
    fconfigure $sock -translation binary
    if {$secure} then {tls::import $sock}
    # Do normal handshake
socket -server {foo::incoming 0} 3000 ; # Unsecure port
socket -server {foo::incoming 1} 3001 ; # Secure port

This difference is less pronounced at the C side, since e.g. Tcl_CreateObjCommand takes a clientData argument that it passes on to the C function actually implementing the Tcl command created. Data arguments embedded in a command prefix are thus a way to achieve on the Tcl side what has always been straightforward on the C side.

From a usage point of view, a lambda is like a script that takes arguments, so it could be thought of as overcoming the disadvantage scripts have relative to command prefixes. In doing so, they also make heavy use of variable substitution however, which places them at the opposite end of the spectrum from command prefixes (or scripts) with embedded data. Lambdas are syntactically the odd man out in this collection, as they are not partial scripts. They are also special in that they provide their own local environment, but it is more often scripts that stand out as different in that respect, since a proc used as basis for a named command or command prefix also provides a local environment.
Command prefix Script Command Lambda
Needs Tcl version Any Any Any 8.5+
Can take arguments Yes No Yes Yes
Difficulty embedding data Low Medium Impossible High
Lifetime semantics value value needs cleanup value
Encapsulation Good Poor Good Excellent

NEM Note that a command prefix can be handled much more efficiently than a general script. Under the hood a command prefix can be resolved to the Tcl_Command that implements it and this can then be invoked directly with little or no interpretive overhead. A general script however must be evaluated every time. While the parsing can be cached, there is still some overhead to resolve the command afresh each time. This is why things like lsort -command are slow: while they are only really used with command prefixes they are actually documented as treating the argument as a script, and so need to call eval. Hopefully this can be changed at some point. Any new commands should take command prefixes and invoke them as such (using {*}). Ideally, Tcl needs an invoke command that does the same as uplevel but treats its argument as a command prefix rather than a general script.

MS Please note the special case of scripts which are canonical lists, ie, which was generated using list or any other of the list operations. The core recognizes that these are pre-parsed single commands and optimizes accordingly (if still not optimally) in many cases including eval, uplevel and namespace eval.

It would be possible to do things so that lsort -command and other such cases (notably traces) also recognize this special structure. Actually, it is not a matter of the structure being recognized, we just need not to spoil it when present. 2008-09-19: just checked that lsort does the right thing already

NEM Does this mean that code that calls uplevel in a loop on the same command should be roughly as efficient as C code calling Tcl_EvalObjv?

NEM (Regarding lsort doing the right thing). OK, if lsort is able to cache the command resolution then surely the following two commands should be roughly equivalent in performance?
set xs [genlist 10000] ;# random list of chars from a-z
time { lsort $xs } 100 ;# => 5246 microseconds per iteration
set cmd [list string compare]
time { lsort -command $cmd $xs } 100 ;# => 121838 microseconds per iteration

It seems to me that we must still be getting interpreter overhead in that loop, because a C coded sort calling a C coded compare should not be that slow, should it?

MS of course there is command invocation overhead! The command is cached, but we are still calling Tcl_EvalObjv at each iteration, and checking that the cached command is valid, and resetting the interp's result, and (if a proc) pushing a CallFrame, and initializing local vars after checking that the number is ok, and checking the result for errors, and communicating the result with a new Tcl_Obj in the interp's result, and ... What I meant is: it is a command not a script, it is not reparsed at each iteration, the cached Command is reused.

NEM: Much of this can be avoided, I think, especially with a native comparison command. I created a stripped down version of lsort specialised just for the -command usage (all other options removed). I then rearranged things so that it looks up the command at the start and then calls the objProc directly for comparisons. This reduces the runtime by approximately 1/2, but still nowhere near lsort -integer (this is sorting a 10000 element list of random ints in range 0-99, tests repeated 100 times):
fastsort  : 22872.614190000004 microseconds per iteration
lsort/cmd : 46619.153609999994 microseconds per iteration
lsort     :  3249.8842 microseconds per iteration

(Code available on request). Profiling, it seems that >20% of the time is spent in three functions: Tcl_GetIntFromObj, Tcl_SetObjResult, Tcl_GetObjResult. It seems having to pass the comparison result through the interpreter result object is the real performance killer.

Indeed, if I cut out the trip through the interpreter result (handing int value via a C global var for just this test), then I can knock that time in half again (now only 4x slower than -integer):
fastsort  : 12132.10835 microseconds per iteration

The other bottleneck is Tcl_GetIntFromObj, but that seems unavoidable.

DGP Take a look at TclGetNumberFromObj for a possible route to improvement.

NEM This discussion seems a bit confusing. Lambdas aren't at the opposite end from command prefixes: they are command prefixes! The table also is misleading. Lambdas can be used pre-8.5 easily enough (the apply TIP gives a pure-Tcl implementation). I also don't see the difficulty in embedding data in a lambda:
proc lambda {params body args} {
    set ns [uplevel 1 { namespace current }]
    list ::apply [list $params $body $ns] {*}$args
foo [lambda {a b other} { ... } $x $y]

I.e. you use exactly the same technique as for any other command prefix: just append extra arguments as required. I'm also not sure that lambdas provide any more encapsulation than named commands.

Lars H: This presumes the interpretation that a "lambda" is an application of a "lambda constructor", or the result a such, which in some contexts may be appropriate. Tcl itself (C sources as well as script-level-visible behaviour) uses the term in a more specific sense however, and I see no reason to deviate from Tcl's own terminology here (especially when it provides a distinctive name for a concept that is being discussed).

AMG: Be careful if you want to pass a command prefix to [eval] or [catch] or [proc] or similar. They accept scripts, which are subtly different from command prefixes.

  • Scripts use line breaks as command separators, whereas command prefixes interpret line breaks as word separators. This is because command prefixes are lists, which interpret spaces, horizontal/vertical tabs, carriage returns, line feeds, and form feeds as word separators.
  • Semicolons are also command separators for scripts, but command prefixes see them as ordinary word characters.
  • Likewise, comments don't exist for command prefixes; the hash marks are also just word characters. (TIP 148 [1])
  • In a command prefix, a backslash before a newline-whitespace sequence is treated as a single space (ASCII 32) embedded in a word.

To force [eval], etc. to interpret a string as a command prefix rather than a script, use this magic:
eval [list {*}$command]

This seemingly no-op [list] command is guaranteed to return a list, and a list is guaranteed to be interpreted by [eval] as a single command invocation. (Are there other ways to accomplish the same thing?)

EG: Yes. For evaluating a command prefix, the following is enough (the listification is done by the parser):

Proof (interactive session):
% proc foo {args} {puts [info level 0]}
% set a {foo
% {*}$a
foo bar baz

AMG: For simple execution there's no need to call [eval], just do as you say. However, often the [eval] is buried in some other command, to which you have to pass a script or script prefix, for example [trace]. My concern is formatting a command prefix such that it is also a valid script prefix, both with the same interpretation. A list is both, so that's what is used. My question is whether there are other ways to listify a command prefix besides [list {*}$command].

EG: [lrange $command 0 end] is equivalent to [list {*}$command], and it works with Tcl version earlier than 8.5. Both operations are "list canonicalization", as MS describes above.

AMG: Thanks! [time] says [lrange] is slightly faster, so that's what I'm going to go with. For my test case, [lrange] takes 3.258165 microseconds, whereas [list] takes 3.412147 microseconds.