Updated 2014-04-20 18:19:30 by pooryorick

Dodekalogue is the common name for the twelve rules that define the syntax and semantics of Tcl.

See also  edit

official Tcl syntax reference

Description  edit

Prior to the introduction of the twelfth rule, Argument expansion, the rules were known as the Endekalogue.

KBK, in a silly mood, points out that the real Dodekalogue is over at http://www.faqs.org/rfcs/rfc1925.html - and observes that Tcl/Tk already has one of the best implementations of RFC 1925 available.

Script substitution is a more precise name for command substitution, since brackets can contain entire scripts, not just an individual command. Being scripts, semicolons and newlines can be used to separate multiple commands. The result is that of the final command. A new stack frame is not created, so using return or break or the like will cause the caller to return, etc.

In a Nutshell  edit

The official dodekalogue is described in the Rules section futher down this page. This section presents an alternate, shorter, and more conversational description of Tcl than the dodekalogue. It describes the same language as the dodekalogue but presents a slightly different model, and is just as complete, except that it does not describe the individual \ annotations. In this description, brackets means [ and ] pairs, and braces means { and } pairs. quotes means ".
Script
A string containing commands, one command per line, or multiple commands, separated by semicolons, per line. Every script returns a value.
Words
Whitespace separates words in a command. The first word is the name of the command. Any additional words are arguments to the command, which interprets them in arbitrary ways.
Annotation
Words can be annotated. Prior to the invocation of each command, each annotation in each word is processed, and the result is interpolated verbatim, in place of the annotation, without any further processing, into the word. The annotations are brackets, \, quotes, braces and $. Braces and quotes only qualify as annotaions when they completely enclose a word. With the exception of {*}, the interpolated value does not change the boundaries of the word it appears in or introduce new words into the command.
Word Expansion
If {*} is placed in front of a word, the words within that word are interpolated into the command in its place and are treated exactly as if they had appeared literally as separate words in the command. The character after the initial {*} is considered the beginning of the word, so it can still be enclosed by braces or quotes. All annotations in the word are processed before this word expansion is performed. {*} is the only annotation capable of manipulating the number of words in a command.
Comment
If # appears where a command is expected, the rest of the line is a comment. No command execution is attemped, and no characters in the line are interpreted, except that the terminating newline may be escaped with \, signifying that the comment continues on the subsequent line.
\
shifts the meaning of the next character: An otherwise-interpreted character takes on its literal meaning, and some otherwise-literal characters take on an interpreted meaning.
Brackets
A script may be embedded at any position of any word by enclosing it in brackets. The embedded script is passed verbatim, without any processing, to the interpreter, for execution, and the result is inserted in its place in the embedding script.
Quotes
Words can be entirely enclosed in quotes to escape interpretation of any enclosed whitespace and semicolons.
Braces
do the same, but also escape $, brackets, and normal \ interpretation, except for \newline. Additional braces can appear inside a braced word, but each open brace must have a corresponding closing brace, unless it is escaped by \, which, unlike the normal \ interpretation, is not removed from the word.
$
followed by a variable name, which may be enclosed in braces to delineate it from neighboring characters, becomes the value of the corresponding variable. This can occur at any position of any word. A variable name that is enclosed in braces may contain any characters except the right brace. When not enclosed in braces, a variable name must be composed only of letters, digits, underscore, or namespace separators. Any other character marks the end of the name. A name containing parenthesis denotes the name of an array and, within the parenthesis, the name of a variable within that array. The name in parenthesis can be any string, but a closing parenthesis in the name must be escaped with \.

Rules  edit

The following rules define the syntax and semantics of the Tcl language:

[1] Commands

A Tcl script is a string containing one or more commands. Semicolons and newlines are command separators unless quoted as described below. Close brackets are command terminators during command substitution (see below) unless quoted.

[2] Evaluation

A command is evaluated in two steps. First, the Tcl interpreter breaks the command into words and performs substitutions as described below. These substitutions are performed in the same way for all commands. The first word is used to locate a command procedure to carry out the command, then all of the words of the command are passed to the command procedure. The command procedure is free to interpret each of its words in any way it likes, such as an integer, variable name, list, or Tcl script. Different commands interpret their words differently.

[3] Words

Words of a command are separated by white space (except for newlines, which are command separators).

[4] Double quotes

If the first character of a word is double quote (“"”) then the word is terminated by the next double quote character. If semicolons, close brackets, or white space characters (including newlines) appear between the quotes then they are treated as ordinary characters and included in the word. Command substitution, variable substitution, and backslash substitution are performed on the characters between the quotes as described below. The double quotes are not retained as part of the word.

[5] Argument expansion

If a word starts with the string “{*}” followed by a non-whitespace character, then the leading “{*}” is removed and the rest of the word is parsed and substituted as any other word. After substitution, the word is parsed as a list (without command or variable substitutions; backslash substitutions are performed as is normal for a list and individual internal words may be surrounded by either braces or double-quote characters), and its words are added to the command being substituted. For instance,
cmd a {*}{b [c]} d {*}{$e f {g h}}

is equivalent to
cmd a b {[[c]} d {$e} f {g h}

[6] Braces

If the first character of a word is an open brace (“{”) and rule [5] does not apply, then the word is terminated by the matching close brace (“}”). Braces nest within the word: for each additional open brace there must be an additional close brace (however, if an open brace or close brace within the word is quoted with a backslash then it is not counted in locating the matching close brace). No substitutions are performed on the characters between the braces except for backslash-newline substitutions described below, nor do semi-colons, newlines, close brackets, or white space receive any special interpretation. The word will consist of exactly the characters between the outer braces, not including the braces themselves.

[7] Command substitution

If a word contains an open bracket (“[”) then Tcl performs command substitution. To do this it invokes the Tcl interpreter recursively to process the characters following the open bracket as a Tcl script. The script may contain any number of commands and must be terminated by a close bracket (“]”). The result of the script (i.e. the result of its last command) is substituted into the word in place of the brackets and all of the characters between them. There may be any number of command substitutions in a single word. Command substitution is not performed on words enclosed in braces.

[8] Variable substitution

If a word contains a dollar-sign (“$”) followed by one of the forms described below, then Tcl performs variable substitution: the dollar-sign and the following characters are replaced in the word by the value of a variable. Variable substitution may take any of the following forms:
$name
Name is the name of a scalar variable; the name is a sequence of one or more characters that are a letter, digit, underscore, or namespace separators (two or more colons). Letters and digits are only the standard ASCII ones (0-9, A-Z and a-z)
$name(index)
Name gives the name of an array variable and index gives the name of an element within that array. Name must contain only letters, digits, underscores, and namespace separators, and may be an empty string. Letters and digits are only the standard ASCII ones (0-9, A-Z and a-z). Command substitutions, variable substitutions, and backslash substitutions are performed on the characters of index.
${name}
Name is the name of a scalar variable or array element. It may contain any characters whatsoever except for close braces. It indicates an array element if name is in the form “arrayName(index)” where arrayName does not contain any open parenthesis characters, “(”, or close brace characters, “}”, and index can be any sequence of characters except for close brace characters. No further substitutions are performed during the parsing of name.

There may be any number of variable substitutions in a single word. Variable substitution is not performed on words enclosed in braces.

Note that variables may contain character sequences other than those listed above, but in that case other mechanisms must be used to access them (e.g., via the single-argument form of set).

[9] Backslash substitution

If a backslash (“\”) appears within a word then backslash substitution occurs. In all cases but those described below the backslash is dropped and the following character is treated as an ordinary character and included in the word. This allows characters such as double quotes, close brackets, and dollar signs to be included in words without triggering special processing. The following table lists the backslash sequences that are handled specially, along with the value that replaces each sequence.
\a
Audible alert (bell) (0x7).
\b
Backspace (0x8).
\f
Form feed (0xc).
\n
Newline (0xa).
\r
Carriage-return (0xd).
\t
Tab (0x9).
\v
Vertical tab (0xb).
\<newline>whiteSpace
A single space character replaces the backslash, newline, and all spaces and tabs after the newline. This backslash sequence is unique in that it is replaced in a separate pre-pass before the command is actually parsed. This means that it will be replaced even when it occurs between braces, and the resulting space will be treated as a word separator if it isn't in braces or quotes.
\\
Backslash (“\”).
\ooo
The digits ooo (one, two, or three of them) give an eight-bit octal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0.
\xhh
The hexadecimal digits hh give an eight-bit hexadecimal value for the Unicode character that will be inserted. Any number of hexadecimal digits may be present; however, all but the last two are ignored (the result is always a one-byte quantity). The upper bits of the Unicode character will be 0.
\uhhhh
The hexadecimal digits hhhh (one, two, three, or four of them) give a sixteen-bit hexadecimal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0.
\Uhhhhhhhh
The hexadecimal digits hhhhhhhh (one up to eight of them) give a twenty-one-bit hexadecimal value for the Unicode character that will be inserted, in the range U+0000..U+10FFFF. The parser will stop just before this range overflows, or when the maximum of eight digits is reached. The upper bits of the Unicode character will be 0.
The range U+010000..U+10FFFD is reserved for the future.

Backslash substitution is not performed on words enclosed in braces, except for backslash-newline as described above.

[10] Comments.

If a hash character (“#”) appears at a point where Tcl is expecting the first character of the first word of a command, then the hash character and the characters that follow it, up through the next newline, are treated as a comment and ignored. The comment character only has significance when it appears at the beginning of a command.

[11] Order of substitution

Each character is processed exactly once by the Tcl interpreter as part of creating the words of a command. For example, if variable substitution occurs then no further substitutions are performed on the value of the variable; the value is inserted into the word verbatim. If command substitution occurs then the nested command is processed entirely by the recursive call to the Tcl interpreter; no substitutions are performed before making the recursive call and no additional substitutions are performed on the result of the nested script.

Substitutions take place from left to right, and each substitution is evaluated completely before attempting to evaluate the next. Thus, a sequence like
set y [set x 0][incr x][incr x]

will always set the variable y to the value, 012.

[12] Substitution and word boundaries

Substitutions do not affect the word boundaries of a command, except for argument expansion as specified in rule [5]. For example, during variable substitution the entire value of the variable becomes part of a single word, even if the variable's value contains spaces.

Double Substitution  edit

Given
set q [list [list a b c] x y z]

Should {*}{*}$q produce a b c x y z?

(anonymous): Good question; the wording here allows {*}{*}$q, but I am not sure if this is supposed to be allowed.

DGP: No, that should continue to be a syntax error. TIP 157 accepted argument expansion (post-substitution determination of word boundaries upon explicit request), but never proposed anything about double substitution.

Clarifying Rule 2  edit

CMcC 2010-06-03 20:04:40:

I would like to see Rule 2. split into two numbered parts. The rule announces initially that it has two steps, starts out with 'First', and then never explicitly states 'Second', or 'Then', but instead buries the subsequent step inside the text. This has the effect of making the the two steps run into one another, in a literal confusion.

Secondly, I would like to examine the wisdom of the applicable-everywhere assertion in 2. Firstly, it's provably false - unknown can and frequently does modify the application of commands. Secondly, it over-constrains the language to no real benefit. Thirdly, even if it were true, and were desirable, it is also strongly implied by Rule 7 (which 2. invokes) and the lack of other mechanisms for command substitution.

At best, it is a clarification masquerading as a general principle.

I have created this RFE to address this (putative) improvement:

CMcC 2010-06-03 20:43:54:

This is the form of change I am suggesting:

2. Evaluation

A command is evaluated in two steps:

(a) The Tcl interpreter breaks the command into words and performs substitutions as described below. (Note: These substitutions are performed in the same way for all commands.)

(b) The first word resulting from (a) is used to locate a command procedure to carry out the command, then all of the words of the command are passed to the command procedure. (Note: The command procedure of (b) is free to interpret each of its words in any way it likes, such as an integer, variable name, list, or Tcl script. Different commands interpret their words differently.)

FB 2010-06-04 04:14:59:

I like your proposal. Much clearer than the original IMHO.

What is Whitespace?  edit

gkubu 2012-04-15:

Not a problem in practice, just wondering

\u00a0 is whitespace in Unicode but it's not in the POSIX portable character set. \u0020 and space bar treated differently:
% info patchlevel
8.6b2
% set a a b
wrong # args: should be "set varName ?newValue?"
% set a a\u0020b
a b
% llength [ split $a ]
2
% set a a\u00a0b
a b
% llength [ split $a ]
1
% set a a" "b
>
% set a a
a
% append a " " b
a b
% llength [ split $a ]
2
% set e " "

% set a a${e}b
a b
% llength [ split $a ]
2
% set a a${e}" "${e}b
> 
%

Backslash-newline inside brace-quoted words  edit

AMG: Why is backslash-newline substitution performed inside brace-quoted words? This has caused trouble for me before. The only justification I could think of was to establish an equivalence between "foo" and subst {foo}, for all possible foo. However, this justification doesn't hold, since subst internally performance backslash-newline substitution.

Let me give an example of where this hurts. In my Wibble web server, the [template] command performs template substitution like in Templates and subst. One thing it's capable of doing is joining lines separated by backslash-newline, which is certainly a useful feature. However, in the case of joining long lines which are Tcl script (i.e. have a leading %), this feature works differently depending on where the input comes from. If the input is read from a file, there must be a leading % on all lines to be joined, which is consistent with the notion that the lines are all Tcl script. If the input is supplied inside a Tcl script, there must not be a leading % on any line but the first, otherwise the %'s get merged in which the script.

This happens because when the input is inside a Tcl script, Tcl joins the lines (% and all) before ever passing the argument to [template]. When the input comes from a file, the Tcl parser isn't executed until after [template] picks out all the lines that start with % and turns them into a script.

steveb: FWIW, Jim doesn't do this backslash-newline substitution in braces, and it has never caused me a problem.

AMG: So, what is it good for? I found another problem it causes: errorInfo line numbers. Since lines separated by backslash-newline are joined by the parser before they're ever passed to proc, they collectively count as a single line for the purpose of counting line numbers, which obviously differs from how line numbers are counted in text editors.

Surely the execution engine can be made to internally treat backslash-newline as a word separator rather than a command separator. This would remove any need for the parser to ever muck with the contents of braced words. Actually, I'm pretty sure it already has this feature, since backslash-newline works as expected (as a word separator or comment continuation) when the script is at the toplevel, no braces in sight.

Backslash-Newline Removal  edit

AMG: Normally, according to rule [9], a backslash-newline sequence is replaced by a single space. Here's a way to instead remove it entirely. This is useful for breaking a long word across lines without chopping it into multiple words.

Instead of:
puts this_is_a_really_long_word\
    _which_can't_have_spaces

Do:
puts this_is_a_really_long_word[
    ]_which_can't_have_spaces

The trick is to hide the newline inside a null script substitution.

Discussion  edit

What is a "word"?

CMcC 2005-02-03:

It seems inconsistent (although not necessarily inconsistent with the above) that while {} [] $ and "" seem to be treated similarly in the above, ${x}y $x(z)y and [x]y are accepted by the tcl parser (and are presumably words), but "x"y and {x}y are not.

It would also seem that x"y and x{y are words, and this is explicitly allowed by the requirement that " and { be the first characters of a word to have special interpretation. Nothing in the above seems to explicitly require that " and } be the last characters of a word to have special interpretation, yet this seems to be the case.

I'm interested in why this is, historically, pragmatically, operationally.

I suppose, pragmatically, {x}y is most likely to be a typo for {x} y, and similarly for "x"y. Shouldn't this be made explicit (or implicit) in the dodecalogue? Perhaps it is already, and I've just missed it.

Lars H: It is explicit. The rules for quote- and brace-delimited words explicitly say that the matching " or } terminate the word. Since words have to be separated by whitespace, it is then an error if there (in the same command) is some non-whitespace following the terminating quote or brace.

NJG: Then what about this:
% set a b
b
(bin) 2 % set c ${a}x
bx

(Scroll down for more on the subject from me. If you are interested, that is)

NJG 2005-02-03:

Or look at this from my console
% set var#3 9
9
(bin) 2 % set var#3
9
(bin) 3 % set a $var#3
can't read "var": no such variable
(bin) 4 % set a ${var#3}
9
(bin) 5 % set "var#3"
9
(bin) 6 % set a $"var#3"
$"var#3"
(bin) 7 % 

Tcl accepts any garbage as a variable name in a set statement a lot of which it cannot handle in $ dereferencing!

Peter da Silva: I don't see the problem here. $ dereferencing accepts any garbage set does, by using the ${...} form. (FB: not any garbage, see below). The ${...} form is not distinct from the $alphanumerics form. THIS WAS DELIBERATE.

DKF: Yes. $foo is really just shorthand for set foo (except it doesn't go through set itself) and the syntactic constraints on $ substitution are useful because they do what you want most of the time. (e.g. $frame.component is very common in Tk code) - RS ... and with bracing you can dollar-evaluate any weird name, too.

NJG 2005-02-04:

Gentlemen, you are missing the point I am making here. It is not about how and why. It is in the affirmative: the morphology of Tcl and the behaviour (as of now there are no such thing as rules) of the $ substitution must be revised. As they are now implemented, variable and command names you never in your right mind would think of using are accepted in definitive places just to prevent (rather make very awkward) the use of constructed names. When I need indexed variables, sometimes I would rather write just $A$i$j than $A($i,$j) or use just $$N instead of set $N. Moreover, the »forbidden« forms can be implemented more efficiently!

For rule number one we should assert that in $<string> the same quoting/substitution rules apply for <string> as in any other places. To avoid unmanageable situations and not to go head on against the present conventions the character set for a variable or command name should then be reasonably restricted.

The present practice IMNHO is just a sloppyness in the interpreter realization that may go back as far as the first implementation but that is no reason to remain unchanged forever.

Lars H: I strongly disagree. The ability to use weird names is sometimes tremendously useful, and it's not like it is hard to avoid generating weird names if one doesn't like them. Changing the interpretation of $A$i$j would definitely wreck tons of Tcl code (and your suggested interpretation is inconsistent -- it couldn't be like $A($i,$j), but would rather be like $A($i($j))).

NJG 2005-02-06: OK, I give up. Not that I agree, I just have not enough time right now to engage in a discussion of merit. However, I would really appreciate if somebody gave me a digestible reason why on earth in $ substitution # should act as a delimiter,

Lars H: Because # is not "a letter, digit, underscore, or namespace separator" (rule 8 above). Variable substitution is designed to be abstemious with respect to what it will grab.

(presumably) NJG: a pair of " characters should not keep their role (that is forcing any substitutions in the enclosed text before $ is applied)

Lars H: Because that is not their role! Read rule 4. The special role of the quotes, which they only take on at the beginning of a word, is to make whitespace, semicolons, and close brackets counts as characters in the word rather than as word separators. Period. Quoted and unquoted words behave essentially the same. It is always possible to get to the exact same end with a word without quotes by escaping these characters, so the double quotes are not an essential part of the Tcl language, but they make lots of things sooo much easier to code.

Otherwise, double dereferencing shoud be made so awkward.

As in the algorithm types I most frequently code in Tcl/Tk constructed names and double dereferencing are frequently the way to go, let me repeat myself: now " ... variable and command names you never in your right mind would think of using are accepted in definitive places just to prevent (rather make very awkward) the use of constructed names." That says it all.

(presumably) NJG: BTW, I don't think I suggested the dropping of the present array notation above!

Lars H: I never claimed you did. I remarked that your suggested juxtaposition as array indexing would not, given your suggested equivalence of $<string> and <string>, in the case of double indexing work the way that you indicated you wanted it to work.

FB: IMHO, as much as I love Tcl, if there is a point that is inconsistent, it's the variable substitution syntax. For example, the ${name} form does not follow the same brace matching rules as for rule [6]. I.e. the following code produces surprising results:
% set {var} foo
% puts ${{var}}
can't read "{var": no such variable

That's because the parser stops at the first close brace, whereas rule [6] balances them properly. Worse, there isn't any way to work around this limitation other than using set directly. I understand that this is the intended behavior as per rule [8], but I fail to see the overall consistency. And this is where the "$ is a shortcut for set" mantra fails.

This is one of the issues I try to address with Cloverfield. See Cloverfield - Tridekalogue rule [8] (and please try to make abstraction of the other propositions such as rule [6] and their implications to variable substitution), as well as allowing extended syntax for easier subscripting (see also Tcl 9.0 WishList #72 and Better Arrays for Tcl9). In short, I propose that $ be a prefix to words (or parts of words) that would be normally parsed and substituted, and whose resulting value would designate the variable name. This name part could also be suffixed by an "index" part (e.g. array subscript enclosed between parentheses, like the current syntax). For example, the following syntaxes would be accepted:
$name                  # Same as with Tcl
${name}                # Ditto
${{name}}              # Same as [set "{name}"]. Uses rule [6].
$"string with spaces"  # Same as with {} but using rule [4]
$[proc]                # Same as [set [proc]]

The index part allows vector and keyed access semantics using an interface concept borrowed from Feather:
$name(foo bar)         # Same as [dict get $name "foo bar"]
$name{1 2 3}           # Same as [lindex $name 1 2 3]

There can be several index parts:
$name(foo)(bar)        # Same as [dict get $name foo bar]
$name{1}{2}{3}         # Same as [lindex $name 1 2 3], but redundant with $name{1 2 3}
$name{1}(foo)          # Same as [dict get [lindex $name 1] foo]
$name(foo){1}          # Same as [lindex [dict get $name foo] 1]

Unfortunately we need distinct syntaxes for vector (numerical index) and keyed accesses (strings) because EIAS, and dicts are also valid lists. Besides, I don't want to fall into the same trap as PHP.

These changes of course potentially break compatibility. However the case about brace matching still stands IMHO and should not cause any compatibility issue (I don't want to meet the kind of psychopathic maniac that would use braces in variable names).

NJG 2005-02-09: Lars, I really appreciate your effort but you too are not reflecting on what I am saying here i.e. that the rules should be modified to provide a consistent result with regard to the two types of dereferencing. As it stands the only true way to get the value of a defined variable is with set . Which is even a more awkward feature of Tcl than the expr . Both lay Tcl open to ridicule, which as you know kills.

MS: would be interested in seeing a concrete proposal for the modified rules that provide that consistent result.

DKF: I think you're way wrong, NJG, but just because I think that doesn't mean that I'm right. Hence if you want to continue this discussion, I urge you to produce a full proposal.

RHS: s/consistent/consistent and doesn't break up to millions of lines of existing Tcl code/

RS: set is indeed the original true way, and accepts any string as variable name. The $ parser was added as a convenience, and follows typical variable-naming rules ([A-Za-z0-9_]), if the name is not braced. That's part of its convenience, it "does what I mean" :)
puts this:$this,that:$that,etc...

I see no problem there.

Peter da Silva: RS is correct. The behavior of $ is deliberate, and should not be changed. There's only one thing in the behavior of $ that was accidental. This is the meaning of $(...)... and in practice this is interpreted as set "(...)". This was something I wanted to make use of right at at the beginning when Karl and I were designing hashes. I wanted $(expression) to be similar to expr {expression} *except* that variables would not need to be preceded by '$'. Eg... $(a+1) would be expr {$a + 1}`... bringing variables and expressions closer together.

SYStems: Basically I want to make sure I understand {*} right! I tried before reading its explanation about it, until I reread the first chapter from Practical Programming in Tcl and Tk. Tcl does grouping first substitution second, this means the result of a substitution cannot affect the number of words a command receive, so is {*} a trick to overcome this? So far it seems, so. {*} substitute that the result may or may not be several words!

I always wonder why expand, was not implemented as a command, but this is now more obvious, {*} affect the behavior of the Tcl parser, thus this new feature cannot be implementated as a command, it's a rule Tcl must always follow.

I wonder why no a more sensible syntax, like [:a b c:], or [| a b c |] or or something or that nature, two consecutive brackets of different types, anyway ... this is not very important

Also,not very important

would
cmd a{*}{b c} d {*}{e f}

result in
cmd ab c d e f 

or what?

Lars H: The reasons not to use e.g. [: ... :] are

  1. It already means something different (command substitution with a command named :). {*}$x was prior to 8.5 a syntax error.
  2. It's unTclish. None of the Tcl syntax rules requires looking at more than one character at a time for determining what to do, but this would lend a special significance to a pair of characters.

Regarding your question, that should result in
cmd a{*}{b c} d e f

since {*} is only special at the beginning of a word; the same is by the way true for left braces and quotes too, so if you try the above you'll find that the b and the c end up in different arguments.

Terminology

escargo 2005-04-25: There are some aspects of terminology in these rules that I find bothersome.

quote - In the history of punctuation, quoting usually has been a punctuation convention that uses opening and closing quotation marks. These marks very from language to language, and even in American English and British English. They still have in common the notion of balance (under most circumstances). Similarly, in many programming languages, there is the notion of an escape character and an [1]escape sequence. This being the case, I find the use of the backslash to be an instance of an escape character and not of something being quoted. Rule 6 uses the phrase "quoted with a backslash." I don't believe that backslashes quote anything. They are escape characters and they escape things.

I also propose changing the wording in certain places to change; for example in Rule 1: "unless quoted" would become "unless quoted or escaped".

forward references - Backslash substitution is referenced before it is defined. This might be due to a more analytic style versus a synthetic style. I prefer an order where terms are defined before they are used. I would move definition of things like backlash substitution before they are used.

word versus token - Is "" a word? Is "{}" a word? Is "." a word? All of these are character strings that can be called words in the processing of Tcl inputs. I would be more comfortable with the term token instead of word in these rules. (Maybe it's my compiler-writing training.)

end of file - An end of file is a command terminator. Perhaps this was not thought to be worth mentioning, but I was experimenting to see if a file that ends before a newline would be recognized as complete or not; it did seem to be.

That's probably enough for now.

Lars H:

Re quoting: Not all languages use distinct opening and closing quotation marks. In Swedish, the rule has rather been to always use the same character to open as to close! (For guillemet quoting there is a modern tradition to quote as »citera«, i.e., the German style, but otherwise it's the \u201D both for opening and closing.)

Re tokens: I think there's a point in not using this term, because Tcl's syntax really isn't like that of languages where you tokenise code. Tcl can be understood on a character-by-character basis, which is only very rarely the case with other languages -- they instead have to be explained in terms of tokens.

Re end of file: This is really a special case of "end of script". Does it need to be pointed out that a command ends when the script it is part of ends?

escargo: The American English opening and closing quotation marks we are using (" and ") are not distinct. (Although different types of quoting could use symmetrical marks, it is clearly not required.) The point is that there two marks, one at each end. So if there is only one mark, then it is not quoting.

Tcl does tokenize code, but it calls the pieces words even if they don't have any letters. I know it's my own preference, but I expect something called a word to be made up of letters, and when it isn't it grates on my metaphorical nerves.

If a line does not end with a newline, can it be a valid statement or command? It's a corner case that I would like to see covered somewhere (even if not in the sacred succinct description here).

MS: (not 100% sure I understood the question, still trying an answer). Of course it can be valid. The rules do not mention lines at all - only that naked newline characters are command separators and comment terminators.

escargo: Things can get confusing where there are separators and terminators. For example, I think PL/I used semicolons as statement terminators; Pascal used semicolons as statement separators. If a newline was a statement terminator, then a file that ended without a newline at the end might be erroneous for the last statement.

Duoas 2008-10-28:

just because some of the above questions on terminology are left unanswered)

A quote has nothing to do with punctuation or balance. It has to do with how the object is treated. For example, if I were to quote Hamlet: "Alas! poor Yorick! I knew him, Horatio" (act 5, sc. 1, lines 202-3), one would expect to be able to verify that the text is unchanged. If I say "Alas poor Yorick! I knew him well!", then I have failed to quote it, since Hamlet never said that. (Another favorite: "Beam me up Scotty!")
So to 'quote' a word means that it is not modified. The mechanism for marking a quote can be anything.
I think the phrasology comes from functional languages (LISP or Scheme or some other variant?)

A word is a grammatical term to indicate a distinct unit of meaning. It is composed of more atomic units. In English, words are combinations of letters, which have no individual value (in modern language, at least). In hardware, a WORD is often composed of two BYTES, but it is considered one value, not two. In computer languages, a word (if it is so called, as it is in Tcl) represents a separable unit of meaning. Hence, in Tcl the string
puts {Hello world!}

consists of two words: puts and Hello world!. The value (or meaning) of the word depends on context. In contrast, a token is not necessarily a word.

As for terminators, end of media is always end of content. Does that really have to be spelled out in the Rules?

LV: the definition of "letter" in the section on valid characters (that can be used in a variable name without brace quoting) should be clarified, IMO. I've seen tutorials that say that the word means 7 bit ASCII alphanumeric characters. I've also seen, here on the Wiki (for instance, on What kinds of variable names can be used in Tcl), developer claims that they have tried using things like umlauts in variable names without needing the braces.

So, some clarification is in order, in my opinion.

wdb: The new expansion rule with {*} enforces us to rewrite the 11 rules. Let us take this as an occasion to find some completely new description.

Here my proposal for replacement of 11 rules.

  • I have optimised by the rule KISS.
  • I have tried to avoid forward references in the description, only backwards.
  • I have written it as if I wanted to write a parser for Tcl (which is not the case).

The {*}expansion is described as step in substitution before introducing the terminus list. This way there is no needs to formulate some exception rule. I don't like exceptions at all in basic explanations.

Note that the description of \ at line end is not where the description of \ as escape sequence is but instead where in the program flow I would suppose it. My intention was: Straight ahead, Sir!

aspect: some behaviour I just picked up on that I find confusing, and would like to make a note of:
% puts "\}"
}
% puts "\{"
}

.. as expected. But then:
% puts {\{}
\{
% puts {\}}
\}

.. and for completeness:
% puts {\\}
\\

This is an interaction of rules [6] and [9], but in my eyes an inconsistency with:
% puts {\
}

%

In the case of backslash-newline, the backslash "escapes" or "quotes" the newline and is removed (rule [9]). In the case of backslash-brace, the backslash escapes the brace but is not removed (rule [6]). This inconsistency hurts as there's no way to embed a lone { or } in a {}-quoted string!

It gets even more mind-melting when you try and express what's going on using expr:
% expr {{\{}=="\\{"}

.. is not a complete command. You need instead to type:
% expr {{\{}=="\\\{"}
1

.. which hurts my brain.

Script

A Tcl script is a string.

Separator

If that string has a non-escaped semicolon (;) or newline, then this character serves as separator such that another script can follow.

Substitution

Before execution, the interpreter substitutes the string from left to right as follows:

  1. If a line is terminated by the backslash character (\), then the following line is taken as continuation of current line where the backslash and following spaces are replaced by a single space.
  2. If the line starts with zero or more spaces and the character (#), then the line is taken as comment. It is not executed.
  3. If there is an opening brace ({) in th string, then finding the matching counterpart (}) has precedence above the separator, i. e. the group can contain newlines and semicolons. This section is protected against further substitution.
  4. If a section is a double quote ("), then finding the closing double quote has precedence above the separator, i. e. the group can contain newlines and semicolons.
  5. If a section of the string is enclosed in brackets ([...]), then the inner area is recursively executed as an embedded script, and the bracketed section is replaced by the result of the inner script.
  6. If there is a dollar sign followed by some alphanumerical characters such as $abc, then it is substituted as if there were set abc] instead. If a dollar sign is followed by a properly braced sequence such as ${abc}, then it is substituted as if there were set {abc}. Note that the set returns the value of a variable such as abc.
  7. If there is a backslash (\), then it escapes the following character as follows: a -- bell; b -- backspace; f -- form feed; n -- newline; r -- carriage return; t -- tab; v -- vertical tab; ooo -- octal number; xhh -- hex number; uhhhh -- unicode character; all other chars escape to themselves, especially braces and brackets described above.
  8. If there is a character sequence {*} immediately followed by a non-space character, then the sequence {*} and the following group is replaced by the elements of that that group taken as list (without grouping characters).

After these steps, the string is transformed to a properly-formed list, the first word of which is taken as procedure name, and the remaining arguments are taken as arguments (Polish notation). Then, the procedure belonging to that name is executed.

List

A string is a proper list if it contains words where:

  1. A word consists either non-space characters or a group delimited by double quotes (") or properly balanced braces {...}.
  2. A word is separated from its neighbours by space characters.

Note that if a word enclosed by double quotes ("...") or braces ({...}) is followed by a non-space character, then the string is not a proper list. If there is a neighbour, then the delimiting space character is mandatory.

CMcC can never remember whether he means endekalog or dodekalog, so uses *dekalog to mean both of them, and more (please refer to context for actual meaning.)

Historical  edit

The following information is obsolete, but retained for historical purposes

Documentation for Argument expansion

MS 2003-10-26: The acceptance of Argument expansion with leading {*} requires a new rule, and slight editions of the older ones.

This page holds a proposed wording for the new Tcl(n) manual page. I'd appreciate comments and suggestions.

escargo 2005-04-22: What about corrections for spelling and grammar? Should those be made in line?

escargo 2005-04-25: Hearing no objection, I'm going to start fixing spelling and grammar. I have some more editorial and conceptual issues, but I will start some discussion of those issues at the bottom.

LV: how many of the following changes have submitted for inclusion into Tcl 8.5?

Required Changes

  • add new rule [5], shift old rules [5]-[11] up by one
  • modify rule [6] Braces to acknowledge the {*} exception
  • modify rule [12] Substitution and word boundaries to acknowledge the {*} exception