Tcl Rules Redux

The Tcl rules fully specify both list and script syntax, but it isn't always clear which parts apply to which syntax. Tcl Rules Redux specifies the same language, first describing list syntax and then describing script syntax in terms of list syntax.

Lists

A list is a string containing a sequence of words separated by whitespace.

Backslash interpretation: \ and the subsequent character are interpreted simply as that subsequent character, which is useful for representing characters such as ", \, braces, and whitespace, that normally have special meaning. The following backslash sequences have special interpretation:

\a
Audible alert (bell) (Unicode U+000007).
\b
Backspace (Unicode U+000008).
\f
Form feed (Unicode U+00000C).
\n
Newline (Unicode U+00000A).
\r
Carriage-return (Unicode U+00000D).
\t
Tab (Unicode U+000009).
\v
Vertical tab (Unicode U+00000B).
\\
Backslash (“\”).
\ooo
The digits ooo (one, two, or three of them) comprise an eight-bit octal value for a Unicode character in the range 000–377 (i.e., the range U+000000–U+0000FF). The parser stops just before this range overflows, or when the maximum of three digits is reached. The upper bits of the Unicode character are 0.
\xhh
The hexadecimal digits hh (one or two of them) comprise an eight-bit hexadecimal value for a Unicode character in the range U+000000–U+0000FF. The upper bits of the Unicode character are 0.
\uhhhh
The hexadecimal digits hhhh (one, two, three, or four of them) comprise a sixteen-bit hexadecimal value for a Unicode character in the range U+000000–U+00FFFF. The upper bits of the Unicode character are 0.
\Uhhhhhhhh
The hexadecimal digits hhhhhhhh (one up to eight of them) comprise a twenty-one-bit hexadecimal value for the Unicode characterin the range U+000000–U+10FFFF. The parser stops just before this range overflows, or when the maximum of eight digits is reached. The upper bits of the Unicode character are 0.

The range U+010000–U+10FFFD is reserved for the future.

If a word is enclosed in quotes ("), it consists of the string between the quotes. The word is subject to backslash interpretation.

If a word is enclosed in braces ({}), it consists of the string between the braces. Brace pairs occuring within the word are ignored for the purpose of finding the matching enclosing brace. The word is not subject to backslash interpretation but a brace preceded by a backslash is ignored for the purpose of finding the matching enclosing brace.

Scripts

A script is an ordered sequence of lists separated by a newline or semicolon. Scripts start with the rules for lists and add the following rules:

Commands: Each list is a command. The first word of each command is the name of the command, which is used to locate a corresponding command routine, to which any remaining words are passed for evaluation.

Backslash Interpretation: There is one additional special backslash sequence:

\ <newline>?<whitespace>?
A backslash character immediately followed a newline character is replaced by a single space, and any immediately subsequent string of space and tab characters is removed. This backslash sequence is unique in that it is replaced in a separate pre-pass before the command is actually parsed. This means that it is replaced even when it occurs between braces, and the resulting space is treated as a word separator if it is not in a braced or quoted word.

Comment: A number sign (#) at the beginning of a command and not otherwise escaped begins a comment that ends at the first newline. A newline that is preceded by a backslash is ignored for the purpose of finding the end of the comment.

Substitutions occur at the beginning, within, or at the end of a word, and do not change the boundaries of the word. A word enclosed in braces is not subject to script or variable substitution.

Script substitution: A string enclosed in brackets at any position in a word is interpreted as a script and is replaced by the result of the evaluation of that script, i.e. by the result of the final command in the script.

Variable substitution: $ followed by a variable name is replaced by the value of the corresponding variable. The variable name is not subject to backslash interpretation or script substitution. If enclosed in braces, the variable name is composed of all characters up to the matching right brace. Otherwise, the variable name is composed only of letters, digits, underscore, the empty string, or namespace separators. In a variable name, a pair of parenthesis encloses the name of a member variable within a named array. The member variable name is subject to backslash interpretation and substitutions.

List Expansion: If a word of a command is prefixed by {*}, it is processed as usual, and the resulting value must be a list. The word and its {*} prefix are replaced by the elements of the list, each of which becomes an individual word in the command.


bll 2016-8-16: (a) It is not entirely clear from this ruleset that a word can be created without either the use of either quotes or braces. I would make that clearer. (A word without quotes or braces is bounded by whitepace....). (b) "The variable name is not subject to backslash interpretation or script substitution." : This doesn't seem to be quite true. I think a clearer explanation will be necessary (or obviously my interpretation of the description is entirely different). The example below certainly seems to be subject to script substitution.

% set {[abc]} xyz
xyz
% puts $[abc]
invalid command name "abc"
% puts ${[abc]}
xyz

PYK 2016-08-17: I see your point about the lack of an explicit statement regarding the nature of words. One one level, the phrase, "A list is a string... separated by whitespace" does explain this, just as the phrase, "A script is a string..." does in the official rules. For a little more clarity (or doubt), I've added "if" to the beginning of the statements about quoted and braced words.

Regarding backslash interpretation and script substitution in variable names, in $abc, abc isn't part of the variable name. If it had been a valid command, a substitution would have occurred and the resulting string would have a literal $ followed by the result of the substitution. Your comment did, however, prompt me to remove the sentence, "Any other character marks the end of the variable name", which seemed unnecessary.

bll 2016-8-17: (a) is good. For (b) I think the The variable name is not subject to backslash interpretation or script substitution. phrase is just confusing. Is it necessary? It makes no sense to me. At the level of my example it does happen. And at the next level, since there are never double substitutions, these can't happen. (c) For list expansion: I find ...resulting value...' unclear. Result of what? Something more like "....it is processed as a list..."? Or just remove the word 'resulting'. And feel free to remove my comments at any time.