Version 10 of Tcl Rules Redux

Updated 2016-08-16 23:33:26 by bll

The Tcl rules fully specify both list and script syntax, but it isn't always clear which parts apply to which syntax. Tcl Rules Redux specifies the same language, first describing list syntax and then describing script syntax in terms of list syntax.

Lists

A list is a string containing a sequence of words separated by whitespace.

Backslash interpretation: \ and the subsequent character are interpreted simply as that subsequent character, which is useful for representing characters such as ", \, braces, and whitespace, that normally have special meaning. The following backslash sequences have special interpretation:

\a
Audible alert (bell) (Unicode U+000007).
\b
Backspace (Unicode U+000008).
\f
Form feed (Unicode U+00000C).
\n
Newline (Unicode U+00000A).
\r
Carriage-return (Unicode U+00000D).
\t
Tab (Unicode U+000009).
\v
Vertical tab (Unicode U+00000B).
\\
Backslash (“\”).
\ooo
The digits ooo (one, two, or three of them) give a eight-bit octal value for the Unicode character that will be inserted, in the range 000–377 (i.e., the range U+000000–U+0000FF). The parser will stop just before this range overflows, or when the maximum of three digits is reached. The upper bits of the Unicode character will be 0.
\xhh
The hexadecimal digits hh (one or two of them) give an eight-bit hexadecimal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0 (i.e., the character will be in the range U+000000–U+0000FF).
\uhhhh
The hexadecimal digits hhhh (one, two, three, or four of them) give a sixteen-bit hexadecimal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0 (i.e., the character will be in the range U+000000–U+00FFFF).
\Uhhhhhhhh
The hexadecimal digits hhhhhhhh (one up to eight of them) give a twenty-one-bit hexadecimal value for the Unicode character that will be inserted, in the range U+000000–U+10FFFF. The parser will stop just before this range overflows, or when the maximum of eight digits is reached. The upper bits of the Unicode character will be 0.

The range U+010000–U+10FFFD is reserved for the future.

A word enclosed in quotes (") consists of the string between the quotes. The word is subject to backslash interpretation.

A word enclosed in braces ({}) consists of the string between the braces. Brace pairs occuring within the word are ignored for the purpose of finding the matching enclosing brace. The word is not subject to backslash interpretation but a brace preceded by a backslash is ignored for the purpose of finding the matching enclosing brace.

Scripts

A script is an ordered sequence of lists separated by a newline or semicolon. Scripts start with the rules for lists and add the following rules:

Commands: Each list is a command. The first word of each command is the name of the command, which is used to locate a corresponding command routine, to which any remaining words are passed for evaluation.

Backslash Interpretation: There is one additional special backslash sequence:

\ <newline>?<whitespace>?
A backslash character immediately followed a newline characters is replaced by a single space, and any immediately subsequent string of space and tab characters is removed. This backslash sequence is unique in that it is replaced in a separate pre-pass before the command is actually parsed. This means that it is replaced even when it occurs between braces, and the resulting space is treated as a word separator if it is not in a braced or quoted word.

Comment: A number sign (#) at the beginning of a command and not otherwise escaped begins a comment that ends at the first newline. A newline that is preceded by a backslash is ignored for the purpose of finding the end of the comment.

Substitutions occur at the beginning, within, or at the end of a word, and do not change the boundaries of the word. A word enclosed in braces is not subject to script or variable substitution.

Script substitution: A string enclosed in brackets at any position in a word is interpreted as a script and is replaced by the result of the evaluation of that script, i.e. by the result of the final command in the script.

Variable substitution: $ followed by a variable name is replaced by the value of the corresponding variable. The variable name is not subject to backslash interpretation or script substitution. If enclosed in braces, the variable name is composed of all characters up to the matching right brace. Otherwise, a variable name is composed only of letters, digits, underscore, the empty string, or namespace separators. Any other character marks the end of the name. In a variable name, a pair of parenthesis encloses the name of a member variable within a named array. The member variable name is subject to backslash interpretation and substitutions.

List Expansion: If a word of a command is prefixed by {*}, it is processed as usual, and the resulting value must be a list. The word and its {*} prefix are replaced by the elements of the list, each of which becomes an individual word in the command.


bll 2016-8-16: (a) It is not entirely clear from this ruleset that a word can be created without either the use of either quotes or braces. I would make that clearer. (A word without quotes or braces is bounded by whitepace....). (b) "The variable name is not subject to backslash interpretation or script substitution." : This doesn't seem to be quite true. I think a clearer explanation will be necessary (or obviously my interpretation of the description is entirely different). The example below certainly seems to be subject to script substitution.

% set {[abc]} xyz
xyz
% puts $[abc]
invalid command name "abc"
% puts ${[abc]}
xyz