Version 3 of Tcl Rules Redux

Updated 2015-02-27 14:30:32 by pooryorick

The Tcl rules fully specify both list and script syntax, but it isn't always clear which parts apply to which syntax. Tcl Rules Redux specifies the same language, first describing list syntax and then describing script syntax in terms of list syntax.

Lists

A list is a string containing a sequence of words separated by whitespace.

Backslash interpretation: \ and the subsequent character are interpreted simply as that subsequent character, which is useful for representing characters such as ", \, braces, and whitespace, that normally have special meaning. The following backslash sequences have special interpretation:

\a
Audible alert (bell) (Unicode U+000007).
\b
Backspace (Unicode U+000008).
\f
Form feed (Unicode U+00000C).
\n
Newline (Unicode U+00000A).
\r
Carriage-return (Unicode U+00000D).
\t
Tab (Unicode U+000009).
\v
Vertical tab (Unicode U+00000B).
\\
Backslash (“\”).
\ooo
The digits ooo (one, two, or three of them) give a eight-bit octal value for the Unicode character that will be inserted, in the range 000–377 (i.e., the range U+000000–U+0000FF). The parser will stop just before this range overflows, or when the maximum of three digits is reached. The upper bits of the Unicode character will be 0.
\xhh
The hexadecimal digits hh (one or two of them) give an eight-bit hexadecimal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0 (i.e., the character will be in the range U+000000–U+0000FF).
\uhhhh
The hexadecimal digits hhhh (one, two, three, or four of them) give a sixteen-bit hexadecimal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0 (i.e., the character will be in the range U+000000–U+00FFFF).
\Uhhhhhhhh
The hexadecimal digits hhhhhhhh (one up to eight of them) give a twenty-one-bit hexadecimal value for the Unicode character that will be inserted, in the range U+000000–U+10FFFF. The parser will stop just before this range overflows, or when the maximum of eight digits is reached. The upper bits of the Unicode character will be 0.

The range U+010000–U+10FFFD is reserved for the future.

A word enclosed in " characters consists of the string between the quotes. The word is subject to backslash interpretation.

A word enclosed in braces consists of the string between the braces. Brace pairs occuring within the word are ignored for the purpose of finding the matching enclosing brace. The word is not subject to backslash interpretation but a brace preceded by a backslash is ignored for the purpose of finding the matching enclosing brace.

Scripts

A script is an ordered sequence of lists separated by a newline or semicolon. Each list is a command. The first word of a command is its name, and subsequent words are its arguments. Scripts start with the rules for lists and add the following rules:

One additional special backslash sequence:

\<newline><whitespace>
A backslash and subsequent newline, followed by any combination of space, tab, and newline characters, is replaced by a single space character. This backslash sequence is unique in that it is replaced in a separate pre-pass before the command is actually parsed. This means that it is replaced even when it occurs between braces, and the resulting space will be treated as a word separator if it is not in braces or quotes.

Comment: A # character at the beginning of a command and not otherwise escaped begins a comment that ends at the first newline. A newline escaped by a backslash is ignored for the purpose of finding the end of the comment.

Script substitution: A string enclosed in brackets at any position in a word is interpreted as a script and is replaced by the result of the evaluation of that script, i.e. by the result of the final command in the script.

Variable substitution: $ followed by a variable name, at any position in a word, is replaced by the value of the corresponding variable. The variable name is not subject to backslash interpretation or script substitution. If enclosed in braces, the variable name is composed of all characters up to the matching right brace. Otherwise, a variable name is composed only of letters, digits, underscore, the empty string, or namespace separators. Any other character marks the end of the name. In a variable name, the a pair of parenthesis encloses the name of a member variable within an named array. The member variable name is subject to backslash interpretation and substitutions.

A word enclosed in quotes is subject to script and variable substitution, but a word enclosed in brackets is not.