Updated 2014-12-28 15:45:16 by dkf

Tcl_ParseCommand - function in Tcl's public C API for parsing a Tcl command from a string.

[Mention extension(s) giving script-level access to this function.]

For a pure-Tcl counterpart here on this wiki, see parsetcl.

Official docs

int Tcl_ParseCommand(Tcl_Interp *interp, const char *string, int numBytes, int nested, Tcl_Parse *parsePtr)

There have been several bugs deep in the Tcl core dealing with the parsing of commands that suggest the existing docs for Tcl_ParseCommand don't point out a tricky case well enough. See Tcl Bug 681841.

The end of a Tcl command is determined by the presence of a command terminator character. This might be a newline ("\n") or a semi-colon (";"). When Tcl_ParseCommand is asked to parse a command in a command substitution context (by setting the nested value to true), then the close-bracket character ("]") is also a command terminator.

Tcl_ParseCommand returns its parsing results in a Tcl_Parse structure pointed to by parsePtr. The two fields commandStart and commandSize in the Tcl_Parse struct indicate the substring that was parsed as a valid Tcl command. The substring begins with the byte pointed to by parsePtr->commandStart and includes parsePtr->commandSize bytes. This substring includes the command terminator!

So, for example, if string originally points to foo;bar and numBytes is 7 (requesting the whole string be parsed), then after Tcl_ParseCommand returns, commandStart will point to the f and commandSizewill be 4, indicating the substring foo; was successfully parsed as a Tcl command.

This interface has pros and cons. The main advantage is that it is easy to create a loop that will parse many commands from a script:
  while (...) {
    if (TCL_OK != Tcl_ParseCommand(interp,script,numBytes,0,&parse)) {
      return TCL_ERROR;
    end = script + numBytes;
    script = parse.commandStart + parse.commandSize;
    numBytes = end - script;

The parser takes care of advancing the pointer past the terminator character for us.

A disadvantage is that the caller of Tcl_ParseCommand may really be interested in the string that is the actual command, and not interested in the terminator character. In that case, the caller is burdened with having to strip off the command terminator character.

Finally, we come to the really tricky problem. The substring marked off by the commandStart and commandSize fields of the Tcl_Parse struct only includes a command terminator character when a command terminator character exists. A Tcl command can be terminated without a command terminator character in one special case: when the command is terminated by the end of the string (when numBytes drops to 0).

So, if we pass in a string of foo;bar and a numBytes value of 3, then the substring marked off in the Tcl_Parse struct is foo . Unlike the previous example, there is no ; included in the substring.

So, if a caller is interested in stripping command terminator characters, it has a complex task of having to discover when Tcl_ParseCommand left them in the substring, and when it did not. And the sad truth is that the public Tcl_ParseCommand interface does not provide a simple way to make that discovery.

If the marked substring does not consume all numBytes bytes of the original string argument, then we do know that the last character of the marked substring is the command terminator character that terminates the parsed command. In that case, the caller can know that the actual Tcl command is one character less than the marked substring.

On the other hand, if the marked substring includes characters all the way up to the end of the original string argument, we cannot tell whether the last character is the terminator character for the parsed command, or whether the command terminated just because we ran out of numBytes bytes to parse. If the caller just assumes the last character is the terminator character for the parsed command, it can make errors like those noted in Tcl Bug 681841.

Now cannot tell is a bit of an exaggeration. Certainly if the last byte is not a character like newline, semi-colon, or close-bracket, then we know the answer. But what about when they are? Well, even then, one can probably step through the list of Tcl_Tokens returned in the Tcl_Parse struct. Find out what is the last character that is part of a token. If the last byte is part of the last token, then it is not acting as a command terminator. If the last byte is not part of the last token, then it is the command terminator. So a solution is possible, though painful.

Another solution is easier. The Tcl_Parse struct has another field named term. It is a (const char *) that points to the command terminator character, if one exists, or points to the character one past the numBytes bytes when the command terminates simply because we ran out of bytes to parse. This is easy to test. The only problem is that the term field of the Tcl_Parse struct is documented to be for Tcl's private internal use only.