Updated 2016-07-17 10:31:02 by dkf

Higher level C API edit

Here I present an idea I have had to improve the C API for Tcl.

The basic idea is to allow the API to work at a higher level of abstractions, like a super getops() in GNU C and then allow the information about the command to be query by other (i.e. reflected).

In most cases when writing a Tcl command in C/C++ there are four different things to be managed:

  • the registration of the command with the interpreter,
  • the arguments,
  • the return value.

LV What is the fourth thing?

The return value

The return value is either normal or an error return. Most of the error returns are related to the arguments of the command and typically only one error is related to the actual operation.

Non error returns only have type to worry about. The types are int, double, buffer, and list. The buffer itself may be a string, a byte array, the name of a variable, the name of a array, name of a ?proc?, a snippet of code, or a handle of some kind.

Error returns usually have a string explaining what happened.

The arguments

Arguments have meanings, operations, and values in Tcl.

The values are typical int, double, buffer, and list. But a buffer can be a string, a byte string, name of a list, name of a array, name of a ?proc?, a snippet of code, or some other type of handle (like a file handle).

The operations on a argument may be considered open-ended, but typically are ?subst?, ?expr?, ?join?, ?split?, and ?eval?. Before a value is received from an argument sometimes one or more of these may have to be done to the argument

The last and most complicated part of an argument is its meaning. The natural way to give meaning is just by position. The C/C++ interface in Tcl was (pre-8.0) just an array of string and now (post-8.0) is an array of Tcl_Object(s) so all the arguments are accessed by an index.

Tcl has always encourage people to make as expressive commands as needed. This has made the language very expression but also very complex.

Example:
regexp ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...?

To understand the possible ways commands are given procedure you have to understand idioms about how variables are given meanings, other idioms about has operations are done on arguments, and still other idioms about advance argument types.

Idioms on how arguments are given meaning

1. Position idiom
    The nth argument has meaning x.

    Variations
    1.1. Simple positional idiom
        Position of argument determines meaning.
    1.2. Simple positional idiom with default arguments
        Position of argument determines meaning but may not appear at all.

2. Subcommand idiom
   The name of the command and the first argument together make up the whole name of the command to execute.

   Variations
   2.1. Complete subcommand idiom.
      The first argument is always a subcommand.
   2.2. Incomplete subcommand idiom
      The first argument is not always a subcommand.
      Example:
      after cancel script script script ...
      after ms ?script script script ...?

3. Switch idiom.
   Has one value for existence and another for non-existence.  In Tcl always of the form ?-switchName?.

   3.1. unlimited in position switch idiom
        The switch can appear any where in command.
   3.2. limited to end position switch idiom
        The switch can only appear at end.
        Example: fcopy inchan outchan ?-size size? ?-command callback?
   3.3. limited to a certain place in argument list switch idiom
        The switch can only appear at certain position first non-switch signals end.
        Example: puts ?-nonewline? ?channelId? string
   3.4. limited to a certain place in argument list and ends with "--" switch idiom.
        The switch can only appear at certain position but this position ends with a "--".
        Example: switch ?options? string pattern body ?pattern body ...?

For Tk widgets, switch-style options are always done ?-switchName flagValue?. -- WHD

4. Named value idiom
   After a certain key the value of the key is the next argument.  In Tcl always of the form ?-key value?.

   4.1. unlimited in position named value idiom
        The named-value can appear any where in command.
   4.2. limited to end position named value idiom
        The named-value can only appear at end.
        Example: fconfigure channelId name value ?name value ...?
   4.3. limited to a certain place in argument list named value idiom
        The named-value can only appear at certain position first non-switch signals end.
        Example: socket -server command ?options? port
   4.4. limited to a certain place in argument list and ends with "--" named value idiom.
        The named-value can only appear at certain position but this position ends with a "--".

5. Variable number of arguments idiom.
   The actual number of arguments is unlimited (or at least unspecified).

   5.1. Variable number of arguments of same meaning idiom
      Example: proc sum {args} { set ret 0 ; foreach n $args { incr ret $n } ; return n }

   5.2. Variable number of arguments each a part of an expression idiom
      Example: expr 3 + 4

These are the 14 idioms you might see. To make things worst people will usually use several of these idioms in one command so the possible combinations of idioms can be huge. This makes Tcl very expressive but complex.

Both switch and named value idioms have several other characteristics.
    1) Required or not.
    2) Radio button action (one of a group of options must be present).
    3) Part of name only required
    4) Default values
    6) Can multiple occur.
    7) Are mulitiples values additive or replacive.

Idioms on how operations are performed on arguments.

1) Control structure idiom.
   One or more of parts are snipplets of code with conditionaly may be evaluated.

Idioms on complex values

1. Little language idiom.
   Sometime we make up little languages to compactly encode the values given to an argument.
   1.1. Examples: regular expressions, format statements, end-1?..

2. Limited values idiom
   2.1. For numbers this can be ranges or groups of values.
   2.2. For buffers (strings or byte arrays) this may be some complex structure.
   2.3. For handles this may be a particular type of handle.

Other idioms I have not seen in Tcl.
   1) Use of ??? or some other symbol instead of ?-? in switchs and name value pairs.
   2) Use of ?+? and ?-? to turn on or off switches.
   3) Parameter groups (i.e. [email protected]?).
   4) The use of long and short commands with "--" and "-".
   5) The grouping into one switch multiple single character switches.

Conclusion edit

So far Tcl does little to help implement any of these idioms, again and again people must code these by hand and the information then has to be recaptured in documentation.

What I suggest we do is somehow encode all this information about commands and add APIs so that it becomes very easy for people to write complex command lines in a uniformly short and easy way. This also means that the errors from wrong arguments will be much more uniform. Another feature is that people can reflect this information to help them build static syntax checkers, test routines, and give hints to the compiler on what it can optimize.

Possible uses:
   1) Make coding easier.
   2) Make coding more uniform.
   3) Lower the amount of code.
   4) Allow compiler to optimze arguments and returns.
   5) Make appearent to compiler snippetes that can be compiled.
   6) More uniformity in error messages.
   7) Allow static syntax checkers to check code.
   8) Automatic documentation.
   9) Automatic coverage test of arguments.
   10) In interactive mode (i.e. TkCon) either intellegent name completion or maybe even automatically

generate small GUIs to help enter in commands arguments.

What is needed is.
   1) An API to describe commands.
   2) An API to present this information to the C implementation in an abstract way.
   3) An API that allows others to see our description and ?reflect? it.
   4) Algorithms to deal with all the possiblities.

Well I leave it up to you do you think this is worth pursuing?

Earl Johnson

Discussion edit

The Tcl C API hasn't much support for the idoms, thats right, some general utility functions are provided, like Tcl_WrongNumArgs and Tcl_GetIndexFromObj. In general it looks interesting, what you propose, especially under the viewpoint of compiler optimization and reduced workload.

Most of the things you propose could be done by a code generator, probably written in tcl, that could read a formal description and create the needed C interface code. Doesn't SWIG do something like that already for c code? Introspection would only work with code created by such a code generator, but that may be o.k.

The introspection/reflection capabilities would be a natural extension, if tcl'ers would adopt a automatic documentation system like javadoc. It could be used for tcl level procs and c level commands in the same way. But it should stay optional, as most people do not like writing lengthy metadata for one liner proc's they introduced for readability.

Michael Schlenker

You are right there are several different ways to do this and SWIG is a good option, but can't help but think the Tcl API is just too complicated (for most work).

Earl Johnson

DKF: The problem with higher-level APIs is that they are much more likely to conceal horrible inefficiencies such as needless copies. However, a code generator that you use once to create a starting point, that's reasonable, especially as you then don't need to try to solve every evil edge case. (You don't need anyone's permission to write one of those.)