option parsing using ensemble

Lars H, 2008-08-19: One of Tcl's strengths is that every core command can be "patched" at the script level, by providing a substitute proc — the old

 rename file __file
 proc file {subcommand args} {
     if somecondition then {
        # Handle special case
     }
     __file $subcommand {*}$args
 }

pattern. Some commands are however rather difficult to patch this way, because they have tricky syntaxes; this is for example the case with glob and regexp, which follow the general pattern

command ?-option ...? ?--? arg ?arg ...?

By contrast, it's much easier to handle commands with Tk-style -option value sequences at the end, such as fconfigure and namespace ensemble… Still, sometimes one has to bite the bullet.

The idea for the code below is to misuse the ensemble mechanisms to parse beginning-of-arguments options of a command — the example globby has the same syntax as glob. Some points to note are:

  1. Each defined option is really a proc in a particular namespace. That namespace is dedicated to parsing the options.
  2. The option-parsing procs are wrapped up as an ensemble (globby_parse::globby). This solves the problem that unique abbreviations of option names are allowed.
  3. Missing argument errors (for options that take arguments) are thrown by Tcl when calling the proc that parses this option.
  4. The ensemble must have an -unknown command to handle end-of-options without an explicit --.
  5. Information from parsed options are kept in an array Opt, which is accessed by upvar 1. The actual location for this array is in the local context of globby_parse::start procedure.

The following is an improved implementation from 2008-10-05.

 proc globby {args} {
     lassign [globby_parse::start $args] opts patterns 
     # Do actual operation... In this demo we just return parse results.
     list $opts $patterns
 }

 namespace eval globby_parse {
     namespace export -*

     proc start {arglist} {
         array set Opt {complain 1 tails 0}
         set patterns [globby {*}$arglist] ; # Has side-effects
         return [list [array get Opt] $patterns]
     }

     proc -directory {directory args} {
         upvar 1 Opt Opt
         set Opt(dir) $directory
         globby {*}$args
     }
     proc -join {args} {
         list [join $args [file separator]]
     }
     proc -nocomplain {args} {
         upvar 1 Opt Opt
         set Opt(complain) 0
         globby {*}$args
     }
     proc -path {pathPrefix args} {
         upvar 1 Opt Opt
         set Opt(path) $pathPrefix
         globby {*}$args
     }
     proc -tails {args} {
         upvar 1 Opt Opt
         set Opt(tails) 1
         globby {*}$args
     }
     proc -types {typeList args} {
         upvar 1 Opt Opt
         set Opt(types) $typeList
         globby {*}$args
     }
     proc -- {args} {return $args}
     proc unknown {cmd opt args} {
         if {[string match -* $opt]} then {
             return -code error\
               "bad option \"$opt\": must be -directory, -join,\
               -nocomplain, -path, -tails, -types, or --"
         } else {
             list [namespace which --] $opt
         }
     }

     namespace ensemble create -command [namespace current]::globby\
       -unknown [namespace which unknown]
 }

Some examples:

 % globby *
 {tails 0 complain 1} *
 % globby a* b*
 {tails 0 complain 1} {a* b*}
 % globby  -- a* b*
 {tails 0 complain 1} {a* b*}
 % globby  -join a* b*
 {tails 0 complain 1} a*/b*
 % globby  -nocomplain a* b*
 {tails 0 complain 0} {a* b*}
 % globby -dir ~ -nocomplain a* b*
 {tails 0 dir ~ complain 0} {a* b*}
 % globby -dir ~ -nocomplain -foo a* b*
 Error: bad option "-foo": must be -directory, -join, -nocomplain, -path, -tails, -types, or --
 % globby -dir ~ -nocomplain -path
 Error: wrong # args: should be "globby -path pathPrefix ..."

A tricky detail is that the ensemble and the main command have the same name (although in a different namespace). This makes the last error look like it is for the main command, even when Tcl encounter them for the ensemble.


DKF, about a previous version [L1 ] of the above: It would be interesting to see how this changes when using tcl::unsupported::tailcall instead of uplevel 1.

Experimentation leads to this (with gratuitous use of dicts...):

proc globby {args} {
    set opts {complain 1 tails 0}
    if {[catch {globby_parse::globby {*}$args} patterns] == 1} {
        return -code error $patterns
    }
    # Do actual operation... In this demo we just return parse results.
    list $opts $patterns
}

namespace eval globby_parse {
    namespace path ::tcl::unsupported
    namespace export -*

    proc -directory {directory args} {
        upvar 1 opts o
        dict set o dir $directory
        tailcall globby {*}$args
    }
    proc -join {args} {
        list [join $args [file separator]]
    }
    proc -nocomplain {args} {
        upvar 1 opts o
        dict set o complain 0
        tailcall globby {*}$args
    }
    proc -path {pathPrefix args} {
        upvar 1 opts o
        dict set o path $pathPrefix
        tailcall globby {*}$args
    }
    proc -tails {args} {
        upvar 1 opts o
        dict set o tails 1
        tailcall globby {*}$args
    }
    proc -types {typeList args} {
        upvar 1 opts o
        dict set o types $typeList
        tailcall globby {*}$args
    }
    proc -- {args} {return $args}
    proc unknown {cmd opt args} {
        if {[string match -* $opt]} then {
            return -code error\
                "bad option \"$opt\": must be -directory, -join,\
                -nocomplain, -path, -tails, -types, or --"
        }
        list [namespace which --] $opt
    }

    namespace ensemble create -command [namespace current]::globby\
        -unknown [namespace which unknown]
}

Lars H: Yes, I considered that, although I don't know if it would do much for the error messages generated. Recursion depth is not likely to be a problem.

And, for what it's worth, the whole thing really started out as an experiment to see if this could be another application of TIP#314 [L2 ]:

 interp alias {} globby {} globby_parse {complain 1 tails 0}
 namespace eval globby_parse {
     namespace export -*

     proc -directory {D directory args} {
         dict set D dir $directory
         [namespace current] $D {*}$args
     }
     proc -join {D args} {
         -- $D [join $args [file separator]]
     }
     proc -nocomplain {D args} {
         dict set D complain 0
         [namespace current] $D {*}$args
     }
     proc -path {D pathPrefix args} {
         dict set D path $pathPrefix
         [namespace current] $D {*}$args
     }
     proc -tails {D args} {
         dict set D tails 1
         [namespace current] $D {*}$args
     }
     proc -types {D typeList args} {
         dict set D types $typeList
         [namespace current] $D {*}$args
     }
     proc unknown {cmd D opt args} {
         switch -glob -- $opt "-*" {
             return -code error\
               "bad option \"$opt\": must be -directory, -join,\
               -nocomplain, -path, -tails, -types, or --"
         } default {
             list [namespace which swap] $opt
         }
     }
     proc swap {pat D args} {-- $D $pat {*}$args}
     proc -- {D args} {
         # Option parsing complete, now do the thing.
         return [list settings $D patterns $args]
     }

     namespace ensemble create -parameters {settings_dict}\
       -unknown [namespace which unknown]
 }

The code here is more compact, since there is no need to upvar or uplevel anything; the dictionary of settings parsed so far is carried along in a parameter argument. Even the namespace current calls can be done away with, as in the first implementation (I just hadn't thought of that trick yet when I wrote the above)!

The disadvantage of this approach is that the parameter dictionary (which the user should not be concerned with) shows up for missing argument errors of the second or later option, although it is appropriately hidden when the same error occurs for the first option:

 % globby -dir
 wrong # args: should be "globby -directory directory ?arg ...?"
 % globby -tails -dir
 wrong # args: should be "::globby_parse {complain 1 tails 1} -directory directory ?arg ...?"