Updated 2015-10-25 10:46:22 by arjen

Arjen Markus (25 october 2015) Operations on arrays of numbers are quite common, for instance in statistical analyses. A programming environment/language like MATLAB heavily depends on the possibility to apply numerical operations on arrays without going through a for-loop explicitly. Within Tcl we have Christian Gollwitzer's VecTcl extension and of course the BLT package has such features too.

I wanted to see if it was possible to use plain Tcl to achieve the same functionality. Perhaps not as fast and efficient as a compiled extension, but it may be useful to have a pure Tcl implementation.

The idea is this: suppose you have two lists of values and you want to calculate the element-by-element sum of these two. The ordinary way would be something like:
set x {1 2 3 4 5 6}
set y {2 3 4 5 6 1}

set sum {}
foreach x1 $x y1 $y {
    lappend sum [expr {$x1+$y1}]
}

resulting in a list {3 5 7 9 11 7}. If we could define a procedure that simply takes the expression and applies it to the elements of the two lists, we could dispense with the foreach loop:
set x {1 2 3 4 5 6}
set y {2 3 4 5 6 1}

set sum [map {$x+$y}]

The code below achieves just that. It is not as robust as one might want:

  • Internally it uses variables with the pattern "_letters_" - if you use such names in the expression, there may be a name clash.
  • There is no check (yet) that the lists involved have the same length.
  • It is assumed that a variable referred to in the given expression has a scalar value or is a valid list of numbers.
  • There is no provision for missing values - an empty element or a NaN value.

A more peculiar limitation is that the expression needs to be in braces:
   map $x+$y

and even
   map $x

will not be evaluated correctly. I know of no way to detect if a value is already a list to avoid shimmering. The command [string is list $value] may or may not be capable of that.

Anyway, the procedure below can be used as a template for a whole host of array operations. To name a few:

  • Sum a list (or an expression applied to lists) under a given condition. Like: sum all positive values
  • Count the number of values that conform to a given condition
  • Filter on a condition

As an illustration, a possible implementation for the conditional summing of a list of numbers is also given.
proc map {_expr_} {
    #
    # Extract the variables from the expression
    #
    set _nvars_  [lsort -unique [regexp -all -inline {\$[a-zA-Z_0-9]+} $_expr_]]
    set _vars_   {}
    set _values_ {}

    foreach _v_ $_nvars_ {
        set _v_ [string range $_v_ 1 end]
        upvar 1 $_v_ _${_v_}_
        lappend _values_  [set _${_v_}_]

        if { [llength [set _${_v_}_]] > 1 } {
            upvar 1 $_v_ _${_v_}_
            lappend _vars_    _${_v_}_
            append _forvars_ "$_v_ \$_${_v_}_ "
        } else {
            upvar 1 $_v_ $_v_
            lappend _vars_ $_v_
        }
    }

    apply [list $_vars_ [string map [list EXPR $_expr_ FOREACH $_forvars_] {
        set _result_ {}
        foreach FOREACH {
            lappend _result_ [expr {EXPR}]
        }
        return $_result_
    }]] {*}$_values_
}

proc sum {_expr_ {_filter_ 1}} {
    #
    # Extract the variables from the expression
    #
    set _nvars_  [lsort -unique [regexp -all -inline {\$[a-zA-Z_0-9]+} "$_expr_ $_filter_"]]
    set _vars_   {}
    set _values_ {}

    foreach _v_ $_nvars_ {
        set _v_ [string range $_v_ 1 end]
        upvar 1 $_v_ _${_v_}_
        lappend _values_  [set _${_v_}_]

        if { [llength [set _${_v_}_]] > 1 } {
            upvar 1 $_v_ _${_v_}_
            lappend _vars_    _${_v_}_
            append _forvars_ "$_v_ \$_${_v_}_ "
        } else {
            upvar 1 $_v_ $_v_
            lappend _vars_ $_v_
        }
    }

    apply [list $_vars_ [string map [list EXPR $_expr_ FOREACH $_forvars_ FILTER $_filter_] {
        set _result_ 0
        foreach FOREACH {
            if { FILTER } {
                set _result_ [expr {$_result_ + EXPR}]
            }
        }
        return $_result_
    }]] {*}$_values_
}

# Simple test
#
set a 10
set x {1 2 3 4 5 6}
set y {2 3 4 5 6 1}

puts [map {$x + $a * $y}]
puts [map {$x * $y}]
puts [map {cos($x / double($y))}]

puts [sum {$x} {$x > $y || $x > 3}]

Measurements of the performance of this procedure show that there is a small overhead in comparison to a direct implementation with a dedicated [foreach] loop:
set x [lrepeat 10000 [expr {0.1}]]
set y [lrepeat 10000 [expr {0.2}]]

puts [time { map {$x + $y} } 100]

proc sum2 {x y} {
    set r {}
    foreach x $x y $y {
       lappend r [expr {$x + $y}]
    }
    return $r
}

puts [time { sum2 $x $y } 100]

resulted in:
2848.96 microseconds per iteration
2828.11 microseconds per iteration