Vkit is a vector engine

VKIT is a a new vector engine, built from the ground up to explore the ability to use Tcl as a foundation for high-performance vector handling.

If you happen to think that vectors are for numerical, statistical, and financial propeller-heads only, think again: the general-purpose MetaKit database library is based on a highly vectorized (column-wise) design, and is being used for storage of a very large variety of data structures. There's a graph package with XML interface to illustrate that vectors needn't be about numerical data at all, nor even about homogeneous data.

Anyway, back to VKIT.

This project uses "memory-based vectors" as a central mechanism (more info can be found in "mvec.README"), which has a fast implementation in C, as well as a much slower one in pure Tcl, but which is highly portable.

The VKIT code itself is all in Tcl. More C-coded extensions may be added over time, if some significant performance benefit is expected, but the main logic will be in Tcl - relying on dual-representations for speed.

One of the things VKIT deals with, is cleaning up after itself. When manipulating vectors, a lot of intermediate results may be needed for a brief period of time. With an objectified command interface, one would have to constantly do "rename $v {}" to get rid of intermediate vectors.

VKIT approaches this from a different perspective. It is mostly about data (though scripts play a key role), and it seemed that a variable- centric design would make more sense and be more effective here. That is why VKIT manages vectors as variables (in its own private namespace), while generating unique names as usual. Underneath is a complete little ref-counting mechanism (in Tcl).

An example (assuming "namespace import vkit::*" preceeded this):

    set v [vec 1 2 3 5 8]
    puts [vlen $v]   ;# displays "5"

Ah, if only everything were that simple:

    proc ex1 {} {
      return [vec 1 2 3]
    }
    puts [vlen [ex1]]  ;# alas: boom, error (and not a nice one yet)

Cleanup is based in local scopes, usually of the calling proc. The above needs a bit more work to properly pass a reference "up" to the caller:

    proc ex1 {} {
      return [vkeep [vec 1 2 3]]
    }
    puts [vlen [ex1]]  ;# displays "3"

Instead of going into the intricate details of this, which have *not* yet all been worked out (but which do appear 100% solvable so far), let's now move on and assume references work properly and naturally.

A few quick examples to get a feel for VKIT, adapted from the code below. FYI, the "vlist" proc converts a VKIT vector into a Tcl list.

    puts [vlist [vrev [vec 1 4 9 16 25]]]
        # displays "25 16 9 4 1"

    proc square {x} { return [expr {$x * $x}] }
    puts [vlist [vapply [vec 1 2 3 4 5] square]]
        # displays "1 4 9 16 25"

    puts [vlist [vsquash [vec {1 one} {2 two} {3 three}] 2]]
        # displays "1 one 2 two 3 three"

    set v2 [vec 4 3 2 1 0 1 2 3 4]
    puts "vfilter = [vlist [vfilter $v2 {$x % 2}]]"
        # displays "3 1 1 3"

There is already quite a package of little "vector operators", all coded as "vkit::v*" procs. The best way to get an idea of what they do, is to go through the sample code at the end and examine the output it produces. Much of this was slapped together quickly, just to have a basic toolset. The exact set of primitives, and their names, is still evolving.

In the examples, you will notice that most are "synthesizing", i.e. they generate a new result from existing data. That reflects the fact that VKIT is geared towards efficient access first, and modification second. The project will continue to emphasize this for quite some time to come.

Nevertheless, there is a "vset", which performs element-wise modification. Inserts / deletes are going to be implemented later, in an unconventional way... stay tuned.

All the examples here work with basic vectors, but that is just for the sake of simplicity. Dealing with multi-column data is no different, in fact a relatively complete MetaKit datafile reader has been built on top of all this. Enough said for now.

There is much more to be said about VKIT, but I wanted to get a few ideas out quickly. The source code for everything is available on the web, at:

    http://www.equi4.com/pub/etc/vkit.tar.gz

This includes VKIT (vkit.tcl, core and sample code in one), as well as the core components "mvec" and "mmap". Both are described in more detail in separate pages, see "*.README" in the distibution.

This code is still very young. VKIT, JOLT, et al. is work in progress.

Script:


  puts "version = [package require jolt]"

  source jolt.tcl
  namespace import vkit::*

  proc vectry {} {
    set v1 [vec 1 2 333 4 5]
    puts "vec $v1 length [vlen $v1] contents [vlist $v1]"

    vset $v1 2 3
    puts "v1 = [vlist $v1]"

    set v2 [vec 4 3 2 1 0 1 2 3 4]
    puts "v2 = [vlist $v2]"

    proc square {x} { return [expr {$x * $x}] }
    set v3 [vapply $v1 square]
    puts "vapply = [vlist $v3]"

    set v4 [vident 7]
    puts "vident = [vlist $v4]"

    set v5 [vconst 10 7]
    puts "vconst = [vlist $v5]"

    puts "vunary = [vlist [vunary $v1 {$x * $x * $x}]]"
    puts "vbinary = [vlist  [vbinary $v1 $v3 {$y - $x}]]"
    puts "vpair = [vlist [vpair $v4 $v5]]"
    puts "vorder = [vlist  [vorder $v2]]"
    puts "vmap = [vlist [vmap $v3 $v2]]"
    puts "vsort = [vlist [vsort $v2]]"
    puts "vrev = [vlist [vrev $v3]]"
    puts "vslice = [vlist [vslice $v3 3 3 -1]]"
    puts "vmesh = [vlist [vmesh $v1 $v3]]"
    puts "vpile = [vlist [vpile $v1 $v3]]"
    puts "vconcat = [vlist [vconcat $v1 $v2]]"
    puts "vrepl = [vlist [vrepl $v1 3]]"
    puts "vdup = [vlist [vdup $v1 3]]"
    puts "vfold = [vlist [vfold $v2 3]]"
    puts "vsquash = [vlist [vsquash [vec {1 one} {2 two} {3 three}] 2]]"
    puts "vfilter = [vlist [vfilter $v2 {$x % 2}]]"
    puts "vsieve = [vlist [vsieve [vunary $v2 {$x % 2}]]]"

 vkit::vdumpall VECTRY
  }

  vectry
 vkit::vdumpall END

Output (SuSE 7.1 Linux, PIII/650) - PURE TCL RUN:


  vec v::1 length 5 contents 1 2 333 4 5
  v1 = 1 2 3 4 5
  v2 = 4 3 2 1 0 1 2 3 4
  vapply = 1 4 9 16 25
  vident = 0 1 2 3 4 5 6
  vconst = 10 10 10 10 10 10 10
  vunary = 1 8 27 64 125
  vbinary = 0 2 6 12 20
  vpair = {0 10} {1 10} {2 10} {3 10} {4 10} {5 10} {6 10}
  vorder = 4 3 5 2 6 1 7 0 8
  vmap = 25 16 9 4 1 4 9 16 25
  vsort = 0 1 1 2 2 3 3 4 4
  vrev = 25 16 9 4 1
  vslice = 16 9 4
  vmesh = 1 1 2 4 3 9 4 16 5 25
  vpile = 1 2 3 4 5 1 4 9 16 25
  vconcat = 1 2 3 4 5 4 3 2 1 0 1 2 3 4
  vrepl = 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
  vdup = 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
  vfold = {4 3 2} {1 0 1} {2 3 4}
  vsquash = 1 one 2 two 3 three
  vfilter = 3 1 1 3
  vsieve = 1 3 5 7

  vdumpall VECTRY ++++++++++++++++++++++++++++++++++++++++++++
     1   9 refs ->             lindex {1 2 3 4 5} 5                            
     2   6 refs ->             lindex {4 3 2 1 0 1 2 3 4} 9                    
     3   7 refs -> 1           'apply v::1 square 5                            
     4   2 refs ->             - 0 7                                           
     5   2 refs ->             0 10 7                                          
     6   1 refs -> 1           'unary v::1 {$x * $x * $x} 5                    
     7   1 refs -> 1 3         'binary v::1 v::3 {$y - $x} 5                   
     8   1 refs -> 4 5         'pair v::4 v::5 7                               
    11   1 refs ->             lindex {{0 4} {1 3} {1 5} {2 2} {2 6} {3 1} {3 7}
    12   1 refs -> 11          'order v::11 9                                  
    13   1 refs -> 2 3         'map v::3 v::2 9                                
    16   1 refs ->             lindex {{0 4} {1 3} {1 5} {2 2} {2 6} {3 1} {3 7}
    17   1 refs -> 16          'order v::16 9                                  
    18   1 refs -> 2 17        'map v::2 v::17 9                               
    19   1 refs -> 3           'rev v::3 5 5                                   
    20   1 refs -> 3           'slice v::3 3 -1 3                              
    21   1 refs -> 1 3         'mesh {v::1 v::3} 2 10                          
    22   1 refs -> 1 3         'pile {v::1 v::3} 5 10                          
    23   1 refs -> 1 2         'concat {v::1 v::2} {5 9} 14                    
    24   1 refs -> 1           'repl v::1 5 15                                 
    25   1 refs -> 1           'dup v::1 3 15                                  
    26   1 refs -> 2           'fold v::2 3 3                                  
    27   2 refs ->             lindex {{1 one} {2 two} {3 three}} 3            
    28   1 refs -> 27          'squash v::27 2 6                               
    29   1 refs ->             lindex {3 1 1 3} 4                              
    30   1 refs -> 2           'unary v::2 {$x % 2} 9                          
    32   1 refs ->             lindex {1 3 5 7} 4          
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


  vdumpall END +++++++++++++++++++++++++++++++++++++++++++++++
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

APN was not aware of VKit and reinvented the wheel as TArray.


(PYK: migrated this interchange to this page from Numerical Analysis in Tcl)

LV: Any chance of seeing 64/128 bit int support added?

Ehm, 64-bit was already in - as byte array for now, can be adjusted to DKF's new "wide" type once it is ready - JCW