subSQL

Richard Suchenwirth 2004-07-07 - Digging in old intranet pages, I found this plaything (but in fact in ran in one of our products for a while) written almost seven years ago - quite a newbie I was then:)

In order to collect experience with query/retrieval problems, I have written a short Tcl script ... that simulates an SQL database using an associative Tcl array with the following features:

  • data organization is in entries, all on one level, internally marked by an unambiguous number, which have a varying number of fields with string value (varying length, no limit);
  • for each field name/value combination, an index to the corresponding entry numbers is held;
  • queries use the SELECT clause emulated with sugared Tcl.

The CREATE, DROP, and DELETE commands were also partly emulated. In real SQL, keywords may be upper- or lowercase. In subSQL, most must be uppercase, as used in most SQL documentation. Syntax of subSQL's SELECT is a subset of SQL's:

 SELECT fields FROM table [WHERE cond [AND|OR cond]* ]

fields can be a list of field (column) names, separated by commas but not whitespaces, or "*" for all fields in historical order. table is the name of the Tcl array. cond is of type column OPERATOR value, where OPERATOR is currently implemented for = and LIKE (both doing the same thing). For both, value may contain ANSI-SQL wildcards (_ for ".", % for ".*"), MS-Access wildcards (? for ".", * for ".*") and "superSQL" bracketed alternative lists like [Oo0Q], which is not available in ANSI or MS-Access.

Even better, wildcarding works also for the column name, so subSQL allows wild queries like

 subSQL> SELECT * FROM pb WHERE *=*OMA* AND *=*40* AND *=*AR*
 {FNM=MARK LNM=ROMANO MST=1400}

which yielded a unique result for very few "unspecified recognition results", but took 2.1 sec on sux000. A query with two ANDs took between 0.7 and 0.85 sec on sux000 from a database of 3007 entries loaded in 34..47 seconds.

Of course, subSQL is no real challenger for all other alternatives mentioned above, but it allows the quick adaptability and testability that only some pages of Tcl code can bring.


 #!/tools/bin/tclsh
 # sqltry -- simulated sql parsing in Tcl
 # R. Suchenwirth 97-07-14

 set db(fields) {}
 set db(records) {}

 proc let {_var = expression } {
    upvar $_var var
    set var [expr $expression]
 }

 #----------------------------------------------------------------------
 proc SELECT {fields FROM _db args} {
    upvar $_db db

    regsub -all "," $fields " " fields
    regsub -all " LIKE " $args "=" args
    set hits {}; set hitsl {}; set ress {}
    if {[lindex $args 0]=="WHERE"} {
        foreach a [lrange $args 1 end] {
            if {$a=="OR"} continue
            if {$a=="AND"} {
                lappend hitsl $hits; set hits {}
            } elseif {![regexp "=" $a]} {
                error "WHERE clause must be column=value"
            }
            regsub -all "_" $a "?" a
            regsub -all "%" $a "*" a
            regsub "=" $a "," index
            foreach i [array names db $index] {
                append hits " $db($i)"
            }
        }
        lappend hitsl $hits
        set hits [eval land $hitsl]
    } else {
        set hits $db(records)
    }
    if {$fields=="*"} {set fields $db(fields)}
    foreach hit $hits {
        set res {}
        foreach field $fields {
            if {![catch {set db($hit,$field)} t]} {
                lappend res "$field=$t"
            }
        }
        if {$res != {}} {
            lappend ress $res
        }
    }
    return $ress
 }

 #------------------------------------------list unique, remove duplicates
 proc luniq {list} {
    set t [lsort $list]
    set previous [set res [lindex $t 0]]
    foreach i [lrange $t 1 end] {
        if {$i != $previous} {
            lappend res $i
            set previous $i
        }
    }
    return $res
 }

 #------------------------------------------------- list non-unique
 proc lnuniq {list} {
    set t [lsort $list]
    set res {}
    set previous "some dummy value which is not expected to occur"
    foreach i $t {
        if {$i != $previous} {
            set previous $i
        } else {
            lappend res $i
        }
    }
    return $res
 }

 #------------------------------------------------------- list AND
 proc land {arg args} {
    set t [luniq $arg]
    foreach i $args {
        set t [lnuniq [concat $t [luniq $i]]]
    }
    return $t
 }

 #----------------------------------------------------------------------
 proc data {_db args} {
    upvar $_db db
    let n = [llength $db(records)]+1
    lappend db(records) $n
    foreach i $args {
        set type [lindex $i 0]
        set value [lindex $i 1]
        set db($n,$type) $value
        lappend db($type,$value) $n
        if {[lsearch -exact $db(fields) $type]==-1} {
            lappend db(fields) $type
        }
    }
 }

 #----------------------------------------------------------------------
 data db {fnm William} {lnm Jones} {mst 2641}
 data db {fnm Mary} {mnm Lou} {lnm Smith} {mst 2641} {fct Assistant} 
 data db {fnm William} {lnm Brown} {mst 2642} 
 data db {fnm Martin} {lnm Marietta} {mst 2642} {fct Engineer} 
 data db {lnm Suharto} {mst 2643} {fct Manager} 
 data db {fnm Udo} {lnm Miletzki} {mst PE32} {fct Teamleader} 
 data db {fnm Richard} {lnm Suchenwirth} {mst PE32} {fct Engineer}

 #----------------------------------------------------------------------
 puts "subSQL: a small but growing subset in Tcl"

 set try "SELECT fnm,mnm,lnm,mst,fct FROM db WHERE mst=2641 OR mst=PE32"
 puts "Example: $try"
 eval $try

 while {![eof stdin]} {
    puts -nonewline "subSQL> "
    gets stdin s
    puts [eval $s]
 }

Another simple database is in the same spirit, just not mimicking SQL so closely, but incomparably simpler and more memory-efficient.


A example of this code tweeked to query CSV files can be found here CSV Sql