SYLK

This page was started by HZe, comments are welcome.

An expanded version of this package tclSylk, by CMcC is available in the half-bakery: [L1 ]

To write out simple, table oriented data to interface with other systems, sometimes csv formatted files are used. But if you have to deal with different localization, soon you get screwed up by different configurations of decimal points (dot or comma) and list separators (comma, semicolon). Even worse, on Windows each user may reconfigure the locale settings.

So I tried to use a different format. SYLK is the file format of old Microsoft Multiplan and is still supported by almost all modern spreadsheet programs.

The benefits are

  • ASCII format
  • decimal point is defined as dot (.)
  • some basic formatting is possible
  • cells are addressed by x/y values, so no separator

unfortunately, there is also a drawback

  • format is not documented well (see e.g. [L2 ] which is almost everything you can find)

GWM also see [L3 ].

The package below is able to read in SYLK files and store it into an array. While doing this, some parts are parsed (like the content, the format and the formulas), some parts are just stored (like the header information). This enables the user to read a given SYLK file (e.g. exported by Excel) and access the values with $SYLK(<x>,<y>,value). Also, modifications can be made and the whole structure can be saved back into a SYLK file.

Since there is no access layer (yet), you will have to access the array directly. So, some caution is necessary. Here are some remarks when using the package:

  • if you overwrite a value with a formula, make sure the formula is deleted (unset SYLK(<x>,<y>,formula))
  • format and values are exactly stored as defined in SYLK
  • the code makes heavily use of regexp, so performance can be improved
  • SYLK(header) is used to store all header lines as a TCL list

Although the code is perhaps not complete, it already may be helpful for some others...

2007 Dec 31 - GWM added parsing for further format types, cell names and references. Used in TekSel project.


    #####################################################################
    # Access procedures for SYLK files reading/writing
    #
    # (c) Copyright 2003 LMS Deutschland GmbH
    #                    Holger Zeinert
    #####################################################################
    # feel free to use without any warrenty
    #####################################################################
    #
    # internal data structure of a SYLK file is stored in an array:
    #
    #   SYLK(id)
    #       identification of the SYLK file
    #   SYLK(<x>,<y>,value)
    #       value of cell
    #   SYLK(<x>,<y>,formula)
    #       formula to calculate value
    #       (a cell needs a value; if a formula is given, but no value, the cell
    #       will not be stored into the SYLK file)
    #   SYLK(<x>,<y>,format)
    #       formatstring of cell
    #   SYLK(xMax), SYLK(yMax)
    #       Boundary X and Y, i.e. maximum column and row index
    #   SYLK(header)
    #       part of the header, which is not handled; this
    #       is used to write it unmodified (uninterpreted)
    #       back to the file
    #
    #####################################################################

    package provide Sylk 0.2


    #--------------------------------------------------------------------
    # parse X and Y values from a SYLK record
    #
    # Both values are optional. If a value is missing, the corresponding
    # value is not modified
    #--------------------------------------------------------------------
    proc parsePosition {record xName yName} {
        upvar $xName x
        upvar $yName y

        if {[regexp {;Y([0-9]+)(;|$)} $record dummy value]} {
            set y $value
        }
        if {[regexp {;X([0-9]+)(;|$)} $record dummy value]} {
            set x $value
        }
    }

    #--------------------------------------------------------------------
    # read a SYLK file into the internal structure
    #--------------------------------------------------------------------
    proc readSYLK {filename arrayName} {
        upvar $arrayName SYLK

        if {[catch {set fp [open $filename r]}]} {
            return 0
        }

        # store table boundaries
        set xMax 0
        set yMax 0

        # hold current postition
        set xAct 0
        set yAct 0

        while {![eof $fp]} {
            gets $fp line
            switch -regexp $line {
                ^ID {
                    # e.g. ID;PWXL;N;E
                    regexp {^ID;(.*)$} $line dummy value
                    set SYLK(id) $value
                }
                #^P -
                ^O {
                    # ignore some default formatting information,
                    # which I did not find documented
                }
                ^B {
                    # e.g. B;Y107;X105;D0 0 106 104
                    regexp {^B;Y([0-9]+);X([0-9]+)} $line dummy yMax xMax
                }
                ^F {
                    parsePosition $line xAct yAct
                    if {$xAct == 0 && $yAct == 0} {
                        # still header formats
                        lappend SYLK(header) $line
                    } else {
                        # cell formats
                        # Dec 2007 extend to handle ;f format ;p excel index
                        if {[regexp {;S([^;]*)(;|$)} $line dummy style]} {
                            set SYLK($xAct,$yAct,format) $style
                        }
                        if {[regexp {;P([^;]*)} $line dummy pfmt]} {
                            set SYLK($xAct,$yAct,pformindex) $pfmt
                        }
                        if {[regexp {;F([^;]*)} $line dummy fmt]} {
                            set SYLK($xAct,$yAct,numformat) $fmt
                        }
                    }
                }
                ^C {
                    # e.g. C;Y1;X1;K"Name"
                    parsePosition $line xAct yAct
                    if {[regexp {;K([^;]*)(;E|$)} $line dummy value]} {
                        regsub {"(.*)"} $value {\1} value
                        set SYLK($xAct,$yAct,value) $value
                    }

                    # with formula? E.g. C;K-1188;E(R[+1]C-0.5)*(R3C2-R2C2)/R4C2+R2C2
                    if {[regexp {;E(.*)(;|$)} $line dummy value]} {
        # [GWM] to convert relative positions in formulae
                        set value [substrelatives $value $xAct $yAct] ;# gwm 03.12.07
                        set SYLK($xAct,$yAct,formula) $value
                    }
                }
                ^NN { ;# gwm 23.12.07 name labels
                # eg NN;Naltitud;ER3C4
                    regsub "NN;N(.*);E.*" $line {\1} name
                    regsub ".*;E(.*)" $line {\1} cell
                    set SYLK(name,$name) $cell
                }
                ^E$ {
                    # SYLK file ends here
                    break
                }
                default {
                    # add to header (uninterpreted)
                    lappend SYLK(header) $line
                }
            }
        }
        close $fp

        set SYLK(xMax) $xMax
        set SYLK(yMax) $yMax

        return 1
    }

    #--------------------------------------------------------------------
    # write a SYLK file from the internal structure
    #--------------------------------------------------------------------
    proc writeSYLK {filename arrayName} {
        upvar $arrayName SYLK

        set fp [open $filename w]

        # write
        if {[info exists SYLK(id)]} {
            puts $fp "ID;$SYLK(id)"
        } else {
            puts $fp "ID;PWXL;N;E"
        }

        # boundary?
        if {[info exists SYLK(yMax)] && [info exists SYLK(yMax)]} {
            puts $fp "B;Y$SYLK(yMax);X$SYLK(xMax)"
        }

        # first, write header if existing
        if {[info exists SYLK(header)]} {
            foreach line $SYLK(header) {
                puts $fp $line
            }
        }

        # write data
        foreach idx [lsort -dictionary [array names SYLK *,*,value]] {
            regexp {([0-9]+),([0-9]+),} $idx dummy x y

            if {[info exists SYLK($x,$y,format)]} {
                set line "F;S$SYLK($x,$y,format);Y$y;X$x"
                puts $fp $line
            }

            if {[string is double $SYLK($x,$y,value)]} {
                set value $SYLK($x,$y,value)
            } else {
                set value "\"$SYLK($x,$y,value)\""
            }
            set line "C;Y$y;X$x;K$value"
            if {[info exists SYLK($x,$y,formula)]} {
                append line ";E$SYLK($x,$y,formula)"
            }
            puts $fp $line
        }

        # end-of-file marker
        puts $fp "E"

        close $fp
    }

    # GWM dec 2007 substitute relative references in formula
    # in Excel (at least) saved slk files use references such as R[-2]C[+2]
    # to refer to cells 2 above and 2 to right of current cell.
    proc substrelatives {formula xAct yAct} {
        # [GWM] converts relative positions in formulae
        # eg R[-1]C[+1] refers to cell row above, col to right.
        # eg RC[+1] R[-1]C refers to cell same row above, or same column as previous.
        # Assume no cell depends on itself to avoid circular references.
        # step 1 replace [+ or - N with [expression xy+/-N] exact cell is found.
        set f1 $formula
        set f1 [regsub -all {C\[(.\d*)]} $f1 {C[expr $xAct\1]}]
        set f1 [regsub -all {R\[(.\d*)]} $f1 {R[expr $yAct\1]}]
        set f1 [subst $f1]
        # now convert the 'same row' formula and 'same col' with bracket after
        # care needed to avoid changing any letter C eg in a name such as sCheme to sCNNheme
        set f1 [string map {RC R${yAct}C} [subst -nocommands {$f1}]]
        set f1 [regsub -all {(R.\d*C)(\D)} $f1 {\1${xAct}\2}]
        set f1 [regsub -all {(R.\d*C)\Z} $f1 {\1${xAct}}]

        if {[catch {subst $f1} errmsg]} {puts "Erro subst $formula:> $f1 - $errmsg"}
        # substitute the expressions on return
        return [subst $f1]
    }