manipulating data with Tcl

Arjen Markus (13 january 2009) At the moment I am involved in a project where I frequently need to edit timeseries in an input file. For instance: to multiply the wind velocity with a factor 0.8 to see what the effect is on the result of my computations. A tedious task if done by hand. Many people would probably use a spreadsheet for this task: copy the raw data into the spreadsheet, put a formula like "=0.8*$A1" in cell B1, copy that to the cells below, and export the results to the input file.

That is all very well, but to me that is rather laborious - I am lousy with spreadsheets, I get confused easily and I find them rather tedious to use (lots of mouse clicks for instance).

One of the nice features of Tcl, in my opinion, is the very limited restrictions it poses on a program. So, when I have a timeseries of this form for instance:

2008/07/05-03:00:00 3.266219192649675
2008/07/06-03:00:00 5.680730549305496
2008/07/07-03:00:00 8.375445823799266
2008/07/08-03:00:00 6.92001047014345
2008/07/09-03:00:00 4.2700598446601665
2008/07/10-03:00:00 4.939019208757603
2008/07/11-03:00:00 5.869121431091313
2008/07/12-03:00:00 5.400178789428884
2008/07/13-03:00:00 2.49166379101902
2008/07/14-03:00:00 4.607648466722291
2008/07/15-03:00:00 5.109895484028547
...

then I can just surround it with { } to get a valid list:

set data {
2008/07/05-03:00:00 3.266219192649675
2008/07/06-03:00:00 5.680730549305496
2008/07/07-03:00:00 8.375445823799266
2008/07/08-03:00:00 6.92001047014345
2008/07/09-03:00:00 4.2700598446601665
2008/07/10-03:00:00 4.939019208757603
2008/07/11-03:00:00 5.869121431091313
2008/07/12-03:00:00 5.400178789428884
2008/07/13-03:00:00 2.49166379101902
2008/07/14-03:00:00 4.607648466722291
2008/07/15-03:00:00 5.109895484028547
...
}

Then, add a loop like this after the data:

foreach {time velocity} $data {
    puts "$time [expr {0.8*$velocity}]"
}

and run the program - the output is a timeseries with the velocity scaled to 80% of its original value. I can plug that output directly into the input file.

If I need a bit more sophisticated processing and even a picture to get a better feeling for the data, well, no problem:

#
# Get rid of the first column - no need for it now
#
set velocity {}
foreach {time vel} $data {
    lappend velocity $vel
}

package require math::statistics
package require Plotchart

pack [canvas .c -width 400 -height 300]
set p [::Plotchart::createXYPlot .c {0 50 10} {0 1 0.1}]

set i 0
foreach value [::math::statistics::autocorr $velocity] {
    puts "$i $value"
    $p plot data $i $value
    incr i
}

and the result is a picture of the autocorrelation function of the data. I would not know a (fast) way of squeezing that out of a spreadsheet. But of course, I am no expert wrt spreadsheets.


newp (13 january 2009) The above example throws an error in the Plotchart::create command:

invalid command name "console"

LV newp, are you trying the example on Windows? If so, then I expect the issue is that you are using a console only version of the tclsh interpreter, or at the very least you need to do a

package require Tk

before you try the example. If you are not using Windows, then one of those packages has a bug in that they are trying to use a Tk command (console) which unfortunately doesn't exist on your platform.

On SPARC Solaris , using Tcl 8.5, the above doesn't generate an error about console. However, the data graphed along the Y axis is a bit peculiar. I copied the above data setting, leaving out the "...". I see numbering along the y axis of 0.0, 0.1, 0.2, 00000000004 (the left side of the number is clipped), 0.4, 0.5, 0.6, 0.7, 99999999999, 99999999999, 99999999999 (again, the left side of the numbers are clipped).


newp (13 january 2009) You are right: I am on Windows XP (with Tcl version at 8.4.17) and I tried it with Tkcon. Even when I added the "package req Tk" part, I get the error. But when I do it from a wish console, all is OK.

The graph does look a bit peculiar. Perhaps it is because of the data range?

LV I am not certain whether the data range is the issue or just a problem with not formatting the numbers before plotting them. Also, newp, it might be worth filing a bug report on the "console command not found when invoked from within Tkcon on Windows XP" bug, at the very least against the Plotchart module (found at http://tcllib.sf.net/ I expect ) ...

AM (14 january 2009) The console command that throws the error is an unfortunately leftover from development. It should be gone with the latest sources from CVS.

As for the graph you get, well, the data I showed are only a part of the actual data series and I created the autocorrelation plot I speak about with a different but related data series (longer and finer detail). Consider these merely examples of how Tcl's minimal syntactic requirements make it possible to do this kind of things "quick and dirty".


AM (14 january 2009) An alternative approach to the above was suggested by Reinhard Max:

  • Leave the data in a separate file
  • Use the [read] command to get the data in:
   set infile [open $filename]
   set data [read $infile]
   close $infile

   #
   # In a small program this would even be acceptable:
   #
   # set data [read [open $filename]]
   #
   # (The input file is left open)

The advantage is that you can easily reuse the program with different data and you still do not need to care much about the data format. However, a small drawback is that you now have two files ...