Version 0 of Optimized compilation of tcl

Updated 2002-07-22 15:25:15

MS I ran the tclbench suite on tclsh compiled with three different compilers and several optimisation combinations. This page summarizes the results.

These tests were run on a PIII/600Mhz/192MB laptop running linux RedHat7.2.

The compilers were:

  • gcc2.96
  • gcc3.1 [L1 ]
  • icc6.0 Intel C++ compiler for linux

[L2 ] Notes on compiling tcl with icc

Results

  SPEED   SIZE  COMPILER
  1.00    1.00   gcc2.96 -O  -march=pentiumpro
  1.05    1.00   gcc2.96 -O  -march=pentiumpro -fomit-frame-pointer
  1.01    1.01   gcc2.96 -O2 -march=pentiumpro
  1.07    1.02   gcc2.96 -O2 -march=pentiumpro -fomit-frame-pointer
  1.01    0.97   gcc2.96 -Os -march=pentiumpro
  1.05    0.98   gcc2.96 -Os -march=pentiumpro -fomit-frame-pointer
  0.99    1.07   gcc3.1  -O  -march=pentium3
  1.03    1.08   gcc3.1  -O  -march=pentium3 -fomit-frame-pointer
  1.02    1.12   gcc3.1  -O2 -march=pentium3
  1.06    1.13   gcc3.1  -O2 -march=pentium3 -fomit-frame-pointer
  1.03    1.14   gcc3.1  -O3 -march=pentium3
  1.08    1.15   gcc3.1  -O3 -march=pentium3 -fomit-frame-pointer
  1.04    0.97   gcc3.1  -Os -march=pentium3
  1.06    0.97   gcc3.1  -Os -march=pentium3 -fomit-frame-pointer
  1.11    1.47   icc6.0  -O3 -xK -ip

Conclusions (?)

  • The "-fomit-frame-pointer" flag produces faster code with

gcc. It is a question if it is worth the loss of a traceable core file - tcl shouldn't dump core

  • The default optimisation flag for gcc "-O" seems suboptimal; both

GNU compilers produce faster and smaller code with "-Os"

  • Intel's compiler produces slightly faster code than gcc (as

measured by tclbench), but a much larger image (in the only tested configuration).

Notes

  • These were all static builds of tclsh from the current (01-22-02)

HEAD

  • The data presented is size/speed relative to the reference build

"gcc2.96 -O". This produced a 702kB tclsh which ran the tclbench suite in 00:04:35.

  • All compilers were set to produce binaries exploiting the processors

features ("-march" and "-x" flags).

  • The intel compiler was benchmarked in a single configuration,

which I suppose gives the best optimisation. I have not checked for the intel equivalent to gcc's "-Os" flag.

  • The "-fomit-frame-pointer" flag to gcc produces code that is

non-debuggable - the stack trace in core files is not usable. This behaviour is also present (I think) in the optimised code produced by icc.