Version 14 of taccle

Updated 2004-06-02 18:13:29

taccle -- Taccle is Another Compiler Compiler

taccle is a complement to fickle in that it reads a taccle specification file to generate working LALR(1) parser. In other words, it is to Tcl what yacc (or bison) is to C/C++. taccle differs from yeti in that the grammar is written before hand as a straight text file rather than generated by procedure calls. Unlike tyacc [L1 ] taccle is written in pure Tcl.

taccle spec files are structured nearly identical to those used by yacc. The following example (blantantly stolen from chapter 8 of lex & yacc[L2 ]) may be interpreted equally by the two:

 %token A R

 %%

 start: x | y R;

 x: A R;
 y: A;

Incidentally both yacc and taccle would recognize the shift/reduce conflict above.

Origin of taccle

I originally wrote taccle to learn about the fun world of LALR(1). As that I do not have a PhD in automata theory the LALR(1) generation is a bit inefficient; I calculate the canonical LR(1) and then merge cores. For those learning parsing theory and syntax parsing in tcl taccle can be set to display every step. taccle calculates seven states in the canonical LR(1):

 lr(1) table:
 state 0:
    start'     -> . start, {$}
    start      -> . x, {$}
    start      -> . y R, {$}
    x          -> . A R, {$}
    y          -> . A, R
    transitions:  A => s1  start => s2  x => s3  y => s4

 state 1:
    x          -> A . R, {$}
    y          -> A ., R
    transitions:  R => s5

 state 2:
    start'     -> start ., {$}

 state 3:
    start      -> x ., {$}

 state 4:
    start      -> y . R, {$}
    transitions:  R => s6

 state 5:
    x          -> A R ., {$}

 state 6:
    start      -> y R ., {$}

{$} is the standard end of input symbol. Converting to LALR(1) gains nothing in this case. The fully generated parse table is:

 generated lalr(1) parse table:
 state  A     R     start x     y     $    
    0   sh1         go2   go3   go4        
    1         sh5                          
    2                                 accept
    3                                 re1  
    4         sh6                          
    5                                 re3  
    6                                 re2  

Note how taccle resolved the shift/reduce conflict in state 1 by giving precedence to shift. Running bison 1.35 on this same input file results in 9 states. Therefore taccle is 23% more efficient than bison. <g>

A Practical Example

Here is another example. The file has been compacted to make it better fit on the web page.

 %{
 #!/usr/bin/tclsh
 %}
 %token ID
 %start start 
 %%

 start: E        { puts "Result is $yy_1" } ;

 E: E '+' T      { set yy_ [expr {$yy_1 + $yy_3}] }
    | E '-' T    { set yy_ [expr {$yy_1 - $yy_3}] }
    | T ;

 T: T '*' F      { set yy_ [expr {$yy_1 * $yy_3}] }
    | T '/' F    { set yy_ [expr {$yy_1 / $yy_3}] }
    | F ;

 F: '(' E ')'    { set yy_ $yy_2 }
    | ID         { set yy_ $::yylval } ;

 %%
 source simple_scanner.tcl; yyparse

This is, of course, the infamous calculator example. Several new things are presented here:

  • There is a literal block (#!/usr/bin/tclsh) and a user subroutine section (source simple_scanner.tcl; yyparse).
  • The start state is explicitly given; by default taccle (like yacc) uses the first rule as a start state.
  • Associated with most of the rules are actions to take when that rule gets reduced. Here is where taccle deviates from yacc. Whereas yacc uses $1, $2, and so forth as token attributes, taccle declares variables yy_1, yy_2, etc. In addition the synthesized attribute $$ has been renamed yy_ in taccle.
  • The rules without explicitly declared actions use the default one set yy_ $yy_1. This is analogous to yacc's $$ = $1;.
  • Yacc declares a global variable yylval typically as a C union. Because Tcl does not have unions, or for that matter variable types (for the most part), the %union and %type keywords are unnecessary. This example assumes that the token generator places into ::yylval attributes of the correct type (e.g., ID will not have a non-numeric attribute). Sorry, you'll have to do your own error checking.
  • This example does not make use of [pkg_mkIndex] so it is necessary to source in the token generator. In this case the generator was developed used fickle.

There are some things taccle cannot handle. These are on my TODO list:

  • epsilon transitions
  • embedded actions (basically, if I can convert NDFA to DFA I'll be able do this too)
  • error recovery (the error token, yyerrok and so forth)
  • detect infinitely recursive rules (e.g., foo -> foo '42' ;)
  • inherited attributes (synthesized attributes are easy, inherited not so)
  • operator precedence (%left, %right, etc)

Downloads

taccle is protected by the GNU General Public License.

Version 0.1 may be obtained at http://tcl.jtang.org/taccle/taccle-0.1.tar.gz .


Comments below:


Return to Jason Tang


Category Application

Category Dev. Tools