NRE: the non-recursive engine in Tcl 8.6

Notes on the first of Miguel Sofer's talks about NRE. (Please add/enhance/correct ...)

State of computation in 8.5

  • C-stack (1 per thread)
  • Nesting of C-function calls
  • CallFrame stack (1 per interp)
  • Nesting of proc, apply, namespace eval; this is what upvar and uplevel navigate
  • Tcl Evaluation Stack (1 per execEnv/interp)
  • Bytecode engine stack and workspace
  • Special allocator supported (TclStackAlloc)
  • CmdFrameStack (1 per interpereter)

Walked through a simplified stack trace of:

proc foo {} {
    set x 1
    moo
    set x 2
}
proc moo {} {
    set x 1
    puts $x; # WE ARE HERE
    set x 2
}

Above will have 3 instances of TclExecuteByteCode; full stack 16 levels deep

Stack overflow and recursion limits: Some platforms (BSD and other Unixes) have small default stack; Stack-hungry platforms (i.e., Windows debug builds); Severely stack constrained platforms (Cisco, phones?). mod-8-3-branch created a few years ago by ActiveState for a customer.

The C stack in 8.6: above example only has 1 instance of TclExecuteByteCode(!)

Barring any undiscovered bugs, full compatability. Only additions to the API, no changes. Modification of struct Command, changes to Tcl_GetCommandInfo/TclSetCommandInfo designed not to break extensions that call*objectProc directly. Changed structs (extended): Interp, Command, ExecEnv. Extensions that include internal private headers might break, but that is what they deserve.

New C API (TIP #322):

  • Tcl_NRCreateCommand
  • Tcl_NREvalObj
  • Tcl_NREvalObjv
  • Tcl_NRCmdSwap
  • Tcl_NRAddCallback

An NRE-enabled command requires two implementations: the regular objProc and a new nreProc. NRE provides a utility function Tcl_NRCallObjProc to make this task essentially trivial. If he could have prohibited people from calling the objProc implementation directly a separate nreProc would not be necessary, but it is to maintain maximum compatability with existing extensions.

int MyCmdObjProc( ... )
{
    <preparation>
    result = ...
    <cleanup>
    ...
}

After adaptation looks like:

int MyCmdNreProc( ... )
{
    <preparation>
    Tcl_NRAddCallback(interp, MyPostProc, data0, data1, NULL, NULL);
    return Tcl_NREvalObj(interp, objPtr, flags);
}

int MyPostProc( ... )
{
    Tcl_Obj *fooPtr = data[0]; /* data0 retrieved */
    MyStruct *mooPtr = data[1]; /* data1 retrieved */

    <postprocessing>
    <cleanup>
    return = result;
    ...
}

NRE-enabled core commands:

DGP notes that an example of a Tcl built-in command has no reason to change (because it does not consume stack) would be format.

Non-enabled commands (yet?):

From the C side: no NR-API for Tcl_Eval(), Tcl_EvalEx() (things running at level #0)

How does this all happen? Trampolining: do not invoke a function, rather ask your caller to do it for you (and then return). TEBC learned one new trick: how to freeze a bytecode, start running another one.

The workhorse is TclNRRunCallbacks (the trampoline). New stack for callbacks maintained by the ExecEnv. Loops running the EE's top callback. Passes a callback's result to the next callback. Allows TEBC calls to "leak through" (when called from TEBC)

TEBC magic

  • TEBC has many (100?) local variables, but ...
  • Only need 4 of them to reconstruct

State of a computation in 8.6a3:

  • C-Stack (1 per thread, lightly loaded)
  • CallFrame stack (1 per interp)
  • Tcl evaluation stack (1 per execEnv), includes NEW BottomData stack
  • CmdFrame stack (1 per interp)
  • NEW CallbackStack (1 per execEnv)

There are opportuntiesfor simplication and merging. The above may not be true by the time 8.6.0 is released. Emphasis so far is to run properly, not (necessarily) fast, yet.

New possibilities:

  • Once TEBC knows how to freeze an execution we can put it aside and save it for later (aka coroutines)
  • Edit the callback stack, rearrange the order of computations (aka proper tailcalls)
  • Schedule new evals from callbacks (aka CPS?)
  • Others??

Miscelanea:

  • Not optimized--should get better during beta
  • Some of the internal things may yet change
  • C API not frozen... TIP not yet submitted to a vote
  • TEBC currently looks like a bomb exploded in it ("don't look at it after lunch")
  • Debugging NRE is no fun; will have to invent something

Question: For a typical stack stressing application, how much less demand on the stack is there?
Answer: Showed demo of a sample .tcl script; depth limit is bounded by heap, not by the C stack (heap is typically a lot bigger).