Updated 2013-12-16 17:28:34 by pooryorick

KBK writes in comp.lang.tcl (12 February 2001, slightly edited by the author) [but was KBK responsible for the title?]:

Virtually any update is ill-considered!

The issue is that update has virtually unconstrained side effects. Code that calls update doesn't know what any random event handler might have done to its data while the update was in progress. This fact is the source of insidious bugs.

The problem is pretty pervasive, once an application reaches a certain level of complexity. One example that I had was:

  • Messages arrive on sockets from various places. Certain messages describe urgent conditions that caused a dialog to be posted. The code that positions the dialog on the screen uses update.
  • Messages arriving on the sockets also allow for urgent conditions to be dismissed (say, because a piece of equipment starts responding again).

Now consider the following sequence of events.

  • A sender detects an urgent condition, and then immediately detects its resolution. The two messages (creating and dismissing the dialog) wind up back-to-back on a socket.
  • The 'readable' handler for the socket is entered, and reads the first message. As part of processing the message, a dialog is posted, and the code that posts it enters update.
  • The event loop is now free to process events, and re-enters the 'readable' handler for the socket. Now the message that dismisses the dialog gets processed and destroys the window.
  • The update eventually returns and tries to do the winfo width and winfo height to begin its geometry calculation. But the window no longer exists -- step 3 destroyed it -- and so the geometry calculation throws an error.

Mind you, this is an easy example. Imagine trying to track down the problem if you have custom C code making callbacks with Tcl_Eval and its friends. One of the callback scripts does an update, and the event handler winds up deleting or making radical modifications to data structures being maintained in the C code. On return from the update, you get a pointer smash with absolutely nothing to go on.

Yes, you can avoid these problems. You can play games with the event loop (unbinding events that could cause trouble). You can re-check the world after an update. You can twiddle bits in the event mask. You can do smart things with Tcl_Preserve. You can have your file-events trigger idle callbacks so that you know that all data have been collected from the socket before the callbacks fire. (And then you get the shaft from an unexpected update idletasks!) I have better things to do with my time!

It's usually much cleaner and easier to debug if you structure the code with chains of event bindings. For instance, you can use a <Configure> binding to trigger positioning a window (as in Centering a window). If you do, there's no update confusing things; the process simply returns to the event loop. If a subsequent event deletes the window before it is configured, no problem, the binding goes away and the <Configure> handler never fires. You can think of the window as a state machine, and each event as a state transition.

The use of update to Keep a GUI alive during a long calculation can also be avoided (a simpler development of the principle appears in Countdown program). Moreover, except for the very simplest applications, the resulting code is cleaner and easier to integrate and maintain (although slightly more verbose).

I'd go as far as to say that I've never seen code where update, a single-argument after, vwait or tkwait is really needed, except for a vwait that initially launches the event loop. And I've seen lots of timing issues like the one I described above. Personally, were it not for the code it would break [AMG adds: the code is broken to begin with], I wouldn't cry if update were removed from the language altogether.

NEM 2010-07-28: I would add some notes to this. Firstly, it is not always obvious that you are calling one of these difficult commands. For example, tk_messageBox calls vwait as part of its workings. I have been caught by this before, in a similar manner to KBK (popping up dialog boxes in response to socket messages), and it can be hellish to debug. Secondly, be aware that coroutines can also lead to these kinds of problems if not controlled: a yield to the event loop is equivalent to a non-nesting vwait. Concurrency is a cruel mistress.

EG I learn not to call any code that shows a modal dialog inside an event handler, because it will not terminate until you dismiss the dialog. When I need to do some user interaction, for example choosing a filename to save partial results, I usually use after 0 mySaveProc. This allows the current event handler to finish, and then the event loop will call the save procedure.

AMG: The problem documented on this page is real, but it's not specific to update, nor to the Tcl event loop. It's more fundamental than that. Rather than unfairly tar update and cause the true problem to go unexplored, I will attempt to get to the bottom of it.

A critical section is a code segment that assumes it has exclusive access to a shared resource. The trouble happens when this assumption is violated.

The best-known culprit is multithreading, in which case the fix is to correctly advertise to the scheduler which segments of code cannot overlap. From when a critical section starts until it finishes, the scheduler must not start another critical section that stomps on any of the same resources as the first. Of course, the scheduler can only do this properly when the code obsessively complies with the resource locking regime. This is a problem on both single- and multi-core systems; it doesn't matter if the contentious critical sections take turns or run in parallel.

Threads aren't necessary for this problem, since all that's needed is for two critical sections to overlap, so that they stomp on each other. Without threads, it's still quite possible for one critical section to (accidentally) invoke another. This can happen by calling update or vwait or tkwait in the middle of a critical section, if there's another event handler that also contains a critical section that collides with the first. But this can also happen by calling yield to return to the event loop, in the same circumstance.

Many Tk commands use vwait and similar. Have a look at the implementation of tk_dialog [1]; it uses vwait, tkwait, and update idletasks. In particular, notice the catch surrounding the bind command at the end; this safeguards against the possibility that vwait called an event handler that deleted the window.

The trick is to break up critical sections such that they never span an update or similar. This serves the same purpose as locking in the multithread case; while the code is running, everything's effectively locked, then when it enters the event loop, everything's effectively unlocked.

One way to do this is to revalidate shared resources after returning from update. Performing this validation means that the code no longer assumes it has exclusive access, therefore the critical section has ended. (The defining characteristic of a critical section is that it assumes exclusive access.) This is the approach taken by tk_dialog.

Another way is to not call update in the middle of straight-line code but to instead schedule the continuation as an event handler. Most types of event handlers are automatically canceled when their resource goes away, e.g. when a channel is closed, all its chan event go away too. Obviously, after idle and after $time handlers don't get automatically deleted, since they don't have an associated resource, so you will need to explicitly delete them in any code that invalidates the critical resource.

AMG: There is another problem with update unrelated to the unlimited side effect issue described above. Nested invocations of update, vwait, etc. can potentially block the parent code from continuing. Nested invocations can easily happen by accident, simply by calling update, etc. within an event handler. Here's a simple example:
proc a {} {
    puts "a: waiting half a second"
    after 500 {set a 1}
    puts "a: [time {vwait a}]"
}
proc b {} {
    puts "b: waiting five seconds"
    after 5000 {set b 1}
    puts "b: [time {vwait b}]"
}
proc test {} {
    after 0 a
    after 0 b
    update
}
test

On my computer, this prints:
a: waiting half a second
b: waiting five seconds
b: 5000277 microseconds per iteration
a: 5001788 microseconds per iteration

vwait b happens "within" vwait a, and even though a's timer expires long before b's, vwait b is blocked until vwait b completes.

This problem is fixed by not recursively entering the event loop. One alternative is continuation passing, the other is coroutines.

With the continuation passing technique, a proc enters the event loop by returning to the top level, since the event loop is at the top of the stack. For example, return -level [info level]. Before returning to the event loop, the proc must schedule itself to be resumed. This means storing all its state--- both variables and execution position a.k.a. continuation--- somewhere that it can get at them later. Global variables work, as do arguments embedded in the scheduled event handler. Also consider TclOO object member variables. This has to be done not only with the proc but with all procs that call it; obviously, you'll want to avoid having this happen deep in the stack.

With coroutines, the shiny new NRE does all this work for you. All you have to do is run your proc in an alternate stack created by the coroutine command, then call yield to return to the event loop. Of course, you still need to schedule your code to be resumed, but all the continuation information is saved without any effort on your part. Simply schedule for the return value of info coroutine to be called. One restriction to mind is that not all commands are NRE-enabled. You can use these commands if you wish, but you can't call yield inside a proc that is invoked by a non-NRE command. Non-NRE commands are mostly found in extensions.

It would be nice if someone created and maintained a list of non-NRE core commands...

Both these techniques are discussed in Keep a GUI alive during a long calculation. Also see Firework Display.

Here's an example of continuation passing. It's simple in this case, but it can get quite hairy. The only tricky part is measuring time. The continuation is formatted as a step number followed by a key-value list of extra state variables, and the continuation is stored in the event queue.
proc a {{step 0} args} {
    dict with args {}
    switch $step {
    0 {
        puts "a: waiting half a second"
        after 500 [list a 1 start [clock microseconds]]
    } 1 {
        puts "a: [expr {[clock microseconds] - $start}] microseconds"
    }}
}
proc b {{step 0} args} {
    dict with args {}
    switch $step {
    0 {
        puts "b: waiting five seconds"
        after 5000 [list b 1 start [clock microseconds]]
    } 1 {
        puts "b: [expr {[clock microseconds] - $start}] microseconds"
    }}
}
proc test {} {
    after 0 a
    after 0 b
    vwait forever
}
test

Result:
a: waiting half a second
b: waiting five seconds
a: 500765 microseconds
b: 5000548 microseconds

a and b are now interleaved properly.

Here's an example using yield. It would be nearly identical to the vwait example if not for the fact that time is (currently) not NRE-enabled. I know this because I got the error "cannot yield: C stack busy" when I tried yielding inside time.
proc a {} {
    puts "a: waiting half a second"
    after 500 [list [info coroutine]]
    set start [clock microseconds]
    yield
    puts "a: [expr {[clock microseconds] - $start}] microseconds"
}
proc b {} {
    puts "b: waiting five seconds"
    after 5000 [list [info coroutine]]
    set start [clock microseconds]
    yield
    puts "b: [expr {[clock microseconds] - $start}] microseconds"
}
proc test {} {
    after 0 {coroutine coro1 a}
    after 0 {coroutine coro2 b}
    vwait forever
}
test

Result:
a: waiting half a second
b: waiting five seconds
a: 497929 microseconds
b: 5000187 microseconds

Again, a and b are correctly interleaved.