Updated 2011-11-03 17:35:44 by AMG

Eric Boudaillier: For a bytecode experimentation, I decided to generate bytecode for the K operator, or, more precisely, the idiom [K $x [set x ""]]. Here is the result, with the help of the anatomy of a bytecoded command and the set compile command TclCompileSetCmd().
 int
 TclCompileClearCmd(interp, parsePtr, envPtr)
    Tcl_Interp *interp;                /* Used for error reporting. */
    Tcl_Parse *parsePtr;        /* Points to a parse structure for the
                                 * command created by Tcl_ParseCommand. */
    CompileEnv *envPtr;                /* Holds resulting instructions. */
 {
    Tcl_Token *varTokenPtr;
    int isScalar, simpleVarName, localIndex, numWords;
    int code = TCL_OK;

    numWords = parsePtr->numWords;
    if (numWords != 2) {
        Tcl_ResetResult(interp);
        Tcl_AppendToObj(Tcl_GetObjResult(interp),
                "wrong # args: should be \"clear varName\"", -1);
        return TCL_ERROR;
    }

    /*
     * Get the variable name token and push the name.
     */
    varTokenPtr = parsePtr->tokenPtr
            + (parsePtr->tokenPtr->numComponents + 1);

    code = TclPushVarName(interp, varTokenPtr, envPtr, TCL_CREATE_VAR,
            &localIndex, &simpleVarName, &isScalar);
    if (code != TCL_OK) {
        goto done;
    }

    /*
     * Emit instructions to get the variable.
     */
    if (simpleVarName) {
        if (isScalar) {
            if (localIndex >= 0) {
                if (localIndex <= 255) {
                    TclEmitInstInt1(
                            INST_LOAD_SCALAR1,
                            localIndex, envPtr);
                } else {
                    TclEmitInstInt4(
                            INST_LOAD_SCALAR4,
                            localIndex, envPtr);
                }
            } else {
                TclEmitOpcode(INST_LOAD_SCALAR_STK, envPtr);
            }
        } else {
            if (localIndex >= 0) {
                if (localIndex <= 255) {
                    TclEmitInstInt1(
                            INST_LOAD_ARRAY1,
                            localIndex, envPtr);
                } else {
                    TclEmitInstInt4(
                            INST_LOAD_ARRAY4,
                            localIndex, envPtr);
                }
            } else {
                TclEmitOpcode(INST_LOAD_ARRAY_STK, envPtr);
            }
        }
    } else {
        TclEmitOpcode(INST_LOAD_STK, envPtr);
    }

    /*
     * Emit instructions to set the variable to empty string.
     */
    code = TclPushVarName(interp, varTokenPtr, envPtr, TCL_CREATE_VAR,
            &localIndex, &simpleVarName, &isScalar);
    if (code != TCL_OK) {
        goto done;
    }

    TclEmitPush(TclRegisterNewLiteral(envPtr, "", 0), envPtr);

    if (simpleVarName) {
        if (isScalar) {
            if (localIndex >= 0) {
                if (localIndex <= 255) {
                    TclEmitInstInt1(
                            INST_STORE_SCALAR1,
                            localIndex, envPtr);
                } else {
                    TclEmitInstInt4(
                            INST_STORE_SCALAR4,
                            localIndex, envPtr);
                }
            } else {
                TclEmitOpcode(INST_STORE_SCALAR_STK, envPtr);
            }
        } else {
            if (localIndex >= 0) {
                if (localIndex <= 255) {
                    TclEmitInstInt1(
                            INST_STORE_ARRAY1,
                            localIndex, envPtr);
                } else {
                    TclEmitInstInt4(
                            INST_STORE_ARRAY4,
                            localIndex, envPtr);
                }
            } else {
                TclEmitOpcode(INST_STORE_ARRAY_STK, envPtr);
            }
        }
    } else {
        TclEmitOpcode(INST_STORE_STK, envPtr);
    }

    /*
     * Pop the empty string, leaving the initial variable value.
     */
    TclEmitOpcode(INST_POP, envPtr);

 done:
    return code;
 }

And a little test, to show benefits of the bytecode, with four version of lreverse: the classic, one with the K operator, one with non bytecoded clear, and the last with bytecoded clear:
 classic: 26629 microseconds per iteration
 K:       11130 microseconds per iteration
 clear:    9933 microseconds per iteration
 clearc:   6841 microseconds per iteration

MS in modern Tcl, the fastest way to obtain the effect of [K $x [set x ""]] uses the (ugly) idiom $x[set x {}]. This is because the bytecode engine optimises appending empty strings. It should be about as fast as clearc - instead of calling INST_POP it will issue an INST_CONCAT1.

AJD This functionality of [K $x [set x ""]] could be added as a new option on an existing Tcl command. To my mind unset best fits the bill. A new flag could be added without breaking backwards compatibility, say "-K" or "-value", or something better :-). AIUI, unset is not currently byte compiled, so this would need to be added to get the speedup of "clearc" detailed above.

MS notes that, at least for this usage, unset is a bad partner. [K $x [unset x]] does indeed return x's (unshared) value, but clears the complete variable - not just its value, it also frees memory structures and cleans up hash tables. Costly if you need to recreate the variable immediately after that, as is the case AFAIR in lreverse.

AJD My suggestion was that the new flag's behaviour would mimic the "clearc" ie. it wouldn't actually unset the variable in the usual sense - which is is an excellent argument against adding it to unset :-) My reasoning is that the functionality is useful enough to be in the core but perhaps not enough to warrant a new command.

Cmcc: maybe I'm not getting it, but at the bytecode level, isn't K the rough equivalent of swapping the TOS and TOS-1 and dropping TOS? Perhaps what's needed is a new opcode, INST_SWAP, to give the required semantics?

Alternatively, why not just write a C command to manipulate the value stack? Would it be slower?

PWQ How about we just have commands like linsert that take `varname` rather than `var`. Then all this obfuscation would not be necessary.

RS disagrees - when you have to give a variable name, you have to have a variable. If read access to a pure value is the only thing needed, specifying the value (which could be the result of another function, or a constant) is more lightweight and flexible - and with $varname you can always use a variable too (but one rule in functional programming is that variables are best not used).

AMG: I suggest renaming your [clear] command to be [take]. Qt has many functions called take [1] that remove an item from a collection and return its value.

An observation: [clear] works like [set] with the second argument hard-coded to be "", except that it returns the variable's old value instead of its new value. This reminds me of the C postincrement++ and postdecrement-- operators.

DKF: It might be instructive to compare this with:
set lst [lreverse $lst[set lst {}]]

That compiles to something efficient; there's a special case in INST_CONCAT to make it so.

AMG: MS mentions this idiom above.