Tcl_Obj proposals

Purpose of page: To keep proposals for how some aspect of Tcl_Obj should be changed separate from descriptions of how they currently work, to avoid unnecessarily confusing newcomers.

The idea is that this should be done for Tcl 9.0.


Joe Mistachkin -- 13/Oct/2003 -- The following is a pre-TIP preview of some enhancements to the Tcl_Obj system I [and others] would like.

The following changes:

<some excerpts are from Tcl'ers chat>

Tcl_Obj always has SOME object type.\

The whole idea is that you rely on the clientData to be YOURS even when the object type is not.

I guess a convention (again only a convention) is to point to the clientData's associated object type in the first sizeof(void *) word.
That way you can safely check.

Instead, we can make it more "safe" and generalized by using this:

Joe Mistachkin - 06/27/2006 - Modified Tcl_ObjData / Tcl_Obj to move file and line number.

      struct Tcl_ObjData { 
        int size;
        Tcl_ObjType *typePtr; 
        ClientData clientData;   
        int coreFlags;
        int extFlags;
        char *file;              /* The file where this object was allocated/initialized. */
        int line;                /* The source line where this object was allocated/initialized. */
      };

Now, we need a new callback for Tcl_Obj's so they can be notified when the object is being DESTROYED.

      typedef void (Tcl_FreeObjProc) _ANSI_ARGS_((struct Tcl_Obj *objPtr));

Next, we modify the Tcl_ObjType struct like so:

      typedef struct Tcl_ObjType {
          char *name;                        /* Name of the type, e.g. "int". */
          Tcl_FreeInternalRepProc *freeIntRepProc;
                                      /* Called to free any storage for the type's
                                +* internal rep. NULL if the internal rep
                                +* does not need freeing. */
          Tcl_DupInternalRepProc *dupIntRepProc;
                                          /* Called to create a new object as a copy
                                        +* of an existing object. */
          Tcl_UpdateStringProc *updateStringProc;
                                          /* Called to update the string rep from the
                                        +* type's internal representation. */
          Tcl_SetFromAnyProc *setFromAnyProc;
                                          /* Called to convert the object's internal
                                        +* rep to this type. Frees the internal rep
                                        +* of the old type. Returns TCL_ERROR on
                                        +* failure. */
          Tcl_FreeObjProc *freeObjProc;
                  /* Called when the object refcount reaches 
                  +* zero just prior to the object being freed. */
      } Tcl_ObjType;

TODO: Come up with a way that we can know if a given Tcl_ObjType has the extra function pointer or not.

Finally, we modify the Tcl_Obj struct like so:

      typedef struct Tcl_Obj {
        int refCount;                /* When 0 the Tcl_FreeObjProc will be called 
                                +* and the object will be freed. 
                                +* WE may also need to call the Tcl_ObjData's Tcl_ObjType freeProc
                                +*/
        char *bytes;                /* This points to the first byte of the
                                +* object's string representation. The array
                                +* must be followed by a null byte (i.e., at
                                +* offset length) but may also contain
                                +* embedded null characters. The array's
                                +* storage is allocated by ckalloc. NULL
                                +* means the string rep is invalid and must
                                +* be regenerated from the internal rep.
                                +* Clients should use Tcl_GetStringFromObj
                                +* or Tcl_GetString to get a pointer to the
                                +* byte array as a readonly value. */
        int length;                        /* The number of bytes at *bytes, not
                                        +* including the terminating null. */
        Tcl_ObjType *typePtr;        /* Denotes the object's type. Always
                                +* corresponds to the type of the object's
                                +* internal rep. NULL indicates the object
                                +* has no internal rep (has no type). */
        union {                        /* The internal representation: */
                long longValue;        /*   - an long integer value */
                double doubleValue;    /*   - a double-precision floating value */
                VOID *otherValuePtr;   /*   - another, type-specific value */
                Tcl_WideInt wideValue; /*   - a long long value */
                struct {               /*   - internal rep as two pointers */
                  VOID *ptr1;
                  VOID *ptr2;
                } twoPtrValue;
        } internalRep;
        Tcl_ObjData *dataPtr;    /* The extra information for use by the "owner" of this object. */
      } Tcl_Obj;

scenerio #1. internal rep changes (gets freed)

step #1. if obj->dataPtr is non-NULL, and obj->dataPtr->typePtr->freeIntRepProc isn't null, call the freeIntRepProc (in this step the called freeIntRepProc CANNOT modify the "outer" Tcl_Obj data UNLESS the objTypes match exactly).

step #2. check the result, if it's an error, stop processing and return the error.

step #3. next, call the obj->objType->freeIntRepProc, if it's non-NULL (it CAN touch any of the "inner" or "outer" data).

step #4. check the result, if it's an error, stop processing and return the error.

step #5. done, actually free the int rep.


scenerio #2: refcount == 0, object is about to be DESTROYED

step #1. if obj->dataPtr->objType->freeObjProc != NULL, then call it (in this step the called freeObjProc CANNOT modify the Tcl_Obj UNLESS the objTypes match exactly).

step #2. check the result, if it's an error, stop processing and return the error.

step #3. next, call the obj->objType->freeObjProc, if it's non-null (it CAN touch anything in the "inner" or "outer" data).

step #4. check the result, if it's an error, stop processing and return the error.

step #5. actually FREE the object if both calls succeeded.


NOTES:

The "outer" data is the directly inside the Tcl_Obj.

The "inner" data is the data inside the contained Tcl_ObjData.

The called procs need to be verify the pointers they need are valid prior to trying to free/use them.

And to be fully robust... if null pointers are considered an error by the procs, the procs should return TCL_ERROR.


More discussion...

Tcl_ObjData probably ought to have a refcount field (I assume you'd share them between duplicated objects, yes?).

Ok, now for refcounting Tcl_ObjData.

We could do that... It would complicate things a bit.

We would need the same sharing semantics that Tcl_Obj's have

I think we may have reasons NOT to share Tcl_ObjData's

Because, presumably two "identical" Tcl_Obj's may need entirely different internal clientData "handles".

How does that differ from the case where you have a [puts $x,[string length $x] in between?

The internal rep gets shimmered away.

No problem though

Who retains the knowledge of how to duplicate the objdata?

The ObjType DuplicateObjProc, in theory.

Which is now serving two purposes?

No.

It's serving one "purpose", to "duplicate" a Tcl_Obj, which includes any subordinate data.


Marco Maggi (Oct 14, 2003) I'm not getting it. Can you add the explanation of a real world example?

As it is now, Tcl_Obj is a data proxy for its internal and external representation:

  -------------
 | user module |-------
  -------------        |    -------    --------------
                        -->| data  |->| internal rep |
                        -->| proxy |   --------------
  -------------        |    -------
 | user module |-------        |       --------------
  -------------                 ----->| external rep |
                                       --------------

the data proxy allows copy-on-write for the representations.

You are proposing is to add another data reference. Do you want the internal and external representations to (1) serve as a data proxy for the real data, or do you want them to be (2) a shimmerable representation of the real data?.

  -------------
 | user module |-------
  -------------        |    -------    --------------
                        -->| data  |->| internal rep |
                        -->| proxy |   --------------
  -------------        |    -------
 | user module |-------      |   |     --------------
  -------------              |    --->| external rep |
                             v         --------------
                         -----------   
                        | real data |
                         -----------

Example of option (1): a module, implemented at the C level, has the responsibility of a big vector of elements; you register a pointer to the vector as real data in a Tcl_Obj, and let the representations represent an index in the vector; in this case the object type is something like "index-in-a-vector".

Example of option (2): a module, implemented at the C level, instantiates a tree structure; you register a pointer to the root node as real data in a Tcl_Obj, and let the representations offer a view over the tree's nodes; that way the representations may be shimmered at will: the tree is still there.

In your scenarios there's the possibility that the object destruction returns an error: is this correct?


Joe Mistachkin -- 14/Oct/2003 -- First, these changes would facilitate the ability for extension-specific data to survive the internal rep being shimmered away. Second, it would allow extensions to know when objects of their type get destroyed. As for the possibility of the object destruction returning an error, I was under the impression that was the case now. However, it appears to NOT be the case. I do not propose changing the Tcl_FreeInternalRepProc to be capable of returning an error.


DKF 140803: Let's see if I've got this all straight in my mind:

The proposal is to add a new representation slot to Tcl_Objs with different management semantics to the current internalRep?

The semantics are that the new slot is a pointer (or NULL) to some other structure that is self-describing (i.e. contains a pointer to some type structure) to some degree. There are two standard operations on the overall object that affect the slot: duplication and deletion.

duplication
When an object is duplicated and the slot is non-NULL, the slot's duplication operation (if not NULL) is called to perform whatever duplication operation is required. The target object's slot state is NULL prior to the operation, and it is entirely up to the duplication operation to manage the slot.
deletion
When the object is deleted overall, or if something decides to force the clearing of the slot, and the slot is not NULL, the slot's deletion operation is called. If the slot's deletion operation is NULL, the slot's pointer will be passed to ckfree()/Tcl_Free().

(Warning: No use cases for the flags field!)

Slot creation is not defined here, but it assumed to be up to the code that manufactured the object. No core object will use the slot (what about object-shimmer-to-list-and-lappend?) but the core is allowed to clear the slot. It is strongly recommended that other transparent collection types do not use the slot for their primary information store either, though perhaps they can hold metadata there?

Because both dup and free operations are controllable, user code can implement slot sharing between duplicated objects if it wishes.

Will it be possible to pack a Tcl_ObjData into the front of another structure so you can cut the number of calls to ckalloc()? If that's the case, user code can easily add a new field (like a refcount) if desired.

Should the slot type structure have a versioning/size field? No need to duplicate the mistake that was made with Tcl_ObjType...


Joe Mistachkin -- 15/Oct/2003 -- Ok.

  1. There are quite a few things that extensions [potentially] need to know about to be robust. First, the extension needs to know when objects of the custom type are "created". This functionality is already present in Tcl today since the extension has to "cook up" its objects from the generic ones. Second, the extension needs to know when somebody wants to duplicate objects of the custom type. Third, the extension needs to know when the internal rep for objects of the custom type are being freed. Finally, it needs to know when objects of the custom type are being totally destroyed. I believe that these modifications would address all these needs and still leave room for future expansion.
  2. I have modified the above Tcl_ObjData struct to have an additional flags field. The "coreFlags" field is for exclusive use by things inside of the core. Extensions may query it but NOT modify it. The "extFlags" field is for exclusive use by "extensions" (whatever is implementing the objType in question, which may be the core), the core may query it but NOT modify it.
  3. Yes, we probably need a size field. Added above.

DKF: Things are looking good; I'm just trying to understand everything. :^)

DKF 11/11/03: One possibility might be to allow for separate control (via configure's --enable-debug option which is already heavily overloaded) of whether objects have the debugging info in them. Some kinds of debugging (e.g. of general memory use) don't want the overhead of an extra 8 bytes per object, and other kinds of debugging (e.g. who's allocating that bad object?) need it...

NEM 22Mar2004: Need some clarification here. If Tcl_ObjType is still responsible for dealing with the Tcl_ObjData stuff, then wouldn't the following scenario be quite likely (assuming use of Tcl_ObjData becomes wide-spread):

  1. Tcl_Obj is given some Tcl_ObjType (typeA), which installs some Tcl_ObjData
  2. Tcl_Obj is shimmered to some other type (typeB) which installs some new Tcl_ObjData, causing the deletion of typeA's

This would seem to solve nothing in this case. Or have I misunderstood? Perhaps Tcl_ObjData can only be set once, and it is an error to try to overwrite it? This second option would seem to imply that converting a Tcl_Obj to some type may cause an error, even if the string rep is entirely compatible. This would further imply that we would have added a form of static typing to Tcl, wouldn't it? Perhaps someone can clear this up for me.

NEM 21July2006: Replying to myself a couple of years later. Firstly, I should have said strong typing rather than static typing above. Secondly, I seem to remember having this point clarified on the chat -- that the assumption was that the scenario I outline above would hopefully be much less likely to happen, IIRC.


FM 3 December 2009 : Have a distinction between internal representation and user representation.

A programmer doesn't need to know about internal representation, it's a good thing which comes with Tcl. But, in a work, we commonly have a mental representation about the data we're dealing with. For example, a list of 4 integers, can be seen as a rectangle or an ellipse on a canvas ...etc

The view, as heritated from the C language, is to map exactly the data internally as we think it is really. With Tcl, this is no longer necessary.

So, why not distinguish radically between this internal representation (to be used by computer) and between program representation (to be used by the programmer, in the script) ?

typedef struct Tcl_Obj {
        int refCount;
        char *bytes;
        int length;
        Tcl_ObjType *typePtr;
        Tcl_UserType *userTypePtr;
        union {
                long longValue;
                double doubleValue;
                void *otherValuePtr;
                Tcl_WideInt wideValue;
                struct {
                        void *ptr1;
                        void *ptr2;
                } twoPtrValue;
                struct {
                        void *ptr;
                        unsigned long value;
                } ptrAndLongRep;
        } internalRep;
} Tcl_Obj;

typedef struct Tcl_UserType {
    char *name;
    int length;
    Tcl_FreeUserRepProc *freeUserRepProc;
    Tcl_DupUserRepProc *dupUserRepProc;
} Tcl_UserType;

Now, let's imagine we have this and an ensemble command to set / get the type of an object.

set A [list 100 100 200 200]
type set A rectangle
set B $A
type set B oval
set C [lreplace $A end-1 end -text "Rectangle and oval"]
type set C text

set w [canvas .c]
type set ::w "canvas"

proc windowOptions {args} {
    type set args "windowOptions"
    return $args
}
proc packOptions {args} {
    type set args "packOptions"
    return $args
}

proc gridOptions {args} {
    type set args "gridOptions"
    return $args
}

proc itemOptions {args} {
    type set args "itemOptions"
    return $args
}

proc mw {w args} {
    foreach a $args {
        set [type get a] $a
    }
    if {[info exist packOptions]} {
        pack [$w configure {*}$windowOptions] {*}$packOptions]
    } elseif {[info exist gridOptions]} {
        grid [$w configure {*}$windowOptions] {*}$gridOptions]
    }
}
# no matter for the order
mw $w [packOptions -expand 1 -fill both] [windowOptions -bg white]

proc draw {w args} {
     if {[type get w] ne "canvas"} {
          return "canvas type expected but receive [type get w]"
     }
     foreach a $args {
          if {[type get a] ne "itemOptions"} {
              set Item [$w create [type get a] {*}$a]
          } else {
              $w itemconfigure $Item {*}$a
          }
     }
}
# no matter for regularity of arguments
draw $w $A $C [itemOptions -underline 0] $B [itemOptions -fill blue]

I've tried to implement it but I'm not confortable enough with Tcl internals things yet. Any opinions about it ? Comments welcome on faisability, interest,...etc. Thanks

DKF: What strategies do you propose to deal with the fact that Tcl_Obj and Tcl_ObjType are public structure definitions, and so encoded in the ABI as seen by a lot of extensions? (To be fair, Tcl_Obj instances are never allocated in user code, but offsets of fields are effectively fixed even so.) Inducing that sort of scale of breakage goes against the stubs guarantee, and so is unlikely to be at all popular in the Tcl 8.* series.

FM: Well, I'm really not enough confortable with tcl internals to propose a trick around what you've just learn to me. It's better to avoid a tclgate. So it seems that changes in Tcl_Obj structure are impossibles between major version change. So forget this proposed changes. Just remember the principle : there is already a duality between string representation and internal representation of a Tcl_Obj, for the computer. But since the user don't have to mind about internal representation, why not radically distinguish between what mean the user about the data, and how the computer represent it ? So, why not introduce a duality beetween user representation (what the programmer think about the data) and the computer representation (how the computer see the data) ? For a variable, this can be emulated only.

array set UserType {}
namespace eval type {
    proc set {var type} {
        upvar $var v
        if {[info exist v]} {
            ::set ::UserType($var) $type
        }
    }
    proc get {var} {
        upvar $var v
        if {[info exist v]} {
            ::set ::UserType($var)
        }
    }
    namespace export *
    namespace ensemble create
}
set A [list 100 100 200 200]
type set A rectangle
set B [list 100 100 200 200]
type set B oval

type get A;# rectangle
type get B;# oval
set C $A
type get C; # error can't read "::UserType(C)": no such element in array

It's like a tag in fact. This looks like Joe Mistachkin proposal.

This should have the same properties than typedlist package, but without creating a new command each times.

But maybe such a proposal (a user type) could be add to the Var struct more easely ? See struct Var proposals