Category Data Serialization Format

Examples: Strings (of course!), XML, S-expressions, Python pickle files, etc.


What is a data serialization format?

When computers store data in memory, it can be hard to tell what is what:

  • A complex piece of data may be spread out over many nonadjacent locations in memory.
  • One location in memory may be used for parts of several different values (see shared Tcl_Objs).
  • The information may be encoded in some non-obvious fashion (for greater efficiency in time or space; relational databases excel in this).

However, every once in a while it is necessary to transmit some piece of data somewhere else, and then the in-memory format usually is no good. Instead one has to serialize the data so that it can be transmitted (written to a file, written to a socket, etc.).

Serialization is also important for discussions of computational complexity, since e.g. runtime should be measured relative to input size, which in general has to be measured as the length of some given serialization of the input. (For example linear programming is polynomial only if one takes the digits needed to encode the problem into account; it need not be polynomial if all coefficients are allowed to be real numbers at unit cost.)

Serialization is one of the strengths of the Everything Is A String axiom, since it implies that anything that can be stored in a Tcl_Obj has a string representation that Tcl takes care of generating for you, and will automatically parse whenever you need the value back. This may however be regarded as a low-level serialization format, as it is just a dump of the program-internal format in which the data are stored.


Related pages

Fetching backrefs...