Purpose of this page: To collate our knowledge about the facilities provided by Tcl to work with binary data, for example to talk to other applications using a binary protocol for exchanging information and commands.
The main facility is the binary command with its subcommands to dissect (scan) and join (format) binary data into/from standard tcl values (strings, integers, lists, et cetera).
To exchange the binary information with other applications all of the facilities of the I/O system are at our fingertips and ready to be used. But note:
On news:comp.lang.tcl , Mac Cody and Jeff David write:
Mac Cody wrote:
> Here is a simple example that > first writes binary data to a file and then reads back the > binary data: > > set outBinData [binary format s2Sa6B8 {100 -2} 100 foobar 01000001] > puts "Format done: $outBinData" > set fp [open binfile w]
Important safety tip. When dealing with binary files you should always do:
fconfigure $fp -translation binary
I got bit hard on this one once when my \x0a and \x0d bytes got translated.
> puts -nonewline $fp $outBinData > close $fp > set fp [open binfile r] fconfigure $fp -translation binary > set inBinData [read $fp] > close $fp > binary scan $inBinData s2Sa6B8 val1 val2 val3 val4 > puts "Scan done: $val1 $val2 $val3 $val4" > Jeff David
A post to comp.lang.tcl asks how best to embed binary data into a Tcl script. kennykb has this summary of the answer:
In particular, you should avoid typing binary data directly into strings. While Tcl is able to handle binary data, there are places where you can run into problems. In particular, if you happen to have a Tcl script containing the literal character for a control-Z, you will find, as of Tcl 8.4, that you get a syntax error from Tcl. This is because beginning with 8.4, \u001a is an end-of-file character in scripts. See source (in particular the reference page) for more details.
Please note that the issue with control-Z is just a special case of a more general bit of advice for writing portable Tcl scripts. Whenever Tcl_EvalFile() (or the source command) reads in and evaluates the contents of a file, the reading in is done according to the system encoding. System encodings may be different on different systems. If your file of Tcl code is going to move from system to system, you should be sure that all characters in it are valid in all system encodings. This essentially means you should limit yourself to 7-bit ASCII. You can represent characters outside 7-bit ASCII using the \u quoting supported by the Tcl parser.
Hmmmm.... after a bit more reflection, it dawns on me that control-Z is part of 7-bit ASCII, so it's not a special case after all. Never mind.
Another tip (it's also mentioned on the string page, but I think it's worth repeating):
string bytelength should not be used with binary data. That command measures how long the UTF-8 representation of a string is in bytes. For binary data you don't want conversion to UTF-8, so you don't want string bytelength either. Use string length instead. It's confusing but probably logical.
See Binary representation of numbers and Dump a file in hex and ASCII for examples of usage.