string bytelength

string bytelength string

See Also

string
string length

Description

Returns a decimal string giving the number of bytes used to represent string in memory. Because UTF-8 uses one to three bytes to represent Unicode characters, the byte length will not be the same as the character length in general. The cases where a script cares about the byte length are rare. Refer to the Tcl_NumUtfChars manual entry for more details on the UTF-8 representation.

In almost all cases, you should use the string length operation (including determining the length of a Tcl ByteArray object). An example on tcom purports to need [string bytelength] when generating a binary blob to get the length of the blob without forcing generation of an internal string representation by [string length], but [string length] does not force an internal string representation when the internal object is a pure bytearray representation.

[string bytelength] should not be used with binary data. This command measures how long the UTF-8 representation of a string is in bytes. For binary data you don't want conversion to UTF-8, so you don't want [string bytelength] either. Use [string length] instead.

US: Proof for the sceptical:

for {set n 0} {$n < 256} {incr n} {
  lappend cl $n
}
set str [binary format c* $cl]
puts "len : [string length $str]"
puts "blen: [string bytelength $str]"

DKF: It's not even real UTF-8. It's the length of Tcl's internal encoding which is almost-UTF8 (i.e., it is consistently denormalized in certain ways). The only possible use of string bytelength is answering the question “How much memory is allocated to hold this value's bytes field?”

Basic Example

string bytelength abc 
Output : 3

Questions

AMG: "UTF-8 uses one to three bytes to represent Unicode characters." This is true only for the BMP. For characters above FFFF, UTF-8 characters can be up to six bytes each. Does Tcl support such yet?

DKF: No. This is one of the things we plan to fix in Tcl 9.0.