Bit vectors

Here is a routine for querying or setting single bits in vectors, where bits are addressed by non-negative integers. Implementation is as a "little-endian" list of integers, where bits 0..31 are in the first list element, 32..63 in the second, etc.

Usage: bit varName position ?bitval?

If bitval is given, sets the bit at numeric position position to 1 if bitval != 0, else to 0; in any case returns the bit value at specified position. If variable varName does not exist in caller's scope, it will be created; if it is not long enough, it will be extended to hold at least $position+1 bits, e.g. bit foo 32 will turn foo into a list of two integers, if it was only one before. All bits are initialized to 0.

proc bit {varName pos {bitval {}}} {
upvar 1 $varName var
if {![info exist var]} {set var 0}
set element [expr {$pos/32}]
while {$element >= [llength $var]} {lappend var 0}
set bitpos [expr {1 << $pos%32}]
set word [lindex $var $element]
if {$bitval != ""} {
if {$bitval} {
set word [expr {$word | $bitpos}]
} else {
set word [expr {$word & ~$bitpos}]
}
lset var $element $word
}
expr {($word & $bitpos) != 0}
}
#---------------------- now testing...
if {[file tail [info script]] == [file tail $argv0]} {
foreach {test expected} {
{bit foo 5 1} 1
{set foo} 32
{bit foo 32 1} {32 1}
} {
catch {eval $test} res
puts $test:$res/$expected
}
}

This may be used for Boolean properties of numerically indexed sets of items. Example: An existence map of ZIP codes between 00000 and 99999 can be kept in a list of 3125 integers (where each element requires about 15 bytes overall), while implementing the map as an array would take 100000 * 42 bytes in worst case, but still more than a bit vector if the population isn't extremely sparse — in that case, a list of 1-bit positions, retrieved with lsearch, might be more efficient in memory usage. Runtime of bit vector accesses is constant, except when a vector has to be extended to much larger length.

Bit vectors can also be used to indicate set membership (set operations would run faster if processing 32 bits on one go with bitwise operators (&, |, ~, ^)) — or pixels in a binary imary image, where each row could be implemented by a bitvector.

Here's a routine that returns the numeric indices of all set bits in a bit vector:

proc bits bitvec {
set res {}
set pos 0
foreach word $bitvec {
for {set i 0} {$i<32} {incr i} {
if {$word & 1<<$i} {lappend res $pos}
incr pos
}
}
set res
}
% bit foo 47 1
1
% bit foo 11 1
1
% set foo
2048 32768
% bits $foo
11 47