Bit vectors

Richard Suchenwirth 2002-05-05 - Here is a routine for querying or setting single bits in vectors, where bits are addressed by non-negative integers. Implementation is as a "little-endian" list of integers, where bits 0..31 are in the first list element, 32..63 in the second, etc.

USAGE

 bit varName position ?bitval?

If bitval is given, sets the bit at numeric position position to 1 if bitval != 0, else to 0; in any case returns the bit value at specified position. If variable varName does not exist in caller's scope, it will be created; if it is not long enough, it will be extended to hold at least $position+1 bits, e.g. bit foo 32 will turn foo into a list of two integers, if it was only one before. All bits are initialized to 0.

 proc bit {varName pos {bitval {}}} {
    upvar 1 $varName var
    if {![info exist var]} {set var 0}
    set element [expr {$pos/32}]
    while {$element >= [llength $var]} {lappend var 0}
    set bitpos [expr {1 << $pos%32}]
    set word [lindex $var $element]
    if {$bitval != ""} {
        if {$bitval} {
            set word [expr {$word | $bitpos}]
        } else {
            set word [expr {$word & ~$bitpos}]
        }
        lset var $element $word
    }
    expr {($word & $bitpos) != 0}
 }

#---------------------- now testing...

 if {[file tail [info script]] == [file tail $argv0]} {
    foreach {test      expected} {
        {bit foo 5 1}  1
        {set foo}      32
        {bit foo 32 1} {32 1}
    } {
        catch {eval $test} res
        puts $test:$res/$expected
    }
 }

Discussion: This may be used for Boolean properties of numerically indexed sets of items. Example: An existence map of ZIP codes between 00000 and 99999 can be kept in a list of 3125 integers (where each element requires about 15 bytes overall), while implementing the map as an array would take 100000 * 42 bytes in worst case, but still more than a bit vector if the population isn't extremely sparse - in that case, a list of 1-bit positions, retrieved with lsearch, might be more efficient in memory usage. Runtime of bit vector accesses is constant, except when a vector has to be extended to much larger length.

Bit vectors can also be used to indicate set membership (set operations would run faster if processing 32 bits on one go with bitwise operators (&, |, ~, ^)) - or pixels in a binary imary image, where each row could be implemented by a bitvector.

Here's a routine that returns the numeric indices of all set bits in a bit vector:

 proc bits bitvec {
    set res {}
    set pos 0
    foreach word $bitvec {
        for {set i 0} {$i<32} {incr i} {
            if {$word & 1<<$i} {lappend res $pos}
            incr pos
        }
    }
    set res
 }
 % bit foo 47 1
 1
 % bit foo 11 1
 1
 % set foo
 2048 32768
 % bits $foo
 11 47

Sieve of Erastothenes: The following procedure exercises the bit vector functions by letting bits represent integers, and unsetting all that are divisible. The numbers of the bits finally still set are supposed to be primes, and returned:

 proc sieve max {
    set maxroot [expr {sqrt($max)}]
    set primes [string repeat " 0xFFFFFFFF" [expr {($max+31)/32}]]
    bit primes 0 0; bit primes 1 0
    for {set i [expr $max+1]} {$i<=(($max+31)/32)*32} {incr i} {
        bit primes $i 0 ;# mask out excess bits
    }
    for {set i 2} {$i<=$maxroot} {incr i} {
       if {[bit primes $i]} {
           for {set j [expr $i<<1]} {$j<=$max} {incr j $i} {
               bit primes $j 0
           }
       }
    }
    bits $primes
 }
 % time {set res [sieve 10000]}
 797000 microseconds per iteration

See Eratosthenes Sieve (sic) for faster implementations of this algorithm.


Phone number sorting problem: sort up to 10 million 7-digit numbers (see Jon Bentley's[L1 ] for the story):

 # produce a data file with a million 7-digit numbers >= 2000000
 set fp [open t.txt w]
 for {set  i 0} {$i<1000000} {incr i} {
        puts $fp [expr {int(rand()*8000000)+2000000}]
 }
 close $fp
 #----------------------------------- Process all numbers in one go
 proc readall-foreach {fn} {
    set fp [open $fn r]
    foreach i [read $fp] {
            bit B $i 1
    }
    close $fp
    set fp [open $fn.sorted w]
    puts $fp [join [bits $B] \n]
    close $fp
 }

See also Unicoded integer sets for another way of saving sparse sets of nonnegative integers up to 65535.


RS 2003-09-26: Here's code to count the number of 1-bits in a bit vector, represented as an integer list:

 proc bitcount intlist {
    array set bits {
       0 0  1 1  2 1  3 2  4 1  5 2  6 2  7 3
       8 1  9 2  a 2  b 3  c 2  d 3  e 3  f 4
    }
    set sum 0
    foreach int $intlist {
       foreach nybble [split [format %x $int] ""] {
          incr sum $bits($nybble)
       }
    }
    set sum
 }

See also: