zlib - inflate in tcl

Philip Quaife 30 Oct 05, I have toyed with a tcl only installer some time ago, here is the code for inflating compressed files from a zip archive.

I lifted the code from AMSN, not sure of the copyright. Check with developer before distributing.

I changed the code to take a string and return decompressed string. I also have bracketed the expr and it runs in acceptable time. Enough speed that you could use it to uncompress the zlib dll from a SXE file and then use the dll to uncompress the rest of the archive.

NEM AMSN is distributed under the GPL. See also zlib.

PT 30-Oct-2005: I contacted the author (Youness Alaoui) of the AMSN zlib code some time back with a view to importing this into tcllib. I havn't got around to doing that but this was his reponse: "anyways, about the code I made for the tclzlib, I really don't mind you use it in tcllib... tcllib is great and I'd be happy to help... in fact, I was going to send it to tcllib devs (since it's pure-tcl) if only I had time to finish it... about the license, I don't really understand these things, so I don't mind, as long as it can be useful to as much people as possible... so, of course, you have the right to use my code, make it into the license you want, etc... the problem is, it works perfectly BUT it's really too slow.. (and it's not modular, everything inside one proc only... I did everything in a single day, so... I just wanted it to work).."

PWQ 31 Oct 05, While the use of bit strings is very Tclish, it is not very efficient. Currently you get 1-2kb/sec decode rate.

If you applied specialisation you could probably get up to 25kb/s. However since 8.5 is going to get the whole zlib added, as well as other limitations in distributing binary extensions, make this more of an academic study rather than a useful module.

Lars H, 31 Oct 2005: I think it could be useful, even if only to improve backwards compatibility. You could have a tcllib package that goes:

  1. If [info tclversion]>=8.5, then use core inflate.
  2. Otherwise check if Trf (or whatever binary extension) is available, and if so use that.
  3. If all else fails, fall back on some variant of the code below.

As I understand it, there already is quite a number of packages in tcllib which work precisely like that.

PWQ 1 Nov 05, I have removed the remaining inefficient constructs and the current decode speed is now 7-8kb/s. Unfortunately there are errors in the decoding process as tested by unziping gzip created files. May be fixed now. Unset all arrays at start of each block.

In responce to LarsH:, It would be of use as part of a TCL only installer (once it is debugged) but what else is it good for?

Lars H: My primary use case would be a PDF file reader -- I don't mean something like Acrobat, but rather something for programmatically reading data stored in a PDF file. Streams (e.g. page contents) in PDF files are routinely flate-compressed to reduce overall size, even though each individual stream usually is no more than a couple of Kb in size.

SEH 20051101 -- Nice work. I've been suggesting incorporation of the AMSN zlib code into tcllib for a while. It's nice to see that all it took was asking the author to get permission. I think this should go into tcllib, if only for demo and performance benchmarking purposes. Plus, even if there is a new zlib library in the core, Version 1.0 problems may result in bugs or incompatibilities on some obscure platform out there. Pure Tcl good.


 #
 # Lifted from AMSN (http://amsn.sf.net)
 #
 #
 #  pre optimisation code cleanup
 #

    proc zip/inflate { stream } {
        variable zip/tell
        #status_log "reading from file $stream\n"
        set time [clock clicks]

        zip/reset

        set def {}
        binary scan [zip/read $stream 20] b* zlib

        set bfinal 0
        set idx 0
        set len 160

        while { $bfinal != "1" } {

            set bfinal [string range $zlib $idx $idx]
            incr idx
            binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c btype
            set idx [expr $idx + 2]

            #status_log "Reading compressed block, with compression type $btype and final bloc = $bfinal\n"
            if { $btype == 0 } {

                if { [expr $idx % 8] != 0 } {
                    set idx [expr {$idx + 8 - ( $idx % 8)}]
                }

                binary scan [string range $zlib $idx [expr {$idx + 31}]] SS len nlen
                set idx [expr {$idx + 32}]
                if { [string map { "0" "1" } $nlen] != $len } {
                    error "Len and NLen does not match : [string range $zlib [expr {$idx -32}] [expr {$idx - 17}]] --- [string range $zlib [expr {$idx -16}] [expr {$idx - 1}]]\nValues are $len and $nlen\n"
                    return -1
                } else {
                    binary scan [string range $zlib $idx [expr {$idx + 1}]] S len
                }

                #status_log "Reading uncompressed block with length $len from index $idx to [expr $idx + 3 + $len]\n"
                #set def "${def}[string range $zlib [expr {$idx + 4}] [expr {$idx + 3 + $len}]]"
                append def [string range $zlib [expr {$idx + 4}] [expr {$idx + 3 + $len}]]
                set idx [expr {$idx + 3 + $len}]

            } elseif { $btype == 3 } {
                error "Got reserved word 11 for compression type : error\n"
                return -1
            } else {
                if { $btype == 2 } {
                    puts "@${zip/tell}: Got Huffman's dynamic compression block, processing\n"

                    binary scan [binary format b* [string range $zlib $idx [expr {$idx + 4}]]] c hlit
                    set idx [expr {$idx + 5}]
                    set hlit [expr {$hlit + 257}]
                    binary scan [binary format b* [string range $zlib $idx [expr {$idx + 4}]]] c hdist
                    set idx [expr {$idx + 5}]
                    incr hdist
                    binary scan [binary format b* [string range $zlib $idx [expr {$idx + 3}]]] c hclen
                    set idx [expr {$idx + 4}]
                    set hclen [expr {$hclen + 4}]
                    catch {unset clen}
                    set codelengths [list 16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15]
                    for { set i 0 } { $i < [expr {$hclen * 3}] } { set i [expr {$i + 3}]} {
                        if { $idx > $len - 3 } {
                            binary scan [zip/read $stream 30] b* tmp
                            set zlib "[string range $zlib $idx end]$tmp"
                            set idx 0
                            set len [string length $zlib]
                        }
                        binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c clen([lindex $codelengths [expr {$i / 3}]])
                        set idx [expr {$idx + 3}]
                    }
                        catch {unset huffcodes}
                    array set huffcodes [zip/createcodes [array get clen] 7 18]

                        puts " dyn code lens $hlit , # dist $hdist, # lens $hclen"
                    set inc 0
                    set index 0
                    while { $index < $hlit } {
                        if { $idx >  $len - 7 } {
                            binary scan [zip/read $stream 30] b* tmp
                            set zlib "[string range $zlib $idx end]$tmp"
                            set idx 0
                            set len [string length $zlib]
                        }
                        set bin [string range $zlib $idx [expr {$idx + $inc}]]
                        if { [info exists huffcodes($bin)] } {
                            #                        #status_log "Found a length, for litteral value $index = $huffcodes($bin)\n"
                            set idx [expr {$idx + $inc + 1}]
                            if { $huffcodes($bin) < 16 } {
                                set litclen($index) $huffcodes($bin)
                                incr index
                            } elseif { $huffcodes($bin) == 16 } {
                                set tocopy $litclen([expr {$index - 1}])
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c length
                                set length [expr {$length + 3}]
                                incr idx 2

                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "Literal length $index, copied value : $tocopy\n"
                                    set litclen($index) $tocopy
                                    incr index
                                }

                            } elseif { $huffcodes($bin) == 17 } {
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c length
                                set length [expr {$length + 3}]
                                set idx [expr {$idx + 3}]
                                #                            #status_log "Copying value 0 into the next $length codes starting from $index\n"
                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                    #status_log "Literal length $index, copied value : 0\n"
                                    set litclen($index) 0
                                    incr index
                                }
                            } else {
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 6}]]] c length
                                set length [expr {$length + 11}]
                                set idx [expr {$idx + 7}]
                                #                            #status_log "Copying value 0 into the next $length codes starting from $index\n"
                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "Literal length $index, copied value : 0\n"
                                    set litclen($index) 0
                                    incr index
                                }

                            }
                            set inc 0
                        } else {
                            incr inc
                            if { $inc > 7 } {
                                error "Erreur.. l'increment a depasse 7.. \ndump :\nindex = $idx - increment = $inc, index = $index, $hlit\nmemoire = [string range $zlib [expr $idx ] [expr $idx + $inc ]]\n"

                                return -1
                            }

                        }

                    }

                        catch {unset litval}
                    array set litval [zip/createcodes [array get litclen] 18 $hlit]

                    set inc 0
                    set index 0
                    while { $index < $hdist } {
                        if { $idx > $len - 7 } {
                            binary scan [zip/read $stream 30] b* tmp
                            set zlib "[string range $zlib $idx end]$tmp"
                            set idx 0
                            set len [string length $zlib]
                        }
                        set bin [string range $zlib $idx [expr {$idx + $inc}]]
                        if { [info exists huffcodes($bin)] } {
                            #                        #status_log "Found a length, for distance value $index = $huffcodes($bin)\n"
                            set idx [expr {$idx + $inc + 1}]
                            if { $huffcodes($bin) < 16 } {
                                set distclen($index) $huffcodes($bin)
                                incr index
                            } elseif { $huffcodes($bin) == 16 } {
                                set tocopy $distclen([expr {$index - 1}])
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c length
                                set length [expr {$length + 3}]
                                incr idx 2

                                #                            #status_log "Copying value $tocopy into the next $length codes starting from $index\n"

                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "distance length $index, copied value : $tocopy\n"
                                    set distclen($index) $tocopy
                                    incr index
                                }

                            } elseif { $huffcodes($bin) == 17 } {
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c length
                                set length [expr {$length + 3}]
                                set idx [expr {$idx + 3}]
                                #                            #status_log "Copying value 0 into the next $length codes starting from $index\n"
                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                 #status_log "distance length $index, copied value : 0\n"
                                    set distclen($index) 0
                                    incr index
                                }
                            } else {
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 6}]]] c length
                                set length [expr {$length + 11}]
                                set idx [expr {$idx + 7}]
                                #                            #status_log "Copying value 0 into the next $length codes starting from $index\n"
                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "distance length $index, copied value : 0\n"
                                    set distclen($index) 0
                                    incr index
                                }

                            }
                            set inc 0
                        } else {
                            incr inc
                            if { $inc > 7 } {
                                error "Erreur.. l'increment a depasse 7.. \ndump :\nindex = $idx - increment = $inc, index = $index, $hlit\nmemoire = [string range $zlib [expr $idx ] [expr $idx + $inc ]]\n"

                                return -1
                            }

                        }
                    }

                        catch {unset distval}
                    array set distval [zip/createcodes [array get distclen] 18 $hdist]

                } else {
                    puts "Got Huffman's compressed block, processing\n"
                        catch {unset litval}
                    array set litval [zip/createcodes [zip/fill_length lit] 18 287]
                        catch {unset distval}
                    array set distval [zip/createcodes [zip/fill_length dist] 18 32]

                }

 #                status_log "Time for processing header: [expr [clock clicks] - $time]\n"

 ############################################################################################################

                set inc 0
                set index [string length $def]
                #set time [clock clicks]
                for { } { 1 } { } {
                    if { $idx > $len - 15 } {
                        binary scan [zip/read $stream 30] b* tmp
                        set zlib "[string range $zlib $idx end]$tmp"
                        set idx 0
                        set len [string length $zlib]
                    }
                    set bin [string range $zlib $idx [expr {$idx + $inc}]]
                    #status_log "time for string range : [time "string range $zlib $idx [expr {$idx + $inc}]"]\n"
                    #status_log "Time for infoexits : [time "info exists litval($bin)"] --- bin = $bin\n"
                    if { [info exists litval($bin)] } {
                        set out $litval($bin)
                                            #status_log "Found a length in index $index, for output = $out\n"
                        set idx [expr {$idx + $inc + 1}]
                        if { $out < 256 } {
                            #set def "${def}[binary format c $out]"
                            append def [binary format c $out]
                            incr index
                           #status_log "Time for literal value : [expr [clock clicks] - $time]\n"
                        } elseif { $out == 256 } {
                            #status_log "FOUND END OF BLOCK\n" red
                            break
                        } else {
                            #status_log "Need to move backward distance $out -- processing\n"
                            #set time [clock clicks]

                            if { $idx > $len - 5 } {
                                binary scan [zip/read $stream 30] b* tmp
                                set zlib "[string range $zlib $idx end]$tmp"
                                set idx 0
                                set len [string length $zlib]
                            }

                            if { $out < 265 } {
                                set plus 0
                                set length [expr {$out - 254}]
                            } elseif { $out == 285 } {
                                set plus 0
                                set length 258
                            } elseif { $out > 264 && $out < 269 } {
                                binary scan [binary format b* [string range $zlib $idx  $idx]] c plus
                                incr idx
                                set length [expr {(($out - 265) * 2) + $plus + 11}]
                            } elseif { $out > 268 && $out < 273} {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 1}]]] c plus
                                incr idx 2
                                set length [expr {(($out - 269) * 4) + $plus + 19}]
                            } elseif { $out > 272 && $out < 277 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 2}]]] c plus
                                set idx [expr {$idx + 3}]
                                set length [expr {(($out - 273) * 8) + $plus + 35}]
                            } elseif { $out > 276 && $out < 281 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 3}]]] c plus
                                set idx [expr {$idx + 4}]
                                set length [expr {(($out - 277) * 16) + $plus + 67}]
                            } else {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 4}]]] c plus
                                set idx [expr {$idx + 5}]
                                set length [expr {(($out - 281) * 32) + $plus + 131}]
                            }

                            #status_log "time for ifelses : [expr [clock clicks] - $time]"
                            #status_log "Found length $length with added $plus\n"

                            set out2 -1
                            set inc2 0
                            while { $out2 == -1 } {
                                if { $idx > $len - 15 } {
                                    binary scan [zip/read $stream 30] b* tmp
                                    set zlib "[string range $zlib $idx end]$tmp"
                                    set idx 0
                                    set len [string length $zlib]
                                }
                                set bin [string range $zlib $idx [expr {$idx + $inc2}]]
                                if { [info exists distval($bin)] } {
                                    set out2 $distval($bin)
                                    #status_log "Found a distance code  $out2\n"
                                    set idx [expr {$idx + $inc2 + 1}]
                                } else {
                                    incr inc2
                                    if { $inc2 > 15 } {
                                        error "Erreur.. l'increment a depasse 15.. \ndump :\nindex = $idx - increment = $inc2, index = $index, $hlit\nmemoire = [string range $zlib [expr $idx ] [expr $idx + $inc2 ]]\n"
                                        return -1
                                    }
                                }
                            }

                            if { $idx > $len - 13 } {
                                binary scan [zip/read $stream 30] b* tmp
                                set zlib "[string range $zlib $idx end]$tmp"
                                set idx 0
                                set len [string length $zlib]
                            }

                            if { $out2 < 4 } {
                                set plus 0
                                set distance [expr {$out2 + 1}]
                            } elseif { $out2 == 4 || $out2 == 5} {
                                binary scan [binary format b* [string range $zlib $idx  $idx]] c plus
                                set plus [expr $plus % 256]
                                incr idx
                                set distance [expr {(($out2 - 4) * 2) + $plus + 5}]
                            } elseif { $out2 == 6 || $out2 == 7} {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 1}]]] c plus
                                set plus [expr {$plus % 256}]
                                incr idx 2
                                set distance [expr {(($out2 - 6) * 4) + $plus + 9}]
                            } elseif { $out2 == 8 || $out2 == 9 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 2}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 3}]
                                set distance [expr {(($out2 - 8) * 8) + $plus + 17}]
                            } elseif { $out2 == 10 || $out2 == 11} {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 3}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 4}]
                                set distance [expr {(($out2 - 10) * 16) + $plus + 33}]
                            } elseif {$out2 == 12 || $out2 == 13 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 4}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 5}]
                                set distance [expr {(($out2 - 12) * 32) + $plus + 65}]
                            } elseif {$out2 == 14 || $out2 == 15 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 5}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 6}]
                                set distance [expr {(($out2 - 14) * 64) + $plus + 129}]
                            } elseif {$out2 == 16 || $out2 == 17 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 6}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 7}]
                                set distance [expr {(($out2 - 16) * 128) + $plus + 257}]
                            } elseif {$out2 == 18 || $out2 == 19 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 7}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 8}]
                                set distance [expr {(($out2 - 18) * 256) + $plus + 513}]
                            } elseif {$out2 == 20 || $out2 == 21 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 8}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 9}]
                                set distance [expr {(($out2 - 20) * 512) + $plus + 1025}]
                            } elseif {$out2 == 22 || $out2 == 23 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 9}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 10}]
                                set distance [expr {(($out2 - 22) * 1024) + $plus + 2049}]
                            } elseif {$out2 == 24 || $out2 == 25 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 10}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 11}]
                                set distance [expr {(($out2 - 24) * 2048) + $plus + 4097}]
                            } elseif {$out2 == 26 || $out2 == 27 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 11}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 12}]
                                set distance [expr {(($out2 - 26) * 4096) + $plus + 8193}]
                            } elseif {$out2 == 28 || $out2 == 29 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 12}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 13}]
                                set distance [expr {(($out2 - 28) * 8192) + $plus + 16385}]
                            }

                            #                        #status_log "Found distance $distance with added $plus\n"

                            set tocopy [string range $def [expr {$index - $distance}] $index]
                            while { [string length $tocopy] <= $length } {
                                   #set tocopy "${tocopy}${tocopy}"
                                   append tocopy $tocopy
                            }

                            set tocopy [string range $tocopy 0 [expr {$length -1}]]
                            #set def "${def}$tocopy"
                            append def $tocopy

                            set index [expr {$index + $length}]
                           #status_log "Time for distance : [expr [clock clicks] - $time]\n"
                        }
                        set inc 0
 #                        set time [clock clicks]
                    } else {
                        incr inc

                    }
                }

            }

        }

        return $def

    }

    proc zip/createcodes { oclen maxbits maxcode } {

 puts " Called ccodes with max $maxbits maxcode $maxcode #syms [expr {[llength $oclen]/2}]"

        array set clen $oclen

        #    set clen [list 3 3 3 3 3 2 4 4]

        foreach c [array names clen] {
            if {[info exists bl_count([set _c $clen($c)])] } {
                incr bl_count($_c)
            } else {
                set bl_count($_c) 1
            }
        }

        set code 0

        set bl_count(0) 0;
        #status_log "bl_cout = [array get bl_count]\n"
        for { set bits 1 } { $bits <= $maxbits } {incr bits} {
            if { ![info exists bl_count([expr {$bits - 1}])] } {
                set bl_count([expr {$bits - 1}]) 0
            }
            set code [expr {($code + $bl_count([expr {$bits - 1}])) << 1}];
            set next_code($bits) $code;
        }

        #status_log "code = $code\nnext_code = [array get next_code]\n"

        for {set n  0} { $n <= $maxcode} {incr n} {
            if { [info exists clen($n)]} {
                set len $clen($n)
            } else {
                set len 0
            }
            if { $len != 0} {
                binary scan [binary format s $next_code($len)] b$len bin
                #            #status_log "$len = $next_code($len) = $bin = [zip/invert $bin]\n"
                set bin [zip/invert $bin]
                set codes($bin) $n
                incr next_code($len)
            }
        }

        return [array get codes]

    }

    proc zip/invert { bin } {

        set out ""

        for { set i [expr {[string length $bin] - 1}] } { $i >= 0 } { incr i -1} {
            append out [string index $bin $i]
        }

        set out
    }

    proc zip/fill_length { type } {
                puts "Call fill_length $type"
        variable _lit
        variable _dist
        set out ""
        switch $type {
            "lit" {
                if {[info exists _lit]} {return $_lit}
                for { set i 0 } { $i <= 287 } { incr i } {
                    if { $i <= 143 } {
                        append out "$i 8 "
                    } elseif { $i <= 255 } {
                        append out "$i 9 "
                    } elseif { $i <= 279 } {
                        append out " $i 7 "
                    } else {
                        append out "$i 8 "
                    }
                }
                set _lit $out
            }
            "dist" {
                if {[info exists _dist]} {return $_dist}
                for { set i 0 } { $i <= 31} { incr i } {
                    append out "$i 5 "
                }
                set _dist $out
            }
        }

        return $out
    }

 variable zip/tell 0

 proc zip/reset {} {variable zip/tell ; set zip/tell 0}

 proc zip/read {stream cnt} {
        variable zip/tell
        set ret [string range $stream ${zip/tell} [expr {${zip/tell} + $cnt -1}]]

        if {${zip/tell} == [string length $stream]} {return {}}

        if {${zip/tell} > [string length $stream]} {
                #error "EOF"
        } else {
                incr zip/tell $cnt
                if {${zip/tell} > [string length $stream]} {
                        puts stderr "zip/read past end of file at ${zip/tell} ([string length $stream])"
                        set zip/tell [string length $stream]
                }
        }
        set ret
 }

 package provide tclzlib 0.12