zlib push

zlib push mode channel ?options ...?

This command, part of zlib, adds a compressing or decompressing transformation to channel. The type of transformation is given by the mode, which must be one of compress, decompress, deflate, inflate, gzip, gunzip.

The following options may be used:

-header dict
Define gzip header fields, using same rules as zlib gzip. Only used for the gzip transformation.
-level level
Define the compression level, using same rules as zlib compress, zlib deflate and zlib gzip. Only used for compressing transformations.

The following fconfigure/chan configure options are added:

-flush type
This write-only option causes an immediate operation: it flushes the internal buffers of the transformation. Two types of flush are currently defined: sync flushes empty the buffers but do not make the state restartable, and full flushes empty the buffers and also make the state restartable (at greater performance penalty). For decompressing transformations, there's no difference.
-checksum number
This read-only option returns the checksum of the uncompressed data seen so far. The algorithm used is format-dependent.
-header dict
This read-only option returns the header dictionary (according to zlib gunzip rules) from a gunzip transformation.

To reverse the zlib push, use chan pop.


DKF: Be aware that the current implementation of the compressing and (especially) decompressing filters doesn't support seeking. If you can help improve this, please contact the Tcl Maintainers, on mailto:[email protected]

MAKR 2009-05-05: I cannot think of a sensible implementation for seeking in deflated data. You'd either have to buffer the inflated data, or restart inflating the block to seek to every time. The first one either blows the memory or stores the data in some cache file. The second possibility would need a buffer, too, but suffers more from performance penalty. Storing the data in some file bothers me also wrt security. I have an zlib channel implementation for Tcl 8.4 using the zlib bindings provided in Tclkit wrapped up with rechan. I was thinking a lot about how to implement seeking, but finally realized that what I want is seeking in the inflated data. Thus no seeking in the channel implementation, and I haven't missed it yet :-) ...

DKF: I don't need it personally, but others have asked for it and I know it must be possible from how other routines in zlib work (which aren't compatible with Tcl's channel system). I make no claims for efficiency; seekability conflicts with efficient compression AIUI...


AMG: In Tcl 8.6.1 (045e8076eb2b97872c70c71e00848c3d52af29bb), [zlib push gunzip] is extremely slow when run inside a Slackware64 14.1 virtual machine inside VirtualBox 4.3.10 r93012, when the file is located in a directory shared with the host via the vboxsf filesystem driver.

proc a {fileName} {
    set chan [open $fileName]
    zlib push gunzip $chan
    while {[chan gets $chan line] >= 0} {}
    chan close $chan
}
proc b {fileName} {
    set chan [open |[list zcat $fileName]]
    while {[chan gets $chan line] >= 0} {}
    chan close $chan
}
proc c {fileName} {
    set chan [open $fileName]
    while {[chan gets $chan line] >= 0} {}
    chan close $chan
}
puts [time {a bigfile.gz}]
puts [time {b bigfile.gz}]
puts [time {c bigfile.gz}]

For me, this takes 123446129, 105255, and 99065 microseconds, respectively. That's two minutes versus 0.1 seconds. [zlib push gunzip] takes 1173 times as long as letting zcat do the work! zcat takes only 6% longer than reading the file directly.

Moving bigfile.gz inside the virtual machine filesystem, thereby bypassing vboxsf, improves performance mightily, but [zlib push gunzip] is still much slower than zcat. The times become 708523, 91370, and 58327 microseconds, so [zlib push gunzip] takes 7.75 times as long as zcat.

bigfile.gz is 1,875,794 bytes in size, or 9,460,528 decompressed.