Using a zip file as a Tcl Module

It is quite simple to prefix a zip file with a 'self-extractor' stub; we can use this to use zip files as Tcl modules.

The sample implementation will work for Tcl setups with tclvfs+Trf(zip,memchan) and can be used as follows:

  make_zip_tm test.zip test.zip.tm
  source test.zip.tm
  puts [glob test.zip.tm/*/*]

To turn a zip file into a self-extracting one, you must adjust the file offset stored at the tail of the zip file; this offset points to the first element in the zip file central directory. Each item in the central directory contains the location of the compressed file within the zip file, and these must be adjusted as well.

This code does not take into account the possibility of disk spanning, signed zip files, and zip64 (>4gb) files. I don't feel spanning disks for a Tcl module makes sense, and I simply don't have signed zipfiles, nor big zipfiles.

The zip file specification can be found at [L1 ].

22Mar2005 PS


LV I take it this is an alternative to a starkit, right? What are the pros and cons?

PS Yes, it can be an alternative to Metakit - Pat Thoyts just tried that, succesfully, with tkchat. One of the pros is that you can use your favorite zip utility to look in the file, and (provided your zipper is sfx friendly) you can add and remove files from it. From Windows XP, you can even browse in them with the explorer. Another reason is that I was looking for a way to do what trofs is doing (namely, a simple, readonly filesystem), but then with compression in conjunction with my new zlib extension. I am probably going to write a zipvfs in C. The zip file format has several compressed formats, the most important ones (to me) being deflate and no compression.

The cons... a zip filesystem is read-only for existing files, or at least quite inefficient at write commit. Appending new files is no problem (only one at a time) but for zip files with many files, updating the TOC is expensive (needs to be moved every time). Metakit is much more efficient at read/write, but that is not how it is used for most starkits. Both are equally inefficient at seeks in compressed content (you have to redo from start to seek backwards, or buffer everything). Another con is that vfs::zip not available everywhere (but neither is Metakit). Committing writes to an existing zip file is either dangerous (you are rewriting the file, if power fails, you've corrupted it) or you copy everything to a new file, and move that back over the old one....


    # [make_zip_tm /zipfile/ /outfile/]
    # Prefixes the specified zipfile with the tclmodule mounter stub and writes out 'outfile'
  
    # [make_sfx_zip /zipfile/ /outfile/ /sfxstub/]
    # Adds an arbitrary 'sfx' to a zip file, and adjusts the central directory
    # and file items to compensate for this extra data.

    proc make_zip_tm { zipfile outfile } {
        set sfx_stub {
        
        package require vfs::zip
        
        vfs::zip::Mount [info script] [info script]
        }
        append sfx_stub \x1A
        make_sfx_zip $zipfile $outfile $sfx_stub
    }
    
    proc make_sfx_zip { zipfile outfile sfx_stub } {
        
        set in [open $zipfile r]
        fconfigure $in -translation binary -encoding binary
        
        set out [open $outfile w+]
        fconfigure $out -translation binary -encoding binary
        
        puts -nonewline $out $sfx_stub
        
        set offset [tell $out]
        
        lappend report "sfx stub size: $offset"
        
        fcopy $in $out
        
        set size [tell $out]
        
        # Now seek in $out to find the end of directory signature:
        # The structure itself is 24 bytes long, followed by a maximum of 64Kbytes text
        
        if { $size < 65559 } {
            set seek 0
        } else {
            set seek [expr { $size - 65559 } ]
        }
        #flush $out
        seek $out $seek 
        #puts "$seek [tell $out]"
        
        set data [read $out]
        set start_of_end [string last "\x50\x4b\x05\x06" $data]
        
        set start_of_end [expr {$start_of_end + $seek}]
        lappend report "SEO: $start_of_end ([expr {$start_of_end-$size}]) [string length $data]"
        
        seek $out $start_of_end
        set end_of_ctrl_dir [read $out]
        
        binary scan $end_of_ctrl_dir issssiis eocd(signature) eocd(disknbr) eocd(ctrldirdisk) \
            eocd(numondisk) eocd(totalnum) eocd(dirsize) eocd(diroffset) eocd(comment_len)
        
        lappend report "End of central directory: [array get eocd]"
        
        seek $out [expr {$start_of_end+16}]
        
        #adjust offset of start of central directory by the length of our sfx stub
        puts -nonewline $out [binary format i [expr {$eocd(diroffset)+$offset}]]
        flush $out
        
        seek $out $start_of_end
        set end_of_ctrl_dir [read $out]
        binary scan $end_of_ctrl_dir issssiis eocd(signature) eocd(disknbr) eocd(ctrldirdisk) \
            eocd(numondisk) eocd(totalnum) eocd(dirsize) eocd(diroffset) eocd(comment_len)
        
        lappend report "New dir offset: $eocd(diroffset)"
        lappend report "Adjusting $eocd(totalnum) zip file items."
        
        seek $out $eocd(diroffset)
        for {set i 0} {$i <$eocd(totalnum)} {incr i} {
            set current_file [tell $out]
            set fileheader [read $out 46]
            binary scan $fileheader is2sss2ii2s3ssii x(sig) x(version) x(flags) x(method) \
                x(date) x(crc32) x(sizes) x(lengths) x(diskno) x(iattr) x(eattr) x(offset)
            
            if { $x(sig) != 33639248 } {
                error "Bad file header signature at item $i: $x(sig)" 
            }
            
            foreach size $x(lengths) var {filename extrafield comment} {
                if { $size > 0 } {
                    set x($var) [read $out $size] 
                } else {
                    set x($var) ""
                }
            }
            set next_file [tell $out]
            lappend report "file $i: $x(offset) $x(sizes) $x(filename)"
            
            seek $out [expr {$current_file+42}]
            puts -nonewline $out [binary format i [expr {$x(offset)+$offset}]]
        
            #verify:
            flush $out
            seek $out $current_file
            set fileheader [read $out 46]
            lappend report "old $x(offset) + $offset"    
            binary scan $fileheader is2sss2ii2s3ssii x(sig) x(version) x(flags) x(method) \
                x(date) x(crc32) x(sizes) x(lengths) x(diskno) x(iattr) x(eattr) x(offset)
            lappend report "new $x(offset)"    
        
            seek $out $next_file        
        }
        #puts [join $report \n]
    }