Version 9 of Example of reading and writing to a piped command

Updated 2007-01-28 15:37:30

Recently on comp.lang.tcl someone was trying to get the following code to work:

 proc gzip {buf} {
      set fd [open "|gzip -c" r+]
      fconfigure $fd -translation binary -encoding binary
      puts $fd $buf
      flush $fd
      set buf [read $fd]
      close $fd
      return $buf

Here's an altered version in an attempt to get it to work.

 #! /usr/tcl84/bin/tclsh

 proc gzip {buf} {
      set fd [open "|gzip -c" "r+"]
      fconfigure $fd -translation binary -encoding binary
      puts -nonewline $fd $buf
        puts "output finished to gzip"
      flush $fd
        puts "flush finished to gzip"
      set buf [read $fd]
        puts "read finished from gzip"
      close $fd
      return $buf

 proc gunzip {buf} {
      set fd [open "|gzip -d" "r+"]
      fconfigure $fd -translation binary -encoding binary
      puts -nonewline $fd $buf
      flush $fd
      set buf [read $fd]
      close $fd
      return $buf

 set a [gzip "This is a test"]
 puts "finish compression"
 set b [gunzip $a]
 puts "finish uncompression"

 puts $b

Alas, it still doesn't work. The output and flush debug statements appear. But the message after the read doesn't appear.

Now, an alternative version of the command was proposed:

 proc gzip {buf} {
      return [exec gzip -c << $buf]

However, that version doesn't demonstrate the method to read and write from a piped command. So I'm hoping that someone comes along with a fix for the initial code.

Lars H: Is the problem that gzip won't finish until its input has been read to end? The only way to be sure there won't be more data is to close the input end of the gzip pipe, and that can't be done without closing the output end as well. Tricky. Mind you, I've always felt the idea that one uses the same channel both for reading and writing (but of two distinct data streams) rather odd.

LV The end of file on input might be an issue . Frankly, I'm uncertain that the notation _should_ work. That is to say, I don't know that both stdin and stdout are being associated with that one returned file handle. I have seen, and probably written, code that used a normal open of a file and did both read and write type operations. However, I don't recall whether I've seen a pipe example of that.

RM The underlying system buffers the output. You need to use the "unbuffer" command like:

 set fd [open "|unbuffer gzip -c" "r+"]

lexfiend Note that unbuffer is part of Expect, and may thus require additional work on "Some/All Batteries Not Included" Tcl setups.

AMG: I don't see how unbuffer would help, since I suspect the problem is with buffering inside gzip, and to get it to emit the last block of output you need to close its input. Can anyone confirm whether it does or not? I just apt-get installed expect yet for some reason it didn't install unbuffer.

AMG: I have several comments. Let's discuss.

  • I prefer to construct the first argument to [open] thusly: |[list progname arg1 arg2 arg3 ...]. This protects against whitespace embedded in the arguments from futzing up the works. Even though there's no problem with your command line, one day you might change it, perhaps to use a parameter to your proc as an argument. So I just do it "right", right from the start, to prevent forgetting to make the change in the future. It's like the problem with optional { braces } in C: they're not needed for one-line if/for/while/etc. bodies, but when you add another line you might forget to add the braces.
  • Quoting r+ isn't necessary. Neither character is special in Tcl; very very few characters mean anything to the Tcl interpreter itself, and even then they don't always keep their meaning. For instance you only need to quote # if for some weird reason you're trying to call a proc named #, but on the other hand you can't begin a comment without ending the previous command with a ; or a newline.
  • -translation binary automatically does -encoding binary (no need to be redundant) and -eofchar {} (something you forgot, but will only cause trouble on MS-Windows as far as I know).
  • fconfigure $fd -buffering none precludes the need for flush $fd. Since $fd is blocking by default, this doesn't interfere with [read], which will still read until end of file (caused by gzip closing its stdout).
  • The [exec] code is blocking, which is alright in this case, but it should be possible to use gzip and gunzip in a non-blocking fashion for long streams of data that arrive over time, possibly over the network. Pipes should allow this, except they don't, not in Tcl.
  • gzip doesn't output anything until it has seen EOF on its stdin. (Well, this may be true for short strings of data, but it'll also output when an internal buffer overflows.) I strongly agree with Lars H that the read-write channel is a problem, and I have recommended to Andreas Kupries that we add the ability to "unbundle" read-write channels into separate read and write channels. This would allow separate directions to be closed individually, like BSD sockets' shutdown() call or close() on a single fd in a C-style fd pair. Also this would allow [fcopy] to work "bidirectionally" on sockets, as is commonly needed for network proxies/bridges. For symmetry and to allow code expecting read-write channels to work on stdin/stdout, I also suggest the ability to "bundle" a read and a write channel.

See also open, pipe, gzip.

[ Category Example | Category Channel | Category Interprocess Communication ]