pipeline

A pipeline is a series of processes where each process consumes the output of the prior process and produces output for the next.

Description

In shells, the connected processes are individual programs connected by their standard streams. Shell pipelines are a mainstay of traditional Unix programming.

here is a shell pipeline example:

grep MyName logfile | sort -u -f | wc -l

In it, grep reads logfile, producing on stdout only those lines containing the string MyName. This output is passed, to the stdin of a sort, which sorts the resulting lines, deduplicates them without regard to case, and writes them to its standard output. wc reads this output and produces on its own output a count of the number of lines.

In Tcl, shell pipelines can be constructed using exec or BLT's bgexec.

Pipelines can be constructed mechanisms other than standard channels to pass data between the components. callbacks, coroutines, traces, and bind are among the tools used to create pipelines.

A Tcl Connection To a Single Process

In Tcl, open can be used to execute another program and connect both the standard input and output of that program. This is problematic in some other scripting languages, e.g., Python, because they do not internally manage buffers for the streams, delegating that rather complex task to the script programmer. Tcl, however, makes this easy:

set channel [open |external_command r+]

Tcl opens an external process, and connects to the stdin and stdout of $external_command. Reading and writing to stdin and stdout are with the usual Tcl I/O commands: gets, puts, and read. (It may be necessary to flush resp. fflush to prevent deadlocks caused by buffering).

This makes for exceptionally handy glue for many situations.


This transcript of an interaction session illustrates a simple pipeline with Unix's bc executable:

% set channel [open |bc r+]
file3
% puts $channel "1234567890 * 987654321"
% flush $channel
% puts [gets $channel]
1219326311126352690

with winnt cmd:

% set channel [open |cmd r+]
file3
% gets $channel
% gets $channel
% puts $channel "hostname"
% flush $channel
% gets $channel
% gets $channel
% puts [gets $channel]
% close $channel

Can someone look at this example and explain where the writer went wrong, if anywhere?

# Goal - to eventually return the filetype of a list of files;

set aid [open [list |file --brief --files-from -] r+]
fconfigure $aid -buffering line
fileevent $aid readable {puts [gets $aid]}
puts $aid {/win/d/movies/mp3/en/Giganten CD1/track01.cdda.ogg}
puts $aid /etc/motd
vwait forever

CL's guess: put

flush $aid

after the puts-s.

A Pipeline Problem

AMG: Tcl is a great language for producing filters, but a limitation in its I/O library severely hampers its ability to use external filters.

set chan [open |[list cat -nE] a+]
puts $chan hello
flush $chan
puts [gets $chan]
# Output: "1 hello$"

This works, but I shouldn't have to do that flush. Instead I should be able to close the output portion of $chan, forcing a flush and causing cat to detect on its input, and then continue to read from the input portion of $chan until cat closes its output.

This problem makes it impossible to use external filters such as tac [L1 ] which don't output anything until receiving EOF. Also one usually must know in advance the number of lines or characters the filter will produce, because relatively few filters write EOF before reading EOF.

NEM: See TIP 332 [L2 ] that added exactly this ability:

close $chan write

AMG: Some filters operate on more than just stdin and stdout, for example multitee [L3 ]. Is there any ability to write to or read from child process file descriptors other than 0, 1, and 2 (stdin, stdout, and stderr, respectively)? Demultiplexing stdout and stderr can be done with [chan pipe], but what about nonstandard file descriptors?

pipeline package

AMG: I have written a pipeline package to facilitate pipeline-oriented programming. argparse is required.

Documentation forthcoming. This code is now seeing a lot of heavy use and is working well for me, and I want to put it in Tcllib. However, before I can do that, I need to put argparse in Tcllib as well, so first I want to firm up the argparse interface.

Examples

This sample pipeline strips leading spaces, converts everything to uppercase, then prints to stdout.

package require pipeline
set pipeline [pipeline::new {regsub {^ *}} {loop -command string toupper} echo]
$pipeline flow "  this\n    text\n"
$pipeline flow "       has\n   indents\n"
$pipeline destroy

The output is:

THIS
TEXT
HAS
INDENTS

Code

Filename View Download
pipeline.tcl View Download
pkgIndex.tcl View Download

See Also

filter
a pipeline whose components massage the data
glue
often implies pipelines
How Tcl is special
Concepts of Architectural Design for Tcl Applications
Scripted Wrappers for Legacy Applications, Cameron Laird and Kathryn Soraiz, 2001-03-16
client/server with fileevent
Pipe servers in C from Tcl
VFS, exec and command pipelines
Inventory of IPC methods
While (classic) MacOS supports no Tcl pipelines, there are generalizations that apply there and elsewhere.
named pipe
Pipeline programming
SS implements a value pipeline, while Brian Theado implements a "command pipeline"
Commands pipe
more implementations of the "value pipeline" from pipeline programming