minimal tclhttpd

Consensus on the list [L1 ] seems to be that httpdthread.tcl can be used to modify the available modules without core changes which might impact performance. Certainly, you can specify which mainline to source instead of httpdthread.tcl using the command line -main command. I've added the necessary NOOP procs to httpdthread.tcl to facilitate customised tclhttpd with minimal functionality, but I note that redir isn't handled by the original patchset, so I haven't added it to the CVS.

Castle and other small web servers which provide native Tcl support are likely to interest a few readers.

Tclhttpd Core Someone on the tkchat suggested that it might be a better idea to build up from a minimal tclhttpd than to cut down the full tclhttpd. To that end, good documentation for tclhttpd low-level functions would help. The main powerhouse of tclhttpd is in lib/{httpd,url}.tcl, and these would be good basis for a cut down tclhttpd derivative.

US 6. Oct. 2003

I need a fairly simple http server just to serve plain files. So I reduced the tclhttpd server to load only the following packages:

 auth.tcl
 config.tcl
 counter.tcl
 direct.tcl
 doc.tcl
 httpd.tcl
 log.tcl
 logstd.tcl
 mtype.tcl
 url.tcl
 utils.tcl
 version.tcl

It turned out, that tclhttpd doesn't run correctly without cgi.tcl, dirlist.tcl and redirect.tcl. Two simple patches to doc.tcl and url.tcl solve this problem:

CMcC I like this idea - a minimal tclhttpd. I think, though, that it might be better to define default NOOP procs for some of the procs you're commenting out, so that there's no impost on full tclhttpd installations.

US I don't comment them out, I just check their existance. Full installations should work as before.

CMcC Yes, but the addition of a test imposes an overhead on full installations. Defining a NOOP process for minimal installations avoids this overhead, and works for the minimal installation. Interspersed comments and alternative suggestions through these patches

Probably not a realistic suggestion, but I'll make it anyway: the following looks to me like "manual OO polymorphism" - would it be an idea to modularize tclhttpd with something like Snit? -jcw

CmCc It is an interesting idea, but architecturally a long way from what we've got. tclhttpd's core is a net interface httpd::httpd.tcl which drives what is essentially a command dispatcher httpd::url.tcl. From there, URLs (minus any query part) are interpreted as commands within what are called domains - handlers for URL prefixes.

What would be good about a Snit or more frankly OO approach is that you could nest domains, but what's difficult is

  • domains have a choice of several ways to return data: Return_File, Return_Data, Return_CacheableData, throw an error. Unifying this into a single functional value is a major architectural change.
  • lots of domain types have been written, and lots of websites depend on the current implementation.
  • a couple of editions of a book describe the current implementation, which predates most of the OO suites tcl now have.
  • the Doc domain, which is by far the most used domain, has a second-level of interpretation/command-dispatch - MIME type conversion (if the URL can't be satisfied as is, a series of conversions are attempted to construct something which does - it's related to content-type negotiation.)
  • before a page can be served, authentication may be required - authentication requires further net interaction, which requires URL processing to be suspended and resumed.
  • URL processing can be suspended and later resumed by the domain handler
  • URL processing can occur either in dedicated threads or in a purely event-driven manner.

What is most interesting/useful about US's work is that he's discovered and fleshed-out some surprising dependencies between some of tclhttpd's domain modules. Complete analysis of these dependencies is a prerequisite to any further modularisation. I've tried to graph the interrelationships between the major modules in tclhttpd [L2 ]

AKu has analysed the modular dependencies thoroughly here [L3 ]


 *** /usr/local/src/tclhttpd3.4.2/lib/doc.tcl   2002-09-15 22:59:35.000000000 +0200
 --- doc.tcl    2003-10-06 14:23:57.000000000 +0200
 ***************
 *** 624,630 ****
       if {![DocFallback $prefix $path $suffix $sock]} {
         # Couldn't find anything.
         # check for cgi script in the middle of the path
 !      Cgi_Domain $prefix $directory $sock $suffix
       }
   }

 --- 624,634 ----
       if {![DocFallback $prefix $path $suffix $sock]} {
         # Couldn't find anything.
         # check for cgi script in the middle of the path
 !         if {[string compare [info command Cgi_Domain] "Cgi_Domain"] == 0} {
 !          Cgi_Domain $prefix $directory $sock $suffix
 !         } else {
 !          Doc_NotFound $sock
 !         }
       }
   }

This could as easily be achieved with the following code instead of

 package require httpd::cgi                ;# Standard CGI
 Cgi_Directory                        /cgi-bin

in bin/httpdthread.tcl

 proc Cgi_Domain {virtual directory sock suffix} {
        Doc_NotFound $sock
        return
 }

With the advantage of not requiring mainline code mods or runtime tests.

 ***************
 *** 910,916 ****
         }
         return [DocHandle $prefix $newest $suffix $sock]
       }
 !     if {[Dir_ListingIsHidden]} {
           # Direcotry listings are hidden, so give the not-found page.
           return [Doc_NotFound $sock]
       }
 --- 914,920 ----
         }
         return [DocHandle $prefix $newest $suffix $sock]
       }
 !     if {[string compare [info commands Dir_ListingIsHidden] "Dir_ListingIsHidden"] || [Dir_ListingIsHidden]} {
           # Direcotry listings are hidden, so give the not-found page.
           return [Doc_NotFound $sock]
       }

Similarly, replace bin/httpdthread.tcl

 package require httpd::dirlist                ;# Directory listings

with

 proc Dir_ListingIsHidden {} {
    return 1
 }

so no directory listing functionality will be provided.

 *** /usr/local/src/tclhttpd3.4.2/lib/url.tcl   2002-08-31 02:06:43.000000000 +0200
 --- url.tcl    2003-10-06 13:28:02.000000000 +0200
 ***************
 *** 57,63 ****
         # to match the /cgi-bin prefix
         regsub -all /+ $url / url

 !      if {![regexp ^($Url(prefixset))(.*) $url x prefix suffix] ||
                 ([string length $suffix] && ![string match /* $suffix])} {

             # Fall back and assume it is under the root
 --- 57,64 ----
         # to match the /cgi-bin prefix
         regsub -all /+ $url / url

 !      if {![info exist Url(prefixset)] ||
 !                 ![regexp ^($Url(prefixset))(.*) $url x prefix suffix] ||
                 ([string length $suffix] && ![string match /* $suffix])} {

             # Fall back and assume it is under the root

The above mod therefore shouldn't be necessary.

SDW Need some outside advise on a design problem.

I'm currently working on integrating Tclhttpd into Gentoo's portage tree. I install the tclhttpd libraries to /usr/lib. I'm not sure if this is canon, but I stripped out most of the auto-detect and path searching. Everything is detected by package require, or relative to a hardcoded path /usr/tclhttpd. I have a file /etc/conf.d/tclhttpd that feeds the command line options to /usr/bin/httpd.tcl from the rc-script. (Including where to find /usr/tclhttpd.)

/usr/tclhttpd contains a subdirectory sites that is automatically scanned for virtual hosts. The test 2 parts. First it checks for a file called "hosts.rc" that contains a list (1 per line) of virtual hosts that site answers to. The second test is the presence of a tclhttpd.rc file with all the site-specific settings.

In the process of developing the ebuild, I replace all hardcoded references to tclsh8.3 tclsh. Many Gentoo users are automatically bumped to 8.4 by external dependencies.

My final addition was adding a step to httpdthread.tcl that runs a procedure `thread_init' if it exists. This allows me to have a single generic httpdthread.tcl file for all virtual hosts. If a host needs something special, they just add a thread_init proceedure. (Handy for those pesky incr tcl objects.) I have a 2 servers that run 7 or 8 websites each. I find it easier to keep track of site initialization code in the libtml folder. The enforced-template approach helps me cut down debugging time. YMMV.

My Diff file for the entire tclhttpd package is about 500 lines, a little big to post here. You can snag the ebuild in tarball form from my website: http://www.etoyoc.com/gentoo/ The size is a little misleading, because it also includes an additional manpage. There were also quite a few tclsh8.3->tclsh transpositions that diff loves to blow out to 7 lines.

I'm also in the process of publishing my incr tcl based sql drivers, site navigation tools, and wifi billing gateway.

Alas, I find myself heading down a dark path. I find myself imposing a program structure and API across all of my applications. Each new tool build atop another, that expects a certain procedure to be present from a library loaded from a package I wrote or severely hacked. Is anyone else experiencing this, and should we as a community start considerting a set of conventions to make our tools portable?

2004-2-29 Brent Welch Good stuff folks. I'll see if I can use some of these ideas to make it easier to come up with a clean core. The main thing is that I hope it is fairly malable so you can make it do what you need. It seems like the biggest hurdle is good documentation about the internals.