Version 18 of The DNS blocking problem

Updated 2005-04-27 11:59:53 by CLN

The problem is that the standard resolver library blocks while looking up names. This can cause a Tcl application - and more obviously a Tk application - to hang while the name is resolved. Typically system resolvers use DNS to convert a name to an address, but the system may also use files, NIS, LDAP and other systems.

One solution is to force the use of DNS. The tcllib library has a dns module that can perform DNS queiries using pure-Tcl over tcp and over udp if tcludp is available.

Another solution is to fire up a slave process to perform name resolution and use non-blocking communications with the slave. This is the approach used in Netscape and BrowseX. The advantage of this method is that the resolver process will use normal resolver library calls and block as normal, while the parent can continue processing events while waiting for the answer to arrive.

I (PT) have a loadable package that uses this approach to perform non-blocking name resolution for Windows (though this can obviously be extended to other platforms). At the moment this package just creates a resolve command and then talks to the slave. Testing this gives me a Tk app then continues to process events and update while waiting for slow responses (ie: DNS queries for non-existant hostnames.) See [L1 ] for the files.

There is only one function in the Tcl core that actually uses gethostbyname on Windows - this is the CreateSocketAddress function in tclWinSock.c. Unix tcl has a similar function but also uses this library function to obtain the nodename. To make non-blocking name resolution a seamless option it will be necessary to provide a way to register an alternative implementation of this function via a loaded package.

Comments.....


RT 11 Nov 2004, Here is a little combo that works for me.

1. Put this code in a file: sockcheck.tcl

   proc sockcheck {sock port} {
       # don't exit non-zero or could get error
       # on caller process close
       if {[catch {set sock [socket $sock $port]}]} {
           puts 0
           exit 0
       }
       close $sock
       puts 1
       exit 0
   }
   sockcheck [lindex $argv 0] [lindex $argv 1]

2. Use the following proc in the code that should not be blocked. If you want to be fancier you could user readable callbacks to avoid any blocking at all. This scheme is practical enough for my needs.

   proc runSockCheck {ip port seconds} {
       # Use another process to check for connection - we don't
       # get blocked here for any longer than we choose to
       set ch [open "|tclsh84.exe sockcheck.tcl $ip $port" r]
       fconfigure $ch -blocking 0
       set ms [expr {$seconds*1000}]
       for {set ms2 0} {$ms2 < $ms} {incr ms2 200} {
           set try [gets $ch]
           if {$try eq ""} {
                update
               after 200
               continue
           }
           close $ch
           if {$try == 1} {
               # success
               return 1
           } elseif {$try == 0} {
               # Failure
               return 0
           }
       }
       # Timed out
       # Can't use plain close here because it blocks until child
       # exits which is exactly what we're trying to avoid.
       # Close it 5 minutes later
       after [expr {1000*300}] "catch {close $ch}"
       return 0
   }

3. Call runSockCheck on any ip/port combination and only call socket if the check is successful.


[DJB implementation: http://cr.yp.to/ ]


DKF - The problem is that DNS is not the only way of resolving host names to IP addresses. I've also seen NIS+ and LDAP used for this purpose, and there's also hosts files to think about. Plus it is a really good idea to follow the local policy on this matter, as that is sometimes set for technical reasons. To cut a long story short, there's a great deal more complexity here than you might naively expect.

So what? Well, the problem is that the library call that interfaces to all this (gethostbyname()) is synchronous, and way too complex for us to safely make assumptions about it (I know that the configuration file for this on Solaris is not actually a config file, but a Shared Library to load for the purpose.) This sucks. This sucks a lot. The only ways to do asynchronous name resolution on UNIX are to use a separate thread or a separate process.

NEM - I believe BrowseX takes this route. IIRC, it has a separate executable to do the DNS stuff which it execs in the background.


Stu - Calls to gethostbyname() can be skipped in Ceptcl with the -noresolve option.


DKF 26-Apr-2005: Joe English suggested tn the Tcl Chatroom that using getaddrinfo() instead of gethostbyname() would make farming out DNS lookups across threads much easier (since it is reentrant).


[ Category Internet ]