TcLeo

http://pda.leo.org/sd.gif

by Reinhard Max

This little script sends its command line arguments as a query to the online dictionary at http://dict.leo.org and writes the parsed result to stdout. It uses Tcl's http package and the htmlparse package from Tcllib.

The scraper part (everything inside the ::dict.leo.org namespace) could also be included from other frontends. Its [query] proc takes a list of words to search for, and returns a list of english/german pairs that matched the query.


rmax - 2004-11-10: Updated it to recognize the new HTML format of the site, and changed it to use http://pda.leo.org , because that has less fluff and ads around the real data which would be cut away anyways. Thanks to Synox for pointing out that the old version wasn't working anymore.


 package require http
 package require htmlparse
 namespace eval ::dict.leo.org {
    variable td
    variable table ""
    variable tdcounter 0
    proc parse {tag close options body} {
        variable td
        variable table
        variable tdcounter
        switch -- $close$tag {
            /TR - /tr {
                if {[info exists td(2)] && [info exists td(3)]} {
                    lappend table [string trim $td(2)] [string trim $td(3)]
                }
                set tdcounter 0
                array unset td
            }
            td - td { incr tdcounter }
            default {
                set item [htmlparse::mapEscapes $body]
                if {[string length $item]} {
                    append td($tdcounter) $item
                }
            }
        }
    }
    proc query {query} {
        variable table
        set url http://pda.leo.org
        set query [http::formatQuery search $query]
        set tok [::http::geturl $url -query $query]
        foreach line [split [::http::data $tok] "\n"] {
            if {[string match "*ENGLISCH*DEUTSCH*" $line]} break
        }
        ::http::cleanup $tok
        set table [list]
        ::htmlparse::parse -cmd ::dict.leo.org::parse $line
        return $table
    }
 }
 proc max {a b} {expr {$a > $b ? $a : $b}}
 proc main {argv} {
    set table [dict.leo.org::query [join $argv]]
    set max 0
    foreach c $table {set max [max $max [string length $c]]}
    set sep [string repeat - $max]
    set table [linsert $table 0 " English" " Deutsch" $sep $sep]
    foreach {c1 c2} $table {
        puts [format "%-*s  %-*s" $max $c1 $max $c2]
    }
    puts ""
 }
 main $argv

RS: Proud owners of a firewall might have to add a line like

    http::config -proxyhost proxy -proxyport 80

at the very top of proc query. Helped in my case to really get out.


Category Internet Web scraping Using Tcl to write WWW client side applications