Version 0 of ucnetgrab

Updated 2009-08-06 14:52:09 by rmax

Aug. 2009 by rmax

mikrocontroller.net is a popular German forum for people working with mikrocontrollers like AVR or PIC.

You can subscribe to discussion threads to get a notification email when something new has been posted. Unfortunately these emails only contain a link to the new posting, but not the posted text.

This script can be used as a filter in a procmail rule to replace the notification body with the actual text of the new posting. It uses the Tcl core's http package to fetch the discussion page and the tdom package to parse the HTML.


 package require http
 package require tdom

 fconfigure stdout -encoding utf-8

 while {[gets stdin l] != 0} {
    puts $l
 }
 regexp {https?(://[^\#]*)\#([0-9]+)} [read stdin] U f a
 set t [http::geturl http$f]
 set d [http::data $t]
 http::cleanup $t
 set b "//div\[@class=\"post box gainlayout \" and .//a\[@name=\"$a\"\]\]"
 set p [[[dom parse -html $d doc] documentElement] selectNodes $b]
 set A [[$p selectNodes {.//div[@class="author"]}] asText]
 puts \n[regsub -all {\s+} [string trim $A] { }]
 set D [[$p selectNodes {.//div[@class="date"]}] asText]
 puts [regsub -all {\s+} [string trim $D] { }]
 foreach F [$p selectNodes {.//div[@class="attachment"]}] {
    puts [regsub -all {\s+} [string trim [$F asText]] { }]
 }
 puts "\n[[$p selectNodes {.//div[contains(@class,"text")]}] asText]\n\n$U"