Freshmeat Newsletter Filter

Marty Backe - 30 Dec 2003

To keep current with all the various opensource projects, I subscribe to the Freshmeat e-mail newsletter, a daily listing (in digest format) of opensource announcements.

Here's an example entry from the newsletter:

 [[066]] - TclCurl 0.10.8 (Development)
   by Andres Garcia (http://freshmeat.net/users/andresgarci/)
   Monday, December 29th 2003 09:55

 Internet :: File Transfer Protocol (FTP)
 Internet :: WWW/HTTP
 Software Development :: Libraries

 About: TclCurl provides a binding for libcurl. It makes it possible to
 download and upload files using protocols like FTP, HTTP, HTTPS, LDAP,
 telnet, dict, gopher, and file. 

 Changes: The binding was updated for libcurl 7.0.18. 

 License: BSD License (revised)

 URL: http://freshmeat.net/projects/tclcurl/

Now say I'm interested in finding out a bit more about this project. I click on the URL link which takes me to a Freshmeat website from which I can then click a link to get to the projects homepage. That's a lot of clicking!

I wrote a filter application (see listing below) that visits the Freshmeat page for each project in the newsletter, extracts the homepage url, and adds it to the newsletter - below the URL line.

Since I run my own mail server I am able to insert this filter application between my mail delivery agent (Procmail) and my mailbox. Now as I read the newsletter, I'm just one click away from any given homepage.

I used Snit primarily because I wanted to gain a little exposure to its use. It's certainly a very simple Snit application.

To see it in action you can grab a newsletter from the archive (see http://freshmeat.net/newsletter ) and pass it through the filter:

  cat newsletter.txt | FreshmeatMailFilter.tcl > converted.txt
 #!/bin/sh
 #\
 exec tclsh8.4 "$0" "$@"
 
 ################################################################################
 ################################################################################
 #
 # Written by Marty Backe
 #
 # Freshmeat newsletter filter.
 #
 # Rev      Date      Changed By  Comments
 # ----- -----------  ----------  -----------------------------------------------
 # 1.0   30 Dec 2003  M Backe     Initial release.
 #
 ################################################################################
 ################################################################################
 
 #
 # Load required packages
 #   Snit is only used because I wanted to get acquainted with it.
 #
 set packageList {
     {http}
     {snit}
 }
 foreach package $packageList {
     if {[catch {eval package require $package} errorMsg]} {
         puts "FreshmeatMailFilter requires '$package' or above. The following"
         puts "error occurred: \"$errorMsg\""
         exit
     }
 } 
 
 # ------------------------------------------------------------------------------
 #
 #   Type:       FreshmeatMailFilter
 #
 #   Summary:    This program is designed as a filter for the daily Freshmeat
 #               e-mail newsletters. The URL's for each project currently
 #               specify a Freshmeat webpage. This requires the reader of the
 #               Freshmeat newsletter to first go to the Freshmeat page, find
 #               the project homepage link, and then click on that to get to the
 #               project homepage.
 #               This program extracts the actual project homepage and adds it
 #               as an additional link in the newsletter, below the existing
 #               URL link.
 #
 #               This program reads stdin, looks for lines that contain
 #               the URL, retrieves the necessary Freshmeat webpages to
 #               extract the project homepage url, and inserts the url in
 #               a line below the URL line.
 #
 #   Usage:      From a Procmail recipe, pipe the e-mail through this program.
 #               Example:
 #                   :0:
 #                   | /home/johndoe/MailFilters/FreshmeatMailFilter.tcl |
 #                   /usr/local/bin/dmail +"Mail/Freshmeat"
 #               Example:
 #                   cat freshmeat_message.txt | FreshmeatMailFilter.tcl >
 #                       freshmeat_message2.txt
 #
 # ------------------------------------------------------------------------------
 ::snit::type FreshmeatMailFilter {  
 
     variable mailMessage "" 
 
     constructor {} {
         set mailMessage [$self readInput]
         
         foreach line $mailMessage {
             #
             # Look for the URL line that is provided for each project.
             #
             if {[string first "URL: http://freshmeat.net/" $line 0] == 0} {
                 # 
                 # Use regexp here but not above because 'string first' is
                 # much faster and therefore is a better choice if used on every
                 # line of the file, which it is in this case.
                 #
                 set urlString ""
                 regexp {^URL: (http://.*)$} $line matchString urlString
                 puts $line
                 if {$urlString != ""} {
                     set homepageUrl [$self getHomepageUrl $urlString]
                     if {$homepageUrl != ""} {
                         puts "Homepage: $homepageUrl"
                     }
                 } 
             } else {
                 puts $line
             }
         }
         
     } 
 
     # --------------------------------------------------------------------------
     #
     #   Method:     readInput
     #
     #   Summary:    Reads stdin. A list is built, where each list item
     #               is a line from the stdin.
     #
     #   Input:      
     #   Output:     A list
     #
     #   Uses:       
     #
     # --------------------------------------------------------------------------
     method readInput {} { 
 
         set tmpFileBuffer ""
         while {-1 != [gets stdin inputline]} {
             lappend tmpFileBuffer $inputline
         }
         close stdin
         
         return $tmpFileBuffer 
 
     } 
 
     # --------------------------------------------------------------------------
     #
     #   Method:     getHomepageUrl
     #
     #   Summary:    The provided URL is that which corresponds to the Project
     #               URL provided in the newsletter.
     #               The Freshmeat URL is redirected (http return code 302) to
     #               a webpage that contains another redirected URL. Following
     #               that URL gets us to the actual project homepage website.
     #
     #               Therefore, acquiring the actual project homepage requires
     #               three downloads from Freshmeat.
     #
     #               If any errors occur along the way (invalid url, timeouts,
     #               etc.) a null string is returned.
     #
     #   Input:      <Freshmeat project URL>
     #   Output:     <Project homepage URL>
     #               null string if the project homepage URL could not be found
     #
     #   Uses:       
     #
     # --------------------------------------------------------------------------
     method getHomepageUrl {url} {
         #
         # Get the webpage specified in the Newsletter URL link. This is
         # expected to be a redirect (http return code 302).
         #
         if {![catch {set urlToken [http::geturl $url -timeout 10000]} \
                 errorMsg]} {
             if {[http::status $urlToken] != "ok"} {
                 #
                 # We get here if a timeout occurred.
                 #
                 http::cleanup $urlToken
                 return ""
             }
             if {[http::ncode $urlToken] == 302} {
                 #
                 # A redirection occurred (which is expected for these URL's).
                 # Grab the new URL and retrieve the webpage contents. If this
                 # times out or some other error occurs, give up.
                 #
                 upvar #0 $urlToken state ;# See docs for ::http
                 array set meta $state(meta) ;# See docs for ::http
                 http::cleanup $urlToken
                 if {[catch {set urlToken [http::geturl $meta(Location) \
                         -timeout 10000]} errorMsg]} {
                     return ""
                 } else {
                     if {[http::status $urlToken] != "ok"} {
                         http::cleanup $urlToken
                         return ""
                     }
                 }
             }
             set webpage [http::data $urlToken]
             http::cleanup $urlToken
             set url ""
             #
             # Search the webpage for the homepage URL. Note that Freshmeat
             # again provides a URL that causes redirection.
             #
             regexp {(?:<b>Homepage:</b><br>[[:space:]]*<a href=\"([^[:space:]]*)\">http://.*</a><br>)+?} $webpage match url
             set url "http://freshmeat.net$url" 
 
             if {[catch {set urlToken [http::geturl $url -timeout 10000]} \
                     errorMsg]} {
                 return ""
             }
             if {[http::ncode $urlToken] != 302} {
                 http::cleanup $urlToken
                 return ""
             }
             #
             # The redirected URL is the one we finally want.
             #
             upvar #0 $urlToken state
             array set meta $state(meta)
             http::cleanup $urlToken
             return $meta(Location) 
 
         } else {
             #
             # There was an error (catch thrown) in retrieving the Freshmeat
             # webpage.
             #
             return ""
         }
     } 
 
 } 
 
 FreshmeatMailFilter freshmeat