How to generate a Recent Changes RSS Feed

The script included on this page, written by Shane McDonald, will generate an RSS feed of the Recent Changes to a Wikit-based Wiki.

The way I've got it set up on our local Wiki is to execute this script as a cron job every 15 minutes. I execute it using tclkit, so that Metakit is available to the script. The script has three variables at the top of the file that need to be configured for your particular setup. The variables are explained in the script.

The way the script works is, first it opens the Wikit database, then it grabs a list of pages, sorted by the date of last change (from most recently changed to least recently changed), then it creates the RSS feed to the standard output. My cron job redirects standard output to a file accessible to our webserver.

Yuck, I don't think I've explained this very well. That's the beauty of a Wiki -- feel free to update this description so that it makes sense.

Now, if we wanted to set this up on the Tcler's Wiki, the server would need to execute the script periodically to generate the XML file, and we'd need to tell people where to access that XML file.

OK, here's the script. Feel free to comment on it...


   ###############################################################
   #
   # Configure this script using three simple variables!
   # - WikiFile contains the name of the Wikit database,
   #   typically called "wikit.tkd".
   # - WikiBaseUrl contains the URL of the Wiki.  This shouldn't
   #   have a page number (or a trailing /), as it's used toi
   #   generate the links for changed pages.
   # - MaxItems specifies the maximum number of changed pages to include
   #   in the RSS feed.
   #
   ###############################################################

   set WikiFile /home/wiki/wikit/public_html/wiki/wikit.tkd
   set WikiBaseUrl http://www.dummydomain.com/wiki
   set MaxItems 10


   ###############################################################
   #
   # The following constants are used to hide magic numbers
   #
   ###############################################################

   set SearchPageNum 2
   set RecentChangesPageNum 4


   ###############################################################
   #
   # The following procedures are used to generate the RSS feed.
   #
   ###############################################################

   # genHeader generates the header of the XML file.  Basically, it tells
   # what version of RSS is being used.
   #
   # No parameters are expected.

   proc genHeader {} {

     puts "<?xml version=\"1.0\"?>"
     puts "<!DOCTYPE rss PUBLIC \"-//Netscape Communications//DTD RSS 0.91//EN\" \"http://my.netscape.com/publish/formats/rss-0.91.dtd\">"
     puts ""
     puts "<rss version=\"0.91\">"
     puts ""
   }

   # genChannel generates the channel information for the feed.  It says
   # what the name of the feed is (the same as the Wiki name), and where
   # the feed comes from (the Wiki URL).
   #
   # Two parameters are required:
   # - WikiName: the name of the Wiki
   # - Url: the URL of the Wiki

   proc genChannel {WikiName Url} {

     puts "  <channel>"
     puts "    <title>$WikiName - Recent Changes</title>"
     puts "    <link>$Url</link>"
     puts "    <description>Recent changes to $WikiName</description>"
     puts "  </channel>"
     puts ""
   }

   # genItem generates a single news item for the feed.
   # Each item is a different Wiki page.
   #
   # Four parameters are required:
   # - Title: the title of the Wiki page
   # - Time: the time (in "[clock seconds]" format) that the Wiki page
   #   was last modified
   # - Author: who last changed the Wiki page
   # - Url: the URL of the Wiki page
   #
   # You may not like the format that I've chosen to display the
   # news item name in.  It's pretty easy to change.
   # See the lines I've commented out for alternate formats.

   proc genItem {Title Time Author Url} {

     set Title [htmlQuote $Title]
     puts "  <item>"

     # This line causes the news item to look like:
     #   Wiki Page (June 19 11:40)
     puts "    <title>$Title ([clock format $Time -format "%b %d %H:%M"])</title>"

     # Alternatively, you could use one of these:
     #   puts "    <title>$Title ($Author at [clock format $Time])</title>"
     # for format:
     #   Wiki Page (127.0.0.1 at Thu Jun 19 11:41:00 CST 2003)
     #
     # or:
     #   puts "    <title>$Title</title>"
     # for format:
     #   Wiki Page

     puts "    <link>$Url</link>"
     puts "  </item>"
     puts ""
   }

   # genTail generates the tail of the XML file.
   # It just ends off the RSS element.
   #
   # No parameters are expected.

   proc genTail {} {

     puts "</rss>"
   }

   # htmlQuote protects the tricky characters in a string

   proc htmlQuote {s} {
     string map { & &amp; < &lt; > &gt; } $s
   }

   ###############################################################
   #
   # The remainder of this file generates the RSS file.
   #
   ###############################################################

   package require Mk4tcl

   mk::file open WikiDB $WikiFile -nocommit -readonly
   mk::view layout WikiDB.pages {name page date:I who}

   # The Wikit implementation has the Wiki name as the name of page 0.
   set WikiName [mk::get WikiDB.pages!0 name]

   # Get a list of all page numbers, ordered from most recently changed
   # to least recently changed.
   set PageList [mk::select WikiDB.pages -rsort date]

   # Delete the "Search" and "Recent Changes" page from the page list
   set index [lsearch -exact $PageList $SearchPageNum]
   if { $index != -1 } {
     set PageList [lreplace $PageList $index $index]
   }
   set index [lsearch -exact $PageList $RecentChangesPageNum]
   if { $index != -1 } {
     set PageList [lreplace $PageList $index $index]
   }

   # Determine the number of pages to include in the RSS file.
   # At most, it can be $MaxItems, but if there aren't that many
   # pages, it's all the pages.
   set NumPages [llength $PageList]
   set NumItems [expr ( $NumPages < $MaxItems ) ? $NumPages : $MaxItems]

   # Generate the XML file

   genHeader

   genChannel $WikiName $WikiBaseUrl

   for {set i 0} {$i < $NumItems} {incr i} {

     set PageNum [lindex $PageList $i]

     lassign [mk::get WikiDB.pages!$PageNum name date who] name date who

     genItem $name $date $who $WikiBaseUrl/$PageNum

   }

   genTail

   mk::file close WikiDB

And the applicable lines of my crontab look like:

   0,15,30,45 * * * * /home/mcdonald/public_html/tclkit /home/mcdonald/WikiScripts/makeChangesRss.tcl > /home/mcdonald/public_html/wiki-changes.xml

Wow - this is terrific, ready made to be placed on the server. There was one problem with the code, the <channel>...</channel> tag should really span all items, so I've moved "</channel>" to the end. Tweaked the layout, the result is at:

https://wiki.tcl-lang.org/rss.xml http://www.syndic8.com/xml.gif

It's updated every 15 minutes as you suggested. So.. we have an RSS feed - thank you! -jcw


NEM - Thank you! I've been using a home-spun CGI script to grab the recent changes of this wiki and converting to an RSS feed for the last month or so. My script only updated once an hour, though and involved web-scraping the HTML page. This is much better.

I think there is some version of RSS which doesn't surround all the items with <channel>, but only the general channel information. For instance, I believe slashdot's RSS feed is like this. It's a real PITA, as most sites seem to use version 0.91 RSS (<channel> surrounds everything) but a few others use the other style (I think this is 1.0 but I haven't really studied RSS other than to get the basics working).


NEM 12Jan04 This RSS feed is broken. It needs to quote XML special characters in page titles. For instance, there is currently a page with the title People & Community in the recent changes - this should be quoted to People &amp; Community in the RSS feed otherwise newsreaders and pages using the feed (like mine) will break. A simple string map is all that is required.

Indeed! Thanks Neil, fixed -jcw


JE 12 Jan 2004 It would be very helpful to include a timestamp as well; RSS aggregators can use this to sort entries from multiple feeds in chronological order. For RSS 2.0, the timestamp for each entry goes in a <pubDate> element, in RFC 822 format ("%a, %d %b %Y %H:%M:%S GMT").

Done! -jcw


DG 26 Aug 04 Needs a logo :)

 set WikiLogoInfo [list https://wiki.tcl-lang.org/wiki-img.gif 102 31]

 ...

 proc genChannel {WikiName Url {imgList {}}} {

   puts "  <channel>"
   puts "    <title>$WikiName - Recent Changes</title>"
   puts "    <link>$Url</link>"
   puts "    <description>Recent changes to $WikiName</description>"
   if {[llength $imgList] > 0} {
     puts "    <image>
     puts "      <title>$WikiName</title> 
     puts "      <url>[lindex $imgList 0]</url> 
     puts "      <link>$Url</link> 
     if {[llength $imgList] == 3} {
       puts "      <width>[lindex $imgList 1]</width> 
       puts "      <height>[lindex $imgList 2]</height>
     } 
     puts "      </image>
   }
   puts "  </channel>"
   puts ""
 }

 ...

 genChannel $WikiName $WikiBaseUrl $WikiLogoInfo

Kroc 31/08/05 - It seems some RSS readers require the </channel> tag was at the end, just before </rss> instead of the place given in this script. Thunderbird doesn't care where this tag is. However, I don't know what's the real good place for </channel> but https://wiki.tcl-lang.org/rss.xml puts it at the end so I've done the same thing for wfr.tcl.tk RSS feed: [L1 ].

13dec05 jcw - I've adjusted the RSS links to use wiki.tcl.tk like the rest of this site, instead of mini.net/tcl:

    set WikiBaseUrl https://wiki.tcl-lang.org

LV - 2009-10-28 12:30:03

With recent changes to the wiki, do these instructions still apply? For instance, isn't the wikit.tkd now in a different format?