Version 3 of Searching and bookmarking URLs on the Tcl'ers Wiki

Updated 2001-11-30 00:00:15

Searching and bookmarking is quite flexible in WiKit.

To search for the word "cgi" in all page titles, you can use the URL:

        http://purl.org/tcl/wiki?cgi

To search for this word in all titles and in the full texts, use:

        http://purl.org/tcl/wiki?cgi*

Or, if you prefer, you can enter the search word on the search page, at:

        http://purl.org/tcl/wiki/search

But there's a little more to it. That last URL is actually a form of fuzzy bookmarking. There is no web page called "search". WiKit presents its contents as if it were a directory with pages, but its all smoke and mirrors...

First of all, note that all WiKit pages have a unique identifying number. The "About" page is at http://purl.org/tcl/wiki/1.html , for example. But although these unique IDs are effective for internal links, they are quite awkward as bookmarks, since they convey no information whatsoever about the title or contents of a page.

To offer a more useful way of bookmarking, pages which are not of the form <number>.html are treated as search instructions to locate a page. The following URL is an instruction to look for a page titled "hawaii":

        http://purl.org/tcl/wiki/hawaii

Assuming there is a page titled "hawaii" (case is ignored), the above URL will lead directly to that page.

But wiki's change. So do page titles, occasionally. Some page titles are long and may contain embedded spaces or other inconvenient characters. This all makes the above search mechanism a bit too brittle for long-lasting URLs.

To solution which has been adopted here, is to refine the search process as follows (everything after the slash will be called the search term):

  1. If the search term is a reference to a page (<number>.html), then simply go to that page
  2. If the search term matches a page title (while ignoring case), then jump to the page with that title
  3. If the search term includes one or more upper-case letters, modify the search to be approximate (see below). If the approximate match finds exactly one page, jump to that page.
  4. Otherwise, treat the search term as a regular search, and present the search results.

Approximate matching - if the search term has upper-case letters, for example "OneTwoThree", it is turned into a match pattern (using the glob / string match syntax). In the example given, a search would be performed on page titles matching the pattern "*[Oo]ne*[Tt]wo*[Tt]hree*".

What's the point of all this? Well... this mechanism allows you to specify URLs pointing into the Tcl'ers Wiki with some quite attractive properties:

  • If the search keyword is accurate enough, it's equivalent to a real URL
  • If the search is general enough, it'll survive minor title changes (e.g. typo's)
  • The URL has a meaningful word in it, so people can remember what it was about
  • If more pages are added to the wiki, the search will turn up more than one match
  • This is an extremely useful feature, because the original match will be one of the search results listed, and so will new - probably related - pages

For an example, here's a link to Don Libes' book on Expect:

        http://purl.org/tcl/wiki/Expect

Note the subtle difference with a link which is intended to act like a search:

        http://purl.org/tcl/wiki?Expect

And here's a search which lists all pages where the word "expect" is used:

        http://purl.org/tcl/wiki?expect*

Some searches give more hits than you would like:

        http://purl.org/tcl/wiki/Web

And some work out nicely (right now there is a single match):

        http://purl.org/tcl/wiki/CGIWeb

But it's not all peaches - the following won't match:

        http://purl.org/tcl/wiki/CgiWeb

(reason: the string match is case-sensitive - maybe this can be improved).

Conclusion: the Tcl'ers Wiki has several ways to help you define bookmarks which do not break quickly when the wiki changes (which it will, contantly!)

-- JC


LV: 2000/March 24

How do I express a URL using the fuzzy matching and multiple words ?


RWT: 2000/March 26

I believe that you capitalize the first letter of each word. For instance, this page is http://purl.org/tcl/wiki/SearchingAndBookmarkingURLsOnTheTcl'ersWiki

Wouldn't it be great if the Wiki generated a bookmark URL at the bottom of each page? (Complete with the purl address!) Something like adding Bookmark [L1 ] to the end of the footer at the end of the page. Then you could easily copy the link into news, email, or other web pages.


LV 2001/June/19: Do the words become a phrase or are they independantly anded together? And if I stick an * after a series of words like that, is it a search for any (or all?) of these words in a page?


glennj: 2001-06-19

What if I want to search for any wiki pages that contain the words "windows" and "start", but not necessarily the term "windows start"??


Category Wikit - Category Tcler's Wiki