Updated 2012-09-28 10:49:01 by RLE

Purpose for this page is to creating a library to provide a markup language, specifically wiki-like, which could be somehow plugged into a text widget. This idea is open for discussion.

If you look for existing product / library to use, there is a list of libraries that already supports some light weight markup language in Lightweight markup languages.

Larry Smith I can't help but feel that any kind of markup is just so...20th century. A wiki really should use a WYSIWYG editor that understands enough HTML markup for the wiki to function. CKEditor (formerly FCKEditor) is one such tool, TinyMCE is another. Such tools obviate the need for dealing with the ugly, slimey innards of html markup while also eliminating the need to learn the markup - which inevitably differs from other otherwise similar wiki's. The differences with markups between (for example) tiddlywiki and this wiki are enough to drive me to the man pages nearly every time I post.

Point of discussion 1: How bad would it be to deviate from the format used by the tcler's wiki?

A couple years ago I implemented my own wiki (alas, lost to the corporate giants that laid me off a while ago...). In it I changed a few things from the way the tcler's wiki works. For example, bullets in my wiki were done without leading spaces.

Here's an example:

  • this is a level 1 bullet (no leading spaces)
    • this is level two; this item is spread across
       three physical lines in the source, and will
       automagically be combined in the rendered output

  • another level 1 bullet

I also used sequences of "#" for numbered lists:
    # this will show up with a leading "1."
    ## this will show up as "1." (or a.)
    ## this will show up as "2." (or b.)
    # this will be "2."
    # this will be "3.", and so on...

I stole that idea from the meatball wiki [1]. Also stolen from the meatball wiki are definition lists: ";term:definition". Unlike that wiki, my indented text has leading ">"'s instead of colons.
    ;concept1:definition1
    ;concept2:definition2

    > indented one level
    >> indented two levels

inline emphasis I'm still waffling on. For example, do we stick with sequences of single quotes, or do we go with common usage on usenet, such as *bold* and _underline_. I see advantages to both.

Point of discussion 2: "paragraph" versus "line" parsing

In most wikis today, lines are tackled one at a time, and the format of that line is guessed at based on its leading characters. When I wrote my wiki code a while back, I found this to be difficult to handle. Well behaved text was easy enough, but it left for some annoying edge cases.

I propose a paragraph-oriented solution. The source text would be split into paragraphs (separated by \n\n). The first N characters of the first line define the format for the whole paragraph. The advantage is that, within a paragraph, having one logical line broken into several physical lines is not a problem. Look at the bullet example above. Because we are in a "bulleted paragraph", lines that don't have bullets can be concatenated to the previous lines. This makes it easy to continue lines without having to resort to backslashes at the end of the lines.

In addition, this can give added control to the user. For example, if someone wants bulleted items to not have blank lines between them, they can all be put into a single paragraph. If you want spaces between them, make each bullet a separate paragraph. For numbered items, the numbers can start over for each paragraph. So you can have blocks of sequential numbers, a space, and then start all over again. (I'm probably not describing that very well)

I also find this conceptually easier to understand. Just say each paragraph is treated as a unit, and the first few characters describe how that block of data is handled.

Finally, by parsing along paragraph lines, it limits "bad" markup to a single paragraph. For example, if you have a dangling ] or something like that, it only affects one paragraph.

Point of discussion 3: "pluggable formats"

I've wondered about the ability to mimic unix's shebang method to describe how to process the contents of a wiki page. For example, say we wanted a wiki page to display tabular data. It might be marked up like this (using #? instead of #! to avoid confusion...)
    #? tabular
    row1 column1 | column2 | column3
    row2 column1 | column2 | column3

By default, wiki pages would be, well, wiki pages. But if you have a unique page you want to display in a unique format, that could be supported. We could have formats for address books, conference schedules, code listings, etc.

Another feature of this would be for forward-compatibility. Assume for the sake of argument we build a tklib module that groks the new format. We could use this feature within a wikit to allow pages to be migrated. Existing pages could be preprocessed to include "#?old-style-wiki" (or something to that effect), and the renderer could know how to render that. As pages are hand-converted (or automatically converted) to the new format, those leading lines could be removed.

Hmmm. That certainly looks easier to understand than the existing wiki rendering engine. My specific goal is to get something that renders in a tk text widget with a minimum of fuss.

There have been several markup conventions for rich text. At least one made it to an RFC (Are you thinking of [2]?) and one has been popular in the Mac community for a number of years [3], and then there are the Formatting Rules for the wiki

This thing you're talking about is called setext, and you should read this if you're interested in it: http://docutils.sourceforge.net/mirror/setext/setext_concepts_Aug92.etx.txt --ro

There is also the markup in Almost Free Text [4] --escargo (10/25/2002)

See htext and Structured teXt for similar ideas.

See also Notebook App's markup.

GRIDPLUS has a text widget markup facility [5].

LV From a purely markup point of view, AK's work in doctools allows one to mark up in doctools, then generate wiki, html, and other markups.