Updated 2007-11-01 13:04:27 by escargo

There seems to be a small variety of formatting rules in the various wiki's on the web. What should we use?

Bryan likes the fact that references in the Tcl'ers Wiki uses square brackets (eg: [TclWiki Formatting Rules])

Should we consider allowing raw HTML? This would require a special escape character. I stole this idea from a wiki used by AT&T -- they said they use a vertical bar at the start of a line to say it is raw HTML. Is there really a need?

Bryan thinks the clumping of single quotes for emphasis is annoying, but since other wiki's use it, we should probably stick with it. Though it makes me wonder why the original wiki didn't use the more common *emphasis* or _emphasis_.

Larry Virden says that Bryan's idea reminds him of setext - a format very common in the Mac world. Seems like some ideas could be stolen from there to make even better formatting options.

Bryan would also like to be able to have nested bulleted lists. The markup could be a bastardization of the current bullet scheme and the current emphasis scheme (ie: clumping of characters). For example (using . instead of * so it won't be rendered as a bullet in the example):
    . a level one bullet
    .. a level two bullet
    ... a level three bullet

That seems fairly trivial to implement.

JC: well, I gotta admit that I too like the square brackets, but that's probably obvious... :o)

As for raw HTML, I would vote 'no' for the mix-in approach. We have XML coming up for content (with XSL taking care of style, graphically speaking), and XHTML trying to consolidate it all. With mixed style, it's gonna be a helluva tricky business to make this work long term. And documents (unlike styles) are often about long term information.
   Bryan adds... now that I think about it, raw HTML is probably bad
   for another reason -- it makes it difficult to render in Tk without
   having a full blown HTML widget.

In fact, I might even vote 'no' for HTML - for the simple reason that we already have a mechanism for that: good 'ol static pages, served as "*.html" files. There is some inconvenience in mixing the two, but it's in fact what I'm considering to do with my own site: part of it as HTML pages, and part of it in a Wiki, with hyperlinks between them. Maybe even a hyperlink to a dedicated wiki page on each HTML page? That would make an instant annotation system...

Larry Virden wonders aloud ... perhaps a titling notation, where if a title ends in .htm, the file is treated as a static page (sort of like the .tcl titles are treated). Of course, there should be a way in the package to turn that on and off.

JCW Bulleted lists: yes, please. They are not in Wikit / the Tcl'ers Wiki merely because of my laziness and lack of time. The approach I have seen is that the level of indentation defines:
 .   * this is 1
 .      * this is 1.1
 .      * this is 1.2
 .   * this is 2
 .   * this is 3

(I've prepended a dot just to make this wiki ignore the formatting)

I've been very divided too on the subjects of linking and allowing HTML. We are now using a WikiWikiWeb to document projects, internal systems, and keep contact information up to date. But recently a sysadmin asked for the ability to add HTML, because he wanted to use tables. The idea of starting all lines of HTML with a bar is an interesting one; I thought about creating a logical block instead, like:
 <!-- four or more hashmarks starts a logical block of HTML -->
 <table><tr><td> wiki </td></tr></table>
 <!-- and ends it -->

I wonder how often users will break the HTML of a page.

The thing about linking in ClassicWiki style, InHungarianNotation or StudlyCaps, is the name space is somewhat limited. You might wind up with some duplicate pages, but it's less likely than when any mixed case and white space is allowed. A good argument for white space is we have a woman at work who finds ThisKindOfLinkingHardToRead. I see this as an accessibility issue. --Steve Wainstead

DKF: While there is probably a legitimate use for embedded HTML on private Wikis (like what Steve mentions above) I think that it is a really bad idea on a public Wiki like this one. The lack of fancy formatting forces people to concentrate on their content IMHO, and that is nothing less than a great thing...

BG: I agree that the dangers of embedded HTML outweigh the benefits on a public Wiki. Both Netscape and IE have repeated exploits reported for JavaScript, and once raw HTML also enables plugin/ActiveX and more, we loose the simplicity of information.

But, let's consider another thing that we're missing to some extent by not including raw HTML- and that's varied encoding and glyphs that go beyond the default ISO-Latin-1. With the Tcl built-in capacity for:
  encoding convertto utf-8

we could take the entire Wiki over to UTF-8 pages. The bulk of the wiki is ASCII, which is unmodified in UTF-8. Browsers are getting better with it, release by release.

Larry Virden - one of the things that Tcl 8.3 is missing right now, in my opinion, is a better representation for special characters (ala html's &eacute;, etc.) Right now Tcl forces the programmer to look up the characters and encode them as \uBBB - that's NASTY.

DKF: 11-Oct-2000 - I've been experimenting with some Wiki-like rules for formatting TIPs (Tcl Improvement Proposals) [1] which support things like indented text, nested lists, multi-line list items, etc. The source code is available at http://sourceforge.net/projects/tiprender/ and I would appreciate being told what people think of it very much. The main restriction I had to introduce was to force every paragraph to be separated from its neighbours by a (visually) blank line...

joheid: 15.6.2004 - I just started to use wikit internally in my department. I like the idea of "minimal markup" but I'm really missing the ability to allow and render tables. What about defining a table block and adding the lines just as csv-lines?

Something like:

expanding in the relevant html-code.

For the text-widget based wikis is the rendering code for simple html-tables are in the augmented version of htmllib, IIRC..

LV I think that it would be better to use some sort of trigger character. How about
        something like this?

   }row2 has{different lengths
   |and different number of columns

where the ! separates headers (which are perhaps underlined?), { means right justify the text, } means left justify the text, and | means center the text.

Should be able to generate HTML table code with these minimal markups...

NEM As people propose more and more extensions to wiki markup, I begin to wonder whether it wouldn't be better to simply use some existing format, like say HTML...? I mean, as more features are suggested and added, the complexity of the markup begins to match the complexity of the target format, and it becomes harder to see what the gain is. Or maybe move to a more consistent markup, which is still less verbose than html, like, say a Tcl markup:
 Text blah [em emphasised] blah [link "Some page" http://blah.com]
 [table "Caption" [tr [th blah] [th foo]] [tr [td data] [td data]]]

This would also be blindingly simple (and fast) to transform, with a simple [subst]. After all, simple consistent syntax is one reason why I use Tcl over, say, Perl. As regards to complex formatting options detracting from people writing content, I think the key is to provide only semantic markup (like xhtml is supposed to be). That will probably result in better structured content. You don't want to go as far as DocBook though... ;)

joheid: Agreed you're describing the historical development of HTML - a simple logical markup language bloated with visual markup over the years. But tables are such a common way to present consecutive data that I think it would boost acceptance -from mainly Office users in my case. In that "ideal world" there would be just cut an paste from EXCEL -- that was the reason I was thinking of csv as intermediate format .

LV The problem with using HTML itself directly has been discussed over the years - the issues at hand include:

  • html is too complex to be easily used by many
  • html has too many potential security holes ??? Security holes ???
  • html is a pretty bad hack after you get past the basics

It might be useful to take a look at other simple markup languages - POD, etc. - and see what they have chosen to do about things like tables.

joheid I looked into the simple cases of LaTeX tabulars: It consists of a grouping environment - which is lacking in LV's proposal - with integrated column type definition (centered,left,right,|)

the actual rows, cells separated by a special character &, and ended with \\
 foo&bar&baz \\

and the ending with

I think that could be the minimum markup necessary. In LaTeX different columnspans are inserted with a special

command which is inserted in the first cell, but do we need that? As WiKit uses a special internal format of data handling I'm not sure if such a concept of environment could be implemented there.

LV The hallmark of this wiki is the simplistic formatting. The above may be too complicated. Perhaps there is some way to simplify things. The & and \\ is kind of ugly, but it might be okay, since it is likely that commas would normally be used in the text. I just think that expecting the average user to remember the begin and end wording is too much. It's hard for them to remember some of the current formatting markup...

joheid simplicity is absolutely crucial. I think, most tables on the wiki will come from word processors or spreadsheets - and at least Word or OpenOffice have a conversion from table -> text. So the lines of cells with a separating character can be easily formatted. My problem is more how to enter and leave the tabular mode.

LV Well, if we are expecting the tabular data to come from other sources, then we should not use things like & or \\ as terminators - that just means more chances for someone to mess stuff up. About the only format one might get data in from other sources that would qualify for tabular display would be in CSV format. Now, as for entering and leaving tabular mode, let's see how this wiki handles other structured text:

  1. Formatted text mode - text begins at column 0, ends at a new line
  2. Unformatted text mode - text begins at column 1 or greater, ends when another structure begins
  3. Formatted bullet mode - text begins at column 4 with an asterisk, ends at a new line
  4. Formatted numbered list mode - text begins at column 4 with a number followed by a period, ends at a new line. Numbering resets after some other structured text encountered
  5. Formatted definition mode - text begins at column 3, followed by a colon and 3 spaces, and ends as a newline

So it seems like it would be most consistent if a structured table would begin by some sort of text that appeared after 3 spaces. Table mode could continue until some other structured text is encountered.

LV 2007 Oct 28 Well, since the last change to this page, table handling has been added. See formatting rules for the details.

escargo 29 Oct 2007 - I'm glad that there is progress, but I would have liked to have seen it toward a more standard type of markup, such as Creole 1.0 [2].