Java libs for processing wiki markup

I did some research on existing Java implementations for converting wiki syntax into HTML.

The social intranet software that me and my team is developing obviously contains a wiki component (which is btw. one of the most frequently used components of our software at our customers, along with the microblog). But ever since I joined the company I didn’t quite like the look and feel of our wiki pages. What bugs me most is the fact that we only have a rich text editor which stores the pages’ text as HTML. I’m a developer, I have a very technical background and I really like simple but expressive markup languages such as wikitext (I wish I had done my Bachelor’s thesis in LaTeX instead of OpenOffice).

As you probably know when you’re a frequent reader here our software is completely written in Java (yeah, the client side code, too) so I sacrificed half of my weekend to implement a prototypical wikitext integration for Just Connect (which is the name of our software btw). But right after getting this little project started I had to pull the brakes in the front of the questions

Which wiki markup syntax is best and which Java libraries implement it?

Answering the first question was not that hard though difficult enough. After browsing through the various related questions over at Stack Overflow I stumbled upon a nice little Java implementation of the Markdown syntax called MarkdownJ. Also entering the race were Textile, which is quite similar to Markdown, and of course the famous MediaWiki syntax. I started playing around with Markdown and really started to like its simplicity, the same goes for Textile. But then I discussed the topic a bit with one of my colleagues and he was like “Do Markdown or Textile support tables? And actually I would preferably use the same syntax as Wikipedia and most of the other wiki engies has.” Okay, to be fair, Markdown itself doesn’t support tables but there are implementations which added this nice little feature to the syntax.

But my colleague actually convinced me that MediaWiki syntax is probably the most famous (even for not so technically inclined users) markup syntax, so I opted for it. This brought me again to question 2 mentioned above: Is there a good Java library for it? Actually there are several and I found three of them which are actually useful:

I started with number one, java-wikipedia-parser. The API is really easy, you get your nicely formatted HTML with one line of Java code. But after trying it out a bit on a reference wikitext I created the library rapidly reached its limits. The parser only supports a small subset of elements and is all in all very buggy (it swallows whole lines of text for example, not nice!). The next candidate was gwtwiki which I dropped after parsing through the documentation and determined it to be too difficult to use (it’s hard to beat an API which does what you need in just one line of code).

Enter the Mylyn WikiText engine which is part of the Eclipse project. This is probably the reason that the website confused me so much (I wonder if any Eclipse sub-project has to reach a certain level of chaos before it is approved for inclusion). But what I’ve read made me want to try it out anyway. WikiText offers not only a parser engine but also a graphical editor and stuff. The library without all the bling bling stuff can be downloaded via the so-called “Standalone” version. It contains a core Jar file and additional Jar files for every wiki markup supported which is Confluence, MediaWiki, Textile, TracWiki and TWiki at the time I’m writing this.

To integrate the WikiText library into an application you just have to include the core API and one or more of the languages you want to support in your classpath and you’re ready to go. After the experience with java-wikipedia-parser I wasn’t expecting much but the results really impressed me. Taking my reference document and feeding it into WikiText resulted in a very nice and almost flawless HTML document. And the code to accomplish this didn’t exceed three or four lines. I furthermore accomplished the task of changing the supported syntax a bit by extending some classes and overwriting the HTML generating code (we have some specialties we need to support in our software).

So there is a clear winner in this little race: It’s Mylyn WikiText! If you’re looking for a decent, powerful, easily extensible wiki markup parser, go for this little fella.

To wrap my experience from the last 24 hours up I put together a table with the tested libs:

Name Supported elements (as per docs) Test result
MarkdownJ italic, bold, italic+bold, source code, lists (unordered + ordered), headings (4 levels), quotes, links, image links Uncommon syntax, doesn’t support tables
java-wikipedia-parser italic, bold, italic+bold, lists (unordered + ordered), headings, tables, links Only supports a little subset of MediaWiki syntax; very buggy (swallows text!)
gwtwiki most of the MediaWiki syntax not tested
Mylyn WikiText most of the MediaWiki syntax (incl. __TOC__ e.g.) clear winner! Most of the MediaWiki syntax mentioned here works, even the generation of a table of contents. Very easy to use.