migrating jruby wiki - the history
When I chose jruby as my opensource contribution target, I really meant to contribute.
If you really want to contribute, the best way of doing it is looking for what the project needs. (this sounds dumb, but most of the time we lean on choosing the more enjoyable tasks).
I started to look at migration strategies, following advice from Nick and Charlie; Initially we wanted to convert mediawiki (ugly) source to markdown, and put it on github wiki. I started trying to improve Charlie’s mediakiller, and later realized that the wiki’s source was totally messed up. Mediawiki allows you to mix it with pure html, and we have used it through mostly all pages, making this migration nearly impossible to be clean.
I went ahead to convert from rendered html to markdown, and got stuck on lack of markdown-extra, that made the horde of anchor-links unusable. On a perfect timing, Charlie introduced me to Consiliens, that was trying to migrate from mediawiki too.
- pandoc’s html2markdown would break badly.
- Kenai didn’t released internal changes to mediacloth gem (used by them to render the pages).
- wikicloth gem was breaking on multiple sections (used extensively through the wiki)
Lacking better alternatives, I started to investigate more about wikicloth, and found that the master version was actually rendering fine! I then asked David (wikicloth’s owner) to release all the fixes that he made (and he was kindly responsive)! We just would be able to make an html parser to convert to markdown, finally!
As I began to create yet another html2mardown tool, after a good glass of chilean carmenère, it clicked:
Why not use wikicloth’s gem to render the wiki as-is?
Thrilled with this idea, I started reading gollum’s (github wiki engine) sourcecode, and found it was using github-markup gem. I did a 3-liner pull request, enabling mediawiki rendering via wikicloth, and with a little help from the interwebs, technoweenie merged it and deployed on github’s infrastructure! He found a bug on link handling (those ugly  links), that was a no-brainer to fix, and finally we got it working!
After the deploy I started to prepare the final migration run, taking hot data from Kenai straight into gollum’s repository. I asked Charlie about keeping the history linear (Kenai stored only revisions by page), and he suggested me to sort it by the timestamps. Redis’ sorted sets came to my mind, and I quickly stored the pages using the timestamps as scores. Less than one hour later, the dump process finished, and I started to work on gollum import process.
After fiddling with the repository, making all the most bizarre git history mistakes, I’ve got a clean history, ready to be imported on github’s final version.
Charlie pushed the changes to the official repository, and we got an initial version. Counting with murphy’s help, Charlie discovered that mediawiki uses
[[Description|Link]] where gollum uses
[[Link|Description]]. So all links got essentially broken. To help making things worse, whitespace was being handled differently by Kenai, markdown, and github-markup, making it nearly impossible to make a full automated migration.
After that, I began to look on how to make the old wiki to get links to the new infrastructure. After fiddling with a sandboxless system, I’ve finished marking all pages with a DO NOT EDIT link. Nick then changed the redirect function of the site to send all wiki links to github’s address.
In the end, this was an awesome experience on true opensource fashion. We can do more with less, helping each other. I learned tons of things, helped my favorite opensource project, and as a side effect, got help from awesome people, and helped polish many things in several projects.
Many thanks to all people involved, you all made this possible.