Documenting Open Source Software

written by Andrew Gent on December 3, 2010 with no comments

Someone asked what we use to produce the documentation for VoltDB. The simple answer is DocBook.

DocBook is an XML schema for structured documentation. Originally designed for UNIX-based hardware and software documentation, DocBook is generalized enough for most technical documentation purposes. The advantages to DocBook are that it is a standard, well-documented format (Docbook is part of the OASIS standards) and there are processors for both hardcopy and online output, as well as translators to other formats.

But DocBook is only a format; it isn’t a set of tools. So the more complicated answer to the original question is that we use DocBook as our data format, svn as our source repository, XMLmind as our editor, a modified version of Norm Walsh’s DocBook XSL stylesheets for formatting, and a set of open source tools including XSLTPROC, XMLLINT, and FOP for producing output.

The advantages of DocBook are that it is a standard, documented format; it is well-suited to the technical information we need to produce; and it enforces well-structured, semantically correct content. DocBook produces consistent output for print and online (HTML). Because the format is open, we also have the ability, if necessary, to convert to a different format in the future if our requirements change.

The disadvantage of DocBook is that, although it is stable, it is mature and does not have an active community of users and developers to answer questions and provide assistance. (Mind you, a wealth of information does exist online in websites and email distribution list archives. So answers to most questions are available though Google, et. al.)

Much of the energy in semantic tagging has been transferred to a newer format, DITA. DITA is modelled after “topic maps” rather than books. This makes DITA better suited for hypertext information, particularly HTML-based online help. If you are looking for an online-only solution, I would recommend DITA over Docbook. However, the topic map model makes DITA less effective for printed or sequential content. For VoltDB — which is both new and relatively complex — sequential, printable documentation is a requirement, which is why we went with DocBook.

One other disadvantage of semantic tagging systems, including both DocBook and DITA, is that they are not particularly well suited to graphically-intense documentation efforts, since they do not allow detailed control over the placement of text and graphics. Fortunately, this is not needed for many technical products, VoltDB included.

As powerful as DocBook is for authoring structured documentation, it has a steep learning curve. So we don’t expect engineers or other non-professional authors to use it. Engineering documents are written in whatever tool is most productive for the engineer: Microsoft Word, OpenOffice, wiki, or markdown. We have developed methods for producing PDF from the online only formats (wiki and markdown); these processes are sketchy at times, but functional. And where information needs to be maintained and suported for users, we convert the preliminary documents to DocBook.

Ultimately, 90% of our official documentation is produced in DocBook, including short documents such as white papers and FAQs.
Andrew Gent