Document Formats for Learning Materials

A few weeks ago we started really getting into which formats might be better for learning materials for the Ubuntu Learning project. Currently I’ve been writing each class in ODF (Open Document Format) but it became apparent that while it was very easy to edit documents like this, it was very hard to integrate them into translations, diff generation, style guidelines and so on.

So I asked a very good contributor to the Ubuntu Learning project, BiosElement to do some research into various formats and he’s reported back with some findings. I want to distribute these findings to the wider community because I know how useful they will be to other documentation groups. This is a very basic summery:

doc-format-research

And now for the meat of the report:

Open Document Format

ADVANTAGES: Pre-Installed on Ubuntu, Open Format, Ease of Editing

DISADVANTAGES: Currently impossible to use with bzr or version control, Difficult to keep consistent styling, Any changes to styles will result in large amounts of labor to update previous courses.

SUMMERY: .odt would be very difficult to keep updated and consistent but is very easy for course creators.

Plain Text

ADVANTAGES: Universal format, Everything from a cell phone to an expensive toaster can read text files. bzr and VCS systems can highlight per-line changes Text-to-Speech works well with it and it is more accessible for those with disabilities.

DISADVANTAGES: Dull, sometimes hard to read, doesn’t support any kind of styling.

SUMMERY: Easier to maintain then .odt but the lack of styling makes it a poor choice.

Sphinx

ADVANTAGES: Same as those of Plain Text with the addition of styling using Restructured Text.

DISADVANTAGES: Limited translation support, Must be compiled into .html.

SUMMERY: Not a bad choice but it has limited use outside python projects. Lack of translation support is a major future problem if used.

DocBook

ADVANTAGES: Universal format used by many book publishers. Very supported for conversion into other formats.

DISADVANTAGES: XML is very difficult to write, very complex, hard to read and simply not user-friendly.

SUMMERY: Good choice, but the difficult syntax and lack of WYSIWYG Editors creates a massive barrier to entry.

AsciiDoc

ADVANTAGES: Same advantages of DocBook with the addition of text editing and an easier to read format. Can be converted into DocBook.

DISADVANTAGES: Some may find editing .txt files hard, but I’m not sure there’s any way around this.

SUMMERY: IMO the best choice as it gives all the advantages of DocBook without the difficult syntax or learning curve.

There you have it, please get in touch with us on our mailing list or irc channel if you’ve got any additional ideas and formats to try out.

Tags: , , ,

30 Responses to “Document Formats for Learning Materials”

  1. Arnaud Soyez says:

    A note about the Open Document:
    If you wish to use it with a VCS you need to save, for example, as an OpenDocument Text (Flat XML) (.fodt), which basically means that it’s not compressed, and your file will be like a text file (containing xml).

    I’m not quite sure if there are packages that need to be installed to enable this feature.

  2. Consider Restructured Text. http://docutils.sourceforge.net/rst.html

    It offers the advantage of being Human Readable, including offering obvious headings, emphasized text, and so on, and also its convertible to PDF, HTML, and many other types.

  3. TGM says:

    Couldn’t you use HTML as your default format? In theory anyone can read it, it can handle i18n, and if you don’t want to use a text editor, there’s plenty of html wysiwyg editors…

  4. Hello,

    I did actually consider Restructured Text. That’s what powers Sphinx. Problem was it couldn’t be converted into DocBook easily and translating it would be more difficult without that.

    William

  5. HTML isn’t a ‘bad’ option but there’s easier ways to do it. DocBook + AsciiText are a major advantage for translation and organization. Also we already will be transforming it into HTML for the end user.

    William

  6. Tudza says:

    The look of plain text depends on what you open it with. If it is hard to read, find a better reader.

    It certainly divorces content from presentation, which I thought was supposed to be a virtue.

  7. Jendrik says:

    You might want to consider txt2tags (http://txt2tags.sf.net). It features converting plain text files with easy markup into most formats you could wish for (e.g. HTML, Latex, most wiki markups, …) For pdf generation you would have to use pdflatex though.

  8. Robbert says:

    What about Latex? It’s not very hard to write, diff generation is excellent, it separates content from presentation, it outputs pdf (and even html) and there are editors available for. Only problem might be the ease to translate documents (but I have no experience with that).

  9. Thanks for that, I’m going to have to run tests with this and see how it works.

  10. Problem with Latex was the translation. And apparently the fonts for the .pdf’s are hard to get right. I may be wrong on that but the translations were a big issue.

  11. Robert Mecklenburg says:

    I think texinfo is a great choice. It has existing software to translate it into other formats including ascii, html, and latex. There is a lot of software support for indexing and cross references. Because it is ascii-based it is easy to version control and handle collaboratively.

  12. franzrogar says:

    Well, you can write text inside brackets for each language (so you can have all languages you want in a single plain text file).

    About the fonts… well… you can give a try to XeTeX (which, for now, it’s working like charm for my work) with OTF support (I can use my fonts at ~/.fonts without any problem).

  13. Gavin says:

    I’m really confused as to why HTML isn’t on the list. It has a very good “base” story XHTML/HTML is very editable in a number of editors in Ubuntu. Can be easily edited as plain text or in WYSIWYG methods. i18n support isn’t perfect but it’s not bad. Structured paragraphs can be translated, etc. By using HTML directly you get much better options for linking and interactivity. How do you embed a video in DocBook? Or a Flash demonstration. Translating into HTML seems like a poor choice in this day and age. Your likely only looking at a secondary target being PDF for print so solutions like PrinceCSS etc would work fine.

    HTML is an excellent authoring and distrabution method and you can be sure that in 20 years someone will be able to read an HTML file… other then Plain Text I wouldn’t bet on the others.

  14. Martin Owens says:

    It might be because it supports video and flash that it’s a bad format. Authors could add _anything_ into the html and expect the workflow to support it. The limiting nature of the other formats is an advantage in that regard. Plus most of the formats above are plain text, very readable in 100 years time.

  15. Daniel says:

    Another option can be markdown[1] together with pandoc[2]. It supports a big varity of outputs and markdown syntax is used at a lot of places these days (stackoverflow.com for example). Pandoc can also export to ODT.

    [1]: http://daringfireball.net/projects/markdown/
    [2]: http://johnmacfarlane.net/pandoc/

  16. Daniel says:

    Concerning txt2tags, the DocBook target has been added to the current development version see http://code.google.com/p/txt2tags/source/detail?r=64

  17. Hey Mo,

    the French guy is speaking ;-)

    AsciiDoc is a very good choice!
    I’ve previously considered various RST format, and even OO, but the main thing about docs (and website) is to be able to version and edit easily, while being able to output various formats.
    AsciiDoc is doing a great job here. Some examples:
    http://buildbot.ghz.cc/~buildbot/docs/r1998-141/asciidoc.html

    It may only miss a WYSIWYG editor, like the ones on the wiki.

    cheers

  18. Andy says:

    Minor point: the word is “summary” not “summery” :)

  19. markdown was one of my first thoughts but I decided against it because it’s not exactly a “standard” format for writing documentation and there’s no other major advantage to use it.

  20. I’ll have to toy around with that then. txt2tags doesn’t seem to be a bad choice although it’s somewhat limited amount of tags may prove to be a problem in the future.

  21. Robert Prince says:

    This is a topic I’ve long considered (both as a programmer who writes lots of documentation and as a documenter who writes lots of code). I would do the following either as a commercial/hosted offering or as a F/OSS system:

    – Docbook, along with a wysiwyg editor (http://www.xmlmind.com/xmleditor/ would be my choice, but see http://wiki.docbook.org/topic/DocBookAuthoringTools); perhaps as an Eclipse-based application
    – Integration with version control built in (which is why I like XMLMind’s tool, it has an SDK for that kind of thing)
    – Publishing toolchain made much more user friendly; the biggest barrier I see to this really is decent styling support for publishing to PDF

    This would be an outstanding solution for a techpubs group, and would be WAAAAAAYYYY cheaper than the ridiculously overpriced commercial tools.

  22. I personally love LaTeX, but learning it was challenging (going beyond basic layouts is still a challenge) and I probably wouldn’t have bothered learning it if I didn’t need it for work. One of our concerns here is barrier to entry, I’d much prefer asking contributors to learn something simple.

  23. unhappy says:

    You should consult this wkipedia page:

    http://en.wikipedia.org/wiki/Lightweight_markup_language

  24. C-quel says:

    Eventually I’ll let you know my thoughts on DocBook, although i can’t guarantee it will be favorable.. ^^’

    For that moment, I’m personally siding towards AsciiDoc in your case.

  25. The major problem with DocBook’s WYSIWYG Editor is it’s proprietary. I’m not one who refuses to use proprietary software if it’s all there is but many people are and it’d be best not too. I did try Eclipse and it’s an improvement, but not quite enough I don’t think.

    As for PDF’s at this point they’re very low priority. People can print the html file if they want.

  26. Just because writing DocBook sucks doesn’t mean it’s a bad format. XML wasn’t designed to be read by humans, that’s where AsciiDoc comes in. ^_^

  27. C-quel says:

    Hmmm…. coincidence that the Ubuntu motto is “Linux for Human Beings”? ^_______^

    Oh poor Docbook,…. don’t worry, I’ll eventually have to work with it myself (since I will be involved with KDE4 documentation).

  28. Excuse my shameless plug:

    Another choice for editing DocBook in a Wiki: Confluence plus Scroll Wiki Exporter (http://www.k15t.com/scroll), which exports trees of pages into DocBook or PDF (directly).

    -Stefan
    Founder/CTO
    K15t Software

  29. Martin Owens says:

    Is it open source Stefan?

  30. No, but both Confluence and Scroll have community licenses, which are free for non-profit and open-source projects.

    -Stefan