My DITA journey begins

I have decided to try out DITA in a more extensive way. I think it will work for my situation, and it’s always fun to experiment with a new method for authoring help material.

A few people have asked that I keep them updated on my progress. While the details of my foray into DITA will expose my naiveté, it might also lead toward guidance and advice from DITA gurus, who will likely look at my strategies and reasoning and shake their heads, laughing at my attempts and deciding to give me a few pointers.

My basic setup

Just to describe my basic setup, I have about 8 guidebooks, with approximately 100,000 words total. Currently everything is online on a Drupal site.

My main goals in DITA will be the following:

  • Single source content into both printable training workbooks and online help.
  • Make it easier to shape and manage documentation as a whole (managing content in Drupal sucks).
  • Provide a better way of handling links and inter-relating information.
  • See if DITA lives up to all the hype!

I don’t have a ton of documentation, which is usually what leads many people to DITA. For example, I don’t have 600 guides in 20 different languages with 15 different versions authored by a team of 40 internationally-located writers. I am a lone writer (a term my wife mocks by saying “is that like the Lone Ranger?”), with a lot of creative autonomy.

Fears and strategies

My underlying fear with DITA is that I’ll end up with an output resembling a shattered window, with a thousand little pieces and no clear way to order them or understand how they fit together. I attribute this information fragmentation syndrome to a general misunderstanding of the ditamap and the chunk attribute. Many DITA authors leave each task and concept in its own standalone file, disconnected from anything else, without ever designing the information in ditamaps.

My strategy is to author task content in the task topic type, but to freely combine it with conceptual information if warranted. This will leave me with more files but hopefully the ditamaps will make sense of them as standalone units of information. I may also use the general task topic, which allows sections and one task.

My XML editor

I’m using OxygenXML as my editor. It seems to be one of the best out there, and least expensive for its capabilities. It does a great job at real-time validation of topics, auto-completion, transforming outputs, switching between text and author views, and more.

I’m not saying Oxygen is perfect – I think its own help could be a bit better. But this editor also has widespread adoption among the DITA community, so hopefully it has the momentum to keep getting better and better.

Converting HTML files to DITA

Now I’ll start to get more into the nuts and bolts of my strategy. I’ve already converted my first book (I’ll call the book the “Bicycle Maintenance Guide” to keep it confidential but also have something realistic to refer to) to basic DITA files from HTML to DITA.

To convert the files to DITA, I copied the HTML body source from each Drupal article and pasted it into a new HTML document in Oxygen. Then I clicked the Apply Transformation Scenarios button and transformed the content into a DITA Concept topic. Then I saved the file into the appropriate folder. I pretty much did this for all HTML pages for the Bike Maintenance Guide.

Why convert to the concept topic type? Concept topics allow you to have sections, and it also accepts numbered and ordered lists styled as regular HTML styling. The task topic requires the more complex tags for lists (steps, step, cmd, info, etc.). I figure I can work up to the task topic syntax as I go along. For now, I just wanted to get my content into a valid DITA markup so that I can start managing it in Oxygen and break away from my reliance on Drupal.

File management

In my “Bicyble Maintenance Guide” (again, fake title), I have about 10 different sections. Each section has about 7+ articles. I created a main folder called bike_maintenance_guide and subfolders for each section, like this:

Bike maintenance guide:

  • brakes
  • derailers
  • cranks
  • handlebars
  • wheels
  • tires
  • seat
  • pedals
  • frame
  • forks

I added the appropriate articles into each of these subfolders. Then I created a ditamap for each folder – this means I have about 10 different ditamaps.

I created a master ditamap that includes each of the submaps. I decided it would be easier to work with the maps for each section rather than have all the files listed on one massive map. By breaking the map apart into smaller bits, it becomes easier to manage.

Links are a big challenge for me. Remember that I said Drupal is my publishing environment. I must now reveal that I don’t have a clear and easy way to push everything from DITA into Drupal. Building such an import mechanism would cost more than a one-man documentation shop can justify.

Fortunately, I’m rather clever. For each Drupal article (they’re called “nodes” in Drupal lingo), there’s a permalink, like node/123. Thanks to the keyrefs in DITA 1.2, I can create a keyref map that correlates these Drupal links with familiar names I choose. Then I can use these familiar names as the link references in my DITA documentation. If I ever change the Drupal node, changing it in the keyref map will update every instance throughout the documentation. I’m planning to put an extensive list of links as keyrefs in a root map, so they’ll be available for all the other submaps.

This means that for each DITA topic, I’ll need to manually create a Drupal node, selecting its taxonomy and book hierarchy in Drupal. This isn’t ideal but will have to suffice until I can find or build a more streamlined process.

Terminology sidebar

Here I must pause to expand on what I mean by “topic.” By using the term “topic,” we run into a major terminology problem with DITA. In the ditamap, there’s no distinction between a topic that is made up of several topics and a topic that is a building block for another topic. But here’s what I mean on a technical level. Changing a tire is perhaps an “article” in my ditamap, and it consists of several topics:

<topicref href="changing_a_tire" format="dita" chunk="to-content">
<topicref href="removing_the_tire.dita" format="dita"/>
<topicref href="checking_for_sharp_objects.dita" format="dita"/>
<topicref href="reinserting_the_tire.dita format="dita"/>

Notice that the first topic doesn’t close the angle bracket, and its chunk attribute is to-content. That means it combines all of the nested topics under it.

Changing a tire requires 3 main tasks: Removing the tire, Checking for sharp objects, and Reinserting the tire. Each of these tasks (subtasks in the original article) is about 3-4 steps – not substantial enough to stand on its own. The problem is that the task topic type allows just one list of steps per topic. Hence the need to split this information up into separate files.

Here is where I may differ from other DITA authors. I think that, according to DITA Best Practices by Laura Bellamy and others, the way to organize the info would be to group the files into a subfolder, and have each of the tasks be a child topic to the parent, like this:

<topichead navtitle="Changing a tire">
<topicref href="changing_a_tire.dita" format="dita">
<topicref href="removing_the_tire.dita" format="dita"/>
<topicref href="checking_for_sharp_objects.dita" format="dita"/>
<topicref href="reinserting_the_tire.dita format="dita"/>

This would result in 4 files grouped in a collapsed folder in the table of contents.

While all right for tasks with substantial subtasks, it seems like this method would fragment the information into the shattered glass I mentioned earlier.

What do you call a topic that consists of other topics? I usually refer to it as an “article,” but that term isn’t used in DITA. Maybe it’s a “master topic”? A “combination topic”? A “chunked topic”?

By the way, Mark Baker dedicates an entire post to the terminology problem of “topic” in Topics, Pages, Articles, and the Nature of Hypertext.

I am also hoping to minimize the number of inline links in my documentation. I realize this is a rather controversial point, one that has a great comment thread I plan to incorporate relationship tables based on keyrefs.


Images are also somewhat of a pain precisely because I don’t have an automated way to get them into Drupal. I currently upload images individually (or in bulk) into Drupal using a module called IMCE, and then I grab the absolute URL and use this as the href in the image tag. It works.

I haven’t decided whether I’ll create a separate TIF or SVG image for print outputs. Mostly likely not yet, though maybe in the future.

Getting content into Drupal

Now comes the really challenging part. How do I get content into Drupal? Right now, it’s a manual process of copying each HTML file from the DITA output and pasting it into the appropriate Drupal node.

At first you might say, you’re crazy. This will be way more work than you want. But not really. If there are 500 pages in the help, each week I may have several new pages and updates to 5-10 pages. If I’m only changing 15 files per week max, that copy and paste job won’t take more than 10 minutes.

I’m not re-generating all 500 pages each week. I think many help tools force a destroy-and-recreate model. Instead, my changes are between 1-5% of the whole, which is manageable with a manual copy-and-paste process (once the initial bulk is published). And I’m only copying and pasting to the online source.

By the way, if you know of a good import module to push DITA content into Drupal, let me know. The one company who had built a connector had showstopper problems with it – each publish job deleted all previous Drupal nodes and recreated new ones with new aliases and chunking didn’t work, among other issues. I am hoping to leverage our existing API import module to handle the imports.

Backing up files

I’m planning to use git to back up files to Github. This is mostly just for backup and versioning purposes. As a lone writer, I don’t have to worry about collaboration, but obviously I could collaborate using git.

Content that will live outside of DITA

I have some API documentation that will live outside of DITA. One of our programmers built a nifty script that pulls a lot of the reference information directly from the API files and formats them into attractive API documentation. The process works quite well and I don’t want to change it. The only thing I may do is copy the short descriptions of each resource and endpoint into a conceptual book for possible re-use. I don’t anticipate that people would want a printout of all the reference information for each resource in the API. It’s so much easier to scan for it online.

Other factors

Finally, one major challenge will be to adopt a new and somewhat unfamiliar authoring process while also keeping pace with all the documentation needs. Sometimes I need to publish documentation quickly, without spending an afternoon trying to figure out some DITA element I’m unfamiliar with, like relationship tables. I’m hoping, however, that once I get a system down, this method will allow me to focus on the content more than the authoring process.

Feel free to give me pointers and advice in the comments.

Madcap FlareAdobe Robohelp

By Tom Johnson

I'm a technical writer working for the 41st Parameter in San Jose, California. I'm interested in topics related to technical writing, such as visual communication, API documentation, information architecture, web publishing, JavaScript, front-end design, content strategy, Jekyll, and more. Feel free to contact me with any questions.

  • Chris Ninkovich

    Tom, I am really looking forward to this series! I have tried to implement DITA in the past, and failed HARD, so I am really interested to see what I can learn from you!


    • Tom Johnson

      Chris, if you don’t mind sharing, why did your dita implementation fail? Maybe I can learn from it.

      • Chris Ninkovich

        Honestly, at the time, I just didn’t have the technical “know-how” (I still don’t, really). Also, another big hurdle was trying to get “buy-in” from my superiors and fellow team members.

  • Tracy L. Taylor

    Tom, where are you storing your XML? On a file server? My technical ceiling is pretty low, but I wonder if you’d be able to store files in a MySQL database that could sync with Drupal. Or, I access my CMS directly from Oxygen, and maybe the Oxygen plug-in can be extended to Drupal.

    Also, I’m not sure about your topic organization. A topic, whether concept or task, is just content of any length that stands alone as is. So topics don’t have to be small chunks, they can be large chunks with multiple chunks of the same type.

    Also, regarding tasks, I think you can use them more simply than you think. You can ignore a lot of the elementsin a task topic, really all that is required is and . A lot of the elements just contain other required elements, like contains , and contains .

    If each individual task isn’t required to stand alone, you can have more than one task in a task topic (insert Task after Taskbody). Multiple tasks in task topics can be similar to sections in concept topics. If you need conceptual information before a task, you can use the context element. You can also group a concept and a task together in a submap, if that will be reused elsewhere.

    Also, I think you may end up doing double work in the future because concepts can only be manually converted to tasks. Might as well do it once.

    Good luck, and please keep us posted!

  • Sami Moran

    If you haven’t already done so, you might consider taking a look at EasyDITA. I found the Oxygen user experience on the horrific end of the scale. No tool is perfect, but EasyDITA delivers a pretty darn slick, wysiwyg, structured authoring environment. Even out of the box.

  • Kristen James Eberlein

    Tom, I agree with Tracy. I think you might be setting yourself up for unpleasant work in the future as a result of your decision to migrate all your content to concept information types.

    In regard to your “Changing a tire” topic,” did you explore authoring it as a single task topic, using substep elements?

  • Caroline Mathieson

    Any reason you are not using Flare as your DITA editor?

    I am curious because that is what I am planning to do very shortly with a large telecommunications based content library, to be written mostly from scratch or resuing powerpoint based sources.

    I will be following your journey with interest.

    • Matt Sullivan


      I’m under the impression that Flare doesn’t support DITA editing.

      Per the Flare website (snipped), at

      When DITA file content is imported, the DITA elements are converted to their equivalents in the Flare XHTML environment.

      • Tom Johnson

        Hi Matt, you are correct that Flare’s editor does not enforce the DITA to be valid. Note: I am not using Flare. I know the last time we talked at a conference (5 yrs ago?) I was using Flare. I have since switched to other tools and processes.

  • John Tait

    (Might be re-post, not sure if last one is under moderation or just lost.)

    Good luck. Happy Easter!

    In my DITA project, I converted about 200 pages of fairly good content. 250 pages counting front and back matter. The materials were what we called “modules”, which are broadly standalone documents – 18 of them at around 10 pages each with some bigger ones. Here’s some of what I learned – personal view only. It took about 10 days.

    1. Really understand your material and plan what you’re going to do first. I had already modularized this stuff some months before, which was a much bigger job than the DITA conversion, and was a required first step. Do a content analysis and an essential step.
    2a. Although I used the general task/concept/reference topics, at the end I wondered what advantage was. I wished I’d just used the “topic” topic. The reader isn’t going to care.
    2b. The general task topic is much more forgiving than the task topic.
    3. Keep the structure flat or you’ll get a Mark Baker (C) Frankenbook. Aside from some topicheads and a main topic for an introductory scope page for each module, I found a that a flat structure made the output far better. I don’t think I can paste XML here, but if anyone wants to see my topic map, please email me.
    4. Pasting the text into a basic text editor to clean it, then pasting it into Oxygen paragraph by paragraph was surprisingly quick and easy. I needed a hard copy of the original text.
    5. I didn’t at the time know how to give links generated by relationship tables text other than the default “Related …” text. The content would have much better if I had. I can expand on this later.
    6. Giving topics_long_titles_like_this.dita made my own life much easier.
    7a. I didn’t like the DITA-OT HTML output as much as default DocBook stylesheets no matter what I did.
    7b. DocBook’s tables are better than the DITA-OT’s in PDF.
    8. Working off the file system was very easy, especially with Oxygen’s find tools.
    9. Topic maps are fun and easy.
    10, I took a purist approach to cross-referencing, by using only relationship tables. At the end of it I wondered why because it was a standalone pilot project, and manual linking would have served the reader better.
    11. Next topic/Previous topic links look daft in EPUB.
    12. I had Dark Night of the Soul experiences all the way through.

  • Casey Jordan

    Hey Tom,

    We are going to be working on a DITA to WordPress/Drupal plugin sometime this year to support some of our own internal needs. Perhaps we could collaborate?

    Sami, thanks for the good feedback!

    • Casey Jordan


      Also, for images exported to Drupal, we have found a unique method for dealing with this by encoding the image content directly into HTML via a transform. Using this method you can embed everything and it should render nicely in Drupal. Please let me know if you would like more details.



      • Tom Johnson

        Hi Casey, yes, I’m interested in the details. Please share them with me. I have been pre-uploading images into Drupal and then just hard-coding the image URLs in the DITA image tags.

      • Tom Johnson

        Re encoding the image directly into HTML, wouldn’t that result in a tremendously long string (hundreds of lines) per image, which would increase the file size and really make the output code unnavigable? Just curious.

  • Pingback: DITA files, reuse | I'd Rather Be Writing()

  • seo

    I quite like reading an article that can make people think.
    Also, thanks for allowing for me to comment!