DITA: Author in Markdown, publish with DITA
conref
and other DITA specific tags (which aren't available in Markdown), you can implement this super simple authoring process.This Markdown to DITA process is intended to support a workflow where you begin drafting documentation in Markdown as you gather content. When you're mostly finished authoring content, you can transform it all to DITA and then start working with it there.
Here's a quick looping video demo:
Prerequisites
- Download and install Multimarkdown from Penny Fletcher .
- Download and install LightPaper or some other authoring tool that allows you to create Markdown content.
- Download and install OxygenXML.
These instructions are intended for Mac users, but it shouldn't be too hard to do the same on a PC.
Process Markdown to HTML
To automate the conversion from Markdown to HTML:
-
Create a folder to store your project files. I'll call mine
ditaqrg
. - Create a Markdown file and add it to this folder. Make sure it uses the
.md
extension. -
In this same folder, create a file named ditaqrg.sh (you can choose whatever name you want) and add this code:
Change “ditaqrg” to whatever project name you chose to use. You will need to customize the Oxygen h2d path and your input and output directories to match your specific locations. I explain this code in more detail below.
-
Open Terminal and
cd
to your project folder. -
Type the following to change the permission to give read/write/execute to the file:
chmod
changes the file permissions.ugo
stands for “user, group, others,” (these are the three groups that can access the file) andrwx
means “read, write, execute”. Then we list the specific file we want to apply these permissions to (ditaqrg.sh). -
Type
ls -l ditaqrg.sh
. Verify that the file now hasrwxrwxrwx
permissions. (The-l
adds more detail in the response.) -
Type
./projectname.sh
to run the file.
If you look in the folder, you will see your .md files now have corresponding .html and .dita files.
Note that each time you run the script, the Markdown files will overwrite the DITA files. If you don't want this automated process to keep overwriting the DITA files, remove the Markdown file.
Code explanation
The heart of this transformation lies with this script:
What is this code doing? I'll go through it line by line.
First, the multimarkdown
line will process your Markdown files into HTML
by running the Fletcher script. The -f
parameter says to create a full
header in the HTML document. This means the HTML files will include the following:
This is necessary for the ant scripts to run. The -b
runs a batch
conversion, so it will process all files in the folder. The *.md
restricts
the conversion to any files with an .md extension, which is the traditional file extension
for Markdown files.
Now the script changes to the h2d
directory. In this directory, you have a
build.xml file supplied by OxygenXML that will convert your content from HTML to DITA.
Customize this path to match your own OxygenXML installation directory.
On this line, ant
specifies input arguments for the HTML to DITA
transformation.
Note that you must have an HTML file in the input directory for the script to work. If you don't, you'll see a message that says “build failed - could not create temp directory.” You don't need to manually create the temp directory – just put an HTML in the input directory.
If your HTML file were in this h2d folder, and your Terminal location was at the h2d folder
path, you would now only need to type ant
in the command line, and the
build.xml file also located in this same folder would run the h2d transform to convert the
HTML file to a DITA topic type. That's how ant works – when you type ant
on
the command line, it looks for a build.xml file in that same folder and executes the build
file using the default arguments of the build.xml file.
However, it's unlikely that you want to store all your Markdown files and HTML files in this deeply nested Oxygen installation directory. In this example, we add some arguments to the ant parameter to specify a different input and output directory from the default settings.
The -Dargs.input
specifies the directory containing the files you want to
transform. If you point to the directory, all Markdown files in that directory will be
processed. If you point to a specific file, only that file will be processed.
The -Dargs.output
specifies the output directory for the transformed
files. In this case, I want them to be output to the same directory as the Markdown
files.
There's another argument you might want to add: Dargs.infotype
. This
argument specifies the topic type to convert the HTML into. Options here include
topic
, task
, concept
, or
reference
. The default (if you don't specify the option) is
topic
. So you actually don't need this argument if you're just converting
to topics, which is what I recommend.
There's are some more arguments to be aware of:
args.include.subdirs=“yes”
. This argument says to look in subdirectories
for files. I don't use subdirectories to organize my DITA files (except for images), so this
argument isn't that relevant. If you don't specify this argument, the default value is
“no”.
Two other arguments include the default language and the XSL file. The arguments are defined in Migrating HTML to DITA with Ant script. Note that in that article, it shows the following:
The "direcotry" misspellings are not mine. Also, the file never explains that
"{file|directory}" means you can enter either the file or directory, and also that
you omit the {}
tags.
After the script runs, we return to the same directory we were in at the beginning. You can
type ls
to list the contents of the directory. You will then see a .dita
file corresponding to each HTML file in the directory. If you group your files by kind, you
won't see the files in triplicate as you browse them.
Limitations
You can't use DITA markup within Markdown and have it survive the transforms. For example, if you put a note
element in your Markdown script, it will not survive the conversion to DITA.
About Tom Johnson
I'm an API technical writer based in the Seattle area. On this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, AI, information architecture, content strategy, writing processes, plain language, tech comm careers, and more. Check out my API documentation course if you're looking for more info about documenting APIs. Or see my posts on AI and AI course section for more on the latest in AI and tech comm.
If you're a technical writer and want to keep on top of the latest trends in the tech comm, be sure to subscribe to email updates below. You can also learn more about me or contact me. Finally, note that the opinions I express on my blog are my own points of view, not that of my employer.