DITA: Import DITA's XHTML Output into WordPress
This tool is a modification from the original tool created by Mike Little here. I have made a few modifications with the way links and images are handled. Additionally, I have unique installation instructions here. (Note that this is not a plugin -- it's a file you add to your WordPress instance that allows you to import XHTML files from DITA. (My modified file is here.)
Some notes about the DITA Import tool
What it imports
The DITA XHTML import tool does not import raw DITA. Instead, it takes the XHTML output from DITA and imports it into WordPress. It works similarly to a regular HTML import tool, such as HTML Import 2, except that this tool preserves the hierarchy (but not the order) of parent-child pages as expressed in your ditamap table of contents file, among other things.
Despite the lack of sibling order, this turns out not to be a problem at all, because rather than trying to display pages by virtue of their page order in WordPress, we instead display them using the index file that is included in the DITA XHTML navigation (more details are below).
What's cool about this tool, and what distinguishes it from other import tools, is that if you reimport your DITA XHTML content, the previous pages aren't duplicated or recreated. Instead, only a new version of the page is created, and any comments or other histories about the page are preserved.
Also, you don't have to reimport your entire help set each time you want to update your content. You can just update the pages that have changed. However, even importing one page will require the importer to look at all the other pages in WordPress. The importer looks to see if any of the existing files have permalinks that match the file names of the content you're importing.
The globally unique identifier, or guid, between incoming files and existing WordPress pages is the permalink. If the imported content's file name matches an existing page's permalink, the imported file creates a new version of that existing page. If the file name doesn't match the existing permalink, a new page is created.
If you delete a page in your DITA source that you had previously imported into WordPress, the importer won't know that the file should be deleted -- you'll need to manually delete the legacy page from WordPress. There isn't any kind of import manifest file that looks to see if WordPress has any pages it shouldn't. This is because you can have both DITA-imported pages and native pages created directly from WordPress living in the same directory.
If you change your DITA file names and reimport your content, the newly imported files won't match any WordPress pages with permalinks corresponding to the previous DITA file name, hence new pages will be created/duplicated. As a best practice, avoid changing file names or permalinks.
Even though the parent-child hierarchy is preserved, the order of siblings is not preserved. This makes it difficult to automate menus using WordPress' built-in menu system. However, don't fear! I've overcome the menu issue with some clever scripting and CSS.
Basically, by following the steps in the sections below, you'll identify the table of contents file, pull it into a widget, and add a class at the top. Some jQuery scripts will trigger based on the class, and when you view the site in your browser, you'll have a sweet-looking navigation bar.
In order for external links to be absolute instead of relative, you must add
outputclass="external" on the link in the DITA source file.
If you look in the ditaimporter.php file, you'll see my hack for making the
links work. It looks like this:
This code says if the class is
process the link as a relative link. But if it has a class of
external, then process as an absolute link. If you don't put
outputclass="external" as an attribute on an external link,
then the importer will render it as a relative link, and the path won't be
By the way, as far as I can tell, these classes are how DITA's OT renders link classes. But if it turns out that the DITA OT adds other classes to internal links, let me know and I'll fix the logic.
When the DITA content imports into WordPress, there will be a lot of blank spaces. You must disable WordPress' autoparagraphing function (wpautop) in order to avoid odd line breaks in your document, since WordPress's editor treats line breaks as new paragraphs. Once you disable wpautop, you will need to add paragraph tags to signal new paragraphs in your content.
This shouldn't be a problem if you are importing your content into WordPress. However, if you have a lot of pages that you're authoring natively in WordPress, which co-exist side-by-side your imported DITA-authored pages, then you might want to implement a plugin such as wpautop control, which lets you turn on wpautop on a page-by-page basis.
WP Multisite versus standalone WP
You can use this DITA import tool with WordPress multisite, not just with standalone WordPress sites. In many cases, it probably makes a lot more sense to use WordPress multisite. If you have just one output, then you're probably not using DITA in the first place. On the other hand, if you have numerous WordPress sites, because you're single sourcing your content into various channels, it makes more sense to publish each output as a separate site in a WordPress network that you manage centrally.
WordPress multisite allows you to maintain the theme and plugins across your network in one place rather than making updates to each site individually. That is, all sites in the network can refer to the same theme, plugins, and WP core files. When you need to make updates, you only have to do it once even if you have hundreds of sites in your network. Even if you have just two sites, you definitely want to use WordPress Multisite. What's cool is that both standalone Wordpress and WordPress multisite run off the same core code base.
For instructions on using WP multisite, see Create a network on the WordPress Codex. I also highly recommend getting a subscription to wpmudev, which is a site specializing in plugins and assistance with wpmudev.
(By the way, my instructions here don't include info on installing either standalone WordPress or WordPress multisite, but you can easily find tutorials online. Also, if you're a total Wordpress newbie, note that you download the self-hosted WordPress from WordPress.org, not WordPress.com. You won't be able to use this tool with WordPress.com. You'll need to have your own space on a web host.)
The tool doesn't have any pre-built CSS styles for the DITA classes that will be in your content. However, if you use the styles I've included here for your theme, it will look decent.
You don't need to do anything special in order for your images to import. They are pulled in to your media library when you import content.
Install the DITA Import tool
Download and install the DITA import tool
- Download the ditaimporter.zip file here and unzip it.
- Upload the ditahelp.php file to wp-content/ditaimporter in your WordPress
file structure. (Create the ditaimporter
You can actually store this ditahelp.php file in other directories, but keeping it in wp-content is pretty safe because you never overwrite your wp-content folder when you update WordPress.
- In a text file, create a new document and insert the following:
- Save the file as dita-import-helper.php.
- In your wp-content directory in WordPress, create a new directory called mu-plugins.
- Upload the dita-import-helper.php file into that
mu-plugins stands for "must-use" plugins. mu-plugins work like regular plugins, except that they remain active even if you switch your theme or deactivate all your plugins. (You will never be prompted to update a must-use plugin, and they will still work even if you deactivate all your regular plugins.)
Set up your ditamap
In your DITA Map, structure your sections with the following hierarchy:
The title for each section (in this case, toc_content_reuse.dita) should go to an actual file, but you don't need to include any content in that file. By default, the DITA OT will populate that file with a summary of all the pages under it (their titles and short descriptions). When you import the content into WordPress, clicking this folder overview page won't show the page (unless you specifically configure that setting). However, the page will still appear in search results.
What you want to avoid is this structure:
topichead elements that don't point to any
real files. The navigation menu in WordPress needs an actual file to create
a hierarchy. (It's a best practice anyway to avoid
Configure XHTML transforms
You need to create a DITA Map XHTML transform. I'm assuming that you're using OxygenXML, but if you're not, you'll have to figure out how to make these changes in your authoring tool.
Duplicate and edit the DITA Map XHTML transform. For the args.index.toc parameter, change it from index to toc. (For some reason, the index file doesn't get imported if it has the name "index." I think this is because the permalink index interferes with the default file loaded in a root directory. )
Once you make this adjustment, generate an XHTML output.
Import DITA Content
- Create a new directory on your server called something like dita-staging.
- In OxygenXML, duplicate the DITA Map XHTML transform.
Edit the DITA Map XHTML transform. Click the
Parameters tab. For the
args.html.toc parameter, type toc.
If you leave the args.html.toc parameter at the default, the table of contents gets pushed into an index.html file, and the index.html file won't be imported. By changing the output file to toc, this file will be rendered as toc.html and will be included in the import into WordPress. The file's title will be your map file name (but its permalink will be "toc").
- Transform your DITA content into XHTML and upload it to the dita-staging directory on your server using a tool such as FTP.
- Find the absolute path to your dita-staging directory.
One way to find the absolute path is by creating a file with the following contents:
Save the file as "phpinfo.php" and upload it to your dita-staging directory.
View the phpinfo.php page in your browser. In the DOCUMENT_ROOT section, look at the absolute path. It will look something like this:
Copy this path -- you will need in an upcoming step.
- In WordPress, go to DITA help. and click
If you have a WP multisite, you have the option of selecting which site you
want to import content into. Select the site from the Application Area
selector. If you don't have WP Multisite, this drop-down doesn't appear.
Note that with WP multisite imports, you always import content from your base site's admin directory. That is, you go to
your_base_wp_url/wp-admindirectory to run the import tool, rather than
your_base_wp_url/subsite/wp-admindirectory. Even for a multisite network of hundreds of sites, you always import your content from the same base site.
- Leave the two check boxes ("Do you want to clear...?") unchecked. You use these to troubleshoot failed imports.
- In the DITA help directory field, insert the absolute path to your dita-staging folder that you copied earlier.
Click Import Files. You will be prompted to move through
about 5 import screens, and with each screen you click a button at the
bottom to continue moving through the import.
The import process is broken up into a series of steps to avoid memory issues (which probably become more apparent with imports involving thousands of files).
outputclass="external"added as an attribute.
Switch to 2012 theme
- In WordPress, go to .
Activate the 2012 theme.
If the 2012 theme isn't there, you may need to search for it and install it first. You can use other themes, of course. However, you will most likely have to do a lot of CSS work to get the theme in shape.
- In WordPress, go to .
Click the functions.php file and add the following
code at the absolute bottom of the file:
- Save the page.
ptags to signal paragraph breaks.
If your WordPress theme is suddenly blank, you probably added incorrect syntax in your functions.php file. You need to FTP into your wp-content/themes/twentytwelve directory, retrieve the functions.php file, and undo what you did. (It's probably better to edit files outside of the WordPress interface, but for small edits, it's more convenient to work within WordPress.)
Create a table of contents
There are probably a lot of ways to incorporate a menu in WordPress with this DITA Import, but the method described here is what I think is the best way. Basically you'll incorporate a jQuery accordion plugin that will read the file structure of the toc.html file and apply a navigation menu. You'll pull this page into a sidebar widget. Only the first import will require some configuration -- you'll need to get the page ID and insert it into the sidebar widget. With each subsequent import, you won't need to make any adjustments.
In the WordPress Dashboard, click the Pages section
and open your toc file.
Note: If you want to see more than 20 pages at a time, click Screen Options in the upper-right and change 20 to something like 100. Then click Apply.
The toc file is your table of contents, and its name is the same as your map file name. In other words, it won't say "toc." It will say something like "DITA QRG." If you can't find it, go to
/tocin your browser and edit the page that way.
- While the page is in edit mode, look in the URL to find the page's ID. It will be a number such as 5 or 46 or something. Remember it.
Install and activate the PHP Code Widget plugin.
If you don't know how to install and activate a plugin, it's pretty easy. Go to Install Now next to the right result. After it installs, click Activate.. In the Search Plugins field, type PHP Code Widget. Click
- Go to and add a PHP Code widget to your sidebar.
Add the following query to pull the toc page into a PHP Code widget:
In this example,
104is the page ID of the toc page. Update this ID to the actual ID of your page. The
accordion cleanclasses are essential for triggering the jQuery scripts.
- Remove the other widgets you don't want by dragging them away.
Set your homepage as your intro page
- Go to .
- In the Front Page Displays section, select a A Static page. Then select the static page you want, such as your help overview page.
- If you want your blog posts to appear on another page, create a new page on your blog called "News" or something. Then return to this section and select the News page for your Posts page.
Turn off revisions
You want to turn off revisions so that each time you import content, WordPress doesn't keep adding revisions. I'm not entirely sure, but I think if you keep adding revisions, you soon begin to see a ton of guid updates for each revision in the system. At any rate, you don't want to unnecessarily increase the size of your database anyway.
It's much better if you simply turn off the revision feature in WordPress. You aren't storing your content here anyway. You're uploading it and overwriting it with each upload. Your source DITA content should be stored in a version control repository such as github, subversion, or mercurial.
Get the latest styles
When you transform DITA into HTML, the OT inserts a lot of classes into the various elements. This allows you to style your content through a CSS file that addresses those classes. I've added a number of classes to both the Twenty Twelve theme and the jQuery navigation plugin that correspond with these classes.
Because I'm still tweaking the styles of this site, just grab the latest styles from my wpdita site and copy them into your site's stylesheets.
To get the latest styles:
- Go to https://idratherbewriting.com/wpdita/, right-click, and choose View Page Source (this is the option in Chrome; it may vary for other browsers). Search for styles.css and then click it.
Search for tom's styles. Copy everything below this point.
These are all the styles I've customized. Unfortunately, I haven't organized them very well. I mostly just add them as needed as I go along...
In your WordPress site, go to
.By default, the style.css file should load in your theme editor.
Scroll to the bottom, insert the "tom's styles" content that you
copied, and then save the file.
You could also create a child theme, but since the WP 2012 theme isn't something that you'll probably be updating regularly apart from the custom theme, there's little reason to do this.
Grab the styles for the navigation skin ("clean.css") by following the same process you followed to get to styles.css. Copy the files in clean.css.
Unfortunately you can't edit clean.css by browsing to it within your WordPress theme editor.
- Using FTP, go to wp-content/themes/twentytwelve/css/skins and download the clean.css file.
- Replace the contents with the styles you copied and reupload, overwriting the previous file.
Add an auto-toc on pages
- Add the Table of Contents Plus plugin.
- Go to .
Configure any settings you want, and then click Update Options.
You may be tempted to choose Top for the Position. However, note that if you choose this, the Table of Contents will jump past your introduction if the user clicks the first heading in the content menu. That's why it's probably better to choose Before first heading (default). This is the same style that Wikipedia uses.
About Tom Johnson
I'm a technical writer / API doc specialist based in the Seattle area. In this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, visual communication, information architecture, writing techniques, plain language, tech comm careers, and more. Check out my API documentation if you're looking for more info about that. If you're a technical writer and want to keep on top of the latest trends in the field, be sure to subscribe to email updates below. You can also learn more about me or contact me. Finally, note that the opinions I express on my blog are my own points of view, not that of my employer.