DITA: Import DITA's XHTML Output into WordPress
This tool is a modification from the original tool created by Mike Little here. I have made a few modifications with the way links and images are handled. Additionally, I have unique installation instructions here. (Note that this is not a plugin -- it's a file you add to your WordPress instance that allows you to import XHTML files from DITA. (My modified file is here.)
Some notes about the DITA Import tool
What it imports
The DITA XHTML import tool does not import raw DITA. Instead, it takes the XHTML output from DITA and imports it into WordPress. It works similarly to a regular HTML import tool, such as HTML Import 2, except that this tool preserves the hierarchy (but not the order) of parent-child pages as expressed in your ditamap table of contents file, among other things.
Despite the lack of sibling order, this turns out not to be a problem at all, because rather than trying to display pages by virtue of their page order in WordPress, we instead display them using the index file that is included in the DITA XHTML navigation (more details are below).
What's cool about this tool, and what distinguishes it from other import tools, is that if you reimport your DITA XHTML content, the previous pages aren't duplicated or recreated. Instead, only a new version of the page is created, and any comments or other histories about the page are preserved.
Also, you don't have to reimport your entire help set each time you want to update your content. You can just update the pages that have changed. However, even importing one page will require the importer to look at all the other pages in WordPress. The importer looks to see if any of the existing files have permalinks that match the file names of the content you're importing.
The globally unique identifier, or guid, between incoming files and existing WordPress pages is the permalink. If the imported content's file name matches an existing page's permalink, the imported file creates a new version of that existing page. If the file name doesn't match the existing permalink, a new page is created.
Deleting pages
If you delete a page in your DITA source that you had previously imported into WordPress, the importer won't know that the file should be deleted -- you'll need to manually delete the legacy page from WordPress. There isn't any kind of import manifest file that looks to see if WordPress has any pages it shouldn't. This is because you can have both DITA-imported pages and native pages created directly from WordPress living in the same directory.
If you change your DITA file names and reimport your content, the newly imported files won't match any WordPress pages with permalinks corresponding to the previous DITA file name, hence new pages will be created/duplicated. As a best practice, avoid changing file names or permalinks.
Navigation
Even though the parent-child hierarchy is preserved, the order of siblings is not preserved. This makes it difficult to automate menus using WordPress' built-in menu system. However, don't fear! I've overcome the menu issue with some clever scripting and CSS.
Basically, by following the steps in the sections below, you'll identify the table of contents file, pull it into a widget, and add a class at the top. Some jQuery scripts will trigger based on the class, and when you view the site in your browser, you'll have a sweet-looking navigation bar.
Links
In order for external links to be absolute instead of relative, you must add
outputclass="external"
on the link in the DITA source file.
If you look in the ditaimporter.php file, you'll see my hack for making the
links work. It looks like this:
This code says if the class is xref
and link
,
or xref
, or link
, or null
, then
process the link as a relative link. But if it has a class of
external
, then process as an absolute link. If you don't put
outputclass="external"
as an attribute on an external link,
then the importer will render it as a relative link, and the path won't be
right.
By the way, as far as I can tell, these classes are how DITA's OT renders link classes. But if it turns out that the DITA OT adds other classes to internal links, let me know and I'll fix the logic.
Paragraph breaks
When the DITA content imports into WordPress, there will be a lot of blank spaces. You must disable WordPress' autoparagraphing function (wpautop) in order to avoid odd line breaks in your document, since WordPress's editor treats line breaks as new paragraphs. Once you disable wpautop, you will need to add paragraph tags to signal new paragraphs in your content.
This shouldn't be a problem if you are importing your content into WordPress. However, if you have a lot of pages that you're authoring natively in WordPress, which co-exist side-by-side your imported DITA-authored pages, then you might want to implement a plugin such as wpautop control, which lets you turn on wpautop on a page-by-page basis.
WP Multisite versus standalone WP
You can use this DITA import tool with WordPress multisite, not just with standalone WordPress sites. In many cases, it probably makes a lot more sense to use WordPress multisite. If you have just one output, then you're probably not using DITA in the first place. On the other hand, if you have numerous WordPress sites, because you're single sourcing your content into various channels, it makes more sense to publish each output as a separate site in a WordPress network that you manage centrally.
WordPress multisite allows you to maintain the theme and plugins across your network in one place rather than making updates to each site individually. That is, all sites in the network can refer to the same theme, plugins, and WP core files. When you need to make updates, you only have to do it once even if you have hundreds of sites in your network. Even if you have just two sites, you definitely want to use WordPress Multisite. What's cool is that both standalone Wordpress and WordPress multisite run off the same core code base.
For instructions on using WP multisite, see Create a network on the WordPress Codex. I also highly recommend getting a subscription to wpmudev, which is a site specializing in plugins and assistance with wpmudev.
(By the way, my instructions here don't include info on installing either standalone WordPress or WordPress multisite, but you can easily find tutorials online. Also, if you're a total Wordpress newbie, note that you download the self-hosted WordPress from WordPress.org, not WordPress.com. You won't be able to use this tool with WordPress.com. You'll need to have your own space on a web host.)
Styles
The tool doesn't have any pre-built CSS styles for the DITA classes that will be in your content. However, if you use the styles I've included here for your theme, it will look decent.
Images
You don't need to do anything special in order for your images to import. They are pulled in to your media library when you import content.
Install the DITA Import tool
Download and install the DITA import tool
Set up your ditamap
In your DITA Map, structure your sections with the following hierarchy:
The title for each section (in this case, toc_content_reuse.dita) should go to an actual file, but you don't need to include any content in that file. By default, the DITA OT will populate that file with a summary of all the pages under it (their titles and short descriptions). When you import the content into WordPress, clicking this folder overview page won't show the page (unless you specifically configure that setting). However, the page will still appear in search results.
What you want to avoid is this structure:
Don't use topichead
elements that don't point to any
real files. The navigation menu in WordPress needs an actual file to create
a hierarchy. (It's a best practice anyway to avoid topichead
elements.)
Configure XHTML transforms
You need to create a DITA Map XHTML transform. I'm assuming that you're using OxygenXML, but if you're not, you'll have to figure out how to make these changes in your authoring tool.
Duplicate and edit the DITA Map XHTML transform. For the args.index.toc parameter, change it from index to toc. (For some reason, the index file doesn't get imported if it has the name "index." I think this is because the permalink index interferes with the default file loaded in a root directory. )
Once you make this adjustment, generate an XHTML output.
Import DITA Content
outputclass="external"
added as an
attribute.Switch to 2012 theme
Activate permalinks
- Go to
- Click Post name.
- Click Save changes.
Test your permalinks to make sure they work. Go to your default Hello World page and look at the URL. If the URL doesn't say hello-world or something similarly readable, permalinks aren't working.
If permalinks aren't working, and you're on a Linux server, go to /etc/httpd/conf and edit your httpd.conf file. Edit AllowOverride None to AllowOverride All in the following two sections:
And here as well:
If you have to SSH via Terminal or command line to get at this file, you might enjoy the Pico editor rather than the VI editor, which requires knowledge of Linux commands to make simple edits.
Disable wpautop
p
tags to
signal paragraph breaks.
If your WordPress theme is suddenly blank, you probably added incorrect syntax in your functions.php file. You need to FTP into your wp-content/themes/twentytwelve directory, retrieve the functions.php file, and undo what you did. (It's probably better to edit files outside of the WordPress interface, but for small edits, it's more convenient to work within WordPress.)
Create a table of contents
There are probably a lot of ways to incorporate a menu in WordPress with this DITA Import, but the method described here is what I think is the best way. Basically you'll incorporate a jQuery accordion plugin that will read the file structure of the toc.html file and apply a navigation menu. You'll pull this page into a sidebar widget. Only the first import will require some configuration -- you'll need to get the page ID and insert it into the sidebar widget. With each subsequent import, you won't need to make any adjustments.
Set your homepage as your intro page
- Go to .
- In the Front Page Displays section, select a A Static page. Then select the static page you want, such as your help overview page.
- If you want your blog posts to appear on another page, create a new page on your blog called "News" or something. Then return to this section and select the News page for your Posts page.
Turn off revisions
You want to turn off revisions so that each time you import content, WordPress doesn't keep adding revisions. I'm not entirely sure, but I think if you keep adding revisions, you soon begin to see a ton of guid updates for each revision in the system. At any rate, you don't want to unnecessarily increase the size of your database anyway.
It's much better if you simply turn off the revision feature in WordPress. You aren't storing your content here anyway. You're uploading it and overwriting it with each upload. Your source DITA content should be stored in a version control repository such as github, subversion, or mercurial.
Get the latest styles
When you transform DITA into HTML, the OT inserts a lot of classes into the various elements. This allows you to style your content through a CSS file that addresses those classes. I've added a number of classes to both the Twenty Twelve theme and the jQuery navigation plugin that correspond with these classes.
Because I'm still tweaking the styles of this site, just grab the latest styles from my wpdita site and copy them into your site's stylesheets.
To get the latest styles:
Add an auto-toc on pages
About Tom Johnson
I'm an API technical writer based in the Seattle area. On this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, AI, information architecture, content strategy, writing processes, plain language, tech comm careers, and more. Check out my API documentation course if you're looking for more info about documenting APIs. Or see my posts on AI and AI course section for more on the latest in AI and tech comm.
If you're a technical writer and want to keep on top of the latest trends in the tech comm, be sure to subscribe to email updates below. You can also learn more about me or contact me. Finally, note that the opinions I express on my blog are my own points of view, not that of my employer.