Search results

DITA: Import DITA's XHTML Output into WordPress

The DITA XHTML Import tool allows you to import DITA-produced XHTML content into a WordPress site or multisite.

This tool is a modification from the original tool created by Mike Little here. I have made a few modifications with the way links and images are handled. Additionally, I have unique installation instructions here. (Note that this is not a plugin -- it's a file you add to your WordPress instance that allows you to import XHTML files from DITA. (My modified file is here.)

Some notes about the DITA Import tool

What it imports

The DITA XHTML import tool does not import raw DITA. Instead, it takes the XHTML output from DITA and imports it into WordPress. It works similarly to a regular HTML import tool, such as HTML Import 2, except that this tool preserves the hierarchy (but not the order) of parent-child pages as expressed in your ditamap table of contents file, among other things.

Despite the lack of sibling order, this turns out not to be a problem at all, because rather than trying to display pages by virtue of their page order in WordPress, we instead display them using the index file that is included in the DITA XHTML navigation (more details are below).

What's cool about this tool, and what distinguishes it from other import tools, is that if you reimport your DITA XHTML content, the previous pages aren't duplicated or recreated. Instead, only a new version of the page is created, and any comments or other histories about the page are preserved.

Also, you don't have to reimport your entire help set each time you want to update your content. You can just update the pages that have changed. However, even importing one page will require the importer to look at all the other pages in WordPress. The importer looks to see if any of the existing files have permalinks that match the file names of the content you're importing.

The globally unique identifier, or guid, between incoming files and existing WordPress pages is the permalink. If the imported content's file name matches an existing page's permalink, the imported file creates a new version of that existing page. If the file name doesn't match the existing permalink, a new page is created.

Deleting pages

If you delete a page in your DITA source that you had previously imported into WordPress, the importer won't know that the file should be deleted -- you'll need to manually delete the legacy page from WordPress. There isn't any kind of import manifest file that looks to see if WordPress has any pages it shouldn't. This is because you can have both DITA-imported pages and native pages created directly from WordPress living in the same directory.

If you change your DITA file names and reimport your content, the newly imported files won't match any WordPress pages with permalinks corresponding to the previous DITA file name, hence new pages will be created/duplicated. As a best practice, avoid changing file names or permalinks.

Navigation

Even though the parent-child hierarchy is preserved, the order of siblings is not preserved. This makes it difficult to automate menus using WordPress' built-in menu system. However, don't fear! I've overcome the menu issue with some clever scripting and CSS.

Basically, by following the steps in the sections below, you'll identify the table of contents file, pull it into a widget, and add a class at the top. Some jQuery scripts will trigger based on the class, and when you view the site in your browser, you'll have a sweet-looking navigation bar.

Links

In order for external links to be absolute instead of relative, you must add outputclass="external" on the link in the DITA source file. If you look in the ditaimporter.php file, you'll see my hack for making the links work. It looks like this:

       $cls = $a-> getAttribute('class'</span>);

	$href = $a-> getAttribute('href'</span>);
	if</strong> (!empty</strong>($href) && ( ($cls == 'xref'</span> && $cls == 'link'</span>) || ($cls == 'xref'</span>) || ($cls == 'link'</span>) || ($cls == null ) ) ) {
                $href = $this->HELP_ROOT_PATH . '/'</span> . $this-> processFilenameToId($href); <em class="hl-comment" style="color:#006400">// hard coded -- BAD</em>
                $a-> setAttribute('href'</span>, $href);
	}

	elseif</strong> (!empty</strong>($href) && ($cls == 'external'</span> && $cls == 'xref'</span>) ) {
		  $href = $this->processFilenameToId($href); <em class="hl-comment" style="color:#006400">// hard coded -- BAD</em>
                $a-> setAttribute('href'</span>, $href);
	}
                

This code says if the class is xref and link, or xref, or link, or null, then process the link as a relative link. But if it has a class of external, then process as an absolute link. If you don't put outputclass="external" as an attribute on an external link, then the importer will render it as a relative link, and the path won't be right.

By the way, as far as I can tell, these classes are how DITA's OT renders link classes. But if it turns out that the DITA OT adds other classes to internal links, let me know and I'll fix the logic.

Paragraph breaks

When the DITA content imports into WordPress, there will be a lot of blank spaces. You must disable WordPress' autoparagraphing function (wpautop) in order to avoid odd line breaks in your document, since WordPress's editor treats line breaks as new paragraphs. Once you disable wpautop, you will need to add paragraph tags to signal new paragraphs in your content.

This shouldn't be a problem if you are importing your content into WordPress. However, if you have a lot of pages that you're authoring natively in WordPress, which co-exist side-by-side your imported DITA-authored pages, then you might want to implement a plugin such as wpautop control, which lets you turn on wpautop on a page-by-page basis.

WP Multisite versus standalone WP

You can use this DITA import tool with WordPress multisite, not just with standalone WordPress sites. In many cases, it probably makes a lot more sense to use WordPress multisite. If you have just one output, then you're probably not using DITA in the first place. On the other hand, if you have numerous WordPress sites, because you're single sourcing your content into various channels, it makes more sense to publish each output as a separate site in a WordPress network that you manage centrally.

WordPress multisite allows you to maintain the theme and plugins across your network in one place rather than making updates to each site individually. That is, all sites in the network can refer to the same theme, plugins, and WP core files. When you need to make updates, you only have to do it once even if you have hundreds of sites in your network. Even if you have just two sites, you definitely want to use WordPress Multisite. What's cool is that both standalone Wordpress and WordPress multisite run off the same core code base.

For instructions on using WP multisite, see Create a network on the WordPress Codex. I also highly recommend getting a subscription to wpmudev, which is a site specializing in plugins and assistance with wpmudev.

(By the way, my instructions here don't include info on installing either standalone WordPress or WordPress multisite, but you can easily find tutorials online. Also, if you're a total Wordpress newbie, note that you download the self-hosted WordPress from WordPress.org, not WordPress.com. You won't be able to use this tool with WordPress.com. You'll need to have your own space on a web host.)

Styles

The tool doesn't have any pre-built CSS styles for the DITA classes that will be in your content. However, if you use the styles I've included here for your theme, it will look decent.

Images

You don't need to do anything special in order for your images to import. They are pulled in to your media library when you import content.

Install the DITA Import tool

This section covers the technical steps needed to get your site up and running with the DITA import tool. Granted, there does seem like a lot of steps, but remember that you're configuring the tool to work with an entire website. The import process itself requires just a few clicks.

Download and install the DITA import tool

  1. Download the ditaimporter.zip file here and unzip it.
  2. Upload the ditahelp.php file to wp-content/ditaimporter in your WordPress file structure. (Create the ditaimporter folder.)

    You can actually store this ditahelp.php file in other directories, but keeping it in wp-content is pretty safe because you never overwrite your wp-content folder when you update WordPress.
  3. In a text file, create a new document and insert the following:
    <?php</span> include_once</strong>(ABSPATH . 'wp-admin/includes/import.php'</span>);
    require_once</strong>(ABSPATH . 'wp-content/ditaimporter/ditahelp.php'</span>);
                    

  4. Save the file as dita-import-helper.php.
  5. In your wp-content directory in WordPress, create a new directory called mu-plugins.
  6. Upload the dita-import-helper.php file into that mu-plugins directory.

    mu-plugins stands for "must-use" plugins. mu-plugins work like regular plugins, except that they remain active even if you switch your theme or deactivate all your plugins. (You will never be prompted to update a must-use plugin, and they will still work even if you deactivate all your regular plugins.)

If the installation worked correctly, you should see a DITA help option when you go to Tools > Import in your WordPress administrative interface. (If you don't, check your references and folder structures.)

Set up your ditamap

In your DITA Map, structure your sections with the following hierarchy:

   <topicref href="toc_content_reuse.dita">
   <topicref href="conditional_profiling.dita"/>
   <topicref href="conref.dita"/>
   <topicref href="ditaval.dita"/>
  </topicref>

The title for each section (in this case, toc_content_reuse.dita) should go to an actual file, but you don't need to include any content in that file. By default, the DITA OT will populate that file with a summary of all the pages under it (their titles and short descriptions). When you import the content into WordPress, clicking this folder overview page won't show the page (unless you specifically configure that setting). However, the page will still appear in search results.

What you want to avoid is this structure:

  <topichead navtitle="Content Re-use">
   <topicref href="conditional_profiling.dita"/>
   <topicref href="conref.dita"/>
   <topicref href="ditaval.dita"/>
  </topichead>                    

Don't use topichead elements that don't point to any real files. The navigation menu in WordPress needs an actual file to create a hierarchy. (It's a best practice anyway to avoid topichead elements.)

Configure XHTML transforms

You need to create a DITA Map XHTML transform. I'm assuming that you're using OxygenXML, but if you're not, you'll have to figure out how to make these changes in your authoring tool.

Duplicate and edit the DITA Map XHTML transform. For the args.index.toc parameter, change it from index to toc. (For some reason, the index file doesn't get imported if it has the name "index." I think this is because the permalink index interferes with the default file loaded in a root directory. )

Once you make this adjustment, generate an XHTML output.

Import DITA Content

To import your DITA-produced XHTML content into WordPress, you upload the HTML files into a directory on your server and then pull the files into WordPress from there.
  1. Create a new directory on your server called something like dita-staging.
  2. In OxygenXML, duplicate the DITA Map XHTML transform.
  3. Edit the DITA Map XHTML transform. Click the Parameters tab. For the args.html.toc parameter, type toc.

    If you leave the args.html.toc parameter at the default, the table of contents gets pushed into an index.html file, and the index.html file won't be imported. By changing the output file to toc, this file will be rendered as toc.html and will be included in the import into WordPress. The file's title will be your map file name (but its permalink will be "toc").
  4. Transform your DITA content into XHTML and upload it to the dita-staging directory on your server using a tool such as FTP.
  5. Find the absolute path to your dita-staging directory.

    One way to find the absolute path is by creating a file with the following contents:

    <?php phpinfo() ?>   

    Save the file as "phpinfo.php" and upload it to your dita-staging directory.

    View the phpinfo.php page in your browser. In the DOCUMENT_ROOT section, look at the absolute path. It will look something like this:

    /home2/idrathe1/public_html/

    Copy this path -- you will need in an upcoming step.

  6. In WordPress, go to Tools > Import and click DITA help.
  7. If you have a WP multisite, you have the option of selecting which site you want to import content into. Select the site from the Application Area selector. If you don't have WP Multisite, this drop-down doesn't appear.

    Note that with WP multisite imports, you always import content from your base site's admin directory. That is, you go to your_base_wp_url/wp-admin directory to run the import tool, rather than your_base_wp_url/subsite/wp-admin directory. Even for a multisite network of hundreds of sites, you always import your content from the same base site.
  8. Leave the two check boxes ("Do you want to clear...?") unchecked. You use these to troubleshoot failed imports.
  9. In the DITA help directory field, insert the absolute path to your dita-staging folder that you copied earlier.
  10. Click Import Files. You will be prompted to move through about 5 import screens, and with each screen you click a button at the bottom to continue moving through the import.

    The import process is broken up into a series of steps to avoid memory issues (which probably become more apparent with imports involving thousands of files).
Look in the pages in your WordPress site, and you'll see that the content has been imported. Check your links and images to make sure they look right. If the links pointing to external pages aren't rendered correctly, look in your DITA source and make sure the links have outputclass="external" added as an attribute.

Switch to 2012 theme

The 2014 theme is really awful (unless your favorite color is black). Documentation looks a lot better with the more minimalist 2012 theme. I've customized the 2012 theme a bit to make documentation look decent.
  1. In WordPress, go to Appearance > Themes.
  2. Activate the 2012 theme.

    If the 2012 theme isn't there, you may need to search for it and install it first. You can use other themes, of course. However, you will most likely have to do a lot of CSS work to get the theme in shape.

Disable wpautop

You'll need to disable wpautop (WordPress auto-paragraphing) to avoid odd line breaks in your content. (If you want to disable it for some pages and enable it for others, you can install a plugin called "wpautop control," but to simplify things, I recommending just turning off wpautop entirely for the site.)
  1. In WordPress, go to Appearance > Editor.
  2. Click the functions.php file and add the following code at the absolute bottom of the file:

    remove_filter( 'the_content', 'wpautop' );
    remove_filter( 'the_excerpt', 'wpautop' );                     

  3. Save the page.
Now when you write a page or post, you must include p tags to signal paragraph breaks.

If your WordPress theme is suddenly blank, you probably added incorrect syntax in your functions.php file. You need to FTP into your wp-content/themes/twentytwelve directory, retrieve the functions.php file, and undo what you did. (It's probably better to edit files outside of the WordPress interface, but for small edits, it's more convenient to work within WordPress.)

Create a basic menu

By default, all pages will appear in the WP menus. You don't want this, so you have to create a basic menu to replace the default behavior.
  1. Go to Appearance > Menus.
  2. Type a name for the menu, such as Home.

  3. Click Create Menu.

    You can add pages to this menu if you want, such as a home or contact page.
  4. Select the Primary Menu location if you're using the 2012 theme.

    Different themes have different menu locations. If you're using a different theme, choose the location that corresponds to your primary menu.
  5. Click Save Menu.

Create a table of contents

Remember that even though the DITA XHTML Import tool preserves the parent-child hierarchy, it doesn't preserve the order among sibling pages. The best strategy for creating navigation is to use the the generated index file (which is the default table of contents created for the DITA XHTML output) as your TOC (toc.html).

There are probably a lot of ways to incorporate a menu in WordPress with this DITA Import, but the method described here is what I think is the best way. Basically you'll incorporate a jQuery accordion plugin that will read the file structure of the toc.html file and apply a navigation menu. You'll pull this page into a sidebar widget. Only the first import will require some configuration -- you'll need to get the page ID and insert it into the sidebar widget. With each subsequent import, you won't need to make any adjustments.

  1. In the WordPress Dashboard, click the Pages section and open your toc file.

    The toc file is your table of contents, and its name is the same as your map file name. In other words, it won't say "toc." It will say something like "DITA QRG." If you can't find it, go to /toc in your browser and edit the page that way.

  2. While the page is in edit mode, look in the URL to find the page's ID. It will be a number such as 5 or 46 or something. Remember it.
  3. Install and activate the PHP Code Widget plugin.

    If you don't know how to install and activate a plugin, it's pretty easy. Go to Plugins > Add New. In the Search Plugins field, type PHP Code Widget. Click Install Now next to the right result. After it installs, click Activate.
  4. Go to Appearance > Widgets and add a PHP Code widget to your sidebar.
  5. Add the following query to pull the toc page into a PHP Code widget:

    <div class="accordion clean">
    <?php
    $args = array</strong>(
    'page_id' => 104,
    );
    
    $the_query = new WP_Query( $args );
    
    // The Loop
    if ( $the_query->have_posts() ) :
    while ( $the_query->have_posts() ) : $the_query->the_post();
      the_content();
    endwhile;
    endif;
    
    // Reset Post Data
    wp_reset_postdata();
    
    ?>
    </div>
    </div>      

    In this example, 104 is the page ID of the toc page. Update this ID to the actual ID of your page. The tocPage ID accordion clean classes are essential for triggering the jQuery scripts.

  6. Remove the other widgets you don't want by dragging them away.
Look at your site. The menu should be pulled in to the sidebar. No doubt you'll want to make the navigation collapsible. Fortunately, jQuery makes it pretty easy to do this. See the following section for details.

Make the navigation collapsible

In this section, you'll incorporate a jQuery plugin called jQuery Vertical Accordion Menu plugin in order to get an accordion menu with a cookie (remembered state) for your navigation.
  1. Go to JQUERY VERTICAL ACCORDION MENU PLUGIN on Design Chemical Premium Plugins. In the upper-right (almost within the header itself), click Download.

  2. FTP into your wp-content/themes/twentytwelve folder and upload the Javascript and CSS files that you downloaded. You should add the scripts to a folder called "js" and the styles to a folder called "css".
  3. Go to Appearance > Editor and edit your header.php file. Before the closing tag, add the following references to the stylesheets and scripts:

    <!-- accordion start -->
    <link href="<?php echo get_template_directory_uri(); ?>/css/dcaccordion.css" rel="stylesheet" type="text/css" />
    <script type='text/javascript' src='<?php echo get_template_directory_uri(); ?>/js/jquery.cookie.js'></script>
    <script type='text/javascript' src='<?php echo get_template_directory_uri(); ?>/js/jquery.hoverIntent.minified.js'></script>
    <script type='text/javascript' src='<?php echo get_template_directory_uri(); ?>/js/jquery.dcjqaccordion.2.7.min.js'></script>
    <script>
    jQuery(document).ready(function($) {
    
     jQuery( "#tocPage ul:nth-child(1)" ).attr({
      class: "accordion"
    });
    
        jQuery('#tocPage ul:nth-child(1)').dcAccordion(
    {
    eventType: 'click',
    autoClose: false,
    saveState: true,
    disableLink: true,
    speed: 'fast'
    }
    );
    });
    </script>
    <link href="<?php echo get_template_directory_uri(); ?>/css/skins/clean.css" rel="stylesheet" type="text/css" />
    <!-- accordion end-->
                            

    If you read about the jQuery plugin, you'll see that I'm not using #accordion to trigger the plugin. Why not? Well, when the DITA OT converts the ditamap to an index file, it strips away all classes. In other tutorials (such as the DITA: Add an expanding side pane (Sidr)), I explained how you can edit certain source files in OxygenXML to allow classes to pass through. Unfortunately, these same edits don't affect the classes in the ditamap. No worries, though. We select the right element through a child combinator selection (#tocPage ul:nth-child(1)). This selects the first list that appears after the #tocPage ID.

    Also note that if you read about the options available in the jQuery plugin, you'll see that I've specified some here:

    eventType: 'click',
    autoClose: true,
    saveState: true,
    disableLink: true,
    speed: 'fast'                      

    Of note here are the saveState and autoClose options. The saveState option means when a user opens a menu and reloads the page, or actually goes to the page (meaning the site reloads), that menu stays open. You want this saveState option set to true so the user knows where he or she is at in the nav menu. (The jquery.cookie.js file allows for the saveState option to work.)

    The autoClose option determines whether the other sections close when you open another section (the accordion feature). Traditional accordions close other sections when you navigate them. I think leaving each section open makes the TOC menu look too big. You can read the Design Chemic documentation for the plugin to see what the other options control.

Set your homepage as your intro page

By default, your homepage in WordPress will be your latest posts. You can change this to be a specific page.
  1. Go to Settings > Reading.
  2. In the Front Page Displays section, select a A Static page. Then select the static page you want, such as your help overview page.
  3. If you want your blog posts to appear on another page, create a new page on your blog called "News" or something. Then return to this section and select the News page for your Posts page.

Turn off revisions

You want to turn off revisions so that each time you import content, WordPress doesn't keep adding revisions. I'm not entirely sure, but I think if you keep adding revisions, you soon begin to see a ton of guid updates for each revision in the system. At any rate, you don't want to unnecessarily increase the size of your database anyway.

It's much better if you simply turn off the revision feature in WordPress. You aren't storing your content here anyway. You're uploading it and overwriting it with each upload. Your source DITA content should be stored in a version control repository such as github, subversion, or mercurial.

To turn off revisions in WordPress, add this in your wp-config.php file (which is in your WordPress root directory).
define( 'WP_POST_REVISIONS', 0 );                

Get the latest styles

When you transform DITA into HTML, the OT inserts a lot of classes into the various elements. This allows you to style your content through a CSS file that addresses those classes. I've added a number of classes to both the Twenty Twelve theme and the jQuery navigation plugin that correspond with these classes.

Because I'm still tweaking the styles of this site, just grab the latest styles from my wpdita site and copy them into your site's stylesheets.

To get the latest styles:

  1. Go to https://idratherbewriting.com/wpdita/, right-click, and choose View Page Source (this is the option in Chrome; it may vary for other browsers). Search for styles.css and then click it.

  2. Search for tom's styles. Copy everything below this point.

    These are all the styles I've customized. Unfortunately, I haven't organized them very well. I mostly just add them as needed as I go along...
  3. In your WordPress site, go to Appearance > Editor.

    By default, the style.css file should load in your theme editor.
  4. Scroll to the bottom, insert the "tom's styles" content that you copied, and then save the file.

    You could also create a child theme, but since the WP 2012 theme isn't something that you'll probably be updating regularly apart from the custom theme, there's little reason to do this.
  5. Grab the styles for the navigation skin ("clean.css") by following the same process you followed to get to styles.css. Copy the files in clean.css.

    Unfortunately you can't edit clean.css by browsing to it within your WordPress theme editor.
  6. Using FTP, go to wp-content/themes/twentytwelve/css/skins and download the clean.css file.
  7. Replace the contents with the styles you copied and reupload, overwriting the previous file.

Add an auto-toc on pages

If you start to nest tasks or concepts in each other, your pages will get long. It's a standard practice to put a Table of Contents menu at the top of a page to allow users to preview what the page contains and to quickly jump to any heading. You can add this functionality easily through the Table of Contents Plus plugin.
  1. Add the Table of Contents Plus plugin.
  2. Go to Settings > TOC+.
  3. Configure any settings you want, and then click Update Options.

    You may be tempted to choose Top for the Position. However, note that if you choose this, the Table of Contents will jump past your introduction if the user clicks the first heading in the content menu. That's why it's probably better to choose Before first heading (default). This is the same style that Wikipedia uses.

About Tom Johnson

Tom Johnson

I'm an API technical writer based in the Seattle area. On this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, AI, information architecture, content strategy, writing processes, plain language, tech comm careers, and more. Check out my API documentation course if you're looking for more info about documenting APIs. Or see my posts on AI and AI course section for more on the latest in AI and tech comm.

If you're a technical writer and want to keep on top of the latest trends in the tech comm, be sure to subscribe to email updates below. You can also learn more about me or contact me. Finally, note that the opinions I express on my blog are my own points of view, not that of my employer.