Flat file systems versus database models for help
In Goodbye WordPress: 2014 Will Be the Year of the Flat-File CMS, Jeremiah Shoaf argues for the upcoming dominance of flat-file systems over database-driven sites such as WordPress.
I found the post extremely interesting because I've been moving back and forth between flat-file systems and database systems with my DITA publishing strategies. There are compelling arguments for each side.
On one hand, databases provide more capability to delivery dynamic content to different users who come to the same site. With flat-file systems, it's much harder to dynamically serve up the content, so you may end up with numerous outputs if your content warrants different information for different audiences. On the other hand, flat-file systems are much easier to deploy. They fit into the developer's workflow, can easily be added to any server, and almost never pose security or other restrictions on access.
I have so much to say on this topic. I'll try to maintain some order to my thoughts.
What are flat file systems created with static site generators?
First, by "flat-file systems," we're talking about regular websites with individual pages, as in the days before WordPress, Drupal, Joomla, and other database-driven web CMS's spring up. But now there are scores of static file generators popping up: Ghost, HTMLy, Jekyll, Wintersmith, Nano, Pico, Kirby, PhileCMS, Get Simple CMS, Octopress, Docpad, Statamic, Flatpress, and many, many more.
The static file generator works somewhat like a help authoring tool works -- it compiles all the files into a website output.
Static site generators (such as Jekyll) are extremely popular in API doc spaces because these flat files can travel with code and live in version control repositories, such as Github. Developers accustomed to writing code in flat files can open their same text editor, jot down some information using Markdown syntax, and initiate site builds with a few words in the command-line. Static site generators fit into their workflow and world.
Proponents of static site generators revel in simplicity. They say WordPress has become too generalized, with too many features, heavy database calls, infrastructure, unnecessary components, and more. In contrast, static flat files are quick and easy, secure, lightweight, and efficient.
On the other hand, database-model CMS's like WordPress can do things that static site generators can't. In a few minutes you can integrate plugins that do advanced things (for example, which restrict access, or assign roles to users, or roll-up site-wide comments). There are literally 30,000+ plugins already available. You can find a gorgeous theme, install it, and configure it in about half an hour. Over the course of a weekend, you can have a fully developed, modern-looking site (or even a social network, or forum, or multimedia site) without being a web developer.
While static site generators are simple, you won't find the same number of plugins and themes at your ready disposal to crank out that awesome website over the weekend. Instead, it will be simple to implement the default site and maybe several different themes, adding a few extensions. But then anything more sophisticated will probably require you to "roll your own," as they say.
Further, with each update to your site, you'll need to regenerate and redeploy the site. You can trigger the rebuilding through some scripts to make the process more automated, but this may require some advanced skills.
Which is better?
Which model is a better fit for tech comm -- flat files compiled by a static site generator, or a database-driven site with dynamic content? Honestly, there are compelling arguments for each case.
Let's consider a help authoring output on par with a static site generator, because this is pretty much what it is. The webhelp output from OxygenXML may not promote itself as a cool new static file generator similar to Ghost or Docpad, but these little "websites in a box" that are rebuilt with each compile or transform are pretty much the same thing.
Reasons to use a static site generator
Here are some reasons to use a static site generator:
Authentication. You don't have to worry about configuring authentication for your users (if this is something you need). You just upload your static files onto the same server as your application and authenticate users via the same authentication as the site you documented. This is a huge feature, since a lot of documentation written by professional technical writers isn't published freely on the web, but rather is restricted to a specific group of users.
No need for robust architecture. You don't need to have an Apache, MySQL, PHP architecture in place and all the settings just right (e.g., the httpd.conf's rewrite permissions for permalinks) to make it work. You just upload your static files and you're done. This is another HUGE benefit.
In most places I've worked, setting up the Apache, MySQL, and PHP infrastructure to support WordPress has been nearly impossible. If you're just publishing to the web without secure data, sure, you can get the infrastructure set up in minutes from most web hosts. But this isn't how most companies work.
With most companies, they want to put their proprietary, confidential help material onto their own servers. This means someone has to provision the server and manage it. Not too many companies are set up with a LAMP architecture optimized for WordPress out of the box, so it may require you to request access, get it set up, run security scans, open firewall ports, and more. You may have to use SSH to interact with the server and make edits to configuration files using a command line editor like VIM.
So yeah, the setup is a kind of a big deal. If your documentation can live freely and openly on the web, and you can pay for a third-party web host to manage the hosting, then you can bypass all of this do-it-yourself server management and be up and running in an afternoon.
Reasons to use a database model like WordPress
Here are some compelling reasons to use a database model like WordPress:
Single site instead of discrete outputs. With OxygenXML and most HATs (Robohelp, Flare, etc.), the default model is to filter your content based on attributes or conditional tags. You may have 10 different outputs based on various audiences, programming languages, products, or more. You then end up serving up these files in different directories.
Managing all of these outputs can be a bear. If you have 10 outputs instead of 1, it takes 10 times as long to generate the files, 10 times as long to upload them, and is more difficult to get reviewers to look through 10 different sets of material.
The cool thing about databases is that you can dynamically serve the content based on a specific login or other condition. For example, suppose someone with X permissions logs in. You can set that only X content should appear to the user, while Y content should not. You can therefore keep all of this content on a single site, rather than spread across 10 different sites that you have to maintain.
If your users aren't logging in, you can dynamically serve content through other triggers. Although I'm not entirely sure how to do this technically, you could probably enable users to select an option that would trigger a filter for the content in the same way.
Granted, with a static site, you can implement some jQuery plugins that will switch styles and hide or show content based on "display:none" settings with styelsheets, but it's more of a trick rather than a solid way of dynamically filtering content.
Publishing speed. When you want to make an update to a page generated in the static file model, you have to re-generate the output and upload it all again. It's a slow model. As Mark Baker puts it, DITA XML has a "wedding-day" style publishing process, where your publishing is intended as a grandiose event. In contrast, with WordPress, Mediawiki, or another online CMS, you just edit a page and hit Save. Voila, the update appears. It's a 10-second process versus a 10-minute process.
If you follow the DocOps model, there's a need for continuous publishing. You need to address lots of little updates based on constant support incidents and other feedback. You need the ability to quickly and easily edit your content based on a barrage of support incidents and calls.
With a database model site, you don't need to republish the entire site just to make one edit. This is largely why Movable Type, an early competitor to WordPress, failed. When you wanted to make a change, it republished your entire site using a Perl script.
Out of the box everything. Another great point in WordPress's favor is the out-of-the-box everything model. Need a plugin? You can probably get the functionality you need immediately (more or less) as well as the theme. For technical writers without front-end dev skills, this is a huge benefit.
Given that users simply expect sites to be interactive, professional, ajax-enabled, etc., using a platform like WordPress can get you there quickly and easily.
Combine KB articles with documentation. Another advantage of WordPress is the ability to easily combine knowledge base articles written by support engineers with documentation written by technical writers. You can give support engineers access and the ability to write without expecting them to craft valid DITA docs and integrate them into your repository. You can even create different post types for the different types of content (KB articles, release notes, how-to, white papers, blogs, etc.).
There simply aren't inexpensive CMS platforms that integrate with support centers. You can use MindTouch and Confluence to integrate with Salesforce, but both of those platforms will cost thousands of dollars.
Granted, if you're using a static file system, you could have your support engineers write files in Markdown and add them to your source control repository using Git or Mercurial, so this model works. But I'm betting people would prefer to log into a CMS and add an article, tagging or categorizing it appropriately, and filling in forms and selecting fields in a more structured way.
Hiring custom development. Another major benefit of WordPress is the high number of developers available to do custom development inexpensively. You can hire out tasks to Romania or India easily, choosing among hundreds of web development agencies.
In contrast, try hiring out a task for XML development (perhaps for a DITA customization). I bet you can't even find a site that lists XML developers. If you do find a company that does XML development, most likely they won't do small projects.
Import workflows versus native authoring
I developed extensive instructions for importing DITA into WordPress, but I'm not sure that importing DITA is the way to go. It might be simply faster and easier to author natively in WordPress.
If you author natively, the only way to replicate the conditional processing is probably through some proprietary WordPress tags (short codes), which are little brackets with words in them, such as
[admin-user] stuff here [/admin-user].
This means that your content is going to be somewhat locked into WordPress, and WordPress will be the only endpoint (except a page you can print, perhaps). Of course, it will be HTML, which is an industry standard, but there's a lot of emphasis in DITA about maintaining content in a language agnostic to the platform, so that you can easily switch from one platform to another, as well as integrate with other systems.
To be perfectly honest, as Mark Baker has pointed out, your content will likely have a longer shelf-life in WordPress than DITA. About 25% of the websites online use WordPress, so WordPress is going to be around for a while.
If you use WordPress, I think it's best to author natively in it (as WordPress was designed), rather than importing content in via a continuous workflow. This would revamp a lot of authoring and publishing models, though. I've written previously on what a pain it is to write documentation in a web-based CMS. Find and replace, printed PDF, quickly creating and editing and deleting pages, maintaining online connectivity, dealing with page loading speed, etc., all become issues with a native authoring model.
Most difficult is dealing with scenarios where you are doing extreme single sourcing. Suppose you push your product into different channels, rebranding the content extensively. How do you push into two different outputs form a single source while using a WordPress model, authoring natively in WordPress? I mentioned a plugin called ThreeWP Broadcast to do it and a process for online single sourcing, but this may be too nonstandard for people to adopt with confidence.
Overall, although I really like WordPress, if I can't easily install it and authenticate users, I can't really use it. On the other hand, once the platform is set up, it might make it much easier to publish multiple types of content and keep it all updated. Right now I'm still using OxygenXML, but the integration of KB articles with documentation might push me to evaluate the WordPress model more seriously.
What do you think is the better way to go? Flat files published with a static file generator (such as OxygenXML's webhelp) or a dynamic database model like WordPress?
About Tom Johnson
I'm a technical writer based in the Seattle area. In this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, visual communication, information architecture, writing techniques, plain language, tech comm careers, and more. Check out simplifying complexity and API documentation for some deep dives into these topics. If you're a technical writer and want to keep on top of the latest trends in the field, be sure to subscribe to email updates. You can also learn more about me or contact me.