Producing PDFs in DITA versus Jekyll
In this near final post in my series comparing DITA with Jekyll, I want to explore contrasting ways to produce PDFs.
I have other blog posts where I have stated how much I dislike PDFs with technical documentation. The main problem is that even though PDFs go out of date quickly, users hang onto them and expect the PDFs to be current, relevant, accurate. PDFs don't really fit into an agile software development model.
Nevertheless, PDFs do tend to have use cases now and then (such as product overviews for sales people), and I know that in order to have a solid documentation system, you have to fill this PDF requirement to some degree. As a result, I set about exploring ways to produce high-quality PDFs from Jekyll.
How to produce PDFs from DITA
Without question, DITA's strength is in producing PDFs. And yet, customizing these PDFs is not easy. You have to know XSL-F0 and other transform processes. I admit I never took the time to really try to learn them. I probably should have read through DITA for Print by Leigh White and took careful notes.
Producing a default PDF output from a DITA source using OxygenXML is pretty trivial. You just select PDF as one of your transformation scenarios. As I recall, OxygenXML does a good job in creating warnings for links not included or found in the build.
What I like about producing PDFs from OxygenXML is that see you can set up all of these transformations in your system and then just click a button and initiate multiple builds. It really is a powerful publishing tool particularly when it comes to PDF.
How to produce PDFs from Jekyll
Jekyll was designed as a web publishing tool, and it is primarily used by bloggers, web designers, developers, or by people who have other website needs. Very few people are using Jekyll to publish any kind of PDF. Therefore, there isn't much information on how to get PDF out of Jekyll.
Some Jekyll plugins let you create a PDF from an existing page, but these single-page PDFs are not the end goal for most technical writers. Most technical writers usually need to create a lengthy PDF that contains all the pages or a subset of pages in the help system, with features such as a table of contents, cross-references, running headers and footers, and other details.
If you really need a lengthy PDF from Jekyll, here's a way to do it. You buy a tool called Prince (~$500) that will transform HTML into PDF using CSS stylesheets. I played around with Prince this week, and I really like it. In fact, even if I were producing PDFs from DITA in OxygenXML, I would still use Prince because I can more easily use the existing stylesheet language (CSS) that I already know. (I believe the next version of OxygenXML will have Prince as an integration option.)
There's a strong argument for using CSS to style your PDFs: you can single source your styles.
If you're able to use the same stylesheet for both your website and your PDF, then you save yourself a lot of time, and your content is more consistent stylistically. (For all the enthusiasm that DITA proponents put forward with single sourcing, you never quite hear whether the styles used for both print and web outputs are single sourced.)
I have a post where I will reveal all the technical details of how to generate good-looking PDFs from Jekyll soon, but for now I will just mention the basic process.
Process for getting PDFs out of a Jekyll site
Here's the basic process for getting PDFs out of a Jekyll site.
First, you define a print layout for your web content. In the print layout, you strip away the sidebar, header, and any kind of other matter that you do not want included in the output. You could also hide these elements through a print stylesheet, but it's easier just to create a new layout and remove the elements from the layout altogether.
You then create a new web output that uses this print specific layout. In the frontmatter defaults section in the config file, you indicate that pages will use your new print page layout.
Before you build the site (based on your new config file), you create a text file containing a for loop that looks through your TOC data file and grabs all the pages for which the TOC entry has a print == true
property. You put this logic into a txt file (such as toc-list.txt) and make sure each TOC link appears on its new line.
You then take this list of pages (toc-list.txt) and feed it into Prince's command line interface input parameter. Now Prince uses the toc-list.txt file to get all the HTML pages and package them up into a single PDF.
You also create a specific print stylesheet with some special tags Prince uses to create counters (cross-references), headers and footers, and other print styling. Prince uses the stylesheets and layout associated with these web pages to create a PDF.
How does the result look? The PDFs look just as good as a PDF generated from DITA. In fact, the PDFs from Prince may look better than other PDFs because they will match your website's style.
With Prince, there is no checker to see if you've included references to pages that aren't included in the output. However, if Prince cannot find the page, it will put page 0
as the cross reference. Consequently, to find any misplaced cross references, you could just do a “Ctrl + Find” and look for “page 0” in the output, and then go back to your content and conditionalize those sections to hide them from the print output.
Again, I don't want to get into the technical details, but that is the basic approach I'm taking. After you set it all up, it's somewhat of a push-button operation (you can trigger it all from a build script), but it's not as push-button as with a DITA platform. (As with everything Jekyll, though, it's easier to take control of the process.)
If PDF is a huge requirement, and you're producing many different types of PDFs for many different targets and audiences, then Jekyll with Prince may not be your best option. On the other hand, since I don't think PDF is the best output for documentation, this isn't a huge concern of mine. I mostly wanted to be able to provide PDFs to someone in case they need it.
About Tom Johnson
I'm an API technical writer based in the Seattle area. On this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, AI, information architecture, content strategy, writing processes, plain language, tech comm careers, and more. Check out my API documentation course if you're looking for more info about documenting APIs. Or see my posts on AI and AI course section for more on the latest in AI and tech comm.
If you're a technical writer and want to keep on top of the latest trends in the tech comm, be sure to subscribe to email updates below. You can also learn more about me or contact me. Finally, note that the opinions I express on my blog are my own points of view, not that of my employer.