Search results

Pushing content into any format with Jekyll

Series: Innovation in tech comm

by Tom Johnson on Mar 6, 2015
categories: technical-writing

In the previous post, I talked about help APIs as a way to deliver help inside applications. In this post, I'll explain how to push your help content into any format.

Pushing content into other formats using Jekyll and for loops
Pushing content into other formats using Jekyll and for loops

Let's say that you have three different channels where you want to push your help content. Channel one is an S3 bucket in Amazon Web Services (requiring HTML), channel two is a Salesforce Knowledge center (requiring CSV), and channel three is your help API (requiring JSON).

Jekyll (and probably a lot of static site generators) provide an amazing capability here. With most help authoring tools, you see a list of outputs (e.g., PDF, HTML, Eclipse Help, etc.), and you're pretty much limited to those outputs. Jekyll, however, allows you to define the templates and format that your content is pushed into.

Let's remember the three different format channels for this scenario:
- For channel 1 (S3), the content needs to be in HTML.
- For channel 2 (Salesforce), the content needs to be in CSV (for batch import into Knowledge).
- For channel 3 (API), the content needs to be in JSON.

Pushing content into HTML

Since HTML is the main publishing use case, most of Jekyll is structured around facilitating this template. You author content in a Markdown or HTML page. At the top of this page, you specify the layout you want in the frontmatter, like this:

---
title: My Page
permalink: /mypage/
---

Jekyll will take all the content in this page and stuff it inside the {{content}} tag in the layout you specified (page.html). Page.html also usually has a layout defined (default.html), so Jekyll takes all the content stuffed into the page.html layout and stuffs it into the {{content}} tag on the default.html page.

(You could just specify your layout as default from the beginning, but you might have various HTML layouts, such as a layout for pages, posts, and specific content types (such as API doc), which all plug into default.html.)

Pushing content into JSON

Now let's move to the JSON use case. Rather than stuffing content into {{content}} tags, you create a file that looks like this:

---
layout: none
search: exclude
---
{
    "entries":
[
    
]
}

"tooltips" is the name of a collection I created inside Jekyll. This code has the basic structure of the JSON that I want, but you'll notice some placeholders. A for loop iterates through all the pages inside the tooltips collection and, with each page, the page's id gets inserted into the /2015/03/06/pushing-content-into-any-format-with-jekyll placeholder, and the page content gets inserted into the {{page.content}} placeholder.

Assuming our pages were short descriptions of sports, here's what the result might look like:

{
"entries": [
{
"id": "baseball",
"body": "Baseball is considered America's past-time sport, though that may be more of a historical term than a current one. There's a lot more excitement about football than baseball. A baseball game is somewhat of a snooze to watch, for the most part."
},
{
"id": "basketball",
"body": "Basketball is a sport involving two teams of five players each competing to put a ball through a small circular rim 10 feet above the ground. Basketball requires players to be in top physical condition, since they spend most of the game running back and forth along a 94-foot-long floor."
},
{
"id": "football",
"body": "No doubt the most fun sport to watch, football also manages to accrue the most injuries with the players. From concussions to blown knees, football players have short sport lives."
},
{
"id": "soccer",
"body": "If there's one sport that dominates the world landscape, it's soccer. However, US soccer fans are few and far between. Apart from the popularity of soccer during the World Cup, most people don't even know the name of the professional soccer organization in their area."
}
]
}

Pushing content into CSV

CSV requires a different format from either HTML or JSON. And admittedly, here's where things get a little theoretical because I haven't actually tested this.

Here's a typical CSV format that I just pulled off the web:

policyID,statecode,county,eq_site_limit,hu_site_limit,fl_sit
e_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3

You have a top row of comma-separated values, and then data in rows below that following the same pattern. Commas separate each value.

In Jekyll, you would first make sure your pages had all the frontmatter tags corresponding each of these CSV headers. Here's what one page might look like:

---
policyID:119736
statecode:FL
county: CLAY COUNTY
eq_site_limit: 48960
hu_site_limit: 498960
fl_sit: 498960
e_limit: 498960
fr_site_limit: 498960
tiv_2011: 498960
tiv_2012: 792148.9
eq_site_deductible: 0
hu_site_deductible: 9979.2
fl_site_deductible: 0
fr_site_deductible:0
point_latitude: 30.102261
point_longitude: -81.707664
line: Residential
construction: Masonry
point_granularity: 3
---

You then create a file with a .csv extension, such as data.csv. In this file, you add some basic frontmatter at the top so that Jekyll processes the file as a page. And then you iterate through each of the pages using a for loop and stuff the data into your CSV template.

I'll pretend that I've created a collection here called "policies," and that each of my pages exists inside _policies.

---
layout: none
search: exclude
---
policyID,statecode,county,eq_site_limit,hu_site_limit,fl_sit
e_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity

See how you access content in the frontmatter using the page namespace? page.policyID gets the value for the policyID in the frontmatter, and so on. The for loop would go through each of the pages and construct a new row of comma-separated values until it reached the end of the pages in the policies collection.

When you build your Jekyll site, Jekyll will recognize the data.csv file as needing to be processed because of the frontmatter tags. You will find a fully populated data.csv file in your build folder.

Not bound by format

Because of this flexibility in constructing templates to stuff content into, you're not bound by a specific format, for the most part. You create your content, decide on the template, and then Jekyll shoves the content inside the template. The template could be HTML, JSON, CSV, or something else. This way you can author content in a way that is separate from format. (You could equally create a template that stuffs this same arcane policy information content into a JSON file, for example.)

No doubt many tools do a similar kind of thing on the backend. I just never really understood what was happening when I selected a certain output. Jekyll exposes this processing in a clear and simple way. Now your content can travel into any number of systems in a seamless way.

There is at least one limitation with the formats, though. You can't really create a DITA template and push the content into a DITA format except maybe in the most general way, with body including just HTML. This is because DITA has some very specific structures, and this simplistic template method won't really wrap lists inside task elements, convert links inside pages into xrefs, enforce element order, and so forth.

Other more sophisticated formats might have similar restrictions. However, my point is that Jekyll allows you to separate out your content from the template (presentation), and this is a huge deal when it comes to processing and displaying information.

About Tom Johnson

Tom Johnson

I'm an API technical writer based in the Seattle area. On this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, AI, information architecture, content strategy, writing processes, plain language, tech comm careers, and more. Check out my API documentation course if you're looking for more info about documenting APIs. Or see my posts on AI and AI course section for more on the latest in AI and tech comm.

If you're a technical writer and want to keep on top of the latest trends in the tech comm, be sure to subscribe to email updates below. You can also learn more about me or contact me. Finally, note that the opinions I express on my blog are my own points of view, not that of my employer.