Maintaining existing documentation
As soon as new docs are published, docs begin a trajectory of decay. The natural progression of technology makes documentation outdated within a matter of months or years. New versions of web browsers, operating systems, supporting utilities and tools, etc., are released, and the whole technology landscape keeps moving forward, evolving, improving, and adjusting — all while documentation remains static. The more your documentation relies on third-party components, the faster it goes out of date.
Most documentation efforts focus on creating new documentation, but what happens to all the existing documentation that is decaying? In this section, I cover ways to maintain existing documentation and prevent it from rotting.
- Working on new docs versus maintaining existing docs
- 1. Define priority docs
- 2. Identify broken links
- 3. Identify broken or outdated images
- 4. Delete unused branches
- 5. Identify unpublished files
- 6. Identify files or doc sets ready for deletion
- 7. Identify content owners for each doc set
- 8. Identify last-reviewed dates
- 9. Require content owners to periodically review docs
- 10. Identify docs that are missing metadata
- 11. Identify top articles in knowledge bases that aren’t in the documentation
- 12. Identify docs with the highest bounce rate and shortest visit time
- 13. Review doc commit logs to check commits from people you don’t recognize
- 14. Check engineering commit dates against release notes
- 15. Look for broken documentation formatting
- Conclusion
Working on new docs versus maintaining existing docs
If documentation isn’t actively maintained, it rots (becomes Redundant, Outdated, Trivial), loses ownership, and turns into “legacy documentation.” The tech writers who created the initial content transition elsewhere, and the product team members move on to other areas of the org or leave. Soon no one knows the history of the content — it was just there when writers arrived. As such, tech writers new to the team might prefer to avoid and neglect it, being unfamiliar with the docs. The neglected docs begin a downward spiral of decay as no one actively looks after the docs anymore. The docs are like orphaned children, without a caregiving parent.
As more time passes, the amount of legacy documentation becomes more and more abundant. The rate of new documentation that tech writers create versus the amount of documentation the tech writers retire tends to be an uneven ratio: you create more than you retire. Most documentation requests that come in from product teams relate to new documentation, as everyone is focused on the shiny and new, the possibilities of a new feature, the excitement of a release, and so forth. Few want to take responsibility for documentation related to products or features that fizzled but are still used.
It’s easy to get caught up in constantly creating new content. After all, this is what requests to the documentation team usually consist of — product managers say, We’re releasing a new product/feature, and we need docs! Product teams rarely reach out to you to review two-year-old docs, or to update docs against support cases to see if better docs could have prevented the support efforts. Once you publish your docs, major content updates or overhauls are rare except with major product updates.
Updating and maintaining existing content isn’t sexy or exciting, but it’s essential if you want to stay on top of your docs. When you look at your documentation repository and see hundreds of docs that look unfamiliar to you, that you can’t vouch for (identifying owners, the last time it was reviewed, who worked on it, how accurate it is, etc.), this is unsettling.
Rather than stay heads down in always creating new content, doc teams should regularly cycle in a few maintenance tickets with each sprint — even though no product teams request the maintenance. Regular maintenance should be part of any content strategy to maintain a healthy set of documentation. But with hundreds of pages in your repository, where do you begin? How do you approach maintenance in any sane way, especially when you can barely keep ahead of the incoming requests for new content? The following sections list possible maintenance activities you can do with docs. (Although I numbered the sections, they are in no particular order.)
1. Define priority docs
First, identify which docs are a priority. If some products have been sunset or abandoned internally, don’t waste too much time improving those docs. Just establishing which docs are a priority can help you with planning. You can identify which docs are a priority by looking at page views in your analytics. Additionally, you should have a sense of your organization’s priorities that are independent of site traffic. When you know which docs are a priority, you can weight them higher in the maintenance tasks. Try to identify your top 5 documentation sets. This way, if other efforts have to be triaged, you can at least focus your attention where it matters.
2. Identify broken links
Identifying broken links is a time-honored tradition in the doc community, and it’s an easy place to start. Fixing broken links is a straightforward, concrete task that doesn’t require outreach to other teams. Put on your favorite Netflix show in the background, and slip into a mind-numbing afternoon of fixing links. There are different tools and scripts for identifying broken links. For example, you could use linking-checking tools (like Dr. Link Checker), custom scripts, or even Chrome extensions (like Broken Link Checker) to identify broken links.
We often look within our own documentation repositories for errors, but what about content outside of the documentation? How about all the marketing pages, blog posts, support threads, and other content that link to your documentation? If your documentation portal has a lot of other pages, consider running link checkers across those pages to identify broken links. Fortunately, you might not have to fix broken links on these pages. You could simply print the report and hand it off to the content owners in these areas.
3. Identify broken or outdated images
As with broken links, identifying broken images is another task that you can tackle by running scripts or other validators. Broken images are much less common than broken links. More often, the issue is that existing images become outdated. If you’re familiar with all the interfaces, you can glance through screens and identify older designs. Alternatively, you can revisit workflows, retracing your steps through tasks to see if the screens have changed. This would give you a chance to check the accuracy of the steps as well. There’s nothing that signals “this documentation is old” than having outdated screenshots.
4. Delete unused branches
How often have you looked at your list of branches in origin and found half a dozen or more old branches or branches you no longer recognize? Long-lived branches outside of your production environment are rarely a good thing, as the branch usually becomes more and more divergent. If you’ve already merged the branch, get rid of it. If the content is unpublished, consider merging it anyway (assuming it’s a new page and doesn’t update existing pages), and then use other techniques to enforce access control. You could even set the page as unpublished.
There might be other maintenance tasks to perform on your Git repository as well, such as pruning large files to reduce the repo’s size, verifying permissions, or running other maintenance tasks.
5. Identify unpublished files
In our documentation repo, there are quite a few files that I’ve started and then abandoned. Sometimes I started working on a file only to have the product team ghost me before launch. Other times I might have set about an idea that I eventually forgot about. In many cases, I set the page as unpublished and move on. However, when I do searches in the repository to look for phrases or instances of something, those unpublished files get included and slow me down. Look through your repo and identify all the unpublished content. Assess whether it still needs to remain. If you don’t anticipate actually working on the content, delete it. Or copy it into a ticket and archive it. Someone can always find and retrieve it.
6. Identify files or doc sets ready for deletion
You might have some files in your repo that you can just delete. For example, suppose you deprecated a product two years ago, and all the remaining files just have a banner indicating the deprecation. At some point, you can just delete the files. Or maybe you have files and doc sets related to products that never launched. Maybe docs were written for a feature that was canned at the last moment and the product team long since disbanded. Or maybe you have leftover files from a previous migration, or old scripts that are no longer used, and such. At a regular cadence, look through your docs and ask what you can remove.
Note that I’m not recommending that you delete actively used content. Almost always, when you deprecate documentation, you add a banner announcing the deprecation and include a migration path. I’m instead encouraging you to think about the “deletion” part of the content life cycle. In Kristina Halvorson’s definition of content strategy, Halvorson says, “Content strategy guides decisions about content throughout its lifecycle, from discovery to deletion.” Think about that last stage in the content life cycle: deletion.
Deleting content can be scary — what if people complain about broken links? What if it turns out we need that content? Before removing it, shouldn’t we get approval from all the product team members (who are no longer there)?
You know what? The great thing about managing content in Git is that it’s nearly always retrievable. You can usually resurrect content that you’ve deleted. If you delete something and no one complains about its absence, you probably made a good choice.
7. Identify content owners for each doc set
Understanding who owns all the documentation content, especially legacy documentation, is a huge task. In the moment that you’re creating new documentation, this kind of content ownership metadata hardly seems necessary. But as soon as the docs are published, teams evolve, people leave, and before too long, support might reach out to you with questions about a specific doc, and you won’t have any idea who to contact about it. Content ownership usually doesn’t need to be granular at the page-level, but the general doc sets should have some metadata about which team owns the product. Team ownership of products doesn’t change so often as the team members. (The average lifespan of an employee at many companies is less than 2 years.) The engineers and PMs that seem so present and thriving one month might vanish the next.
8. Identify last-reviewed dates
Probably the most significant metadata for your docs is the last-reviewed date. Although Git timestamps your edits, Git won’t distinguish between a single-character update and a major doc overhaul that was carefully reviewed by SMEs. In your page metadata, consider maintaining two fields: “last modified” and “last reviewed.” Last modified is helpful to quickly see who recently worked on the file. You can implement a Git hook to print the Git user email into a value on the page, as well as the last modified date. The “last reviewed” metadata can’t easily be automated, but even if you have a general date like Q1 2020, Q2 2020, and so on, it will probably be fine.
After you add this last reviewed date to your docs, consider taking the bold step of printing the timestamp on each page (perhaps in the footer). If users see that the page hasn’t been updated since 2016, it can be a red flag to users that perhaps the content is old. You can create scripts and reports and automate other processes as long as you have these dates on your files.
9. Require content owners to periodically review docs
Probably the most powerhouse technique you can do to keep docs updated is identify docs that haven’t been reviewed for a long time and submit a ticket to the product team to review the content. As long as your “last reviewed” dates are accurate, you could create scripts to identify these pages at a regular cadence (e.g., yearly) and then automate notifications or tickets to content owners to complete reviews. This regular “baselining” is common with security teams and permission settings.