Can Help Content Have Recognizable Facets?

In my previous post, I wrote about faceted search and faceted classification, and how facets can help users narrow information to a specific topic.

Even if you’ve never heard the term “faceted search,” you’ve no doubt used it on various websites, like Google, Amazon, Linkedin, and more. When you perform a search, you get a list of filters to further narrow the information. Each of the filters (or facets) narrows the results. Here’s faceted search on Linkedin (after searching for “technical writer”):

Faceted search on Linkedin

You can narrow the results by company, relationship, location, date, salary, industry, experience level, and more. The facets that appear depend on the type of information you’re looking for — companies, jobs, people, and so on.

Faceted search works great because you can start out with broad search terms and progressively narrow the information. If you narrow too quickly, you can easily de-select some of your filters and expand the results — without having to create a new search.

While we see a lot of examples of faceted classification on the web with regular websites, which often have products with clearly distinguishable features such as size, weight, color, make, model, brand, mileage, and so forth, help information is more nuanced. Help topics are mostly just information. Additionally, we have almost no examples of faceted search in help systems as a foundation to examine. Even Linkedin’s help center doesn’t have facets.

Where Are the Examples of Faceted Search?

The lack of examples of faceted search in help systems doesn’t mean faceted search wouldn’t be a great feature to add to a help system. One reason we may not see faceted search in help sites is because many tech writers use help authoring tools that do not provide this capability out of the box. And companies don’t usually dedicate programming resources to create custom solutions for the tech pubs group.

A few years ago I attended an STC Summit and asked as many knowledgeable people as I could about how to implement faceted search. Not one person could tell me how to do it. The only answer was that I’d probably have to build it myself through a team of programmers.

Well, tech has advanced since then, and now with the Apache Solr Search (made easy through the Acquia Search) on Drupal, you can get faceted search up and running in a short amount of time without programming anything yourself. (In another post I’ll explain how to set this up on Drupal.)

But now that the technical question is partially resolved, we come to a difficult strategic question: What facets do you use with help, since help doesn’t have clear physical attributes to call out?

Do Users Navigate Through Classifications?

Before turning to possible answers about help facets, let’s question our assumptions. Do facets, which are really classifications (or groupings), serve as useful guides to help people find information?

I’m going to dig deep into the challenges to classification for a while, but I promise I’ll swing back around to facets. Stick with me because the challenge to classification is fascinating as well as fundamental to accepting facets.

In Readers Don’t Classify Their Experience, Mark Baker argues that a hierarchical navigation (such as with TOCs) doesn’t work with large-scale content because large-scale TOCs create meaningless high-level classifications. The classification schemes authors invent to make sense of their content mean little to readers. Mark writes,

The reason it does not work is that people do not classify their experience. They do not say, “I have an ache in my second upper bicuspid,” they say, “I have a toothache.”

In other words, users don’t start out by navigating through some arcane dental anatomy (Alveolar Processes > Mandibula > Biting and Chewing > Bicuspids > Upper Biscupids > Toothaches). They search directly for their problem: toothache.

If you look at one large doc set from a company like Palantir, you can see the attempt to group large-scale content into high-level folders. Do “Applications,” “Administration,” and “Customization” mean a whole lot to users? Probably not. The user doesn’t think, I need to set up my widget, so I’ll look in Administration and then filter through the folders (e.g., Maintenance > Tracking > Configuring … ) from there.

Of course, since I’m not a user of Palantir’s products, I really can’t evaluate their help. So I’ll turn to something more tangible: Wikipedia.

Wikipedia’s groupings provide an even stronger case about the absurdity of adding hierarchical navigation for large-scale content. If you were to organize Wikipedia into a TOC, here’s how you would navigate to find “Technical Communication.”

  Main topic classifications
                  Written Communication
                    Technical Communication

I’m not making this up. Start at the Technical Communication article and scroll to the bottom, looking for the category. Try to trace your way up. Every category must be contained inside a parent category — those are the rules of Mediawiki (the wiki platform for Wikipedia).

There’s a bit of suspense in climbing up the navigation hierarchy. As I’m going up I think I might end up at the core fundamental Truth and origin of the universe, but it just turns out to be “articles.”

Why is there no TOC in Wikipedia? Because if they ever converted their category hierarchies into a table of contents, it would be such a joke that people would be checking their calendars to see if was April 1.

This kind of meaningless hierarchy is exactly what we technical writers do when we create a massive TOC. Even limiting the hierarchy to four levels leads us to a meaningless group on Wikipedia: Language. Who can anticipate that the starting point in trying to drill into “technical communication” would be language? I can think of a dozen other paths that would make just as much sense: Communication, Technology, Computers, Careers, User Experience, Instructional Writing, and so on.

The massive TOC

Authors sometimes resort to building a single massive TOC for all content. The problem with large-scale TOCs is that they force authors to nest content into so many hierarchical levels that the high-level groupings become abstract and meaningless to users. Both hierarchies here might provide equally logical paths to the same end.

(By the way, I previously wrote about Wikipedia’s category structure in From Help Authoring Tools to Web Tools, Especially Wikis.)

Does The Failure of Large-Scale TOCs Mean Classification Is Useless?

There’s probably not a more intuitive way to logically classify large-scale content into different groups. Yahoo’s Internet Directory proved the failure of navigation hierarchies long ago.

However, once you drill into the right folder, the small list of topics in a specific folder makes more sense and is meaningful to users. At the micro-level, a list of topics sort of works, and one or two hierarchies can establish meaning and context.

Mark concedes the utility of small navigation lists:

… a small scale TOC acts as a list that the reader can simply browse, but a large scale one becomes a classification scheme that users have to navigate. If you can’t take the TOC in at a glance, or at least read it through comfortably, then you have to trace down particular paths in the hierarchy of the TOC, and that means you have to figure out the classification scheme behind the TOC. And that does not work. (Readers Don’t Classify Their Experience)

In other words, reinterpreting Mark’s implicit point, a small-scale TOC that the user can simply browse, taking it in at a glance and reading through comfortably, works.

The problem is, if all your navigation paths are just two levels deep, you’ll end up with dozens of navigation options — each really shallow. Users aren’t accustomed to browsing 45 different navigation folders as they try to determine which folder contains the information they want. So classification of large-scale content fails even when you flatten it out.

Flattening out the TOC

If you try to avoid deeply nested hierarchies in your TOC, you end up with a lot of shallow books to sort through, which makes it hard to see the TOC at a glance and get meaning from it.

Before you throw classification out the window and force all users to search for content, remember that time and space in digital mediums do not constrain us with the same limitations as paper mediums. You can have both a large-scale and small-scale TOC on the same platform with the same content.

Facets to the Rescue

Search facets can help balance large-scale navigation with small-scale results. This is because facets only appear if the tag exists within the search results.

For example, you might have 15 different vocabularies in your taxonomy, each with their own terms. Maybe you have 150 different total terms on your site. It would be too overwhelming to list all 15 folders with their 10 terms as a navigation. It would be the massive TOC that fails to communicate meaning to the user because you have so many nested hierarchies trapped in layers of abstract, high-level classification.

But that’s not what happens with search facets. Facets hide the large-scale TOC from the user. When a user searches for “widgets,” the user doesn’t see all 15 navigation options, each with a full array of terms. Instead, the user sees only the relevant navigation options. A search for widgets might pull up 5 vocabulary sets, and only show 8 terms from each vocabulary.

In this way, faceted search does a brilliant job in balancing out search with navigation. It makes it possible for users to browse because it limits the TOC to a small-scale — you get 20 navigation options instead of 150. And it also makes each of these browsing options extremely relevant. Every folder already shows possible results that have some keyword relevance.

Faceted search narrows the massive TOC to a small scale

Faceted search helps narrow down the massive TOC. You only see facets relevant to your search, so you get a zoomed-in and partial view of the TOC as it relates to your search query. In this way, you get the best of both worlds: a relevant and small-scale TOC for a massive amount of possible content.

In short, faceted search makes browsing meaningful and practical. Browsing the entire set of information doesn’t work, but browsing the micro-set does. You start with search, and then you get a meaningful result set to browse. Users can select filters to execute more specific searches on the fly, or clear those filters to expand results on the fly. Here we have browse and search working together like a fully unified couple, each contributing to the success of the relationship.

Moving on to More Practical Matters

Now let’s move on to more practical matters. Which facets work for help?

In another post by Mark Baker, Sometimes Readers Do Classify Their Experience, Mark argues that faceted search only works when the facets are familiar to users as common ways to group the items.

If you look at car sites like (which use facets like mileage, year, make) or medical sites like WebMD symptoms checker (which use facets like body part, type of pain, duration, etc.) you’ll feel the facets are a natural fit because you’re accustomed to dealing with those topics through that facet language.

But since help doesn’t tend to have familiar facets, coming up with meaningful facets is more challenging.

Our tendency from ecommerce sites is to look for physical attributes like size, color, and shape. But maybe that sells facets short. I’ve been reading The Discipline of Organizing, edited by Robert Glushko (published May 2013). Robert writes,

Unlike those for physical resources, the most useful organizing properties for information resources are those based on their content and meaning, and these are not directly apparent when you look at a book or document. (p.15)

I’m only 50 pages into the book, so I have a lot more to read, but already the book has catalyzed some thoughts about facets. If the “most useful organizing properties for information” are “content and meaning,” not physical attributes like size, color, and shape, then perhaps some worthwhile facets can be leveraged for information-based content.

In my help information, I have the following groups of information that revolve around content and meaning:

  • Game components (such as points, leaderboards, behaviors)
  • Development tools (such as API platforms and SDKs)
  • 3rd party integrations (such as with WordPress, Ensighten, or Jive)
  • Reports (analytics about program activity)
  • Program Frameworks (competitive frameworks, social loyalty frameworks, and so on)
  • Timeline (introductory phase, pre-implementation)
  • User type (marketer, developer)
  • Content type (forum, blog, documentation)
  • Article format (PDF, video, quick start, API reference)

Facets that involve content and meaning are specific to the information and domain. Facets that are more useful might be the following:

  • Date published
  • Popular content
  • Author (this is questionable)

It’s probably not useful to have too many facets. Five or six is more practical. However, as I mentioned previously, you could come up with 12 or more facets without overwhelming the reader, because the facets only appear if search results include those tags. It’s unlikely that the search results would include all facets every time, so your navigation options shrink to the micro-level for most searches.

I’m still experimenting with faceted search, so I can’t share more experiences here. It’s true we have a dearth of help examples that incorporate faceted search, but I’m sure that as faceted search tools become more common, tech writers will leverage them for the benefit of users.

In upcoming posts, I’ll explore faceted search with more depth. If you’ve implement faceted search in your help, I’d love to hear about your experiences.

Madcap FlareAdobe Robohelp

By Tom Johnson

I'm a technical writer working for the 41st Parameter in San Jose, California. I'm interested in topics related to technical writing, such as visual communication, API documentation, information architecture, web publishing, JavaScript, front-end design, content strategy, Jekyll, and more. Feel free to contact me with any questions.

  • Shane Taylor

    For some help system, the most useful facet might be user role. If I can narrow down the content to exclude stuff I’m never going to do because that’s not part of my job, that’s very helpful.

    Facets are also a useful way to provide an alternative vocabulary for users. As tech writers, we try to balance using our users’ vocabulary with using the vocabulary dictated by the product we are documenting, which means sometimes the latter wins out in the TOC. However, faceted browsing can (and I think, should) use a more generalized vocabulary based on your users and not your product.

    • Tom Johnson

      Shane, your bring up an excellent point with vocabulary. One of the tasks on my to-do list is to figure out how to create synonyms for terms in Drupal taxonomies. It would also be great to include an instant search corrector when someone starts typing a query. For example, when a user starts typing “find trophies”, it would be great to have instant search show results like “find rewards”. I’m not sure how to do that technically, but it would be super helpful.

      It’s good to hear your feedback about the importance of user roles. We mainly have a technical and a non-technical audience, but a lot of content overlaps. Technical dev people need to understand the concepts too, and the non-technical people sometimes want more detail. So I like the idea of tagging topics by user role to allow people to filter with that facet if they want, but not restricting the entire output to a user role that is often fuzzy in the topics the role sees.

  • Don Day

    You asked, “Why is there no TOC in Wikipedia?” Your answer is correct, from the point of view of Wikipedia as an über-resource. But once a seeker has defined the set of information that is meaningful to them, they start to see patterns in the data, and knowledge emerges, manifested as organization that is now appropriate to put into branches of relationships. In this regard, a “Table of Contents” is more of a “Tree of Understanding” for that user.

    And in fact, Wikipedia provides a tool to let you do just that: the built-in Help:Books application lets you recraft your assembled findings into an outline of relationships that you can reuse to your hearts content. Because now it is YOUR web of understanding that perhaps no one else sees that way, but that’s fine–the main goal of knowledge management systems is to enable the progressive organization of information into knowledge and of knowledge into wisdom. Your Wikipedia outline is your badge of enlightenment; wear it however you like.

    • Tom Johnson

      How cool. I didn’t know Wikipedia had such a slick interface to create your own books. I would be interested to see what kinds of books people put together.

      I do think that classification is necessary and useful. We have to organize things to make sense of everything around us. Even driving down the street the other day in SF, I realized that cars parked on the sides of the roads are organized differently than cars driving on the roads. This organizational pattern cues us into different car states: moving or parked. In other words, location translates into meaning. Grouping like this is how we process and make sense of our world. If all cars were mixed together, it would be hard to discern parked versus moving cars.

      The same concept applies to classification. Things in one group have a different meaning from things in another group, and nested groups mean something specific to users as well. Grouping facilitates meaning.

      I’m not throwing out the whole idea of classification here. Just agreeing that classification on a macro-scale isn’t that useful. One a micro-scale it works, and that’s why I think faceted search is so key — it creates a micro-scale view of a classification on a large system.

  • Don Day

    You also said, “This kind of meaningless hierarchy is exactly what we technical writers do when we create a massive TOC.”

    Tom, I know you didn’t intend this statement to seem condescending to the knowledge that technical writers bring to the necessarily complex and often thankless task of documenting a product. Of course an overall taxonomy is not the same thing as a ToC. But product documentation necessarily encodes many kinds of information relationships as hierarchies: organization by features, by task progression, by value to assist evaluators and buyers, by patterns or concepts for training programmers or end users, and so on. The initial organization of formal documentation for delivery via multiple channels (of which the Web is but one) requires a manifest to drive the build process; content written specifically for the Web does not necessarily have that process (other than a site map, which does need to be hierarchical for SEO, if nothing else). That the manifest has a side effect of representing an orderly view of at least one set of relationships between items is not a bad thing; it is just not the only thing, as I suspect you were getting at.

    • Tom Johnson

      I’m not entirely sure I understand what you’re saying. I think we agree here. There are many hierarchies to organize product documentation — “organization by features, by task progression, by value to assist evaluators and buyers, by patterns for training programmers or end users.” Putting the content into a single fixed order based on the tech writer’s preferred pattern is somewhat myopic. A tech writer who chooses pattern A will disappoint the user who seeks pattern B. The user who seeks pattern C will also be disappointed by both patterns A and B.

      But rather than deliver separate A, B, and C web outputs, faceted search allows dynamic re-organizing via the same web output. The web doesn’t restrict you to one org pattern. It’s a channel that has many different delivery mechanisms in it for rendering infinite outputs. You don’t have to create a map with web output A, another map with web output B, another map with web output C. With faceted browsing, A, B, and C org patterns co-exist on the same channel to satisfy different audiences. The content is reorganized on the fly based on the filters users select. Or with static filters, based on the views users select.

    • Mark Baker

      Don, re “The initial organization of formal documentation for delivery via multiple channels (of which the Web is but one) requires a manifest to drive the build process”

      No, it doesn’t. It can be driven by metadata attached to individual objects. This is, in fact, the way most large data sets are assembled and managed. We have to get past the idea that we need a manifest, and rid ourselves of the horrendous content management overhead that goes with it. Until then, we are just doing variations on a theme by FrameMaker.

  • Mark Baker

    Tom, I believe that faceted navigation works best not simply when the reader can recognize the facets, but when the reader has already organized their thoughts into facets before they even get to the site. That would be the case with autocatch,com, for instance, where a typical car shopper will already have a checklist of features in mind before they even reach the site. The site succeeds by allowing the user to enter the facets they already have in mind.

    The list of facets you use for your content may be recognizable (though I bet you overestimate how recognizable they are — the curse of knowledge bites us all). But even if I, as a visitor, am capable of recognizing them, I still have to ask, which of these facets apply to my question. That’s work I have to do. Meanwhile, the search box is right there, inviting me to skip the work and just type in what is already in my mind. Nielsen’s research seems to indicate that that is the choice most people make.

    And there is another problem with faceted search — the user may not trust it. In the case of, the user inherently trusts that there are real cars at the end of their search, and that they have the characteristics described by the facets. No person with any experience of documentation would have an innate confidence that the information they need is going to be equally well classified. We have all had the experience of finding information placed in places that make no sense to us.

    I don’t use faceted information searches because I have no confidence that the underlying information has been correctly grouped and categorized. All my experience tells me it probably won’t be. Search, on the other hand, should turn it up if it is in there, and it is a lot less work than pogoing though the categories in search of some misplaced gem of information.

    There is not much you can do about this general prejudice, but if you actually want to make sure your content is correctly classified, and that each piece of content does the whole job it is supposed to do, you are going to have to implement a pretty strict structured writing system — something you have in the past referred to as a straitjacket for the writer. But there is no way to get that level of consistency of classification and completeness of content across a broad information set without those kinds of constraints. (My work helping people move to structured writing has shown that writers grossly overestimate how consistent and complete their work is until they see structure actually applied to it.)

    On the other hand, if you do impose that kind of structure, you will go a very long way towards solving both bottom up findability and top down findabilty. Autocatch, for instance, does not have someone inserting car listing into categories. The whole faceted search mechanism is driven of the metadata in individual car listing records. It is a top-down mechanism, but it is built entirely from the bottom up, and is entirely plug and play.

    In other words, fix the bottom up issues, and you get top down navigation thrown in for free.

    • Tom Johnson

      Mark, thanks for commenting. I think we agree more than it appears here. When you use faceted search (say on, you don’t start out by selecting facets first. You start out with a freeform search. (Advanced search, on the other hand, would require users to pre-select filters to determine the results.) So you begin at the same starting point: search.

      If the results contain your answer, great. You’re all set without selecting any facets. But if you look at the search results and don’t see what you want, you could reformulate your search query by typing a new keyword combination, or you could reformulate your search query by selecting facets to automate the query you already made in a more advanced way. All a facet does is insert some advanced queries into the search. Since users are poor at restrategizing their queries in advanced ways, the facets are like little search input helpers.

      For example, users looking for “rewards and API but not widgets” might be able to construct something like “rewards AND API” NOT widgets but nobody does this because some search engines accept operators and some don’t. You never really know. If the search ignores “not,” you end up with widgets in your results. Other search engines use + and -.

      However, if users can apply this advanced searching logic via the facets, then all the better in creating more intelligent searches. And much unlike advanced searches, users can adjust the query on the fly rather than retyping the query each time. In sum, the facets are like search query helpers.

      Re Nielsen studies, I’m not sure his study included the use of facets, so I’m hesitant to interpret his conclusion as embracing non-faceted search over faceted search.

      Re structure, yes, I’m a big fan of structured taxonomies. I’ve seen the result of freeform tagging — it fails miserably. In fact, I don’t know why I freeform tag my blog posts — the long list of tags is pretty much useless. Structured categories are much more useful.

      But the whole idea of tagging is kind of interesting. In Drupal, you create vocabularies and terms within each vocabulary. You then select which content types can use those vocabularies. So via the exposed vocab lists, I restrict what tags are available. This helps keep the tags useful.

      Is this structured authoring? If so, I’m a total advocate.

      But this tagging approach leads to another question. Because it’s hard to get content into Drupal by authoring in the wysiwyg editor, I’m probably going to adopt a DITA format as an authoring process, and then use a script to pull the DITA content into Drupal. The DITA files can have the vocabulary encoded within it, but how do I enforce what terminology is allowed in the name/value pairs? I’m assuming that enforcing this vocal would require me to specialize DITA, which isn’t in my future. Instead, I think I’ll just refer authors to a list of the allowed terms. In this way, the wysiwyg editor enforces more semantic structure with terminology than a DITA model — at least I think. I really don’t know enough about metadata enforcement in DITA to speak more here. Either way, coming up with a structured set of tags is the subject of another blog post. It requires a lot of strategic thought.

      • Mark Baker

        Tom, when I talk about structured writing in this context, I mean that that content that is written must conform made by the promise of the taxonomy.

        If you are going to do reliable faceted search of a doc set, that means that for whatever combination of facets the user selects, the content returned will be complete and relevant to that set of facets. Nothing relevant will be omitted, and nothing irrelevant will be included.

        To accomplish that, every piece of content must be written to fully satisfy the metadata attached to it in every facet. Your facets and their metadata are essentially laying out a multi-dimensional grid. You are then placing content at points on that grid, and you need to do so in such a way that the content fully merits its placement at that point on all of the axes for the grid.

        If you don’t do that, the results of the faceted search are going to be inconsistent and unreliable and people will very quickly give up on it.

        And you are certainly not going to achieve this by writing the content first and then figuring out where it fits in the matrix afterwards. Most content just isn’t going to fit the matrix properly unless it was designed to fit the matrix. (I’m pretty sure this is what Jonatan Lundin is on about when he talks about the metadata predicting user needs. The metadata defines a matrix, and the writer’s job is then to populate the matrix.

        Achieving that is going to require an extraordinarily strict structured writing regime.

        My general objection to this approach is that it assumes that the interesting relationships between content can all be mapped on a grid, and this is very clearly not the case. Often, topics describe particular and unique relationships between real world objects in a way that is not going to fit into any matrix. In other words, there is no one-to-one correspondence between topic type and relationship type.

        A web, on the other hand, does not require that relationships follow a grid. It allows you to relate anything to anything else as long as a subject affinity exists in the particular case. It thus allows a far more realistic mapping of real world relationships. As I have noted here ( you can’t draw a top-down map of those relationships that makes any sense, which is why writers and information architects get spooked by it. But it is still a better mapping or real world relationships between subjects and topics.

        What’s more, it is also faceted navigation. Facets don’t have to fit into neat tables and rows to be legitimate facets. Every subject affinity is also the expression of a facet. We don’t have to flatten everything down to a set of fixed dimensions. We have, like never before, the means for content to follow the real contours of reality. We finally have the means to be structured and systematic without being square. We need to start using it. (Re this:

      • Don Day

        More than you wanted to know, probably, but you do not have to specialize content for those behaviors. Subject Schemes are a way to manage the structured classifcations separately from your source content. There is this excellent slideshare deck, Introduction to DITA 1.2 Classification and Subject Schemes: Building a Knowledge Model for Your Content by Joe Gelb, and also Kris Eberlein spoke on the subject at DITA North America last month. That awareness is at least starting to get out where it is needed.

      • Mark Baker

        By the way, I would seriously recommend against trying to do this with DITA. Can open. Worms everywhere.

        I’d suggest a more fruitful line of research would be to look at defining custom content types in Drupal itself. Sara Wachter-Boettcher’s, Content Everywhere: Strategy and Structure for Future-Ready Content would be a good place to start investigating this approach.

        • Tom Johnson

          I actually already have numerous content types defined. Each content type has about 8 different metadata fields associated with the content type. I’ll check out the book you referenced, though.

          The problem with the wysiwyg editor is that it’s not a great authoring environment. Either you see the wysiwyg display and then then have to clean up all the inline code it produces, or you turn it off and look at code in the raw, without syntax highlighting. In the raw, if you make invalid html, like adding a paragraph element between list items, the editor moves that paragraph element to the top of the document in order to make the list valid.

          If you author in markdown, great. Drupal has a markdown filter that will easily process the syntax and convert it to html. There’s a markdown extension for Mediawiki, which would probably simplify things a great deal. But it’s still copy-and-paste page by page. (But at least it wouldn’t involve reformatting.)

          I’m still in flux here trying to make a decision. In replying here, I’ve almost convinced myself toward the mediawiki > drupal solution. I guess I’m really wishy-washy about this point right now.

          • Mark Baker

            I’m with you 100% on WYSIWYG authoring. (You might be interested in my webinar The Business Case for Forms-based Authoring with oXygen

            I am also conscious that I am talking in generalities while you are attempting to solve an immediate problem with the tools you have available to you. Sometime the ideal can be the enemy of the better.

            Markdown is a great irony for me, because I was around when the great XML vs SGML debate was raging. SGML was a system for creating markup languages for humans to use. Angle brackets were the default markup, but you could define almost any forms of markup you liked in SGML. You could certainly create an SGML DTD to describe markdown, and therefore process markdown as an SGML document.

            But the XML folk argued that markup was dead, and that XML would only ever be used behind a WYSIWYG interface, so all those SGML features were dropped. And here we are today, with markdown thriving and people turning against WYSIWYG interfaces in droves.

            Not that we are likely to revive SGML’s fortunes at this point, so here we are in the second age of markup with no generalized markup language. Hi Ho. Which brings us back to forms.

            By the way, while I am pointing out the limitations of top-down faceted navigation, I am by no means saying that we should abandon top down navigation, or even top-down faceted navigation. The user always starts from the top. (Often the top of the Web — Google — rather than the top of your site, but the top all the same.) So we do need top down navigation. What I am arguing is that it is bottom up navigation where finding is finished, and we need to spend far more time and effort on it than we do presently.

          • Tom Johnson

            Mark, thanks for pointing me to your upcoming webinar. It’s also interesting to hear about your SGML background. You’ve been in structured authoring a long time.

            Re markdown, last night I experimented with a Mediawiki extension that changes the syntax to markdown (Alternate Syntax Parser) in tandem with a markdown syntax filter for Drupal. Transferring content that way actually works quite well. And I like the open source nature of everything.

            But I’d have to install Mediawiki at my work, which would be odd given the prevalence of Confluence wiki as an internal architecture. Two wikis? At least there aren’t 2 wiki syntaxes to learn. (Confluence took a step in the wrong direction, in my opinion, when they did away with wiki syntax.)

            Then I had another thought. We have a lot of SMEs writing documents in Google Docs — more so than in Confluence. Why not just have people write markdown syntax in Google docs? I tried it a bit, and you know, it just might work.

            The collaborative features in Google Docs are really unparalleled. Coupled with the ability for controlling views and access, Google docs provides the best collaborative platform I’ve seen. And if SMEs are already using Google docs as their preferred tool, why not make it work?

            The idea of markdown is to be a readable syntax. The problem with markup languages is that when you start adding too many angle brackets into content, it starts to be cumbersome to read. You almost have to render it to really read it — unless you’re working in a syntax highlighter of some kind that color codes everything really well, or unless you’re sufficiently familiar with the angle brackets that they don’t even register when you’re reading. But this isn’t the case even for engineer SMEs.

            I may experiment with this extremely lightweight workflow. It’s like an Occam’s Razor solution. Our publishing needs and requirements are so small that it might actually work.

          • Mark Baker

            Tom, yes I have been doing this a while. :-/

            Back in the day, we used to build systems that used a relational database with metadata in regular database fields and a chunk of SGML in a BLOB field. We called it “microdocument architecture” and it was used to build some pretty cool stuff.

            The thing is, just about every Web CMS out there today is based on microdocument architecture. The only real difference between what we were doing then and what is commonplace today is that the markup in the BLOB field is HTML or markdown rather than a semanticly-rich SGML DTD.

            Make no mistake about it: a Web CMS based on a microdocument architecture is structured writing, even if it uses HTML or markdown in the BLOB. DITA may have a lot of currency in Tech Comm right now, but compared to the prevalence of microdocument architectures across the Web, it is a small sideshow.

            I too love Google Docs for ad hoc collaboration. Both GD and wikis are essentially ad hoc collaboration tools. (Structured collaboration is a whole different thing.) One of the ongoing questions is going to be how to reconcile the short-term flexibility of unstructured ad hoc collaboration tools with the long-term flexibility of structured content.

            In specific cases, at least, a microdocument architecture in which you have a structured content type defined in a database and then an unstructured BLOB in the middle may well provide a workable solution.

            It’s main limitation, though, is that it does not provide an obvious means to automate linking, which is what you need to build a robust bottom-up organization of content.

        • John Tait


          I’ve been trying to use DITA and subject scheme maps (for facets) to design dynamic role-based manuals, but many of your posts and comments here and elsewhere suggest that I’m slightly off-track using DITA. I’m interested implementing in what this blog post is about, i.e. structured topics plus metadata delivering findable content.

          If not DITA, can you please tell me what I could use instead? Is Drupal the go-to solution now?

          • Tom Johnson

            I think one has to separate authoring processes from delivery mechanisms. You could probably select a variety of mechanisms to delivery faceted search in help. Drupal is just one of them. And as I just mentioned in a previous comment, you could simply use a DITA to Drupal connector from VR Communications to accomplish this via DITA. Maybe Eclipse help has something similar? Not sure, but there was a screenshot of it in a DITA book I was reading called DITA Best Practices: A Roadmap for Writing, Editing, and Architecting in DITA, Video Enhanced Edition (on Safari Books online).

          • Mark Baker

            John, DITA has reached a point now where its inherent suitability to task is not the only issue to consider. It now has a significant incumbancy advantage and tool base that you can exploit to put together solutions even when the architecture was not designed for them. I suspect we will see a lot of people doing that with DITA now.

            Some people would answer your question based on incumbancy advantage alone. I’m more inclined to answer it based on architecture alone. So you will have to balance those two points of view for yourself.

            DITA was designed to build old-style TOC-based help systems and books, not to build Every Page is Page One topics delivered through a faceted navigation. It is also designed to maximize ad-hoc reuse of content, which is why it is so dependent on a CMS and a manifest-driven build. It comes with a lot of overhead, which costs time and money that has to be recouped in other parts of the process, reuse and translation being the places that savings are usually found.

            If you want faceted navigation, you are presumably developing independent topics, not a consecutive or hierarchical manual. (What I call Every Page is Page One topics.) A microdocument architecture, as I described it to Tom, is a much more natural architecture for that, and Drupal provides you all the tools to create that.

            Drupal’s incumbancy advantage is a big as DITAs, and much bigger in the Web delivery space. What it won’t give you out of the box (AFAIK) is book-style PDFs or captive help systems. If you need both, a DITA/Drupal mashup might be workable, especially since someone has apparently (per Tom) already built a connector — thats incumbancy advantage on both sides. But if that is not a requirement, I can’t see what advantage DITA give you.

  • Pingback: Enjoy Technical Writing » WordPress for Knowledgebase Development()

  • Vinish Garg

    Tom, this is quite an insightful post, in fact your last few posts on findability and I feel really challenged for the way we plan our documentation projects. However, these holistic strategies and recommended practices work better for enterprise products or midsize companies.
    A small business where the client is seeking a contractor for a small online manual for say, an LMS for online certification program may not always afford it, neither in house, nor for a contractor. It gets tricky for the contractor who is evolving such as by participating in this blog or other community events, to not to implement these learning when the service buyer cannot afford it for time and/or cost. I understand that we can present business cases to highlight benefits and ROI but my experience suggests that it does not always go well with small, contractual tech comm assignments.

    Coming back to this post topic, your post suggests that these are interesting times for technical communicators. We have moved on from merely writers (planning a tri-pane help) to being UX engineers and information architects. This is an interesting mix.

    • Tom Johnson

      Vinish, I enjoyed your most recent post about WordPress. I esp. liked your instant search implementation. I think in many ways we are on the same page in looking for web-based solutions that work for small clients.

      I don’t work for a large company. I’m actually 1 of 2 tech writers in a startup, but we have several SMEs writing and contributing content as well. The Drupal solution I’ve been writing about lately is an open source solution. An expensive, robust CMS probably does faceted search in a more powerful way, but I haven’t worked in an environment that uses such a tool. I like to believe that the best solutions are also open source.

      In comparing WordPress with Drupal, I definitely think Drupal has more potential for tech comm. And I say this having been a WordPress consultant (as a side job) for years. Drupal has a concept of taxonomy that is lacking in WordPress. You can easily create views of content based on your taxonomy terms, and you can roll these taxonomy terms as facets into search. You can also create different content types and configure the fields available for each content type. WordPress just barely started offering content types, and it’s quite difficult to create a new content type in WordPress (for me anyway).

      • Vinish Garg

        Tom, thank you for your comments on my post. For Drupal vs WordPress, there is no comparison as far as scalability and customization options are concerned; Drupal is miles ahead.

        My comments on your findability approach, research and experience had been primarily on the implementation feasibility when we take smaller contractual assignments. Sometimes we cannot afford such extensive and comprehensive approach because of A, B or C reasons. Of course I do not deny the usefulness and purpose of this approach when a business can afford it. And in any case, this is a step towards our strife for excellence helping us evolve as information experience specialists. Thanks again for this excellent post.

  • Don Day

    Tom, this is good work on your open research and sharing of ideas. Any choice you make on a long-term content strategy, whether Drupal or DITA or anything along that spectrum, will have benefits that scale in proportion to investment. For DITA, I’ll point out that Michael Priestley and Amber Swope outlined an adoption path in the DITA Maturity Model whitepaper, well worth a read.

    FWIW, I’ve seen faceted search in IBM’s DITA-based internal KnowledgePlace project, and I saw an excellent demo of faceted search at DITA North America last month by the French company Antidot (pronounced with a long o); see the slideshare deck here. Both projects validate the current high end of what is possible in the DITA Maturity Model. Is your vision of a managed help model feasible and sustainable within any system’s particular architecture? Are the investment costs worthwhile for your corporate information strategy? Keep doing the necessary fact-driven analyses against your business requirements (as Sarah O’Keefe diligently reminds us, often and rightly), and keep us informed.

  • Jonatan Lundin

    Tom, you ask “What facets do you use with help, since help doesn’t have clear physical attributes to call out?”. As you know, my work with SeSAM is exactly about answering the “what facets do you use…”. First of all, let us conclude on what it is we classify using facets. It is answers to user questions. Thus, our help system is a pool of (predicted) standalone, self contained answers and the relations between them, like a giant FAQ list.

    The question is then “what type of questions do users ask?”. A user is asking a question in a search situation. The user has an information-seeking goal in a search situation. So what we are really classifying is information-seeking goals, not *some* already written type of information. If we know the information-seeking goals and their classification, we can write the answer and, voilá, classify that answer to the facet values that the corresponding information-seeking goal is classified to.

    This is the crash course on how to predict user questions, thus the information-seeking goals. Consider the product you are documenting. Imagine all user questions that all users will ask in future. It could be 50, 100 or 10 000. You don’t know. Just imagine a great number.

    Now we can start to categorize these imagined questions. First we can categorize them according to the situations in which they arise where the user is using the product to solve a specific need. A “need” is the reason to why the user bought the product (=the product purpose). If your product can solve three different user needs (called the primary goal in SeSAM – this example is a little simplified) all the imagined questions end up in any of the three groups. This grouping is your first search situation taxonomy. In a search situation, the user is in one (1) of these three groups, having the corresponding need.

    Now you can categorize the questions according to another set of groups: the secondary goal tasks. The user must do various tasks – to reach a secondary goal, such as install, configure etc – to make the product solve a need. The second taxonomy reveals the type of tasks the user can/must do. To know which tasks the user can/must do, we need to know the needs the product can solve and how it is solving them – so there is a relation to the first need taxonomy.

    Now, let’s focus on all imagined questions classified to one (1) type of task and (1) need category (called a task situation in SeSAM). Let’s say there where x amount of questions in this task situation. The next step is to further categorize the questions into which step in the task, the questions relates to. Let’s say that the task included 10 steps, then each step would have n related questions. The steps are an extension to the secondary goal task taxonomy.

    The questions for each step, is then categorized according to what type of question it is (the third goal taxonomy), which we talked about in the previous blog post. A third goal in a step can for example be to:

    * Know how to do the step
    * Understand what a product response mean (error message etc)
    * Find an icon, button, menu in user interface
    * Find the reason and solution to a perceived problem
    * Understand what data to select/supply in a dialog/field
    * Understand and learn about the product since it seems that it cannot do what I want it to
    * etc etc

    Now, we can categorize all questions according to other types of search situation taxonomies, such as which product interface the task is carried out in (if the product has several types of interfaces), the operating and usage environment where the product can be used (dessert/north pole etc – since the answer and tasks steps are different depending on environment), the software/hardware modules that are invoked when the product is solving the need etc. Note that “user role” is not seen as a search situation facet taxonomy.

    At last, to predict how many questions will be asked we need to first populate the need taxonomy, then the task and step taxonomy and then the third goal taxonomy (type of questions) etc. As we talked about in the previous post, the facet values are specific to a given product and its design, thus there no such thing as a universal set of taxonomies. So you get a matrix for a given product which helps you to predict user questions, to then know which answers to write (as Mark points out). Each answer is then classified to a facet value in each search situation facet taxonomy.

    But, there are several issues we must consider when predicting user questions, for example what level of knowledge we assume the user to have. We may skip to write the answer for many predicted questions since we assumed the user to know the answer or since the user must be certified to know the answer to be allowed to use the product. Furthermore, due to a “wrong” mental model, users will ask questions that cannot be predicted, thus the search user interface must help the user to “correct” the mental model, by something I call “learning-by-searching”. Even users having a “correct” mental model will ask questions we cannot predict, so support people need to react to them. The “reacted” answers can then be uploaded to the portal to keep the content “alive” post-release.

    Finally, a faceted navigation interface built on the search situation facet taxonomies, should first present the type of user question (third goal) taxonomy. Once the user has selected the type of information-seeking goal, the next step is to select the task, need etc. This approach is bottom-up. The user may very well find the answer upon the first facet selection. Thus, we can build a guided faceted navigation portal that mimics user-expert conversation (which my company is developing using DITA subject scheme maps). But as we have concluded elsewhere, most users would first to a keyword search and then filter the list of answers according to the search situation facet taxonomies – which is called faceted search. Well, sorry for a long comment again but there are a lot to say so wait for my next blog post…

  • number lookup

    Having read this I thought it was really informative.
    I appreciate you finding the time and effort to put this informative
    article together. I once again find myself spending way too much time both reading and posting comments.
    But so what, it was still worth it!

  • gold pieces

    Do you hae a spam problem on this site; I also am a
    blogger, and I was curious about your situation; we have developed some nice methods and we are looking to swap
    methods with other folks, why not shoot me an e-mail if interested.