Taxonomy, Metadata, and Search: Notes from Seth Earley’s Confab Workshop

The first day of Confab, I attended an all-day workshop on Taxonomy, Metadata, and Search by Seth Earley. The workshop had a lot of information, much of it streamed out like a firehose through 200+ slides. In this post, I attempt to make sense of some of these concepts.

Seth outlined a three-prong approach to information management:

  1. Develop a taxonomy.
  2. Apply the taxonomy to your content.
  3. Leverage the taxonomy to view your content in different ways.

Taxonomy

Taxonomy is one of those vague words whose meaning seems a bit slippery, but Seth was adamant that taxonomy is not navigation, though it affects navigation. By taxonomy, we’re referring to “a system for organizing concepts and categorizing content.” A taxonomy is your metadata, “arranged in a tree-like structure, with top level categories that branch out to reveal subcategories and terms in varying levels of depth. “ The taxonomy “expresses hierarchical relationships (parent/child).”

In other words, a taxonomy is how you organize and classify all of your information objects. It’s the hierarchical representation of one term with another, expressed in parent, child, grandchild, cousin, etc. relationships.

Metadata

Coming up with metadata to describe your subject is a key first step. The granularity of the metadata depends on your needs. For example, a gas company might have a lot of sub-categories for the word gas, whereas a non-gas company might not.

Some standard approaches to metadata are available in frameworks such as the Dublin Core. The Dublin Core will identify common metadata attributes, such as title, description, date created, and so forth. You can leverage standards like this to a certain extent, but no doubt you will need to move beyond standards as you develop a metadata model that fits your needs.

Information architecture and semantic architecture

In establishing metadata, keep in mind a difference between “information architecture” and “semantic architecture.” With information architecture, “a single concept can have different expressions.” With semantic architecture, “a single expression can have different concepts.” For example, Statement of Work and SOW are different expressions of the same concept. But the word “set” can have different concepts, such as to place, a group of things, a tennis match, and so on.

Language is slippery, and it accounts for a good reason why people struggle to find things. When people enter keywords in search fields, their searches are usually brief and general; they do not express the level of detail and accuracy that people really need. The slipperiness of language gets at the heart of why people can’t find what they’re looking for.

While metadata (information about information) is the list of keywords that describe your content, the taxonomy is the hierarchical organization of that metadata. You can critique a taxonomy by evaluating whether the parent-child relationships express a clear logic. Do the children fit logically under the parent? Are there redundancies, polyhierarchies (the same children under different parents)? Are terms parallel when they should be parallel, or subordinate, or orthogonal?

My favorite exercise was organizing a list of Home Depot products (such as carpet, hand tools, and plumbing) into different groups. We had about 50 different seemingly random Home Depot products. One person’s grouping certainly varied from another. There’s was no clear-cut method for grouping, and despite anyone’s best attempts, each arrangement was unique.

Although there’s no clear way to group a large body of items, you can leverage the metadata to find related items. You can establish equivalent terms in your hierarchy, and then configure your search results to display equivalent items at that same level. This puts more effort into the taxonomy. A well-structured taxonomy will lead to a better search, because you can better approximate what the user is looking for. Based on your taxonomy, you can configure associated results.

Metadata is really the foundation for doing anything. Without this metadata applied to your information, you can’t leverage it in different ways. Seth stressed that information workers should be specialists with metadata and taxonomy. If information is truly a corporate asset, leveraging it in different ways through metadata should be a key strategy. It leads to a major competitive advantage.

Pace layering

In developing a taxonomy, keep in mind a concept called “pace layering.” Pace layering is the idea that change takes place at different rates for different groups within a company. A business person may change at a much faster pace than other groups, such as IT. The “clock speed” of the infrastructure team may be slower than the clock speed of the social media team. And because these clock speeds move at different rates, one group gets frustrated at another for being so slow.

Applied to taxonomy, this means you may have to continually evolve. Your taxonomy needs to be flexible and adaptable to changing business environments. Some companies revisit their taxonomy several times and year, and then push out updates.

Information metabolism

Seth used the term “information metabolism” to refer to the rate at which information flows through an organization. It accompanies the idea of pace layering. If your organization has a high rate of information metabolism, the taxonomy you develop may quickly become stale and need to adapt in a more flexible, dynamic way.

Folksonomy

A folksonomy is a tagging system that can allow you to keep up with a dynamic, fast-paced environment where terms constantly change. A folksonomy is a metadata system that allows users to add their own tags to content, rather than pulling from a set of structured tags established in fixed taxonomy. One problem with folksonomies is that you end up with a variety of terms all expressing the same concept but spelled differently.

Seth highlighted SharePoint 2010 as a particularly good tool for capturing and managing metadata that is added in a folksonomy-based way. With SharePoint 2010, users can add terms of their own, but as they add them, they’re prompted to choose similar terms already in the database. This provides the best of both worlds – allowing users to expand the existing metadata with new tags while also encouraging users to select from tags already established.

Emergence

Despite the randomness in folksonomies, sometimes a trending organizing principle emerges. The concept of “emergence” is the idea of order from chaos. Without any kind of coordination among parties, the system suddenly develops into an order of its own, based on a collective trending of independently acting minds.

Centralizing metadata

You can organize all of your metadata in various tools. (Seth mentioned several, though I can’t remember them here.) One problem is that your organization may have many different systems (a corporate intranet, a public website, a product database, and such), all using terminology stored in separate tools. Ideally you want to pull directly from your metadata tool rather than having separate siloed instances of the metadata.

Seth compared the problem to a set of remote controls on a coffee table. Usually people have multiple remote controls – one to change channels, another to change volume, another to work the VCR, etc. The dream is to have one universal remote that controls them all, and while many remotes claim to have universal capability, they really don’t. The universal remote would be the single tool to manage taxonomies across your organization.

Ontology

Seth said the word ontology usually intimidates people and seems to be very philosophical, but in reality, an ontology is really just a collection of taxonomies. To make an astronomy comparison of my own, a solar system is to a galaxy as a taxonomy is to an ontology. A galaxy comprises multiple solar systems, just as an ontology comprises multiple taxonomies.

Content models

Once you’ve established your metadata, you then begin to develop content models. A content model is just “metadata smushed together.” For example, one content model may simply be a title and description. The content model might express various rules about how the metadata gets used. An established content model leads you to create various content types.

All of these content types come together in a content management system (CMS) that can dynamically render outputs based on various content. The CMS pulls information based on the metadata, following the rules or pattern of the content model type.

Dynamic display

Seth showed an example of a website with various mobile phones, and noted how a CMS can dynamically render the page simply by displaying data that meets specific metadata attributes. The content type establishes where and how the data should be displayed, but the whole system is a dynamic rendering. You don’t have someone hand-coding the HTML behind the scenes to define the order and display. Instead, the CMS does it through the metadata. Seth referred to this dynamic organization and display as “content choreography.”

If you can gather information about your users, you can program the CMS to render the information in different displays based on the user profile. This is one way that you can dynamically change the information to match the user’s needs, thereby increasing the findability of the content. Your content is no longer static but rather dynamic and changing.

Pre-classified and post-classified

Seth also introduced some more terms to reflect this dynamic content. A pre-classified system sets up fixed portals that the user can enter to see information displayed in different ways. A post-classified system creates the portals on the fly in a list of facets that the user can navigate into once he or she enters a search term.

The whole experience of metadata mixed with a CMS can lead to a much more flexible organization of information, where there is no single fixed organization but rather many different organizations, each based on needs users may have for different information.

My thoughts

I’m glad I attended this eight-hour workshop. For an alternative writeup of Seth’s workshop, see this summary by Clive Gibbons. I also attended a session by Leigh White at the STC Summit called Taxonomy: Do I need one? Leigh’s session was excellent, so if you have access to Summit @ a Click, I recommend listening to it. (In some ways, I got more out of Leigh’s session than Seth’s, but maybe that’s due to my short attention span.)

I will be honest here, and admit that although I think a taxonomy would be awesome, I’ve never created one, and it seems I hardly have the time to do it. At one time I was convinced I need to start doing taxonomy, so I began reading Heather Hedden’s The Accidental Taxonomist. About 80 pages in, I realized taxonomy had more of a library science angle than I wanted to explore.

Although I’ve never created an official taxonomy, I suppose I wrangle out terms and their relationships enough to make sense of them in consistent ways, even if I don’t make it all explicit in an official parent-child structure. Nevertheless, it would probably be a good practice to create an official taxonomy, clearly deciding on each term and its definition and relationship to other terms. This should no doubt help as I write technical content. I’ve just never seen any technical writer create a taxonomy in his or her approach to writing help for a software application.

TOCs and Taxonomies

I said that I’ve never created a taxonomy, but really when we organize the table of contents navigation in a help file, isn’t this an exercise in taxonomy? When we decide what the top-level folders are, and each subfolder, and sub-sub-folder, and what tasks belong grouped together on the same topic, and what the related topics are, isn’t this is an expression of the hierarchy that would exist in a separate taxonomy? Probably to some extent, yes. Even though Seth emphasized that taxonomy is not navigation, structuring your navigation is an exercise that makes you think about taxonomy.

Although many users may not use the table of contents to find information, your table of contents communicates valuable information to users. It communicates a taxonomy, which helps users understand the hierarchy and relationship of one concept and term to another. This structural overview of your content can be important. In Designing Web Navigation, James Kalbach points out:

People prefer information that involves a sequence. They like to browse. Navigation provides a narrative for people to follow on the Web. It tells a story — the story of your site. In this respect, there is something both familiar and comforting about web navigation. The widespread, seemingly natural use of navigation to access content on the Web reflects its strength as a narrative device.

In other words, your table of contents, which expresses the hierarchy, order, and relationships within your information, helps the reader understand at a glance the whole of the information. Even if the user doesn’t navigate his or her way through this sometimes maze-like TOC structure, not having the table of contents at all makes users uneasy. If you replace that table of contents with another sort of organization, something that doesn’t express the semantic relationships of the information components, your users may feel lost.

That said, although organizing a TOC and building out a taxonomy seem similar, wouldn’t it be much easier to build out the TOC if you already had a taxonomy to refer to? The taxonomy should provide the heavy critical thought already, making the TOC somewhat of a derivative exercise.

I’d be interested to hear your approach to creating and using a taxonomy when you write help material.

Madcap Flare Adobe Robohelp

This entry was posted in findability on by .

By Tom Johnson

I'm a technical writer working for The 41st Parameter in San Jose, California. I'm primarily interested in topics related to technical writing, such as visual communication (video tutorials, illustrations), findability (organization, information architecture), API documentation (code examples, programming), and web publishing (web platforms, interactivity) -- pretty much everything related to technical writing. If you're trying to keep up to date about the field of technical communication, subscribe to my blog either by RSS or by email. To learn more about me, see my About page. You can also contact me if you have questions.

8 thoughts on “Taxonomy, Metadata, and Search: Notes from Seth Earley’s Confab Workshop

  1. Vinish Garg

    Tom

    Thanks for another insightful post! I recall that many weeks week, I stumbled upon an excellent post on taxonomy scheme, by Dick McCarrick, at: http://www.ibm.com/developerworks/lotus/library/ls-Kmap_tax/.

    I think that the TOC is an outline of all logical and functional sections and subsections (topics, subtopics) and each section has some instructions, notes, process flow diagrams, and images. So, the taxonomy is a subset of TOC.

    I will not hesitate to say the taxonomy guides the technical writers to develop and complete the manual whereas the TOC is for the users’ reference.

  2. Jonatan Lundin

    Taxonomies are definitely an important tool for technical communicators. Taxonomies play an important role when designing end user assistance. My approach is the SeSAM approach to information architecture.

    I argue that you should develop the taxonomies before starting to write something. The taxonomies are used to identify what to write. And, you should not only develop only one (1) taxonomy, but several. How come and why, you might ask? And what type of taxonomies do you need to develop?

    First of all; we have concluded for a long time that users are searching for answers when using technical products. More precise, users ask questions like “How do I install X?”, “Can my product do Y?” etc. So technical communicators are answering questions (the challenge is: how do we know the questions, but that is another story since you need to predict them). Let’s assume that you know the questions users ask. Each question can be classified from several facets such as what functionality (possibility – like the alarm on your phone) the user trying to get working, the user goal (configure, use, customize etc), the product interface the user is using (the product may have several user interfaces) etc.

    Each facet makes up a taxonomy of its own. The taxonomies can be used to build a matrix, from which the needed answers to create can be identified (=help topics). Thus an answer is classified to one value in each taxonomy. The taxonomies (I call them search situation facet taxonomies) are also used to build search guides, like faceted search environments. The search guide is the single point of entry for all type of users.

  3. blaine

    Very thought-provoking post Tom. Reading it, I realized that my first approach to writing when tackling a new product/knowledge area is to work out a taxonomy, though I never called it that–rather I referred to it as a glossary.
    Creating a functional glossary seems crucial to me–not so much for the end user (though that will go along way too), as for me the writer. How can I describe a system to another person if I don’t know what to call the things I want to identify for the user?
    This leads to another realization I’ve had over the years–the biggest challenge to communication around a product can be the different folksonomies/taxonomies that grow up around a product. Engineers typically create their own folksonomy when creating a product. Especially in start ups, this can be done rather on the fly. By the time marketing comes along, these structures and name sets are so ingrained for developers that there is precious little that can be done to change the term in the mind of the developer, though it may be crucial for marketing to make a change for the sake of the customer. As a technical writer, one has to be aware of and harmonize (at least in one’s own mind) these competing systems in order to keep communication flowing and the correct information available to help the end user.

  4. Marko Hurst

    Hi Tom,

    Nice recap / post. My comment is to you TOC & Taxonomy section.

    As you describe it above you’ve actually described an information architecture or a navigation scheme, in which case I agree with the premise of what you’ve said. But you’ve made the (common) assumption that navigation / browsing is the equivalent to or the same as taxonomy. They aren’t. Navigation is not equal to taxonomy.

    While they are both types of classification systems, they actually serve very different purposes and rules for building then make them closer to being opposites than anything. The oversimplified, but accurate version of the differences are:
    – NAVIGATION: is broad high level similarities that get more granular the further down you go, i.e. “aboutness”, things that are about the same.
    – TAXONOMY: Much stricter rules are applied that not find similarities, but the differences between content objects (anything). When a difference is found it is “kicked out” into another node (parent, child, sibling or channel), i.e. precision, which gives greater control over the relationships between content (think related content) giving it greater relevancy because its level of specificity for relatedness is more precise and granular.

    The biggest benefits I’ve seen in with a proper taxonomy is in relating content, relevancy of content (onsite / enterprise search results) and with ad sales / placement. This is why so many personalization, recommendation and enterprise search engines simply suck. They over rely on technology and don’t understand content structures.

    Sorry for the long rant, but it’s a pet peeve of mine and I think you just have the wrong use of the word. That said this is one thing that great damage or good can be done by knowing what and how to build a proper taxonomy actually is and that its not the same as navigation or what a user wants to browse.

    1. Tom Johnson

      Marko, thanks for commenting. I appreciate your clarification, and for correcting my direction to associate taxonomy with navigation. I do say in my post that taxonomy isn’t navigation, but rather than consider the division and classification of content an activity related to taxonomy, I should probably just call that navigation instead. I am finding less and less relevance in taxonomies for help authoring.

      I’d be interested to hear your response to Mark’s comment here. Do you think that doing any kind of taxonomy practice would be helpful in creating help material, such as an online help file?

      Also, it seems that various people define taxonomy differently. I still think that grouping and sorting and dividing content is a classification activity, so associating that classification with taxonomy doesn’t seem too much of a stretch.

  5. Pingback: Taxonomy, Metadata, & Search: A Different View - semanticweb.com

  6. Thomas Kohn

    I’d appreciate a follow-up to “You can organize all of your metadata in various tools. (Seth mentioned several, though I can’t remember them here.)”

    I wonder if a CMS is needed to store the taxonomy as you develop it, or whether the taxonomy is quickly applied to the XML coding.

    I’m at a great loss, probably, in my inexperience so far with DITA-XML and writing within that technology.

  7. Pingback: Link Roundup – July 29, 2012 | Enterprise Information Management in the 21st Century

Comments are closed.