Taxonomy, Metadata, and Search: Notes from Seth Earley's Confab Workshop
The first day of Confab, I attended an all-day workshop on Taxonomy, Metadata, and Search by Seth Earley. The workshop had a lot of information, much of it streamed out like a firehose through 200+ slides. In this post, I attempt to make sense of some of these concepts.
Seth outlined a three-prong approach to information management:
- Develop a taxonomy.
- Apply the taxonomy to your content.
- Leverage the taxonomy to view your content in different ways.
Taxonomy is one of those vague words whose meaning seems a bit slippery, but Seth was adamant that taxonomy is not navigation, though it affects navigation. By taxonomy, we're referring to “a system for organizing concepts and categorizing content.” A taxonomy is your metadata, “arranged in a tree-like structure, with top level categories that branch out to reveal subcategories and terms in varying levels of depth. “ The taxonomy “expresses hierarchical relationships (parent/child).”
In other words, a taxonomy is how you organize and classify all of your information objects. It's the hierarchical representation of one term with another, expressed in parent, child, grandchild, cousin, etc. relationships.
Coming up with metadata to describe your subject is a key first step. The granularity of the metadata depends on your needs. For example, a gas company might have a lot of sub-categories for the word gas, whereas a non-gas company might not.
Some standard approaches to metadata are available in frameworks such as the Dublin Core. The Dublin Core will identify common metadata attributes, such as title, description, date created, and so forth. You can leverage standards like this to a certain extent, but no doubt you will need to move beyond standards as you develop a metadata model that fits your needs.
Information architecture and semantic architecture
In establishing metadata, keep in mind a difference between “information architecture” and “semantic architecture.” With information architecture, “a single concept can have different expressions.” With semantic architecture, “a single expression can have different concepts.” For example, Statement of Work and SOW are different expressions of the same concept. But the word “set” can have different concepts, such as to place, a group of things, a tennis match, and so on.
Language is slippery, and it accounts for a good reason why people struggle to find things. When people enter keywords in search fields, their searches are usually brief and general; they do not express the level of detail and accuracy that people really need. The slipperiness of language gets at the heart of why people can't find what they're looking for.
While metadata (information about information) is the list of keywords that describe your content, the taxonomy is the hierarchical organization of that metadata. You can critique a taxonomy by evaluating whether the parent-child relationships express a clear logic. Do the children fit logically under the parent? Are there redundancies, polyhierarchies (the same children under different parents)? Are terms parallel when they should be parallel, or subordinate, or orthogonal?
My favorite exercise was organizing a list of Home Depot products (such as carpet, hand tools, and plumbing) into different groups. We had about 50 different seemingly random Home Depot products. One person's grouping certainly varied from another. There's was no clear-cut method for grouping, and despite anyone's best attempts, each arrangement was unique.
Although there's no clear way to group a large body of items, you can leverage the metadata to find related items. You can establish equivalent terms in your hierarchy, and then configure your search results to display equivalent items at that same level. This puts more effort into the taxonomy. A well-structured taxonomy will lead to a better search, because you can better approximate what the user is looking for. Based on your taxonomy, you can configure associated results.
Metadata is really the foundation for doing anything. Without this metadata applied to your information, you can't leverage it in different ways. Seth stressed that information workers should be specialists with metadata and taxonomy. If information is truly a corporate asset, leveraging it in different ways through metadata should be a key strategy. It leads to a major competitive advantage.
In developing a taxonomy, keep in mind a concept called “pace layering.” Pace layering is the idea that change takes place at different rates for different groups within a company. A business person may change at a much faster pace than other groups, such as IT. The “clock speed” of the infrastructure team may be slower than the clock speed of the social media team. And because these clock speeds move at different rates, one group gets frustrated at another for being so slow.
Applied to taxonomy, this means you may have to continually evolve. Your taxonomy needs to be flexible and adaptable to changing business environments. Some companies revisit their taxonomy several times and year, and then push out updates.
Seth used the term “information metabolism” to refer to the rate at which information flows through an organization. It accompanies the idea of pace layering. If your organization has a high rate of information metabolism, the taxonomy you develop may quickly become stale and need to adapt in a more flexible, dynamic way.
A folksonomy is a tagging system that can allow you to keep up with a dynamic, fast-paced environment where terms constantly change. A folksonomy is a metadata system that allows users to add their own tags to content, rather than pulling from a set of structured tags established in fixed taxonomy. One problem with folksonomies is that you end up with a variety of terms all expressing the same concept but spelled differently.
Seth highlighted SharePoint 2010 as a particularly good tool for capturing and managing metadata that is added in a folksonomy-based way. With SharePoint 2010, users can add terms of their own, but as they add them, they're prompted to choose similar terms already in the database. This provides the best of both worlds – allowing users to expand the existing metadata with new tags while also encouraging users to select from tags already established.
Despite the randomness in folksonomies, sometimes a trending organizing principle emerges. The concept of “emergence” is the idea of order from chaos. Without any kind of coordination among parties, the system suddenly develops into an order of its own, based on a collective trending of independently acting minds.
You can organize all of your metadata in various tools. (Seth mentioned several, though I can't remember them here.) One problem is that your organization may have many different systems (a corporate intranet, a public website, a product database, and such), all using terminology stored in separate tools. Ideally you want to pull directly from your metadata tool rather than having separate siloed instances of the metadata.
Seth compared the problem to a set of remote controls on a coffee table. Usually people have multiple remote controls – one to change channels, another to change volume, another to work the VCR, etc. The dream is to have one universal remote that controls them all, and while many remotes claim to have universal capability, they really don't. The universal remote would be the single tool to manage taxonomies across your organization.
Seth said the word ontology usually intimidates people and seems to be very philosophical, but in reality, an ontology is really just a collection of taxonomies. To make an astronomy comparison of my own, a solar system is to a galaxy as a taxonomy is to an ontology. A galaxy comprises multiple solar systems, just as an ontology comprises multiple taxonomies.
Once you've established your metadata, you then begin to develop content models. A content model is just “metadata smushed together.” For example, one content model may simply be a title and description. The content model might express various rules about how the metadata gets used. An established content model leads you to create various content types.
All of these content types come together in a content management system (CMS) that can dynamically render outputs based on various content. The CMS pulls information based on the metadata, following the rules or pattern of the content model type.
Seth showed an example of a website with various mobile phones, and noted how a CMS can dynamically render the page simply by displaying data that meets specific metadata attributes. The content type establishes where and how the data should be displayed, but the whole system is a dynamic rendering. You don't have someone hand-coding the HTML behind the scenes to define the order and display. Instead, the CMS does it through the metadata. Seth referred to this dynamic organization and display as “content choreography.”
If you can gather information about your users, you can program the CMS to render the information in different displays based on the user profile. This is one way that you can dynamically change the information to match the user's needs, thereby increasing the findability of the content. Your content is no longer static but rather dynamic and changing.
Pre-classified and post-classified
Seth also introduced some more terms to reflect this dynamic content. A pre-classified system sets up fixed portals that the user can enter to see information displayed in different ways. A post-classified system creates the portals on the fly in a list of facets that the user can navigate into once he or she enters a search term.
The whole experience of metadata mixed with a CMS can lead to a much more flexible organization of information, where there is no single fixed organization but rather many different organizations, each based on needs users may have for different information.
I'm glad I attended this eight-hour workshop. For an alternative writeup of Seth's workshop, see this summary by Clive Gibbons. I also attended a session by Leigh White at the STC Summit called Taxonomy: Do I need one? Leigh's session was excellent, so if you have access to Summit @ a Click, I recommend listening to it. (In some ways, I got more out of Leigh's session than Seth's, but maybe that's due to my short attention span.)
I will be honest here, and admit that although I think a taxonomy would be awesome, I've never created one, and it seems I hardly have the time to do it. At one time I was convinced I need to start doing taxonomy, so I began reading Heather Hedden's The Accidental Taxonomist. About 80 pages in, I realized taxonomy had more of a library science angle than I wanted to explore.
Although I've never created an official taxonomy, I suppose I wrangle out terms and their relationships enough to make sense of them in consistent ways, even if I don't make it all explicit in an official parent-child structure. Nevertheless, it would probably be a good practice to create an official taxonomy, clearly deciding on each term and its definition and relationship to other terms. This should no doubt help as I write technical content. I've just never seen any technical writer create a taxonomy in his or her approach to writing help for a software application.
TOCs and Taxonomies
I said that I've never created a taxonomy, but really when we organize the table of contents navigation in a help file, isn't this an exercise in taxonomy? When we decide what the top-level folders are, and each subfolder, and sub-sub-folder, and what tasks belong grouped together on the same topic, and what the related topics are, isn't this is an expression of the hierarchy that would exist in a separate taxonomy? Probably to some extent, yes. Even though Seth emphasized that taxonomy is not navigation, structuring your navigation is an exercise that makes you think about taxonomy.
Although many users may not use the table of contents to find information, your table of contents communicates valuable information to users. It communicates a taxonomy, which helps users understand the hierarchy and relationship of one concept and term to another. This structural overview of your content can be important. In Designing Web Navigation, James Kalbach points out:
People prefer information that involves a sequence. They like to browse. Navigation provides a narrative for people to follow on the Web. It tells a story -- the story of your site. In this respect, there is something both familiar and comforting about web navigation. The widespread, seemingly natural use of navigation to access content on the Web reflects its strength as a narrative device.
In other words, your table of contents, which expresses the hierarchy, order, and relationships within your information, helps the reader understand at a glance the whole of the information. Even if the user doesn't navigate his or her way through this sometimes maze-like TOC structure, not having the table of contents at all makes users uneasy. If you replace that table of contents with another sort of organization, something that doesn't express the semantic relationships of the information components, your users may feel lost.
That said, although organizing a TOC and building out a taxonomy seem similar, wouldn't it be much easier to build out the TOC if you already had a taxonomy to refer to? The taxonomy should provide the heavy critical thought already, making the TOC somewhat of a derivative exercise.
I'd be interested to hear your approach to creating and using a taxonomy when you write help material.
I'd Rather Be Writing Newsletter
Get new posts delivered straight to your inbox.