The Importance of Chunking for Sorting

If you want to be able to sort information by various classification schemes, such as by most popular, or by role, or by problem, your content has to be chunked in a granular enough way to facilitate the various means of sorting.

Consider a work that is one large book, with no chunks at all. In that case, it would be impossible to sort anything, because you have just one object. With one object, the only pattern you can configure is itself. But if you have a handful of objects, you can arrange those objects into as many patterns as you want.

To use an analogy, let’s say you have a pile of rocks. If you have 1,000 small rocks, the potential number of patterns you can configure with the rocks is infinitely greater than the patterns you can configure with just a few rocks.

I noticed this in a recent trip to Arches in Moab. While walking along trails, we saw a lot of rock piles called cairns that act as guide points. The cairns can be stacked and arranged in myraid ways, because they consist of little rocks:

But the big rocks are much more pattern-limited. They mostly just sit there, alone:

Thus if your goal is to enable a variety of patterns or classification schemes, so your users can choose from myriad classifications, according to their individual needs, you must chunk your content in a granular enough way to facilitate the classifications.

Granular chunking poses some difficulties for help content, because if you chunk things too small, the help system becomes arduous to navigate. If each page contains just one topic, you end up with so many pages, navigating the pages will give users a headache.

To avoid this, on my calendar help wiki, the Viewing Calendars page has the following topics on the same page:

Calendar Contents

All of these topics appear on the Viewing Calendars page.

Now, suppose I want to manipulate this content on a more granular level. Suppose the “View Calendars of Other Wards” topic is a popular topic; the “FAQ” issues would be appropriate in a problems-based classification. The “About Subscribed Calendars and Subscribed Locations” belongs to a conceptual table of contents. The “View Churchwide Calendars” belongs to a “Coming Features” type of organization, and so on.

In short, let’s say I want to add metadata to each of these sub-topics so that they can be sorted, rearranged, recompiled, or otherwise organized in different classification schemes. If they are compiled in one giant topic, they can’t be manipulated at all except on a more macro-level. This is why chunking is such a fundamental principle to technical writing, because without small chunks of content, you don’t have many options for manipulating it.

Whether you use a wiki or not, deciding how granular to chunk your content is a challenge. For example, on Microsoft Word’s Help, this is the topic for Changing or Setting Page Margins.

This topic on working with margins really contains five separate topics.

This topic on working with margins really contains five separate topics.

By combining these five topics into one topic, it becomes more difficult to manipulate the individual sub-topics as their own topics. The metadata you add to this topic must account for all the sub-topics within this topic.

Now consider the opposite strategy. Let’s make each subtopic its own topic. You can see the effects of this approach in the following Office help search:

Granular chunking

When you chunk things in a granular way, it becomes harder to find the chunks, and you lose some context.

Here the topics on formatting are all chunked into their own topics, so you end up with Clear all text formatting, Show or hide formatting marks, Apply strikethrough formatting, and so on. When a user clicks on a topic, the topic is short, such as the following:

This is a short topic.

This is a short topic. This is all that’s there.

This short topic either answers the user’s question or it doesn’t. There’s not much room for error, since no similar topics are grouped together. If it’s not the right topic, the user must return to the list of results and click another, and another, and another until he or she locate the right topic.

In contrast, if you combine a larger number of topics together on the same page, you give more context to the user. He or she can read conceptual introductions followed by a handful of sub-topics that all deal with the general topic. The user can easily scan down the subheadings to find the right sort of task for this topic. But it’s harder to manipulate each individual sub-topic separate from the larger topic. And your metadata can’t describe each of the individual sub-topics but must cover the larger topic generally.

Chunks that Consist of Chunks

I’ve been contrasting big chunks versus little chunks without acknowledging that big chunks can consist of combinations of little chunks. So in each of the examples above, the topics can exist separately but be grouped together into the larger topics that you see.

With Mediawiki, this method of reuse is called transclusion. Last week, convinced that I needed to chunk each topic more, I separated all the topics that you see in that first calendar screenshot onto their own individual pages. I then “transcluded” these chunks to form a longer page.

Currently, from the user’s point of view, it looks exactly the same. But really, I can now arrange and manipulate these chunks however I want because I can apply unique metadata to each one of the topics.

However, this poses a new problem: searches will find the individual chunks and the larger pages that combine these chunks, which means content will be in multiple places rather than one place.

The Collage and the Painting

Don Day’s post on The Collage and the Painting describes how search becomes problematic with little chunks. Day is writing in the context of DITA, but the challenge of working with small chunks is the same. Day writes,

A common talking point about DITA is how the topic-referencing architecture makes it easy to reuse topics in new maps of information. By extension, searching on a facet of interest should bring up a collection of topics that you can read as a focused subset of a larger whole. Print it as a PDF, or output it in eBook format, and you’ve got some good reading for the commute or for the weekend. But how practical is this vision?

The flaw in the theory comes from loss of context when you pull a set of topics by query. Imagine doing a web search on a subject of interest and then printing the whole list of hits, as is, into a single PDF for later reading. Obviously you will have the problem of duplicated content, possibly some older and less reliable content, a good deal of discussion by people who are not experts on the subject, organizing the hits in a reasonable manner (by timeline, by author, in a hierarchy) and so forth. Metadata might help in preserving bits of a former organization or rationale, but the new use might be totally different from how any of that content was originated. Bringing order out of disarray is the whole drive behind the growing trend of Content Curation.

Don Day uses the metaphor of the collage and painting to distinguish between small topics pulled together and a larger chapter that provides context for each of the topics.

Don Day uses the metaphor of the collage and painting to distinguish between small topics pulled together without order and a larger chapter that provides context and sequence for each of the topics.

In other words, if you pull together all topics that have specific metadata, such as all topics related to scheduling events, you may get an unordered collage of topics. The order of the topics may not reflect any kind of sequenced or arranged reading. The list of topics no longer forms a larger, well-written chapter that contextualizes each topic, but rather may seem like little scattered objects here and there.

The effect might be compared to taking an entire book and ripping out all the pages and throwing them on the ground, mixing them up, and then reading the randomly arranged orders. That reading experience is dizzying and un-fun.

In sum, when you run searches on all the topics together that have similar metadata, you end up with assortments of small chunks that lack the continuity and context of a larger chapter or book. This simply seems to be the tradeoff of chunking your content. Your search results become more like a collage, but you have more flexibility in how you arrange your topics.

31 thoughts on “The Importance of Chunking for Sorting

  1. Helen Abbott

    Another great post. Tom, have you tried MediaWiki’s Hierarchy extension? It allows you to order articles in a category, rather than relying on alphabetical order in the generic MediaWiki categories. So far we haven’t worked the kinks out of the extension, but if we do, I think it will help introduce some order into the chaos. However, there will be a lot of initial setup and maintenance.

    1. Tom Johnson Post author

      Thanks for the tip on the Mediaiwki extension. I hadn’t heard of the Hierarchy extension. It looks robust. I was exploring the Semantic Wiki extension and one of the Dynamic Page List extensions last week. The content on my wiki will be migrated into another format, eventually. I need to format it into XML and migrate it onto our main site. On that site I’ll actually be able to take advantage of faceted browsing, recommended results, instant search results, and more. I like Mediawiki, though, and I can do a lot there as well.

    2. Marcia Johnston

      Helen, I’m encouraged to learn about better ways to organize topics in a wiki than alphabetically. During my first and so far only experience contributing to a wiki (using Confluence), I was told that alphabetical was the only order available for the topic list / nav bar at the left. Not very usable.

      Maybe alpha order is simply the default, and no one on my team knew how to change it. Anyone got tips for creating alternate organization/navigation schemes in Confluence?

  2. Patty Blount

    First, let me say what a brilliant analogy this is – rocks : content chunks. The visuals just nailed it for me.

    Designing Help is a challenge for me. Despite working as a tech writer for over a decade now, I’ve only designed two such systems and am frequently overwhelmed by the behind-the-scenes work involved so users find the Help truly helpful.

    I find the linking and cross-referencing and outlining to be more time consuming than the actual writing and find I’m often never done. What advice do you have for tackling this more effectively?

    1. Tom Johnson Post author

      I agree that linking and cross-referencing can be extremely time consuming. In fact, trying to do anything more than a simple topic-based classification will take a lot of time. It would be hypocritical of me to say I’ve done everything I’m recommending here. However, I think if you get a good strategy for the types of metadata you want to add to each topic, it will make the process faster.

      One thing about internal linking — relationship tables can make life easier here. If you aren’t familiar with relationship tables, check out my post here.

    2. Mark Baker

      I gave a paper on this very subject at JoAnn Hackos CMS/DITA conference earlier this month. When you have a system that depends on maps to organize and link content, you are indeed likely to spend as much time linking and cross referencing as you do writing, particularly if you chunk your content finely.

      At the conference, I described a technique I called “soft linking” by which authors mark up references to subjects as they write and links are formed by the build scrips by querying the indexes of all the topics in the documentation set. This means, among other things, that links are formed correctly no matter how the topic is used.

      But there is a precondition for this to work. Tom has described the problem perfectly in his comment on Don Day’s article. If you chop your chunks too fine, you cannot query them successfully, as least, not if you hope to get a coherent result.

      The smaller you chunk your content, the harder it is to attach meaningful metadata to the pieces, an issue I recently discussed here: http://everypageispageone.blogspot.com/2011/04/topics-should-merit-their-metadata.html.

      This is not to dispute the upside of fine chunking that Tom outlines here. But fine chunking comes with considerable costs as well. The trick is to find the level of chunking that is optimal for your particular set of business problems.

      1. Tom Johnson Post author

        Hey Mark, sorry for the slow reply. I read your comment a while ago and have been mulling over your clock analogy. I think it’s brilliant and you’re dead on with your analysis. There’s definitely a danger in chunking things too small. Small chunks may allow someone to manipulate content in an extremely flexible way, but it falls apart on so many other levels. Thanks for commenting and writing such excellent blog posts.

      2. Tom Johnson Post author

        By the way, you mentioned a method for linking without specifying exact links. Was that the equivalent of a related-posts plugin on a blog?

  3. Larry Kunz

    LOL. Tom, you might just be the only person who hikes around Arches taking pictures of the cairns rather than the scenery. But I know why you did: the cairns are the indispensable signposts that guide people to the scenery.

    After a search engine presents the user with a collage of topics, the next logical step is a process of curation that bubbles some topics to the top (relevance factor). The whole science of curation and semantics is still evolving, but the potential is exciting.

    BTW, if you took any pictures of the scenery–and not just the cairns–I hope you’ll find a way to include them in your blog posts. Arches is one of my favorite national parks.

    1. Tom Johnson Post author

      Larry, thanks for your comment. I agree that this gets us into the topic of curation, the “science” of curation even, as you say. I’m moving in that direction, eventually. I think curation will probably need to be hand-selected/arranged to be useful, though, rather than automated.

      Okay, since you asked, here are two more Moab pictures: park avenue and sanddune arch.

  4. Warren Jason Street

    I think this is a great way to illustrate the fact that taste plays an important part in designing the presentation of text. Yes, it can be well organized, but I think there’s an element of taste that enters into this–when I look at how you’ve set up “granular chunking,” I can see the thought process behind beginning with “clear, show, apply,” and so on. It’s logical and orderly but it also appeals to a sense of clear purpose.

    1. Tom Johnson Post author

      Warren, thanks for commenting. Yes, I agree that taste, as you put it, is critical. One can’t merely automate a bunch of different intelligent arrangements. I think the alternative arrangements/classifications may takes as much cleverness as the original writing.

  5. Marcia Johnston

    “Transclusion.” I’m glad to know this word. The rock photos add a lot too. Chunking is one of the most challenging (says the writer in me) — and most important (says the reader in me) — aspects of making Help helpful. Looking forward to hearing more on this subject.

    1. Tom Johnson Post author

      Transclusion seems to be a made-up word by Mediawiki because they couldn’t think of a simpler term, like re-use. When you transclude a topic, you can choose to define some parts with includeonly, noinclude, or onlyinclude tags. The semantics are like an amusement park.

  6. Jonatan Lundin

    One of the great advantages and also drawbacks with DITA is that is doesn’t restrict the topic granularity. You can infact have a topic with just a title. But this is also a problem for many adapters, since DITA as a standard doesn’t give you any advice on how to define the granularity (and DITA shouldn’t). It’s up to you. I have understood that this is a problem in global content creation teams, lacking a common strategy, where each writer has their own idea of the size of a topic.

    One idea to control the topic granularity is to first develop a metadata taxonomy, using for example the DITA subject scheme or even topicmaps, and use it as a “granularity grid”. See http://www.stc-india.org/wp-content/uploads/2010/11/SubjectScheme_STCIndia_Jonatan.pdf

    1. Tom Johnson Post author

      I can see that this topic is sliding into taxonomy land, an area that I know little about. My understanding of taxonomy is that it’s the agreed-upon metadata and language to describe content. As Mark points out, the metadata you define has some bearing on the size of the topic, because smaller topics have more restricted metadata. So I see agreeing on metadata as naturally defining the topic size. I will need to learn more about taxonomy before I can comment intelligently there. One thing is certain, though. There is definitely confusion about the ideal granularity of a topic.

  7. Richard Rabil, Jr.

    Tom, this is great. I’ve been thinking a lot of about information architecture lately, and your explanation of transclusion, which I didn’t really understand when I first started checking out MediaWiki, has been really helpful. Keep it up, man.

    1. Tom Johnson Post author

      Thanks Richard. By the way, what are you up to these days? You were at Texas Tech, right? Did you graduate and go to work for a specific company? You shared with me that cool Prezi last year. Just wondering what career direction you pursued.

  8. Julio Vazquez

    Some interesting thoughts here and it has stirred me to comment (as if I could resist).

    Yes, there is a danger if you are too granular with your chunks. However, if you have a robust enough strategy for applying your metadata at a level that unites the chunks, then you can provide a unifying structure for the content within a specific context. If you apply the metadata at the topic level, then you wind up with the little rocks instead of the cairns.

    While DITA does allow you to be extremely granular, should you so choose, it does not force you to be granular or to apply the metadata to the topic level. In fact, it gives you many ways to specify the metadata such that you can effectively group topics together in a unique grouping should you desire. This becomes a simple exercise because you can define that at the publication level rather than during the authoring of the content.

    Will you still get a bunch of topics returned when you search? Yes, but those will be group within the context you defined. Is it perfect? No, but I have yet to find a perfect solution to this sort of issue.

    The bottom line is that you have to implement metadata within the boundaries of the overall content strategy so that you achieve the right level of granularity and that’s dictated by the organization and not any single architecture or tool.

    1. Tom Johnson Post author

      Julio, thanks for adding your insights. I agree that the metadata defined by the content strategy will inform the level of granularity. I need to read more about DITA, since I feel I’m entering territory with this thread that has already been hashed out by DITA gurus like you. One day I will get around to reading your book. :)

  9. Mary Ann

    Did you know that stacking rocks in Buddhist temples is also an act of prayer and reflection?

    Interesting post!

    1. Tom Johnson Post author

      I did not know that. I know that raking sand or a rock garden is supposed to be mentally cleansing. I’ve seen the little mini zen gardens on office desks. Maybe I’ll get one.

    1. Tom Johnson Post author

      Meggan, thanks for the comment. I must warn you, though, because you’re linking to a Complete Bargains site in the UK, and because your comment lacks substance, I’ll eventually flag it as spam. However, you’ve given me a great example for my upcoming WordPress workshop, so thanks. BTW, how does one pronounce the second G in your name?

  10. Deepak

    Thanks Richard. By the way, what are you up to these days? You were at Texas Tech, right? Did you graduate and go to work for a specific company? You shared with me that cool Prezi last year. Just wondering what career direction you pursued.

  11. Harshad

    Another great post. Tom, have you tried MediaWiki’s Hierarchy extension? It allows you to order articles in a category, rather than relying on alphabetical order in the generic MediaWiki categories. So far we haven’t worked the kinks out of the extension, but if we do, I think it will help introduce some order into the chaos. However, there will be a lot of initial setup and maintenance.

  12. Krunal

    By the way, you mentioned a method for linking without specifying exact links. Was that the equivalent of a related-posts plugin on a blog?

  13. Pingback: Topic Chunking and The Broken Clock | I'd Rather Be Writing

  14. Pingback: Why fine chunking and rich metadata don’t mix

Comments are closed.