Why Rubrics Fail as a Means of Measuring Documentation Quality

Alice Jane Emanuel has an interesting post that details her methods for measuring the quality of documentation. The post consists of notes from a webinar she gave on the subject. Alice writes,

… I have never seen anything like what I envisage in my head, which closes the argument by creating a weight or optimal rating for each necessary element in the technical communication being reviewed. When you start to consider necessary elements you can look for concrete things, gauge how much they are needed, and look at how well that need is met.

Some of the categories she assesses include:

  • Document structure
  • Reference and navigation
  • Graphics and other visual elements
  • Accuracy and grammar
  • Terminology
  • Consistency
  • Clarity
  • Task orientation
  • Completeness

She explains,

Depending on what your document requires, you set a weight for each element in each category, with 1 as low and 5 as high…

For the full talk, see Measuring quality — The talk — Comma Theory.

I think her method is a good example of a functioning rubric. And I’m not singling out her approach at all. It’s just that her post made me think more about rubrics in general. I have mixed feelings about them.

Before I became a technical writer, I worked for four years as a composition instructor (two years as a grad teacher, and two years as regular faculty). In both positions, the rubric always reared its head because you had to have some kind of criteria to evaluate student essays. Students wanted to know how they would be graded, and teachers wanted to avoid accusations of subjectivity.

For composition, the rubric usually assessed writing based on argument, organization, source integration, language, and some other elements. You can see the rubric I put together here when I was a composition teacher in Egypt.

Rubrics are popular in almost any event where judgements are made. For example, we’re seeing a similar criteria with the deathmatch going on at MindTouch:

Here our judges will compare two competitors’ product help and support communities against each other in the following criteria; User Experience, Social, Engagement and Findability. (See Death Match Round 1: Mozilla Versus IE.)

One of my issues with using rubrics to assess anything — documentation, essays, support sites — is that, at least for me, judgment is not so mechanical. Almost nothing can be broken down into a list of parts that, when properly assembled and in the right balance, create a perfect whole. I guess I lean more toward the “holistic rubric” camp, which has more general categories, each of which has a number of subpoints (which are not evaluated and scored individually).

Rubrics may provide a good reminder for writers as they’re creating documentation. For example, including visuals and other illustrations in help would probably be a good idea, as well as adding a glossary and index and table of contents. It’s also important to have simple sentences, to be free of jargon, and to clearly articulate concepts.

But if we want to measure effectiveness of something, it should be measured against its goal. The goal of documentation is not to score perfectly on a rubric. The goal of documentation (and any writing) is to meet the needs of its audience. The questions we should be asking of documentation are as follows:

  • Did the documentation meet the information needs of the audience?
  • Were users able to find answers to their questions quickly?
  • Did the documentation reduce the number of support calls coming in?
  • Did documentation increase usage of the product?
  • Were users pleased with the documentation?
  • How would users rate the documentation?

When weighed against this goal, the other criteria — completeness, accuracy, grammar, terminology, clarity, etc. — lose importance. The documentation might be full of spelling errors. It might be poorly organized. It might be ultimately incomplete and even lack visuals. But if it gets the job done and satisfies the users’ needs, then the documentation has achieved its goal and should be rated higher than documentation that checks off a list of rubric categories.

Granted, it’s much harder to evaluate documentation based on how it meets its end goal, because it requires you to somehow contact actual users. But I think that’s the direction any rubric should go — toward the user’s perspective, not evaluated from an internal perspective.

This discussion leads to the larger problem of tech writing teams not having regular or close communication with users. If we did have more contact with users, our rubrics would naturally reflect more of a user’s perspective. Since most of us lack this contact, the rubrics we create have a noticeable absence of user input.

Adobe Robohelp Madcap Flare

This entry was posted in general on by .

By Tom Johnson

I'm a technical writer working for The 41st Parameter in San Jose, California. I'm primarily interested in topics related to technical writing, such as visual communication (video tutorials, illustrations), findability (organization, information architecture), API documentation (code examples, programming), and web publishing (web platforms, interactivity) -- pretty much everything related to technical writing. If you're trying to keep up to date about the field of technical communication, subscribe to my blog either by RSS or by email. To learn more about me, see my About page. You can also contact me if you have questions.

19 thoughts on “Why Rubrics Fail as a Means of Measuring Documentation Quality

  1. Gordon

    I was lucky enough to be at a workshop at TCUK11 where Alice Jane (a formidable woman!) presented this rubric.

    Whilst I partly agree with your sentiment, I put a different slant on things. Wearing my ‘manager’ hat, using this scoring mechanism will mean that, fundamentally, the documentation being scored will improve.

    I agree that as long as information meets the needs of the end user, then it is a ‘success’, but most technical writers will be asked for a way to show they are improving the quality of their work. Asking the user will reveal a level of that but it’s hard to quantify.

    So, for me, this type of scoring mechanism (and this is something Alice Jane covered in her workshop) is most useful on two fronts:
    1. as a reporting mechanism to senior management
    2. as a training tool for writers (these are the areas we care about).

    Running this scoring mechanism internally is definitely a good thing. Where it will really start shine is when you also run an external questionnaire (covering the very questions you proffer Tom), that may then highlight the fact that a well received document scores highest in a particular area of the rubric and allow you to pick up the bright spots in what you produce and give you focus areas to concentrate on in the future.

    1. Tom Johnson

      I like your point here: “Where it will really start shine is when you also run an external questionnaire (covering the very questions you proffer Tom), that may then highlight the fact that a well received document scores highest in a particular area of the rubric and allow you to pick up the bright spots in what you produce and give you focus areas to concentrate on in the future.”

      You’re right that as we test our rubric against what really matters to the user, that is, in actual user testing and analysis to gather their feedback, we can begin to place confidence in our rubric and not feel that the categories are writer inventions based on writer preferences and writer priorities.

      For example, based on user testing I’ve done, I would say that that videos are something that users find helpful. So in my rubric, I could make a “includes videos” check box. I wouldn’t need to do user testing with all help content for every project now. I would just generalize that if videos were helpful for X app, then videos will probably be helpful for Y app. The only fault in the logic is if X is vastly different from Y. Does this describe your point well?

      1. Noz Urbina

        Gordon, I should say as long as I’m clarifying and apologising that, to be clear, it’s not a criticism against your comment. I think your suggestion is very good. I wanted only to point out that I find it a measure that’s taken when we can’t do the ideal, which is really integrate metrics into our process directly and continuously. You can only survey people so much before they really don’t like. Twice a year is probably lots, especially in Techcomm.

        Really we want to be getting drip-fed metrics on an ongoing basis, like a website.

  2. Rengaraman

    //Did the documentation reduce the number of support calls coming in?//

    Valid Question, I agree. But how many users search in the help file/manual also matters. Somehow users prefer support calls rather than going first into documentation. It may be my personal experience need not to be true in general.

  3. Noz Urbina

    I’m vastly with you, Tom. I’m completely get where Gordon’s coming from and agree that is a valuable and valid application of the concept. However, I see it like the argument “We use copy and paste to manage reuse because structured content is hard”. Ok, maybe not that bad, but the point is that it is possible in much more effective ways than customer surveys to achieve both goals.

    However, like structured content, it’s something that requires a level of company buy-in and mind-share that is harder to wrangle. You have to really overhaul how you deliver and your infrastructure for managing delivered (not creation) of content. Once you’ve got the right forward-facing tools in place, you, like marketing, can start to measure more abstract things like success rates and satisfaction.

    For me this is another great example of how the marketing communications and technical communications world can learn from each other. You talked about this a bit in your last post, Tom, which I unfortunately didn’t have time to respond to. Marketing do measurement really really well. Modern techcomms need to too.

    If your content is accessible online, or your application has one of those ‘Help us make our app better by allowing us to report metrics’ features, then you theoretically have access to massive amounts of data. To leverage it, you need to usually make pretty big changes to your content strategy.

    Think like a marketer. Imagine it’s no longer optional and you MUST know: who is accessing our content? When? why? What mood are they in when they do so? What happens after they access it? What mood were they in after? And so on. If you think of all the ways that content is or could be accessed on a computer, with the right thinking cap on, there’s things like:
    - tracking interactions with help online
    - adding simple ranking feedback and commenting
    - allowing users to annotate content for themselves or the community
    - integrating knowledge-bases (and WIKIs) with help with a lifecycle into content and tracking movements between them
    - integrating support DBs with documentation and tracking movements between them

    It can be done and should focus completely on Tom’s “Did the user get the job done?” type goals, but like Gordon says, these days, with the tools in most tech comms departments, it’s hard and the Rubric is therefore a good fallback position.

    Noz – http://lessworkmoreflow.blogspot.com // @nozurbina

    PS – Doing a webinar kind of on this if you’re interested: http://bit.ly/q4qZqI

    1. Tom Johnson

      Noz, thanks for your comment.

      “this is another great example of how the marketing communications and technical communications world can learn from each other. … Marketing do measurement really really well. Modern techcomms need to too.”

      I agree that marketers are much better at measuring results. I think they have to look more closely at metrics. It would be good to learn from them, and I’m not sure why technical writers don’t have this same push towards metrics. The only reason I can think of is that not enough people read the help. Thanks for the insight.

      1. Noz Urbina

        Pleasure to be participating again after so long!

        The reason that Tech Comms folk don’t do measurement is two-fold, or one-fold, depending on how you think about it. It’s culture, which is a function history and technology. We don’t have metrics and analytics bred into us from early on because of the cultural roots of help content: they’re offline. Help has its ancestry in manuals, which were not so long ago, purely paper affairs. Technical Communications has a ‘default’ setting that is far too connected to the world of old.

        Although individuals and teams are changing behaviours fast, culture changes slow no matter what. Even if electronic help could be measured, it isn’t out-of-box. You and your company both have to really invest in building both the tools and the models to measure. Analytics on the web has been there since day one.

        At first we just poured over server logs and tracked stupid things like ‘hits’, and moved on to mildly less stupid things like ‘time on page’. Measurement has been there since the beginning as part of the tools and part of the process. This embedded itself into culture. Marketing has never had the ‘No one reads our stuff’ attitude. Their reason for being was that people were going to read their stuff whether they liked it or not. The question was then how much they read it and what effect that had.

        Because the TC culture didn’t demand analytics, the tool vendors have been slow (slooooooow!) to supply. Also, developers and engineers making product similarly downplayed help as something that was developed when they were done, and read only when they as designers had not time or ability to make something usable enough. They never saw help as something that should be tracked and measured as part of the success/failure evaluation of the product.

        Attitudes are starting to change, but they are far from changed in TC, so naturally, they’re far from changed outside it.

        We’ve talked before about how content usage metrics are rarely-tapped goldmine of product improvement feedback. When organisations realise this across the various stakeholding departments, we’ll see the tools put in place to really see what users are doing and thereby learn how we can better help them.

        PS – @Mark Baker – loved your insights here!

        1. Mark Baker

          This is a great analysis, Noz. Another cultural factor is that when tech writers do measurement, they generally don’t like what they find. Measurements often show that readers don’t value the things that many writers value.

          Far from this disconnect leading writers to adjust their own values, many see it as a reason to set out to change the reader’s values. Give the reader the decorous and slow documentation that the writer feels they reader should value, and, they hope, the reader will get the taste for it. Never happens, of course.

          A related cultural factor is that most writers are graduates of a university culture that teaches them to despise commercial values. That their writing should be judged by the crass commercial criteria of reducing costs, increasing sales, and improving margins is anathema to them.

          As long as you despise the yardstick, you are not going to measure anything.

          1. Noz Urbina

            Oh man, Mark, where have you been all my life? TOTALLY true! Thanks for putting it into words!

            I shy often away from the controversy waiting to explode in my face when I address the fact: Technical Communicators need to let go of being technical writers. The craft of writing is not dying, but it’s being compartmentalised into being just an aspect of technical communications.

            As Tom’s said, video, and other formats of communicating, are meshing with technical information because that’s how users want to injest information. Words and sentences are often the interlinking bits that guide users between other media which is how they will actually learn.

            It’s ironic, you called reading slow, but in fact, 5 good sentences is far faster to consume than 60 or even 30 seconds of video. BUT, most readers will still choose to risk their 1 minute rather than risk the 5 sentences. My theory is that in the back of our heads users are concerned they won’t be 5 good sentences, and they’ll have wasted their time.

            The biggest impedance to readers reading manuals is manuals (or help). No matter how good writer X makes *their* manuals, they can’t change the User’s holistic experience of “manuals” throughout their lifetime. Globalisation brings in more manuals either poorly translated into the User’s native language, or written on anorexic budgets directly by SMEs instead of professional authors, or written in what is a foreign language for the writers.

            It all means the “They don’t read the manual” thing isn’t going to change, and as you’ve refreshingly brazenly pointed out, it’s not in the writer’s nature to accept that they’re being recast and outcast, with the future actually looking more bleak for the craft of writing.

            I actually ducked under my desk a few times while writing this…

            Remember writers that it’s from a place of love and support I’m writing this! I’m trying to help writers help the users, and we have to give them content in the format they want it if they’re going to use it.

            Noz – http://lessworkmoreflow.blogspot.com // @nozurbina

            PS – just signed up to your blog feed.

          2. Noz Urbina

            PPS – to all: Before anyone mails me a bomb. I’m not suggesting that writers should *stop* writing, but that the focus needs to be more of a matrix approach that combines various approaches to conveying knowledge. That every help project should be approached with ‘how can I best communicate knowledge’ as opposed to ‘what will I need to write to support this user’.

            I’d be pissed off if someone told me I needed to become a video director or a multimedia expert. I think the solace is to be found in the fact that the core, undeniable skill of the technical communicator is understanding, anticipating and addressing the needs of the user. Videos have scripts, which need writing. Information requirements need to be documented, spec’d out, planned for, and addressed. I’m not saying toss your writing degree aside it’s obsolete, I’m saying re-assess the deliverable without confirmation bias.

            It’s “How does the user want to learn?” vs “What does the help/manual need to contain?”

  4. Scott

    IBM has a great rubric, very similar to Alice’s, published in Developing Quality Technical Information. Their editors evaluate the quality of their documents based on nine measures within three categories: easy of use—task orientation, accuracy, completeness; easy to understand—clarity, concreteness, style; easy to find—organization, retrievability, and visual effectiveness. They weigh these evaluations and determine a final score (out of 100) based on the importance of the categories to their readers/users.

    1. Tom Johnson

      Thanks Scott. I’ll keep that in mind. It would be great if you had a link. I like the categories you mention. Still, I would like to see more connection to measuring the actual user experience.

  5. Mark Baker

    Hi Tom.

    Great post. You are right, of course, that measuring outcomes is more meaningful than measuring the product against a set of mechanical rubrics. The problem, of course, is that measuring outcomes is hard.

    In any field where measuring real quality is hard, we tend to resort to measuring proxy variable that are easier to quantify. The problem is, this is only meaningful if you can demonstrate that the proxy variables correlate strongly to the variable you are really attempting to measure.

    I don’t doubt that all of Alice Jane Emanuel’s rubrics have some degree of correlation to real customer value, but like you I doubt if the correlation is as close or as comprehensive as it would need to be to become a useful measurement.

    My own personal rubric for communications is this: the purpose of all communications is to change behavior. The measure of its success is whether or not the reader’s behavior changed in the way you wanted it to.

    That may sound less altruistic than meeting the user’s needs, but it is a heck of a lot easier to measure. If I create a document with the purpose of stopping customers from calling tech support by 50%, that is a measurable behavior change.

    It also fits quite well with the hottest trend in business metrics right now, the net promoter score. The net promoter score is a measure of how likely your present customers are to recommend your product to others, or, alternatively, to recommend others not to buy it. Changing reader’s behavior from not recommending your product to recommending it is a measurable change in behavior.

    I also find this rubric to be an excellent aid to composition. Whatever I am thinking of writing, I can ask myself, how am I trying to change my reader’s behavior, and what can I say here to produce that change? It’s a question that really focuses the mind on the task at hand.

    1. Tom Johnson

      Mark, thanks for your always insightful comments. Your explanation of rubrics makes sense — that it’s a “proxy variable” that attempts to correlate to the real variable that we’re trying to measure. I definitely lean towards certain characteristics in my writing that I think will result in a positive user experience — providing a quick reference guide, including context-sensitive help, creating video tutorials, and so on. But actually connecting those characteristics to measurable end-user impact is a bit more difficult. However, many times we mislead ourselves and don’t do enough user testing of our content. When I watched a group of users use my help material, their preference for video blew away some of my assumptions. Without this connection to our users, we might place too much confidence in a rubric category that we think is importance but has little effect on our end goal. I’m thinking of all those lengthy discussions about style that technical writers are always having.

      You wrote, “the purpose of all communications is to change behavior.” A bold assertion. Can you expand on what you mean by “behavior”? For example, did reading my post change your behavior in any way? Are there any situations where writing is not intended to change behavior? For example, does reading fiction change behavior? If someone is constantly reading, is that person’s behavior constantly changing? Is this too reductive? It’s a good way to look at writing nonetheles.

      Also, if the goal is to change behavior, then I think we don’t draw upon enough rhetorical tools to persuade people to change. Story changes behavior more than anything else, yet we usually write in a dry, mechanical, instructional style.

  6. David Albrecht

    In my opinion, a rubric is too confining or restrictive.

    Also, I reject the idea that subjectivity should be minimized. As long as good writing remains both skill and art, the quality of writing is a function of the eye of the beholder. As more expert than the students, it is the instructor’s eye that matters most.

    I believe that what is said (ideas) matters as much as how it is said. Therefore, a rubric too often fails to capture the overall quality of a paper.

  7. flash template

    Nice post.Thank you for taking the time to publish this information very useful! waiting for some interesting thoughts from your side in your next post ,thanks.

  8. Pingback: The Purpose of All Communication is to Change Behavior

  9. Pingback: Update: Diigo in Education group (weekly) | ChalkTech

Comments are closed.