Every Page Is Page One

Every Page Is Page One, Guest Post by Mark Baker

Mark Baker

The following is a guest post by Mark Baker.

The a-ha moment came for me reading David Weinberger’s Everything is Miscellaneous, a book Tom and I both admire. Weinberger’s central thesis is that miscellany has become more powerful than order. No one ordering of information is ideal for every reader. The web allows readers to find information for themselves, and to organize it for themselves and for others. The power to organize content, Weinberger argues, has passed from the creators of content to the consumers.

What have we always told people our job is? Organizing content so people can find it. That’s how we have justified our existence to sometimes skeptical development and product managers. You need us because we know how to organize information. If you leave it to the developers, it will be a mess and no one will be able to find anything. Miscellany bad. Hierarchy good.

Not so much, it turns out. By digging and liking, and tweeting, and tagging, readers lay down their own paths through the content, paths that are non-hierarchical, miscellaneous, and increasingly well travelled. This is a battle that has been fought out on a global scale. On the side of hierarchy and order was Yahoo, which set out to catalog the entire web with a legion of human editors. On the side of miscellany was Google, with a whacking big search engine. Remember who won? It may put you in mind of an earlier battle between man and machine: John Henry hammering in the mountain, trying to beat the steam drill, until the handle of his hammer caught on fire.

What then must we do? Die with our hammers in our hands? Or find a new way to do this job? Because it is no longer in our power to control the order in which readers read our content. They will read it any way that suits them.

In the age when the search engine is king, writers can no longer direct where the reader will land in the documentation, what they will read first, or what they will read next. The reader may begin anywhere, and that means that any page may be page one for that reader. And if any page may be page one for someone, what that really means is that every page is page one.

That was the a-ha moment: Every page is page one. Any page is as likely as another to be the first page the reader sees. There is no first, last, previous, next, up, or back. There is only page one. This page, that page, every page: all page one.

What if the page that the reader lands on does not work as page one for them? Where do they go? Backward and forward through the document hoping to find page one? No. They go to the next page listed in the search results, click on that, and that becomes their next page one.

Now, you may argue, this may be true on the web, but our docs are not on the web. We have a self contained help system, and we have the power to designate page one for our readers, to structure the reading experience for them. Alas, it is not so. Twenty years ago, readers came to help systems from the world of books, and they expected them to work like books. Some of the more interesting experiments in help system organization actually got smothered out of existence early on by the reader’s demand for a familiar book-like organization of the help system. Help systems vendors and authors complied, and help systems have pretty much been organized that way ever since.

But readers no longer come to a help system expecting it to work like a book. We are all children of the web now, and we come to any information system looking for the search box, and expecting the search to work like Google. Recent user feedback that I have been reviewing made this very clear to me. Our users see the documentation as one thing, not a collection of books, but one seamless whole, and they expect to find things by searching the entire doc set from one place.

If anyone in the tech pubs business thinks they are being innovative by moving from books to topics, I’m afraid that innovative is not the word for it. A little less behind the times that the rest would be a more honest assessment. Our readers are way ahead of us. They are already treating our documentation as if were a collection of topics, and being disappointed when the first topic they land on does not meet their needs, does not work as their page one.

Every page is page one — this is a fact, not a choice. Making every page we write a good page one — that is a choice. And this, I believe, is the single most important thing we must learn to do as technical writers. To be sure, many trends are claiming our attention, and there are a dozen new things every year that we are told we absolutely must master if we hope to stay relevant. But in the end, we exist to give people the information they need to do their jobs, and the way to do that is to create content that works for them no matter how or where they find it. If we do everything else and forget this, we will fail. If we do this and forget everything else, I think we will still succeed.

We will still succeed, even if we do get everything else wrong, because the Iron law of the Internet is that good content gets found. Thousands of recordings, mostly abysmal, are uploaded to YouTube every day. There is no sorting, no editing, no promotion. Yet the great ones go viral, usually within hours. Content that deserves to be found, gets found.

What then must we do? The answer, I am convinced, is that we must write every page as if it were page one. What does that mean? How do we write every page to be page one? That is apt to be a large subject. I don’t have all the answers, but I am exploring the subject on my blog, Every Page is Page One. I hope you will join in the discussion and the exploration, both here, and there.

Mark Baker has been a technical writer for over 20 years and has worked on topic based authoring, structured writing, single sourcing, and SGML/XML for almost as long, including a stint as Director of Communications for SGML pioneer OmniMark Technologies. He has designed and built topic-based authoring systems and has spoken and written frequently on topic based writing and the applications of markup technology. He is currently Senior Staff Technical Writer at Wind River System. He blogs at everypageispageone.com and tweets as @mbakeranalecta.

Madcap FlareAdobe Robohelp

This entry was posted in general on by .

By Tom Johnson

I'm a technical writer working for the 41st Parameter in San Jose, California. I'm primarily interested in topics related to technical writing, such as visual communication (video tutorials, illustrations), findability (organization, information architecture), API documentation (code examples, programming), and web publishing (web platforms, interactivity) -- pretty much everything related to technical writing. If you're trying to keep up to date about the field of technical communication, subscribe to my blog either by RSS, email, or another method. To learn more about me, see my About page. You can also contact me if you have questions.

17 thoughts on “Every Page Is Page One

  1. Jonatan Lundin

    The user behavior outlined above goes well in hand with the minimalist view on the active user. A user is not interested in reading about, but using the technical product. When the user is stuck with the product, information is needed and the user starts searching.

    But to me it is important to not only focus on the single topic (making it page one). When the user is trying to find a topic, s/he is using a search system; search engine, faceted search environment, even a table of contents etc. The search system is a layer on top of the topic content (as building a topic map).

    The knowledge about how users are searching is vital when building search systems. In my research I have proposed a search process model for users of technical communication based on a model for school books presented by John Guthrie some 20 years ago (see figure 1 in http://www.sesam-info.net/ISTC%20Communicator%202010%20Winter%20JL.pdf).

    A user must first find an information artifact (web help system, community, manual, help system, web forum etc), then find a topic within the artifact and then read to understand the topic and implement the new knowledge in the user situation. A well built search system can help the user understand what the topic is all about.

    Google is a search system as well as the search engine within a web forum or a help system. So as previous discussed on this blog, metadata is really important to classify and organize topics to be able to build robust search systems.

    Many users have problems in finding a topic in technical communication. One reason I have found is that the “design vocabulary” used by the technical communicator to build the search system doesn’t match the search vocabulary the user is possessing. Creating topics and not organizing them or adding metadata, assuming that users anyhow will find a topic, is to me not aligned with how it works out there. A writer may use a key word for something that the user never would express, thus the user will never find the topic by free text search. For people searching YouTube the design and search vocabulary are, in many cases, almost 1:1, thus it is easy to find. But for an advanced industrial product, the design vocabulary can be very different from the search vocabulary.

    A previous post discussed multiple entry points. This is something a favor and a faceted search environment is a good example of a multiple entry point search system. My favorite example of a multiple entry point system is the phone book. Let’s say that a company in the phone book corresponds to a topic. In a Swedish phone book companies are sorted in alphabetical order, but there are also alternative search systems; sorting companies in “type of services (auto repair, hair dresser etc)”, “companies on a specific street” etc.

    Back in 2008 I made a proposal for DITA called “ditatoctemp” which is a mechanism to allow writers to create a paper manual with multiple tables of contents (like in the phone book). I used a deck of cards to show how this would work. Each card corresponds to a topic and a card has many metadata; color, figure/number, ace/heart etc. Multiple tables of contents in a paper manual are like pre-selections within a faceted search environment.

    1. Marcia Johnston

      Jonatan,

      Your mention of multiple tables of contents for paper manuals reminds me of the seven classification tables from the hymnal in Tom’s March 24, 2011, blog entry. In a hymnal, I can imagine classification tables serving organists and choir directors well (if the facets match their needs).

      For many books, though, an old-fashioned back-of-the-book (BOTB) index would be even more useful than a set of classification tables. Classification tables are flat. Each one works on only one level of granularity: the level of its facets. A good BOTB index includes multiple classifications (entries with subentries) and other entries at all levels of granularity.

      Mark’s point about every page being page one seems more relevant to the Web than to books, but even in a book — at least the kind that technical communicators write — it’s useful to think of every page (or topic) as page one. All the more reason to create a good index.

      If you happen to be writing a B, a BOTB index is your ultimate multiple-entry-point search system.

    2. Mark Baker

      Hi Jonatan,

      I’m with you on multiple tables of contents. While I do think search is king, all navigation methods play a part, something I discuss here: http://bit.ly/j6pTdr.

      But I would make one important caveat. Those multiple TOCs should be generated from metadata, not built by hand. This is important for a couple of reasons:

      1. Building navigation sequences by hand is time consuming, and we are all short staffed.

      2. It helps force you to create good metadata and attach it to objects that really merit it.

      I would caution you about a couple of things, however. The first is that any study of user navigation behavior over the last twenty years (or the next twenty) should only be taken as a point on a curve. User’s navigation strategies and expectations are developing rapidly, and if you base your design today on a study done five years ago, you will likely be completely out of date by the time you are finished.

      The second is this: your distinction of “design vocabulary” and “search vocabulary” is a useful one, but distinctions based on vocabulary don’t go far enough. Search engines are increasingly good at handling differences in vocabulary. Google automatically applies synonyms to search terms. But the difference between the reader and the writer can go much deeper than vocabulary. The difference that causes real problems is likely to be one of expectations or “mental model”. If the reader’s mental model is different from that of the writer or the system designer, bridging the vocabulary gap is not going to fix the problem.

      1. Jonatan Lundin

        Hi,
        I disagree with Marcia in “Classification tables are flat”. I believe that is perfectly fine to build a hierarchy of metadata to use for classification (this is your taxonomy) of topics. Check out for example topicmaps (ISO/IEC 13250:2003) or the subject scheme in DITA 1.2.

        Establishing a taxonomy, which can be a separate XML file from the actual topics, and then use a “cross reference” matrix to relate a topic to a node in the taxonomy is a powerful way of classifying.

        The BOTB index as I have understood it means that you must insert key words into the topics, which means that it is time consuming to maintain such a system since, if you need to alter a classification, you sometimes need to go into several topics. Also, a taxonomy classification system, like topicmaps or subject scheme, can express relations between key words in a much more powerful way than the index “see also”. Furthermore, you cannot build faceted search environments from index keywords (has anyone tried that out?). A faceted search environment gives more possibilities than the traditional index; think of the user who wants to see all topics that meets metadata X AND Y (Boolean operator). This is not possible to achieve with an index of keywords.

        So I totally agree with Mark that a table of content shall of course not be built by hand, but generated from the taxonomy classification. But I do not agree with Mark in “It helps force you to create good metadata and attach it to objects that really merit it” if attaching metadata means to do something with the actual topic. Hard coding metadata into topics (either as system attributes or putting them in the actual XML file) and then build navigation systems based on the metadata is, to me, not as powerful as having a separate taxonomy and cross reference system. A separate taxonomy and cross reference system allows you to alter the topic classification (adding new or changing) without actually touching the topics.

        As said before, building a (most often you need several) taxonomy that reflect what type of information end users are searching for, before you start creating topics, helps you to determine which topics you need and also to control the topic granularity. This is one of the core ideas behind the SeSAM concept.

        The search and design vocabulary is a product of the mental model, so you could reverse the process and get (parts of) the mental model by studying the vocabulary. I agree that the fundamental task we as technical communicators are facing is to understand our audience, which incorporates their mental models. A search system that meets the users search vocabulary means that we have understood our users.

        1. Mark Baker

          Jonatan,

          You say:

          “Furthermore, you cannot build faceted search environments from index keywords (has anyone tried that out?). A faceted search environment gives more possibilities than the traditional index; think of the user who wants to see all topics that meets metadata X AND Y (Boolean operator). This is not possible to achieve with an index of keywords.”

          Certainly it is possible. You will find countless examples on the web. Suppose you are building a used car site. You can create a specific metadata schema, with fields for model, make, year, transmission type, etc, and then allow readers to select year=2004 AND style=convertible AND make=Miata. But alternately you can just tag each ad with a set of keywords such as 2004, convertible, and Miata. The reader can then do a keyword search for convertible, for instance, and be given a list of all the tags that occur with “convertible”. They can then select “Miata”, and be shown a list of ads that contain the tags “convertible” and “Miata”, which might include things like “2004”, “1992”, “red”, “green”, “five-speed” and “automatic”.

          There a pluses and minuses to both approaches, but the big plus of uncatagorized keywords is that they can be opened up so that both readers and writers can tag content. This does a huge amount to overcome the terminology gap that can exist with a structured, author-defined taxonomy, as well as allowing readers to tag facets that are important to them, but which do not occur in the author’s metadata schema.

          Grand taxonomy schema have been with us for a very long time, and they have never lived up to the hype. The problem is, as it has always been, that all language is local, and people classify things in ways that make sense in their own lives, based on their own histories and experiences. You can no more create one taxonomy for the world than you can create on novel or one song.

          On the subject of where metadata should be attached, the important thing is the the topic should merit its metadata, a subject I discuss here: http://everypageispageone.com/2011/04/17/topics-should-merit-their-metadata/. Merited metadata is stable metadata, so you should not worry too much about it changing.

          Externalizing metadata tends to occur when the metadata you are assigning to objects is not intrinsic to the object but is being used to manipulate it based on external criteria. The fundamental problem with that approach is its overhead, and overhead that grows disproportionately as the size of the collection grows.

          You certainly can organize content based on an external taxonomy, and that doubtless has great appeal to writers who are not yet ready to concede Weinberger’s point that the power to organize content has passed to the reader. But they are expensive, problematic, and have a track record that does not inspire confidence.

          1. Jonatan Lundin

            Hi Mark,
            I still argue that building, for example faceted search environments based on keywords in topic, is something you probably should avoid. Let me clarify. What I mean with keywords in my discussion above is the approach where you mark up parts of your topic content to allow building an index. The marked up content is captured in an alphabetical lists upon processing of the topics.

            Using DITA the approach means to use the or elements inside a topic element, for example The product X can communicate data of type Y. Here we are actually not talking about metadata, only marking up “key content” that the writer assumes is included in user search vocabulary.

            Using this approach to manage metadata to build faceted search environment is to me not doable. Have you any examples where this approach has been used to build faceted search interfaces? Your used car site is probably not built using this technique.

            This is not doable because there are two types of metadata perspectives. One perspective is metadata that the content is treating and the other is metadata that content can be classified into but is not treating. Content treating metadata is for example Product X is used to…. Content can be classified into categories where the content is not treating the classification, for example Product X is used to…. Of course the last topic is not talking about the operator, only product X (DITA gives additional possibilities to insert keyword/indexterm elements in the topic prolog to handle “non content treating metadata”). For building of search systems you often also want to include “non content treating metadata”.

            I don’t see a big difference from the approach where writers or reader can freely attach metadata to a topic as uncategorized keywords from the approach where the writer creates a separate taxonomy file to classify topics. The writer created taxonomy can also be opened up for readers to allow them add their own metadata, or the reader can even create their own taxonomy file and classify. I see these two approaches as the same thing, where the difference is the technique of classifying.

            How do you distinguish the two approaches for a reader who is re-organizing information in an information viewer? “Do you want to add your own tags by inserting them into the topic or by creating a separate taxonomy?”. The reader probably doesn’t care how the classification is done behind the scene.

            From a modularized topic oriented perspective it should be avoided to hard code stuff into topics; Topics shall contain meaningful information. Relations between topics, metadata classifications etc shall occur elsewhere outside the topic content. That’s why DITA has maps, relationship tables, subject schemas etc. External taxonomy classification is to me very powerful to allow efficient release management etc. Hard coding things sometimes means a heavy maintenance burden (but this is another discussion).

            The real challenge is to identify what type of metadata is needed, not the classification technique. For technical communicators managing content for a technical artifact, metadata must be derived from the user perspective and what type of information end users really need, as discussed in http://dita.xml.org/blog/how-do-you-determine-the-size-of-a-topic.

          2. Mark Baker

            Hi Jonatan,

            For faceted navigation based on uncatagorized keywords, see: http://taggalaxy.de/. Start with an initial tag, it then shows you common tags associated with that tag. Choose one of those, and the search is narrowed and a new set of associated tags are shown. Keep narrowing your search until you have a manageable number of hits. I tried “Miata”, “Red”, and “Sunset”.

            The used car site is certainly using fielded metadata, which is to say, a conventional database table with one column for each metadata facet. Every time the user makes a choice, a new query is generated, specifying new values for certain fields, and a new set of choices are displayed.

            In either case, all that is going on is a parametric search of a database. All faceted search does, whether using fielded or unfielded metadata, is help the user build their search bit by bit by suggesting terms to them. (This is a big deal since it takes much of the terminology problem off the table by telling the user what terms are available to be searched on).

            Point is, though, that whether you use fielded or unfielded metadata does not make a lot of difference to your ability to build faceted search. Whether you chooses fielded or unfielded metadata is going to be based largely on whether you can reasonably anticipate and constrain the range of facets that will be useful to your reader.

            If I am reading you correctly, it sounds like you are assuming that the keywords that writers mark up in a topic must necessarily be unfielded. This is not the case. All of the referencing we do in our topics is fielded, and all of the index entries we attach the the topic are fielded. This has a utility far beyond enabling search. It also provides us with rich linking and auditing capability.

            Because early content management system largely stored binary objects, such as word files, and thus any metadata had to be stored externally, it seems to be assumed by many that a metadata record must inherently be something separate from the content it applies to. But this is not the case. An XML document is a database record, not a blob in a table, but a fully queryable record in its own right. It can, and should, carry its own metadata.

            If you have non-content treating metadata that is intrinsic to the topic, therefore, that is an indication that your XML schema is inadequate.

            You are correct, of course, that the user does not care how the metadata is stored (though they would certainly care how they are asked to contribute to it, so getting them to build their own taxonomy is a non-starter).

            But from a content management perspective, you should care, because the approach you advocate suffers from one very big problem: overhead. Even the most die hard DITA advocate will usually admit that DITA had a high content management overhead.

            There is no maintenance overhead in maintaining in the content that metadata which is intrinsic to it, because metadata that is intrinsic to it expresses its real properties, which do not change unless its subject matter changes. Getting the distinction between intrinsic and extrinsic metadata right is absolutely essential. Something I discuss here: http://everypageispageone.com/2011/04/17/topics-should-merit-their-metadata/

          3. Jonatan Lundin

            Hi Mark,
            Thanks for the link to the tag galaxy. I love a search system visualized as an image. In my PhD research, one hypothesis is that an image based search interface will boost the search performance over a textual based interface. But the image based systems are fun in the beginning but tend to be boring in the long run and they also sometimes the performance is bad.

            I wonder what architecture the tag galaxy is built on? Is it topicmaps? It is very similar to, for example, the Vizigator (http://www.ontopia.net/page.jsp?id=vizigator). The topicmap standard allows you to state relations between subjects (topics, entities or whatever you call it – you can also type the relation), which is what you get in the tag galaxy.

            I agree that the technique on how to manage the classification (internal in the topic or external to the topic) is of academic interest compared to the challenge of finding the appropriate metadata. But when I started to modularize documentation some 15 years ago using SGML I began to put metadata as SGML elements/attributes inside the topics.

            But now I have concluded that an external taxonomy schema is better from a maintenance point of view. The classical example is when you have a taxonomy of product components (compontent AB, component AC, component AD etc) and then classify 1000 topics to component AB by using the topic prolog (maintaining the classification within the topic XML file).

            Then later the component name “component AB” is changed to “component BA”. I have only to change the name once (1) in my external classification but I have to alter 1000 topics when the data is internal. If it takes 1 minute to change one topic I have wasted another 1000 minutes. What happened to the single sourcing of metadata?

            But of course modern CMS must allow you to search-and-replace or use a key construct to solve this. Or keep the classification as system attributes where the prolog is updated upon export from the system.

    1. Mark Baker

      Thanks Marcia,

      The problem with the concept of modular writing has always been that people have struggled with the appropriate granularity. Often they have broken things down so finely that they only way to make them comprehensible has been to stitch them back together into a book — effectively leaving both the reader and the writer back were they started from.

      So the challenge is to figure out a way to express what a module of modular writing should really look like, and it does seem to me that the best answer is that it should look like page one.

  2. Larry Kunz

    I agree with Marcia. “Every page is page one” is a brilliant way of stating the truth about the way in which modern readers access content. Your point about Google winning over Yahoo is spot-on too.

    I disagree, however, with your assertion that “content that deserves to be found, gets found.” Would that it were so. While it’s nice to think that content always rises or sinks based on its merit, I don’t think that’s always true — any more than it’s true that the best politicians are always elected or the best products always gain the biggest market share. We probably have to settle for “content that deserves to be found, gets found most of the time.”

    1. Mark Baker

      Hi Larry,

      Fair enough. We can certainly do a lot to increase the findability of content besides simply making it worthy to be found. My point was simply that we should focus first on making it worthy to be found. If we don’t have time or energy for anything else, we still have a chance of succeeding if our content deserves to be found. But we can improve our chance of success greatly by taking additional measures to improve findability.

      On the other hand, if our content is not worth finding, increasing its findability is just a way of disappointing our customers faster.

      1. Larry Kunz

        Thanks, Mark. I completely agree that we should focus first on making our content worthy to be found, and that to do otherwise is to betray our customers. Well said!

  3. Erin

    Great article! I have stressed to our user guide creators (the trainers, strangely enough) that every section of the guide needs an intro (how did I get here and what am I supposed to accomplish?), a screenshot, and a list that outlines how to use it. It makes sense to think of every page as page one. Excellent way to put it. Maybe this will sink in with them.

  4. Jon

    I find this also very interesting in connection with the time I spent studying postmodernism and hypertext poetry. The concept being that exploring the poem through hyper-linked words allows for unique experiences for all readers. I am very interested in exploring such a pathway of exploration that allows for both chaos and organization equally.

    1. Tom Johnson

      Jon, hyperlinking words in a poem sounds interesting. I don’t read as much poetry as I should, but when I do, I think of reading a paper book in a summer cottage somewhere, away from technology.

  5. Pingback: Where are the new ideas? | one man writes

Comments are closed.