Search results

Figuring Out Search Algorithms [Organizing Content 10]

Series: Findability / organizing content

by Tom Johnson on May 27, 2010
categories: findability technical-writing

In my last post, I argued that navigation systems can't be entirely discarded in favor of search, because navigation helps users discover the unknown unknown. But now that we've covered navigation systems a bit, it's time to move on to search, because search is undoubtedly a major way that users navigate help content. How can you organize your content so that the topics are findable in search?

Notice that I said "organize." You're still organizing your content, but on a smaller level. Rather than organizing topics within groupings or folders, you're organizing the words within the topics. To make the topic visible, you have to organize the words in a way that maximizes visibility in the search.

But here is where things get confusing. Almost no one understands how search works. Google's search works differently from Flare's search. Flare's search works differently from WordPress's search. I'm guessing that the search in RoboHelp differs from the search in Author-it, which differs from the search in DITA-produced Eclipse help, and so on.

Each search engine has a unique algorithm that sorts and ranks information based on a set of variables and other factors, much more than simple keyword frequency. Here's the key point: you can't optimize search until you understand the search algorithm you're optimizing for.

Google's search algorithm

Although Google's search algorithm is complex, it's unquestionable that Google's search results are based on some of the following factors:

  • The number of links pointing back to your site
  • The authority of the sites pointing back to your site
  • The text used in the links pointing back to your site
  • The location of the keywords the user is searching for, especially in the title and h1, h2, h3 tags
  • The frequency of the right keywords that the user is searching for
  • Your own site's page rank

What about meta keywords and title tags? Not that important. You could stuff a ton of keywords into the header of your topic, but Google has been gamed with that trick long ago, so it ranks these keywords low. The genius of Google's search stems from the collective wisdom of hyperlinks.

WordPress' search algorithm

According to Lorelle Van Fossen, WordPress orders search results with the date as a major factor:

When you search in your WordPress blog, your search results are listed chronologically. Not by “most likely”, “most popular”, “most frequent use of the phrase”, or even alphabetically, just by date. And the chronological order runs from most recent to oldest. If the most likely post to provide the information the user is searching for is older, they will have to scroll towards the end of the list to find the most likely candidate for information. What are the odds they will, huh? This frustrates me no end.

Another frustration with WordPress searching is that it only searches posts. It does not search comments nor Pages. Only post content.

Notice that while links pointing to your content factor high in Google, these same backlinks are non-factors in the WordPress search algorithm. In WordPress, the date of the post seems to matter most.

Madcap Flare's search algorithm

Since Flare is the help authoring tool I'm currently using, I'll dive into its search algorithm with more depth. According to Rob Houser in Flare Tip: How the full-text search works, Flare's search weighs the following three factors with the most significance:

  • Exact matches
  • Frequency of the words
  • Location of the words

Rob explains,

Flare ranks exact matches of the term the user enters higher than partial matches. An exact match uses the same form of the term (for instance, an exact match for “deleting files” would be “deleting” or “files”). A partial match may share the same root word as the original search term, but the terms aren't exactly the same (for instance, the user might search for “deleting files” and the search would also find “delete”, “deletion”, “file”, and “filed”).

I'm not sure where Rob is getting his information, whether through experimentation with Flare or conversations with Flare developers. But let's assume Rob is right. There are still many questions unanswered, namely how to create an exact match. If the topic title doesn't provide the exact match, how do you create the match?

The Problem with Using Index Keywords

Flare doesn't have a section where you can stuff a file with meta keywords. But you can stuff a topic with index keywords. Index keywords are included in the search algorithm. (Concept keywords are excluded, by the way.)

However, using index keywords as meta keywords presents a dilemma. If exact matches factor highest in the search, you should add index keywords according to the phrases you think users will search for in the help, right? But the way users enter keywords in search boxes differs from the way users browse an index.

With Flare you're forced to make a decision about how you want to use the index keywords. If you use index keywords to beef up your search, your index will look random. If you choose index keywords for readability in a traditional index, you cripple the search.

Let me give an example. Suppose my topic is "Properly Delivering a Burn Notice." Users searching for this topic might enter the following phrases:

  • burn notice
  • deliver burn notice
  • drop off burn notices
  • burn notices sending
  • how to present a burn notice
  • cutting off undercover agents
  • severing ties with field agents
  • burn notice protocol
  • best way to handle burn notice
  • give burn notice to operatives

Because exact matches factor the highest in Flare, it's important to include these phrases as index keywords, and to insert those index keywords into the highest ranking location in the topic -- that is, inside the h1 tags of the topic title.

But if I'm going to also produce an index in the help file or in the printed guide, the keywords and phrases need to be written and ordered differently. A typical string of index keywords might look like this:

  • burn notices, delivering
  • protocol, for burn notices
  • field agents, delivering burn notices
  • operatives, terminating
  • tactics, presenting burn notices

When you browse an index, you look for the main word first, which is usually a noun. Users don't usually browse to "give" or "deliver" when looking for a topic in an index.

But when users search, they usually do include verbs to lead off the phrase. I tend to search for exactly what I'm trying to do. If I want to download an mp3 file of a James Bond soundtrack, I search for "download mp3 James Bond." If I want to figure out how to stop water leaking through my window well cracks, I search for "stop leaks to window wells." I lead with the verb.

Stems?

Another question is to determine how the search algorithm handles stems. Exact matches factor highly, so when I have a phrase like "drop off burn notices," do I need to add drops off burn notices, dropping off burn notices, dropped off burn notice, drop off burn notices, and so on for each of these search keyword phrases?

I wrote to Madcap to ask them for this information, because it's not in the help file. They replied that the search will automatically look for partial matches, so if I just include "drop off burn notices," Flare will look for partial matches with the other verb forms. But an exact match still trumps a partial match in search engine weight.

What? It's not in the help!

I felt a little bewildered that Flare's help, which is comprehensive and complete in almost every way, had nothing to say about one of the most important elements of help authoring: how to influence the search to increase topic rank.

And then I realized something ironic. In the application I'm documenting at work, search is a key feature, included in four separate places in the interface. And yet beyond keyword matches, I have no idea how the search algorithm works.

About Tom Johnson

Tom Johnson

I'm an API technical writer based in the Seattle area. On this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, AI, information architecture, content strategy, writing processes, plain language, tech comm careers, and more. Check out my API documentation course if you're looking for more info about documenting APIs. Or see my posts on AI and AI course section for more on the latest in AI and tech comm.

If you're a technical writer and want to keep on top of the latest trends in the tech comm, be sure to subscribe to email updates below. You can also learn more about me or contact me. Finally, note that the opinions I express on my blog are my own points of view, not that of my employer.