Figuring Out Search Algorithms [Organizing Content 10]

In my last post, I argued that navigation systems can’t be entirely discarded in favor of search, because navigation helps users discover the unknown unknown. But now that we’ve covered navigation systems a bit, it’s time to move on to search, because search is undoubtedly a major way that users navigate help content. How can you organize your content so that the topics are findable in search?

Notice that I said “organize.” You’re still organizing your content, but on a smaller level. Rather than organizing topics within groupings or folders, you’re organizing the words within the topics. To make the topic visible, you have to organize the words in a way that maximizes visibility in the search.

But here is where things get confusing. Almost no one understands how search works. Google’s search works differently from Flare’s search. Flare’s search works differently from WordPress’s search. I’m guessing that the search in RoboHelp differs from the search in Author-it, which differs from the search in DITA-produced Eclipse help, and so on.

Each search engine has a unique algorithm that sorts and ranks information based on a set of variables and other factors, much more than simple keyword frequency. Here’s the key point: you can’t optimize search until you understand the search algorithm you’re optimizing for.

Google’s search algorithm

Although Google’s search algorithm is complex, it’s unquestionable that Google’s search results are based on some of the following factors:

  • The number of links pointing back to your site
  • The authority of the sites pointing back to your site
  • The text used in the links pointing back to your site
  • The location of the keywords the user is searching for, especially in the title and h1, h2, h3 tags
  • The frequency of the right keywords that the user is searching for
  • Your own site’s page rank

What about meta keywords and title tags? Not that important. You could stuff a ton of keywords into the header of your topic, but Google has been gamed with that trick long ago, so it ranks these keywords low. The genius of Google’s search stems from the collective wisdom of hyperlinks.

WordPress’ search algorithm

According to Lorelle Van Fossen, WordPress orders search results with the date as a major factor:

When you search in your WordPress blog, your search results are listed chronologically. Not by “most likely”, “most popular”, “most frequent use of the phrase”, or even alphabetically, just by date. And the chronological order runs from most recent to oldest. If the most likely post to provide the information the user is searching for is older, they will have to scroll towards the end of the list to find the most likely candidate for information. What are the odds they will, huh? This frustrates me no end.

Another frustration with WordPress searching is that it only searches posts. It does not search comments nor Pages. Only post content.

Notice that while links pointing to your content factor high in Google, these same backlinks are non-factors in the WordPress search algorithm. In WordPress, the date of the post seems to matter most.

Madcap Flare’s search algorithm

Since Flare is the help authoring tool I’m currently using, I’ll dive into its search algorithm with more depth. According to Rob Houser in Flare Tip: How the full-text search works, Flare’s search weighs the following three factors with the most significance:

  • Exact matches
  • Frequency of the words
  • Location of the words

Rob explains,

Flare ranks exact matches of the term the user enters higher than partial matches. An exact match uses the same form of the term (for instance, an exact match for “deleting files” would be “deleting” or “files”). A partial match may share the same root word as the original search term, but the terms aren’t exactly the same (for instance, the user might search for “deleting files” and the search would also find “delete”, “deletion”, “file”, and “filed”).

I’m not sure where Rob is getting his information, whether through experimentation with Flare or conversations with Flare developers. But let’s assume Rob is right. There are still many questions unanswered, namely how to create an exact match. If the topic title doesn’t provide the exact match, how do you create the match?

The Problem with Using Index Keywords

Flare doesn’t have a section where you can stuff a file with meta keywords. But you can stuff a topic with index keywords. Index keywords are included in the search algorithm. (Concept keywords are excluded, by the way.)

However, using index keywords as meta keywords presents a dilemma. If exact matches factor highest in the search, you should add index keywords according to the phrases you think users will search for in the help, right? But the way users enter keywords in search boxes differs from the way users browse an index.

With Flare you’re forced to make a decision about how you want to use the index keywords. If you use index keywords to beef up your search, your index will look random. If you choose index keywords for readability in a traditional index, you cripple the search.

Let me give an example. Suppose my topic is “Properly Delivering a Burn Notice.” Users searching for this topic might enter the following phrases:

  • burn notice
  • deliver burn notice
  • drop off burn notices
  • burn notices sending
  • how to present a burn notice
  • cutting off undercover agents
  • severing ties with field agents
  • burn notice protocol
  • best way to handle burn notice
  • give burn notice to operatives

Because exact matches factor the highest in Flare, it’s important to include these phrases as index keywords, and to insert those index keywords into the highest ranking location in the topic — that is, inside the h1 tags of the topic title.

But if I’m going to also produce an index in the help file or in the printed guide, the keywords and phrases need to be written and ordered differently. A typical string of index keywords might look like this:

  • burn notices, delivering
  • protocol, for burn notices
  • field agents, delivering burn notices
  • operatives, terminating
  • tactics, presenting burn notices

When you browse an index, you look for the main word first, which is usually a noun. Users don’t usually browse to “give” or “deliver” when looking for a topic in an index.

But when users search, they usually do include verbs to lead off the phrase. I tend to search for exactly what I’m trying to do. If I want to download an mp3 file of a James Bond soundtrack, I search for “download mp3 James Bond.” If I want to figure out how to stop water leaking through my window well cracks, I search for “stop leaks to window wells.” I lead with the verb.

Stems?

Another question is to determine how the search algorithm handles stems. Exact matches factor highly, so when I have a phrase like “drop off burn notices,” do I need to add drops off burn notices, dropping off burn notices, dropped off burn notice, drop off burn notices, and so on for each of these search keyword phrases?

I wrote to Madcap to ask them for this information, because it’s not in the help file. They replied that the search will automatically look for partial matches, so if I just include “drop off burn notices,” Flare will look for partial matches with the other verb forms. But an exact match still trumps a partial match in search engine weight.

What? It’s not in the help!

I felt a little bewildered that Flare’s help, which is comprehensive and complete in almost every way, had nothing to say about one of the most important elements of help authoring: how to influence the search to increase topic rank.

And then I realized something ironic. In the application I’m documenting at work, search is a key feature, included in four separate places in the interface. And yet beyond keyword matches, I have no idea how the search algorithm works.

Madcap FlareAdobe Robohelp

This entry was posted in findability, general on by .

By Tom Johnson

I'm a technical writer working for The 41st Parameter in San Jose, California. I'm primarily interested in topics related to technical writing, such as visual communication (video tutorials, illustrations), findability (organization, information architecture), API documentation (code examples, programming), and web publishing (web platforms, interactivity) -- pretty much everything related to technical writing. If you're trying to keep up to date about the field of technical communication, subscribe to my blog either by RSS or by email. To learn more about me, see my About page. You can also contact me if you have questions.

3 thoughts on “Figuring Out Search Algorithms [Organizing Content 10]

  1. Leo Paoletti

    Tom,
    Enjoyed this series, especially this topic covering Flare and search.

    I notice you and Rob Houser haven’t discussed the impact of synonyms. Here’s a snippet from the Flare 6 Help:

    Creating Synonyms to Enhance Search Results
    (This feature is supported in DotNet Help, WebHelp, WebHelp Plus, WebHelp AIR, and WebHelp Mobile output.)

    You can make improvements to your output so that, in the future, users are able to find the search results they need. One way to enhance your output is to create synonyms for search phrases. This way, even if a user enters a search term that is not included anywhere in the project, that person will still be able to find the appropriate information.
    ==========================

    I’d really like to hear from MapCap on search topic ranking.
    Thanks,
    Leo

    1. Tom Johnson

      Leo, good point about the synonyms. I was planning to address this in the post but never did. Even including misspellings into the synonym file can be helpful.

      Madcap said they would work on a KB article sometime this summer that clarified the search topic ranking more. So just be on the lookout for that.

  2. Pingback: Search Engine Optimizing Your Help Content for Google [Organizing Content 11] | I'd Rather Be Writing

Comments are closed.