One question I've often wondered is how technical writers can feel so comfortable stringing together words on paper but be completely inadequate when it comes to vocalizing the same text in a script, such as reading for a video tutorial.
We talk often about voice and tone, and consider ourselves, as writers, experts in understanding how to communicate. We can adopt different styles in our writing -- friendly or formal, long-winded or brief, plain or "official." We have a keen sense about words, even what's communicated between the lines, and how we communicate by tone without being explicit.
Why is it, then, that in front of a microphone, we often freeze up and sound like robots, or become stiff as if reading at gunpoint? Our breathing quickens or else becomes stilted, we stumble on familiar words, we don't sound like ourselves, and before you know it, the whole delivery is awkward -- even if we wrote the script ourselves.
The question is even more poignant when you consider how pervasive speech is. When we write, we may think we're not playing in the realm of speech. But that's not the case. Each word I type, I can hear myself saying it in my head. When I read, I can also hear it in my head. Not always -- for example, not so much when I skim and skip around. But writing is not merely confined to paper and processed without speech. We hear the words in our minds, as if someone were speaking them.
How can we write all day, listening to the words in our heads, but not be able to vocalize them with exactly the same stresses and tones in which we wrote them? Writers should be excellent speakers. Why isn't that the case?
A few weeks ago, I listened to an interesting presentation about speech act theory. In one of the STC Summit presentations, Jürgen Muthig, a professor in communication and media management at a German University, explores speech act theory and its practical applications for tech comm.
According to Muthig, a sentence can have more than one meaning depending on how you say the sentence. The act of speaking it gives rise to a variety of meanings. In some cases, the vocalizing of words is an act in itself (such as with a pronouncement, blessing, or spell).
Muthig says, for example, to take this sentence: The door is open. In writing, there are only so many meanings one might derive from it. But if you could hear me say it 12 different ways, with different inflections, pauses, and stresses, you would see the many meanings this sentence might have.
The door is open. [Statement of fact.]
The door is open? [A question.]
The door... is open. [Inviting you to leave.]
The door is open! [Mad that the door is open.]
The door is open. [A courteous gesture to return at any time to my office.
The door is open. [Perhaps a correction to someone's sentence, I know the door is shut.]
The door is open. [Perhaps in response to someone saying, It appears we have just been robbed.]
The door is open. [In response to someone's assertion that the window is open. There's a draft. Is the window open?]
The door is open. [In response to someone asserting that both doors are open. No, there's just one door.]
The door ... is open. :) [In response to someone saying, Where's the dog! Where's my $2,000 pet bulldog terrier?]
The door is open! [In response to someone knocking, while you're washing dishes.]
The door is open! [In response to someone knocking, directed to your spouse or kids because you're in the shower.]
Since it's hard to communicate all of these variations of meaning that you can express from the tone alone, I've created a simple audio file demonstrating the various meanings.
Although you can guess some of the meanings of the written sentence from the context and any font styling and punctuation, you're still somewhat limited in writing. You rely on the context of other sentences around it to gauge the meaning.
Muthig applies linguistics and speech act theory toward another end: He explains how to minimize the number of meanings to provide more straightforward speech across cultures and languages. However, my purpose with this post is to explore differences in speech and writing, and how those differences affect video tutorial scripts, since video tutorials are one common scenario in which technical writers engage in speech.
When we read, we try to infuse the copy with intonations, inflections, stresses, pauses, and so on. Each of these vocal contributions can influence the meaning of what we say. Try ending a sentence with an upward inflection rather than a downward one -- it changes the meaning. Add a pregnant pause somewhere -- it changes the meaning. Read it without any inflection at all -- it changes the meaning.
Thus when we read a sentence, we not only read the words, we make a thousand unconscious decisions about the tone, style, inflection, pauses, stresses, rhythm, and other aspects of a text. Each of these vocal adjustments adds to the meaning of the text.
This is partly why voiceover actors will mark up a script they're reading. If you leave all of these pauses and inflections to be decided as you're reading, it's likely that you'll choose the wrong one, or skip over an inflection you meant to include. Or you may find yourself adding speech emphases that you didn't intend, thus changing the meaning as well. This is why it's so hard to read a script. You're suddenly faced with dozens of unconscious tools that you can use to change the meaning of the sentence.
Since intonation is key to reading a script, you want to be sure you read it the right way. How can you avoid inflecting weirdly, or in an off-beat kind of way? It's quite hard. Read the above paragraph out loud, and listen to all your inflections. Where did you learn to inflect the way you did? Did you think about inflection? Or like breathing, did you just read it in a way that felt natural? If you're like most people, you didn't consciously consider the intonations and inflections. It's just something that you learn and imitate unconsciously.
In Vernacular Eloquence, Peter Elbow says that we internalize the rules of spoken language as early as four years of age:
"[Spoken language is] a language with complex and intricate grammatical rules -- rules that we tend to master by around age four and usually obey without any awareness of them. This is the complex language that comes out of our mouths without planning when we have a thought or feeling to share" (14).
In short, spoken language allows you to create myriad meanings based on your performance of the script. These performances follow a set of unconscious oral rules that we've learned since children but never consciously examined with much thought.
If we do begin to examine the unconscious rules of spoken language, the words themselves begin to appear odd. I'm sure you've noticed this when you look at one word too long. It starts to look foreign. This is because when we isolate a word, our examination removes the word from its context -- and removed from context, language has little meaning.
Now that I've detailed the difficulty of speech, and why it poses challenges to writers who are intimately familiar with words, I'd like to get to a more practical point: How do you know when and how to intone, and how can you freshen up a script by inflecting, emphasizing, and pausing at the right times?
Imagine you're explaining the concept of intonation to an extraterrestrial. What rules govern when we say a word? Who knows!
Typically sentences that aren't questions end in a downward tone. Questions end in an upward inflection. But sentences aren't either one upward or downward hill. There are lots of rising and lowering tones throughout the sentence (this is called "intonation"). Children learn to interpret some meaning through the tone alone, before understanding the specific meaning.
When we set off a noun with an appositive, renaming it, we drop our tone to more of a whisper, because appositives aren't usually emphasized.
An introductory phrase also doesn't receive the emphasis of the sentence, so it is said with less force. When the introductory clause shifts to the main clause, the actor and action of the main clause receives the major emphasis.
The brain is sophisticated enough to unconsciously recognize many parts of speech and process them differently. It processes language according to a consistent set of intonation rules -- rules that our conscious mind doesn't focus on because doing so would require too much thought and decision-making.
I suspect that one paragraph may have so many shifts in tone that one would have to be an incredibly patient, bored linguist to classify it all -- and to what end? After all this cataloging, are there any rules that we end up? A general theory of intonation?
I once knew a poetry teacher who was reputed to read aloud anything so well he could make a phone book sound beautiful. This is one thing that's fascinating about speech over language -- the way you read it can freshen up the content, even convert clichés into original sounding prose. In the reading, you can choose a non-standard yet thoughtful and evocative intonation. And with video tutorial scripts, skillful readers can also avoid predictable intonation patterns to provide a fresher, more interesting delivery.
For a practitioner like me, all theory eventually leads to some practical application, which is where I'm now headed. First, let me reiterate what I've been arguing. Just because you write a script, it doesn't mean you've locked down the script's meaning, or even that you're ready to read it, because as you read it, you'll be infusing the sentences with additional meaning and suggestions by every little inflection and intonation.
Therefore it's important to plan out the way you'll read the script, to decide just what words you will emphasize, how long you will pause, and so forth. This will help your recording sound right. Voiceover artists call this task "woodshedding" a script. To bring many of the unconscious inflections to the surface, you must annotate the sentences with a formatting that describes and defines the many vocal adjustments in each sentence.
With software video tutorials, it's quite difficult to perform a script while also driving a mouse to illustrate different tasks in an application. It's better to separate the simulation from the voiceover performance so you can devote more attention to the voice of the script.
The problem is that by separating the two, you compromise the timing between the simulation and the script. One workaround is to record the simulation while reading the script, but not to focus on the script much at all. Instead, focus on getting the simulation right. Then after the recording finishes, dub over the recording with a new voiceover reading. With your re-dub, sync your newly spoken sentences to the timing of the original. This allows you to keep the timing while separating the actions from the performance of the script.
Overall, I'm fascinated by the differences between speech and writing, so you'll probably see this topic surface on my blog numerous times in the coming weeks. I've just started reading Peter Elbow's Vernacular Eloquence, and it's fascinating.
Get new posts delivered straight to your inbox.
I'm a technical writer based in the California San Francisco Bay area. Topics I write about on this blog include the following technical communication topics: Swagger, agile, trends, learning, plain language, quick reference guides, tech comm careers, and certificate programs. I'm interested in information design, API documentation, visual communication, information architecture and findability, and more. If you're a professional or aspiring technical writer, be sure to subscribe to email updates using the form above. You can learn more about me here. You can also contact me with questions.