Cassette tape with loose tape

Is dictating the new writing? Can speech-to-text engines help you talk your way to a novel?

Thanks to speech-to-text engines it’s now possible for anyone with a smartphone to become one of the great dictators. Not the testosterone-fuelled – my rocket is bigger than your rocket – kind of dictator. I’m talking here about writers who use dictation to speak their way to a story.

Dictation was once the preserve of writers like Barbara Cartland, who worked with an assistant sitting nearby, typing every utterance at a hundred words a minute. Improvements in speech-to-text technology mean than we can all dictate like Barbara. If you dream of writing a book while reclining on a sofa, then read on.

I first tried dictation using Windows Speech Recognition two years ago on a Windows 10 laptop. The experiment wasn’t particularly successful. It was easy enough to dictate at well over 2,000 words an hour, a massive improvement on my glacial keyboard-driven pace. But the quality was extremely disappointing. The error rate was high and the manual correction took much longer than expected. It felt like something that needed time and patience to improve on. Disappointed, I put the dream of greater productivity aside and returned to my keyboard.

Cassette tape with loose tape

A recent change in work routine means that I’m away from my desk more often. There are several periods of the day when I have time to write or record ideas about writing. But I don’t have my laptop with me. The obvious answer is to write using my smartphone’s keyboard. But typing on a small screen is a slow business. So I decided to give dictation another try.

My experiments with speech-to-text are still at an exploratory stage. I don’t want to commit money to it, so I’m sticking for now with freely available tools.

Dictation example using Google speech-to-text on an android phone.

The text below is an unedited example of dictation using speech-to-text on my Android smartphone. I used earphones with a built-in microphone. The phone had a wifi internet connection throughout the test. I used the microphone on the standard Android keyboard and dictated into a Google Docs text file.

The text is taken from chapter 14 of Crossing Live:

Roland woke to the sound of voices outside. He struggled out of bed and stumbled onto the Veranda, finding it hard to believe it was morning already full stop
Suzie, wearing a pair of very dark glasses, was standing on the footpath. She was talking to his elderly neighbour, Mrs Campbell. Roland ran back into the bedroom for his t-shirt. When he returned, Susie was getting into the front seat of a taxi. He called out to her as he ran down the steps but the car reversed onto the road and drove away full stop
Mrs Campbell was holding [hosing] the nature strip in front of her house. She heard Roland and turned to a wave, sending a jet of water onto a passing powawalker [power walker]. Roland chic Reed [she cried].
He realised he would have to go and speak to her.
Who is a dark horse then, Roland Kendall?
I’m sorry question mark
You and that lovely Susan Denning. I just introduced myself and we had a nice little chat. She seemed pleased about this. She held out a newspaper, folded open at the social pages full stop Prominent among the pictures of the usual social set at the week’s functions was one of Susie and Roland, pressed cheek to cheek.

In the text above, I’ve highlighted the errors in red and shown the correct word in square brackets. There were 8 errors in 203 words, an error rate of around 4%. Half of the errors were problems with punctuation. The remainder were understandable mistakes.

Overall, this is an extremely good result. In other tests, particularly my earliest attempts, the error rate was double this. Dictation quality also drops dramatically when there is no online connection. An offline experiment with the same text quote above resulted in an error rate of more than 10%.

What I’ve learned about speech-to-text so far.

  • Speech-to-text works fine with my phone’s built-in microphone. But it works better when I use the microphone built into my earphones.
  • Speech-to-text works best when you have an online connection. The errors increase dramatically when you’re offline.
  • Clarity matters more than volume. Google speech-to-text is surprisingly good even when I whisper. The best results come when I speak at a normal volume and make an effort to move my lips. So, channel your inner Eliza Doolittle and think: “The rain in Spain stays mainly on the plain!”
  • Dictation takes practice. If, like me, you’re used to seeing the words on the screen as you type, talking blind can be unsettling. The best results came when I already had a few dot points as a guide to what I was going to say.
  • Google’s speech-to-text engine has limited success with punctuation. In online mode, it usually gets full stop and comma right. It also mostly understands the new paragraph instruction and even understands question mark. Offline, in my experience, it more often than not fails to get the punctuation right.
  • Understandably, names are a problem in dictation. It’s actually surprising that when it gets a name right. Don’t get hung up on names and spelling. They can easily be corrected later.
  • You need to maintain a sense of humour and remember that this is actually a surprisingly accurate process. It’s far from perfect, but the fact that it works at all is amazing.

Conclusion

Dictation using speech-to-text is a valuable addition to any writer’s toolbox. It’s possible to draft quickly with an acceptable level of errors. But it’s not a perfect process and it will require practice to get it right.

So far, I’m encouraged by the quality of the dictated text. I’m not ready yet to talk my way to an entire novel. But ideas and short scenes are already beginning to flow. I’m going to stick with it and hope that eventually the effort will pay off.