Reporter's Notebook

The art and science of the interview

How to Transcribe

leave a comment »


I don’t have time for this post, mostly because I have about a dozen interview transcripts I need to create.  But it is the perfect opportunity to talk about transcribing using Dragon NaturallySpeaking v 12 (DNSv12)  as well as different techniques for creating transcripts.  I hope it helps you. I’ve talked about the need to create transcripts before, but this is about actually creating them.

Dragon NaturallySpeaking v12 works in two ways that I find helpful to different degrees.  First, it reads text which, if you’ve visited my state constitutions website, is critical for voicing impossibly long text documents.  It recognizes punctuation, knows when sentences begin and end and uses rudimentary but adequate accenting and emphasis for words and sentences.  This it does very well.  The other thing I need it for; transcribing, it doesn’t do as intuitively.  I Tweeted DNS and suggested that their next version include a timed transcription feature.  They’re response seemed less than personal and a little rushed.  I guess they’re really busy.

I’ll be using an interview I just did with a political candidate as the example.  Candidate Eric Squires is running for HD 26 in Oregon.

DNSv12 lets you drop .wav audio files (not .mp3) into a box called “DragonPad” that turns the audio file into words.  The problem is the software rarely hears punctuation, so it doesn’t know when to end sentences and it doesn’t capitalize words at the start of sentences.  Here is the intro to the interview w/Mr. Squires as the software heard it;

“I’m Don Merrill and I’m talking with Eric squires and squires is a Democrat who is running for the seat in House District 26 the district includes Wilsonville Sherwood Kings table Mountain and Greenville squires think very much for talking thanks Raven here.”

This is how the text should read:

“I’m Don Merrill and I’m talking with Eric Squires.  Mr. Squires is a Democrat who is running for the seat in House District 26.  The District includes Wilsonville, Sherwood, Kings City, Bull Mountain and Greenville.  Mr. Squires, thank you very much for talking with me.  Thanks for having me here.”

You may have noticed something else the software doesn’t do.  It doesn’t put each person on their own line. To fix that along with the other things it doesn’t do, you can go through the text manually after the software has read it.  But that means that you also have to listen to the audio as you’re reading the text so you can fix it as you go.  Otherwise, you’ll miss words that you don’t see but are in the audio that the software missed.  If you download the entire audio file,  you have to consider the total length of the audio you’re listening to and add the time it takes to make corrections to the text.  In the end, you might spend up to twice as long or longer creating an accurate transcript.

There is another way to transcribe your audio. If you use DNSv12, it can recognize spoken commands like punctuation, line changes, proper nouns, etc.  And once you set it up on your PC with a microphone, you can repeat the interview into the microphone and DNSv12 will create an accurate transcript.  But there is a trick to this.

The audio you want to transcribe has to be on another playback device like a microcasette player or digital recorder. Put on headphones from that playback device and start the DNSv12 software on your PC.  Repeat (in your own voice) into the DNSv12 microphone what you hear through your headphones. And when you speak, remember that you have to speak in a very specific way that the DNSv12 software can understand.  Let’s go back to the Eric Squires intro.  When speaking into the microphone, this is how you have to speak to get an accurate transcript like the one following the example below:

COMMAND: colon
COMMAND: space
I’m Don Merrill
COMMAND: comma
and I’m talking with Eric squires
COMMAND: period
COMMAND: space
Mr Squires is a Democrat who is running for the seat in House District 26
COMMAND: perod
The District includes Wilsonville
COMMAND: comma
Kings City
COMMAND: comma
Bull Mountain and Greenville
COMMAND: perod
Mr Squires
COMMAND: comma
thank you very much for talking with me
COMMAND: period
COMMAND: new line
COMMAND: colon
COMMAND: space
Thank you for having me here
COMMAND: period

NOTE: The word “Command” is only there to tell you a command is necessary for the software. Don’t say the word “Command”.

And this is how it should look:
DM: “I’m Don Merrill, and I’m talking with Eric Squires.  Mr. Squires is a Democrat who is running for the seat in House District 26.  The District includes Wilsonville, Sherwood, Kings City, Bull Mountain and Greenville.  Mr. Squires, thank you very much for talking with me.
ES: “Thanks for having me here.”

The advantage of this form of transcription is, once you get the technique down, you can complete an accurate and properly formatted transcript in one reading. If you’re repeating an hour long interview, it can get tedious which is why it requires concentration. But the trick, besides knowing how to speak in a way the software recognizes is to learn how to speak exactly what you hear without thinking about it. The words almost have to flow into your ears and out of your mouth into the microphone. If you do too much thinking, you lose your place, you have to start over and you get frustrated.

In both cases, the software has to be first “trained” to recognize your voice. The disadvantage of the first technique; dropping an audio file into DragonPad, is that in conversation, we tend to not speak as precisely as we would if we were reading specifically for the software. Normal conversation has slurs, shortcuts and slang that the computer would have a hard time recognizing even if it could recognize our voices perfectly. Plus, remember that the software is not trained to recognize other voices. So in the end, you have to do all of the work anyway.  It seems to me that the second technique for transcribing is the better of the two.

When I have transcribed using the first technique, this is the process I usually go through and how long each part takes.  On average, a 30 minute interview is around six pages long.

Transcribing Process

1.  00:01:00 –  Convert .mp3 file to .wav file in Audacity
2.  00:09:00 – Machine Transcription of a 30:00 .wav interview in Dragon
3.  00:05:00 – Transfer to Open Office/MS Word Document
4.  00:30:00 – Spacing between interviewer/interviewee segments and adding periods.
5.  00:05:00 – Identifying speakers by initials at beginning of soundbytes
6.  00:05:00 – Timing pages
7.  00:16:00 –  Deep Corrections Page #1 – Capitalization, puncutation, correcting words and spelling
7b. 00:14:00 – Deep Corrections Page #2 – same
7c. 00:11:00 –  Deep Corrections Page #3 – same
7d. 00:18:00 – Deep Corrections Page #4 – same
7e. 00:12:00 – Deep Corrections Page #5 – same
7f. 00:14:00 –  Deep Corrections Page #6 – same
8.  00:05:00 – Retime pages

Total Transcribing Time – 2:25:00

DNSv12 has lots of features, including backspacing, delete and repeat among others.  And its text read function is wonderful.  Its transcribing function is something to be desired, but it is better than manual transcription, which can be hellish.


Written by Interviewer

April 6, 2014 at 02:40

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: