Speech-to-text gets it so wrong it’s actually hilarious. And not, you know, a waste of your time and money.
With speech-to-text transcription, what are you really saving?
[Patrick Emond contributed to this post]
Last week, IBM trumpeted their latest achievement in automated speech-to-text: a record-low error rate of 5.5 percent. But always, especially with regard to saving money on transcription, you have to read the fine print.
“This was measured on a very difficult speech recognition task: recorded conversations between humans discussing day-to-day topics like ‘buying a car,’” notes the Principal Research Scientist, George Saon. “This recorded corpus [defined as “a collection of written or spoken material in machine-readable form, assembled for the purpose of studying linguistic structures, frequencies, etc.”], known as the Switchboard corpus, has been used for over two decades to benchmark speech recognition systems.”
It is worth noting, however, that our “corpus” is not a mere database of recorded phone conversations, but the real world. Our team of transcription experts includes musicians, writers, bartenders, astrophysicists, ethnomusicologists, film geeks, hockey nuts, and world travelers, all of whom bring real-life experience and a unique knowledge base to your transcription projects.
Saon prefaces this entire milestone with the following claim, “Depending on who you ask, humans miss one or two out of every 20 words that they hear.” It is worth dwelling on that one claim for a moment. We are to believe that humans, when straining to listen or transcribe as this context dictates, miss 5 to 10 percent of everything that they hear? Saon, though, then goes on to explain the realities of speech-to-text:
“As part of our process in reaching today’s milestone, we determined human parity is actually lower than what anyone has yet achieved — at 5.1 percent.
“To determine this number, we worked to reproduce human-level results with the help of our partner , which provides speech and search technology services. And while our breakthrough of 5.5 percent is a big one, this discovery of human parity at 5.1 percent proved to us we have a way to go before we can claim technology is on par with humans.”
IBM tell us that they “worked to reproduce human-level results,” whereas we actually deliver them. An error rate of 5.1 percent, the utterly ludicrous benchmark by which IBM has set its speech-to-text goals, is an error every 20 words. This translates to an error on every single line of your transcript, with hundreds, if not thousands, of errors in total across, for example, a 35-page transcript (or one-hour recording).
We deliver transcripts well in excess of 99 percent accuracy with a 100 percent satisfaction guarantee. We are not looking to set any benchmarks; we want to deliver the best transcripts with the fastest turnaround. You don’t want to spend your time and money making hundreds or thousands of corrections; you want to grow your business. You want accurate transcripts.
And that is why we are here, and have been for 50 years. Computer speech-to-text programs may deliver a number, based on a benchmark, based on a corpus, based on a reproduction of a finite number of phone recordings. But the Audio Transcription Center just delivers: near-perfect transcription with no hidden fees when you need it.
In which we ponder how an antiquated Maine labor law, a class-action lawsuit, and a controversial bit of punctuation can make the national news.
Recently, my wife forwarded me a New York Times article about a lawsuit in my home state of Maine. This isn’t a common occurrence, for how often does one really lend much thought to labor disputes in their hometown? But this one had a special flavor to it, that speaks to the risk inherent in subpar transcription.
The article, by Daniel Victor, “Lack of Oxford Comma Could Cost Maine Company Millions in Overtime Dispute,” presents a somewhat worst-case-scenario for the Oxford comma (or serial comma of you’re not prone to well-ripened narcissism).
Three truck drivers are suing Oakhurst Dairy for more than four years’ worth of unpaid overtime. The state’s overtime rules indicate that any work performed after 40 hours in one week, must be paid out at 1.5 times the normal rate. There are of course exceptions, and the lawsuit, and the $10 million at stake, hinges upon one missing Oxford comma.
An explanation of the Oxford comma (from Oxford Dictionaries no less) for those curious.
In effect, the Oxford rule states that a comma should precede the conjunction in the final list item. To use a common example of when the Oxford comma might be prudent:
Oxford comma: I would like to thank my parents, Oprah, and the Pope.
No Oxford comma: I would like thank my parents, Oprah and the Pope.
So you may be asking: how exactly could a punctuation decision in the Maine Legislative Drafting Manual possibly affect the transcription for my project?
To be brief: Transcription is a subjective interpretation of a recorded medium. You are asking someone to write down not only what was said from a recorded file, but you are asking them to punctuate the content precisely.
Does your transcriptionist understand the Oxford comma? The comma splice? Does your transcriptionist understand that people don’t speak grammatically with any regularity and how best should they approach applying grammar in an interview when it is not regularly utilized?
These are all important questions you should consider when looking for transcription, and they only scratch the surface. Who do you trust with your transcription?
It turns out, if you’re following the curious case of the Oxford comma, the US Appeals Court sided with the plaintiffs in their decision. In short, as law reads:
The canning, processing, preserving, freezing, drying, marketing, storing, packing for shipment or distribution of:
(1) Agricultural produce;
(2) Meat and fish products; and
(3) Perishable foods.
And as Victor points out:
If there were a comma after “shipment,” it might have been clear that the law exempted the distribution of perishable foods. But the appeals court on Monday sided with the drivers, saying the absence of a comma produced enough uncertainty to rule in their favor. It reversed a lower court decision.
In other words: Oxford comma defenders won this round.
These little issues in a transcript can add up to confuse, obscure, or otherwise completely change the meaning and intent of an audio or video file. While it is unlikely that such an error will potentially result in the loss of millions with your case trending on The New York Times, it can result in subtly, or even wildly, inaccurate transcripts.
Which rather defeats the purpose, doesn’t it.
Winter Storm Juno or “Snowpocalypse” is arriving in the northeast with a vengeance overnight tonight, so we’re preparing for the worst while still handling all of your transcription needs to the best of our abilities!
Team ATC is ready to make sure your audio and video content aren’t buried beneath the snowdrifts or blown away in the 50+ mph blizzard-like winds.
Thanks to the latest advances in weather forecasting and the Internet, our team is able to “virtually” keep your projects moving (for those projects that allow such work to leave the cozy confines of our downtown Boston World Headquarters).
ATC’s Boston office will remain open TODAY, Monday, January 26, 2015 until 5 p.m. EST (unless we follow up later that we needed to shut down early), but tomorrow (and possibly Wednesday – I really hope not) we need to wait and see if the weather allows us to make it in.
We’ll be available virtually via email firstname.lastname@example.org.
Our virtual team will be able to keep your important projects moving, and we’ll email them back to you as we’re able.
For those of you in the blizzard’s path, please stay safe, and we will post any updates here to the blog as needed.
Tamar Carroll researches the questions of what motivates community activists to do what they do…
The media content our academically-minded transcription know-it-alls listen to and transcribe on a daily basis is truly second to none—OK, maybe we’re a little biased about our team and our clients’ media content—so we’re always bursting with enthusiasm for these projects. As per our previous post on confidentiality, we can’t always talk about the various subjects we’re transcribing, so we’re super-excited for those times when we are permitted to sing a project’s praises from the second floor of our downtown Boston office. (This may also explain those times when the pigeons fly rapidly away form the director’s window — the bottom window on the right if you were wondering…)
But I digress…
Today we are thrilled to talk about Tamar Carroll of Rochester Institute of Technology and her forthcoming book, We corresponded with Tamar via email, and she was kind enough to take some time to answer our questions and talk in detail about these interviews and what she hopes to learn and understand from them.
ATC: Tamar, tell us about these interviews you’re conducting in more detail.
CARROLL: The interviews I have done with more than 40 activists are research for my book, Mobilizing New York: Community Activism from the War on Poverty through the AIDS Epidemic, which is under contract for publication with the University of North Carolina Press in 2015. The book begins with Mobilization For Youth (MFY), a demonstration project for the War on Poverty located in the Lower East Side, and charts the transformation of this social welfare agency by the civil rights movement and the participation of African American and Puerto Rican mothers. I then follow a young social worker and Congress of Racial Equality (CORE) activist, Jan Peterson, from MFY to Williamsburg/Greenpoint, Brooklyn, where she founded in 1975 the National Congress of Neighborhood Women, a working-class feminist organization that established a college and jobs program as well as the first battered women’s shelter in New York City. Finally, I examine the collaboration between gay men and feminists in the AIDS Coalition to Unleash Power (ACT UP) and Women’s Health Action Mobilization (WHAM!) in the late 1980s and early 1990s,when their spectacular street theater and dramatic poster art reshaped the social geography of the city, leading to the creation of a supportive queer community as well as important changes in public policy on AIDS and medical research more broadly.
Mobilizing New York examines how residents have enacted participatory democracy, using self-education, consciousness-raising, public protest and civil disobedience to make American citizenship more inclusive. I also investigate the conditions that foster collaboration across lines of race, class, gender and sexuality, as well as the challenges posed by differences of identity.
ATC: What do you hope to learn from these interviews?
CARROLL: The interviews help me understand what motivates individuals to become activists and how they think about strategies, tactics, and movement goals. I also learn how they assess the triumphs and failures of the movements they have taken part in, and perhaps most significantly, how taking part in activism shaped their own lives.
ATC: Where and how can these interviews be accessed if made public?
CARROLL: I have donated my interviews with WHAM! and ACT UP members to the Tamiment Library at NYU, where the WHAM! papers are located, and my interviews with MFY and NCNW members to the Sophia Smith Collection at Smith College, where the papers of the NCNW and of Frances Fox Piven and Richard Cloward (Cloward founded MFY and they met there when she worked there) are. Both the audio files and transcripts are available for many of my interviews.
Stay tuned for the published book in 2015, and in the meanwhile ATC will continue to transcribe and blog about other fascinating projects each month (as we’re allowed by our clients).
Finally, you’re now able to ‘Like’ us on Facebook!
What comes to mind when you picture a transcription service? Since 1966, ATC has adjusted with the times by continuously learning from our experiences. We always hire the best and most diverse team of transcription know-it-alls!
No voice recognition software here, just awesome people!
|(Blizzard of ’78 picture By David L. Ryan/Globe Staff/file 1978)|
The Blizzard of 1978 caught many people by surprise, and Boston was shut down for days afterwards.
**So please stay in touch with your orders and we’ll be sure to keep your transcription process moving to exceed your expectations. ATC’s Boston office will close at 2 p.m. EST. Friday, February 8, 2013, but again we’ll be available virtually after that time until Monday at 8 a.m. Call (617) 423 – 2151 with any needs prior to 2 p.m., or email us email@example.com.
Our virtual team will be able to keep your important projects moving, and they’ll be safe…wherever they are!
Of course, for those of you in the blizzard’s path, stay warm and stay safe!
Thanksgiving is around the proverbial corner, and this holiday is typically a wonderful opportunity for friends and families to reconnect. People being together offers a perfect time for stories to be passed around the holiday table along with helpings of stuffing and mashed potatoes. The potential for these stories to be handed and passed from generation to generation is at a peak while everyone is together. What better way to collect, share, and save these stories from potentially being forgotten than by recording, archiving, and transcribing them for posterity?
We believe that StoryCorps’s The National Day of Listening is the perfect excuse to talk, listen, record, and transcribe.
We live in a special time when we’re not just able to orally pass stories down the line, but we’re also able ensure their archival longevity through the recording and transcribing of these personal and oral histories.
Take the time to find a quiet space, and set up your digital recorder. Test the device to make sure you are recording properly. Then, hit the record button and listen to and record the story. It’s that simple, and it will be a gift to read and listen to for generations. This year, StoryCorps suggests honoring a veteran, and offers suggested conversation starters right on their website.
Don’t lose out on your family history and question yourself after it is too late. We speak from our own missed opportunities.
Wishing you a peaceful Thanksgiving, and the opportunity to listen to, record and transcribe a new story never heard before.
In full disclosure, the Audio Transcription Center has partnered with StoryCorps on transcription of their audio recordings for their published books, Listening is an Act of Love, All There Is, and Mom , that we are humbled and proud to have participated in.