Speech Recognition Maligned?

June 24, 2013

I recently came across a posting on AuntMinnie.com regarding a study published in the American Journal of Roentgenology ( AJR, October 2011, Vol. 197:4, pp. 1-5) that describes how “breast imaging reports generated with automated speech recognition software are eight times more likely to contain major errors than those generated with conventional dictation transcription.”

I recently came across a posting on AuntMinnie.com regarding a study published in the American Journal of Roentgenology ( AJR, October 2011, Vol. 197:4, pp. 1-5) that describes how “breast imaging reports generated with automated speech recognition software are eight times more likely to contain major errors than those generated with conventional dictation transcription.” The implication would be that automated speech recognition (ASR) is somehow to blame for the errors. I would subscribe that this is indicative of poor ASR implementation practices, and not the ASR application itself.

I base my conclusion on my own experience as an early reseller of the original IBM MedSpeak/Radiology product, dating back to the mid-1990’s. Radiologists are used to dictating and having a transcriptionist edit their dictation. In many situations, the sheer volume of dictated reports results in the radiologist relying on the transcriptionist to catch obvious errors and address questionable ones with the radiologist, so that signoff is a mere formality. Oftentimes, a radiologist will sign the stack of reports with no further review other than a casual glance. Such an approach worked well, but was time consuming and expensive.

Enter Automated Speech Recognizer (ASR) technology. Such applications were originally designed to enable the dictator to self-edit their dictation and make corrections in misrecognition. Later iterations included workflow models that allowed a transcriptionist “editor” to make the corrections to satisfy the criticism of radiology groups that felt self-editing was “beneath them” or time consuming.

To understand the issue with editing, one has to understand the mechanics of ASR applications. Most use multiple methods of converting speech into text. First, the application listens to the sound and makes a “best guess” as to the word dictated based on frequency tables of the sounds, or Phonemes. Next, many applications use different rule sets to improve the context of the transcription. Specialized dictation limits the available word choices depending on the vocabulary, or topic. For example, radiology’s vocabulary is likely different from that of pathology, so words in the vocabulary have a higher probability of selection. Lastly, the applications look at the context of the words, and act to predict the likely next word. For example in the phrase “clinical indication: colon cancer” the punctuation “:” and the word colon sound the same. How does the system know that the first sound is punctuation and the second is anatomy? “Clinical indication” is interpreted as a header phrase, and punctuation usually does not follow punctuation, so the first is interpreted as punctuation and the second as anatomy.

With that explanation as background, how does it relate to a misunderstanding of errors? In the use of ASR applications, the transcribed text is associated with the speech until the report is saved, at which time the application updates its sound and context rules based on the text associated with the speech. If one accepts or saves a report without correcting errors, the system erroneously assumes the text is a correct interpretation of the sound. Therefore, if errors are not corrected, the accuracy of the system declines, as the system is erroneously learning incorrectly.

I would subscribe that this is the major factor in the errors reported in the aforementioned study in AJR. If the radiologist who dictated the report does not correct the errors, the system will continue to make errors. My most shining example of this was a personal experience in demonstrating the technology at the University of Iowa many years ago. To preclude any chance of tricking the system with a canned report, they handed me a previously dictated report and asked me to dictate it verbatim. My initial embarrassment over “Barnhart Catheter” being misinterpreted as “Barn Yard Catheter” was alleviated when I corrected the text and added a pronunciation, using the applications correction utility. My subsequent dictation was accurate to their utter amazement!

The bottom line is that I believe both vendors and support staff sometimes overlook the value of stressing to users the need to review their dictation and correct errors to improve accuracy. If more attention were paid to behavior modification in the implementation of ASR technology, I submit there would be fewer errors in resultant reports. Perhaps the researchers who conducted the study should enforce error corrections, and repeat the study. I would subscribe that the results would be significantly improved! For those considering ASR applications don’t be dissuaded by such research! Instead, make sure that adequate training and follow up is included in the contract.