Founded in 1994, Nuance Communications provides speech recognition technology so physicians can dictate information into EMRs, freeing them to spend more time with patients. The Burlington, MA-based company currently does $1 billion business in the health space and has more than half of all U.S. physicians on its speech products. Joe Petro, senior vice president of healthcare R&D, recently spoke with Healthcare Dive about how far speech recognition technology has come in the last decade, where it is going and how that affects physician practice and healthcare generally.
Healthcare Dive: How far has speech recognition changed as a tool for physicians?
Joe Petro: Speech recognition has changed a lot just in the seven years that I’ve been at Nuance. And the way we sort of measure the “goodness” of speech recognition is through accuracy or word error rate. If you look where things started 15 years ago, speech recognition was in the 75% accuracy range, which meant one in four words were incorrect. That was a real challenge, because if you think about how a physician might use something, if a product was producing a one in four WER, that would be very difficult to have a sustainable workflow and produce something that was actually usable. What ended up happening was that speech was applied in such a way that the problem was very, very narrow. For example, if there were only three or four words you were selecting, the words sounded quite different from one another and you could ratchet the accuracy way up to 90%. But if you were generally narrating, the error rate was actually quite high, in the 25% range, so only 75% accuracy.
Over the years, speech recognition has become a lot more accurate. Right out of the box, we’re now in the 95% range. The performance is very high fidelity with real time, so you can put the speech into the system and the system can write the words on the paper as fast as you can actually speak them. Not only does it do speech recognition, but also command and control interfaces. So with an EMR, you can input the patient’s information and actually control the interface as well.
What are some of the key ways it’s being used to improve physicians’ relationship with technology?
Petro: As you know, EMRs have sort of a mixed level of reception among physicians. It’s putting a lot of demands and regulations on physicians to capture information inside the EMR, and for many physicians it is getting in the way of patient care. We feel, and we have measured these results, that speech recognition gives the physician more time to get back to the bedside. So speech recognition has made the data entry and documentation experience much more natural.
The other thing that’s happened over the last several years is physicians have become mobile and they want the same kind of documentation experience that they have on their desktop on their laptop or iPad or iPhone. And because we can deliver speech thru the cloud now and stream that experience down to the device, we’ve created something called form factor agnostic speech. That essentially means the physician can use our technology anytime, anywhere on any device and get the same experience. We feel like that’s been a real game changer and that’s really just come into play over the last few years.
How does this change or enhance the physician-patient relationship?
Petro: This is all about getting the physician back to the bedside, giving them a little more time in the day to actually treat patients. So in terms of documentation, something that would take the physician 15 minutes can now be done in five. It’s really easy to see that speech recognition is creating more bandwidth between the physician and patient and creating a more natural experience. It also improves the doctor-patient experience because the patient can listen to what the physician is actually saying and entering into the record versus the physician privately entering the documentation.
How are the interoperability and functionality currently?
Petro: From an interoperability point of view, we work with all of the major EMR companies — Epic, Allscripts, eClinical Works, you name it. The experience is a deeply integrated experience. The physician basically pulls up the EMR, automatically log into Dragon, click into the field, pick up the microphone or use their iPhone and enter speech directly to the point of documentation. All of that happens in real time if it’s a front-end experience.
As for functionality, what we’ve added lately has really been focused around this “form factor agnostic” notion of where we’re enabling speech on more devices and making the experience on those devices more powerful. And over the last year and a half to two years, we’ve crossed that line where there’s no functionality gap between the desktop experience and the mobile experience.
Where do you see the integration going in the future?
Petro: What’s going to happen is understanding natural speech, or clinical language understanding. We’re marrying that into the speech workflow, so as you’re speaking, we’re mining the things that you narrate — patient complaint, active medication, allergies, social habits, etc. By harvesting that out of the narrative, we make that information actionable. So you can leverage this information for things like population health, clinical documentation improvement, quality metrics, clinical outcomes. What’s happening is the functionality is extending not only from an experiential point of view — the quality and accuracy experience — but also extending into new workflows and new use models that are basically driven by natural language processing. So speech and natural language processing, or language understanding, are becoming inseparable.
How fast are hospitals and healthcare practices moving to adopt speech recognition technology?
Petro: We have about 500,000 physicians on our products, between 50% and 66% of the market. From a transcription point of view, we do about 5 billion lines a year, which is about half the market. And especially what’s happening with transcription, as EMRs get installed, the transcription market is converting over to front-end speech recognition. I would say that at any account that installs Dragon, anywhere between 50% and 100% of doctors actually adopt it. We don’t really see any slowdown relative to the adoption rate for speech technology. Speech is pretty much mainstreamed into the IT infrastructure at this point.
Are there other companies working on speech recognition in the medical space, and do you see this niche heating up as the technology becomes more widespread?
Petro: There are a lot of players out there. Google has it. Microsoft and Cortana. There are a couple of small players in our space as well. We’re the big gorilla in speech with a $1 billion business now. We feel like we’ve been working in a heated up market for a period of time. When I got here, the division was about $220 million, so in seven years we’ve more than quadrupled. The technology is becoming widespread, and I think on the consumer side, Nuance, as well as companies like Google and Microsoft, are making it more and more natural for us to talk to our iPhone. So I think this is all working for us and it’s helping advance the technology, because the more comfortable everybody feels using it, speech is one of the best ways to communicate.
What do you see as the major challenges in speech recognition going forward?
From an accuracy point of view, we’re on the last mile. Right out of the box now, just about anybody can expect a 95% accuracy rate. What we’re working on, and we’re spending tens of millions of dollars per year from a research point of view, is closing the 5% gap so it turns into 1%. Some radiologists already have a 99% accuracy rate, and we’re working on their 1%.
A lot of the functionality that we’ve been adding new workflows and capabilities around the mobile physician and making sure that the mobile experience is on par with the experience that physicians have gotten used to on their desktop. And we’re pretty much there at this point. There are a few things we’re adding here and there. You’ll see extensions from a technology point of view. The cloud enables a lot of very, very interesting things like I can pick my phone up and dictate, put my phone down and move to my desktop, and my dictation is already there. I can have the exact same experience if I’m using a partners mobile application like Epic’s Haiku or. Epic Power Chart Touch, because my ID sort of follows me around. So these things are all pointed in the direction of making speech so incredibly natural that it doesn’t feel like a tool anymore; it just feels like part of a very natural experience in terms of working with the electronic medical record.
How sensitive is speech recognition today when you’re dealing with different accents and speech patterns?
We have accent-based models, and within the first four seconds of speaking, the technology switches you to the appropriate model for the accent. We’ve got several accent models just within the United States, and when our technology hears you’ve got a southwestern accent, it plugs you into the southwestern model and that model becomes part of your experience moving forward. The same thing with American Irish, Indian, different Asian accents — those are all different models that we support. We also support about 75 different languages.
How is user experience measured in terms of accuracy and satisfaction?
We measure a variety of things, but one of the things we track on the cloud, for example, is accumulated change rate, or ACR. That basically means that when a physician is interacting with a document — they speak, text gets delivered, they review and modify the text — we watch how much they fiddle around with the document and the amount of changes they make. So if it misses a word like “not,” that’s a big deal. And we try to make sure that we screen for that when we’re looking at ACR. One of the things that we try to do is pool physicians into a cohort group, so if we’ve got a number of physicians who are in the 98% to 99% range and their experience is really, really positive, we’ll compare and contrast that to a cohort that is in the 92% to 93% range. We’ll ask ourselves, is there anything we can derive from the data that tells us that they’re doing something differently, and if we have somebody on site, we might interview those physicians and try to figure out what they’re doing differently. The cloud has really changed the game and everything is de-identified, so it’s all HIPAA-compliant and physicians opt into the program. So it’s all part of what we do.