‘Hurtling into the future’: The potential and thorny ethics of generative AI in healthcare
Artificial intelligence is already used widely in healthcare, but generative AI is a big — and somewhat risky — step forward due to rapid deployment of the technology paired with a lack of regulation and oversight, industry experts said at the HIMSS conference in Chicago this week.
As a result, tough ethical questions are dogging generative AI like GPT-4, OpenAI’s latest large language model that backs the premium version of the wildly popular chatbot ChatGPT. Still, generative AI has huge potential to help improve U.S. medical care, proponents say.
One of the biggest companies in the space, Microsoft, has been working with GPT-4 developer OpenAI over the last eight months to understand the technology’s implications for healthcare. Generative AI is showing early success in a number of use cases, including simplifying explanation of benefits notices and writing prior authorization request forms, Peter Lee, corporate vice president of Microsoft Healthcare, said in a keynote panel on Tuesday.
Physicians can give the AI prompts to help them interpret novel medical cases or rewrite a patient conversation in a standardized format, Lee said.
Microsoft recently unveiled plans to embed generative AI into clinical software from Epic, the biggest hospital EHR vendor in the U.S.
Last week, Microsoft and Epic went live with the first sites integrating GPT into EHR workflows to automatically draft replies to patient messages. The companies also are bringing generative AI tools to Epic’s hospital database, allowing laypeople to ask the AI general questions instead of needing a data scientist to query specific data.
Epic is investing a significant amount of R&D resources into generative AI, Seth Howard, Epic’s vice president of R&D, said in an interview.
The health IT company also is researching the use of generative AI to summarize a patient’s medical history, and translate patient-facing materials between different languages and reading levels, to improve health literacy.
“That’s kind of a big category of work,” Howard said. “There are many use cases we’ll put out over the next year.”
Google also is making its own large language model, called Med-PaLM 2, available to a select group of customers to explore use cases. Med-PaLM was specifically trained on medical data, allowing it to sift through and make sense of massive amounts of healthcare information. Still, it has room for improvement in the complexity of queries it can perform and meeting product excellence, according to Google’s AI team.
Med-PaLM won’t be used in patient-facing settings, Google Cloud head Aashima Gupta said in an interview. Instead, hospitals could use the AI to analyze data to help diagnose complex diseases, fill out records or as a concierge for patient portals.
One notable use case for generative AI is streamlining medical notetaking, experts say. Physicians can spend up to six hours a day logging notes in their EHR, which can result in cutting back on time with patients and contribute to burnout.
Leading clinical documentation companies are integrating generative AI technology that they say improve their products’ accuracy and speed.
Nuance, a documentation company owned by Microsoft, said last month that it had integrated GPT-4 into its clinical notetaking software. This summer, providers using existing Nuance documentation products can apply to beta test the application, called DAX Express.
Ten to 15 clients summing 300 to 500 physicians will test the product in private preview in mid-June, before it’s generally available in early fall, said Diana Nole, general manager at Nuance.
Suki, a documentation company that’s partnered with Google, launched its generative AI-powered “Gen 2” voice assistant earlier this month.
Now, Suki can generate a clinical note ambiently from a conversation and fill in the note automatically, CEO Punit Soni told Healthcare Dive. Closer to the end of the year, doctors will also be able ask the AI questions and give it commands, such as plotting a patient’s A1C levels over the last three months, Soni said.
Gen 2 currently is being piloted with “very few” customers, according to the CEO. Suki plans to make it generally available later this year.
”The interest is unbelievable,” Soni said. “If you could have somebody to just do all your notes for you — wouldn’t that be awesome?”
To be sure, generative AI makes mistakes, raising pernicious ethical questions around accuracy, equity and accountability. That means the medical community needs to have a robust ongoing role in deciding whether, when and how this type of AI is used, experts said at HIMSS.
One issue with AI is the black box problem: How can a result be trusted if it’s unclear how the AI got there?
It’s easy to be fooled into thinking that this type of technology is able to reason, but users have to remember generative AI doesn’t engage in deliberation — just predicts the next set of words that makes the most sense, said medical ethicist Reid Blackman.
“Maybe we’re okay with magic that works. That’s one option. The other option is to say okay, if you’re making a cancer diagnosis, I need to absolutely understand the reasons why you’re making that diagnosis… The problem is that GPT doesn’t give reasons. It gives what looks like reasons,” Blackman said on a Tuesday panel.
Generative AI also has issues with accuracy. Large language models are known to hallucinate, or provide answers that are factually incorrect or irrelevant. GPT-4 “still has many known limitations that we are working to address, such as social biases, hallucinations, and adversarial prompts,” OpenAI’s website reads.
Bias is another key worry with AI, especially in healthcare. If an algorithm is trained on biased data, or its results are applied in biased ways, it will reflect and perpetuate those biases, experts said.
Accountability also remains a concern. Since generative AI is such a new field, it’s unclear which stakeholder — the owner? the user? the developer? — is at fault for any negative consequences stemming from its use, according to Kay Firth-Butterfield, CEO of the Centre for Trustworthy Technology.
“Who do you sue when something goes wrong? Is there somebody to sue?” Firth-Butterfield asked during the Tuesday keynote panel.
More than 27,000 stakeholders have signed an open letter asking AI labs to immediately pause training AI systems more powerful than GPT-4 for at least six months. The letter argues that industry should use the moratorium to develop shared safety protocols and work with policymakers to develop AI governance systems.
“You shouldn’t be exploring, engaging in artificial intelligence, without having some understanding of responsible AI issues,” said Firth-Butterfield, who signed the letter. “We are hurtling into the future without actually taking a step back and designing it for ourselves.”
‘Mysterious,’ ‘awe-inspiring,’ ‘frustrating’
AI companies argue many of these problems can be ameliorated with careful human oversight. Organizations should be explicitly aware of the dataset a model was trained on and when precisely it should be used, experts said.
Microsoft’s Lee said while questions about generative AI are important, especially those around accountability, the black box problem might not be as troubling as some say.
“We’ve been unable to provide its incapable [of reason],” Lee said. ”It’s a mysterious and awe-inspiring and frustrating research roadblock that we’ve been unable to prove that. I have to admit, as a scientist, that this black box issue, at some point of development starting with GPT-4, actually might not really exist.”
And there are techniques to ensure generative AI gives a reliable output — a field called prompt engineering, where scientists figure out how to query the AI in a way that produces trustworthy results, Epic’s Howard said.
Both Suki and Nuance declined to share the accuracy rates of their generative AI-based notetakers, but said accuracy was high enough that they’re comfortable instituting them in a real-world setting.
Soni said Suki could publish its Gen 2 accuracy rates in a few months. And Nuance has been working with AI for years, Nole said.
“The level of accuracy we were already producing is advanced. We know how it works. We know how it acts. Now, we’re just utilizing GPT to kind of raise the level,” Nole said.
As AI has advanced, some models have outperformed doctors on diagnostic tasks, creating concerns that machines could one day replace clinicians. Experts at HIMSS agreed that day is nowhere close, even with generative AI.
Med-PaLM’s ability to pass a medical exam is “in no way indicative of medical knowledge or competence,” Alan Karthikesalingam, Google’s health AI research lead, said in an interview.
“There are a lot of material safety, bias and ethical concerns that need to be attended to very carefully,” Karthikesalingam said. But replacement? “I think by and large that on its own is hopefully not a huge concern to us physicians. The positioning of these systems is very much as tools.”
Suki and Nuance stressed that their AI-created notes allow clinicians to edit and approve what is finalized in the EHR. Generative AI is an instrument for doctors, not a substitute for them, AI executives said.
Still, “anyone who can make predictions on what this epoch will bring is a fool,” Soni said. “What I can tell you is that in the foreseeable future, I see AI being a very strong and serious assistant that makes a lot of our lives better.”