In public, hospitals rave about artificial intelligence. They trumpet the technology in press releases, plaster its use on billboards, and sprinkle AI into speeches touting its ability to detect diseases earlier and make health care faster, better, and cheaper.
But on the front lines, the hype is smashing into a starkly different reality.
Caregivers complain AI models are unreliable and of limited value. Tools designed to warn of impending illnesses are inconsistent and sometimes difficult to interpret. Even evaluating them for accuracy, and susceptibility to bias, is still an unsettled science.
A new report aims to drag these tensions into the open through interviews with physicians and data scientists struggling to implement AI tools in health care organizations nationwide. Their unvarnished reviews, compiled by researchers at Duke University, reveal a yawning gap between the marketing of AI and the months, sometimes years, of toil it takes to get the technology to work the right way in the real world.
“I don’t think we even really have a great understanding of how to measure an algorithm’s performance, let alone its performance across different race and ethnic groups,” one interview subject told researchers. The interviews, kept anonymous to allow for candor, were conducted at a dozen health care organizations, including insurers and large academic hospitals such as Mayo Clinic, Kaiser Permanente, New York Presbyterian, and University of California San Francisco.
The research team, dubbed the Health AI Partnership, has leveraged the findings to build an online guide to help health systems overcome implementation barriers that most organizations now stumble through alone. It’s a desperately needed service at a time when adoption of AI for decision-making in medicine is outpacing efforts to oversee its use.
“We need a safe space where people can come and discuss these problems openly,” Suresh Balu, an associate dean of innovation at Duke’s medical school who helped lead the research. “We wanted to create something that was simple and effective to help put AI into practice.”
The challenges uncovered by the project point to a dawning realization about AI’s use in health care: building the algorithm is the easiest part of the work. The real difficulty lies in figuring out how to incorporate the technology into the daily routines of doctors and nurses, and the complicated care-delivery and technical systems that surround them. AI must be finely tuned to those environments and evaluated within them, so that its benefits and costs can be clearly understood and compared.
As it stands, health systems are not set up to do that work — at least not across the board. Many are hiring more data scientists and engineers. But those specialists often work in self-contained units that help build or buy AI models and then struggle behind the scenes to keep them working properly.
“Each health system is kind of inventing this on their own,” said Michael Draugelis, a data scientist at Hackensack Meridian Health System in New Jersey. He noted that the problems are not just technical, but also legal and ethical, requiring a broad group of experts to help address them.
The Health AI Partnership’s findings highlight the need for a more systematic approach to that work, especially amid the rush to harness the power of large language models such as ChatGPT. The process shouldn’t start with a press release, experts said, but with a deeper consideration of what problems AI can help solve in health care and how to surround them with effective oversight.
STAT spoke with data scientists, lawyers, bioethicists, and other experts from within the partnership about the biggest challenges that emerged during the research, and how they are attacking them on the front lines. Here’s a closer look.
“So, we’ll have situations where faculty will have a connection with a comp