Brown Univ. study warns of AI challenges in radiology, stressing incorrect feedback's impact on interpretations and need for caution.

What are the consequences of incorrect AI feedback for radiology?

Brown Univ. study warns of AI challenges in radiology, stressing incorrect feedback's impact on interpretations and need for caution.

Artificial intelligence (AI) is especially beneficial for pattern recognition. For this reason, the use of AI has expanded in the field of diagnostic radiology. AI programmes can be trained to detect breast cancer, rib fractures, blood clots in the lungs and brain, as well as a host of other lesions. Radiologists are increasingly relying on AI to assist with these diagnoses. As technology evolves, this trend is set to continue.

However, there are two significant caveats to consider. First, while AI is an exceptionally powerful tool, it isn’t infallible. AI systems can detect abnormalities where none exist (false positives, FP) and might occasionally overlook abnormalities that are present (false negatives, FN).

Secondly, we are far from the stage where we can allow an AI system to solely interpret a mammogram without the oversight of a radiologist who has undergone years of training to identify and diagnose a myriad of often subtle diseases that may be concealed in the nuances of x-rays. In essence, discussions about AI in radiology are effectively about the interplay between AI and radiologists i.e., the AI-radiologist interaction. AI should therefore be primarily be viewed as a supportive tool for radiologists.

Our team at the Brown Radiology Human Factors Lab has been exploring how AI and radiologists jointly interpret images, or, in other words, how AI influences a radiologist’s interpretation. In a recent study, we aimed to address two key questions: 1) If AI makes an error, will it mislead the radiologist? 2) If erroneous AI impairs the radiologists’ performance, is there a way to counteract this effect?

Brown Univ. study warns of AI challenges in radiology, stressing incorrect feedback's impact on interpretations and need for caution.
Credit. Midjourney

The research

To examine this empirically, we had radiologists interpret the same 90 chest radiographs on four occasions at least one month apart. Half of the time, there was lung cancer. In one condition, radiologists did not have any AI to help them. In the other four conditions, radiologists interpreted cases with the help of an AI system that said an image was “abnormal” or “normal”. Radiologists thought the purpose of the study was to compare each of the four different AI systems, but in reality, the goal was to examine how radiologists responded to the AI feedback was given.

In one condition, radiologists were told that the AI feedback would be deleted from the patient’s file. In another condition, radiologists were told that the AI feedback would be kept in the patient’s file. In the final condition, radiologists were also told AI feedback would be kept, but a box was placed around the putative pathology.

Unknown to the radiologists, they were in fact using the same AI system across all scenarios. The rationale behind this was straightforward: To ascertain whether our alterations in the presentation of feedback influenced the radiologists’ performance, the AI feedback needed to remain consistent.

As stated earlier, the study’s objective was to examine how radiologists react to inaccurate AI feedback. Consequently, we centered our analysis on the 12 (out of 90) instances where the AI made an error. In 8 of these instances, the AI indicated a lung abnormality when the lung was actually normal (FP), whereas in 4 cases, the AI failed to detect a genuine lung abnormality (FN).

Research question 1: Will incorrect AI mislead the radiologist?

The answer to this question was a resounding “yes.” In the four instances where the AI produced a False Negative, radiologists rarely overlooked any abnormalities when interpreting without the aid of AI. However, when they used the AI and failed to identify an abnormality, radiologists also frequently missed it. Likewise, in the 8 instances where the AI resulted in a False Positive, the AI feedback correspondingly elevated the radiologist’s False Positive rate compared to when no AI feedback was given.

Research Question 2: Can we mitigate the impact of incorrect AI?

The answer to this question is also clearly “yes.” The False Positive rate diminished when radiologists believed that the (erroneous) AI feedback would be removed from the patient’s record compared to when they thought it would be retained. Furthermore, the False Negative rate was reduced when a box was added around the purported pathology compared to scenarios without such highlighting.

What should we make of these findings?

Radiologists, like everyone else, are susceptible to influence and bias. The susceptibility of a radiologist to inaccurate AI is not unexpected, but it underscores the need for caution when integrating AI into radiology. Radiologists need to feel empowered to disagree with AI, as AI will make mistakes.

Encouragingly, our research indicates that there are methods to enhance the probability of radiologists accurately challenging AI conclusions. Firstly, when AI identifies a pathology, there should be a clear indication of its suspected location. 

Secondly, if there’s a record of a radiologist contradicting AI, the likelihood of them doing so decreases. We need to contemplate whether we should store AI results or if AI should exclusively be a tool accessible to the radiologist. This presents a genuine dilemma. While the medical field rightly gravitates towards increased transparency for patients, we must recognise that such openness, especially in the realm of AI in radiology, might come with repercussions.

In the near future, AI will be integral to almost every radiology practice. A crucial responsibility of experimental psychologists is to evaluate various approaches to how AI is implemented. While this might appear to be a theoretical concern, considering the vast amount of radiological imaging that AI could interpret, for some individuals, the implications could be life-altering.


Journal reference

Bernstein, M. H., Atalay, M. K., Dibble, E. H., Maxwell, A. W., Karam, A. R., Agarwal, S., … & Baird, G. L. (2023). Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography. European Radiology, 1-7.

Michael H. Bernstein, Ph.D., is an assistant professor of diagnostic imaging at the Warren Alpert Medical School of Brown University and a research scientist at Rhode Island Hospital. He is the director of the Medical Expectations Lab and an editor of the forthcoming book “The Nocebo Effect: When Words Make You Sick.”

Dr. Atalay is a radiologist at Rhode Island Medical Imaging and a Professor of Diagnostic Imaging and Medicine (Cardiology) at Brown University specializing in cross-sectional imaging and cardiac MRI and CT. He holds MD and PhD degrees in biomedical engineering from The Johns Hopkins School of Medicine where he also completed his radiology training. At Brown, he serves as Vice-Chair of Imaging Research, Director of Cardiac MRI and CT, and Medical Director of the Brown Radiology Human Factors Lab.

Grayson Baird, PhD, PSTAT®, is an Associate Professor in Diagnostic Imaging. In addition to offering design and statistical guidance, Dr. Baird also serves as the director of the Brown Radiology Human Factors Lab. He is a senior research scientist with the Lifespan Biostatistics Core at Rhode Island Hospital.