Artificial intelligence (AI) is especially beneficial for pattern recognition. For this reason, the use of AI has expanded in the field of diagnostic radiology. AI programmes can be trained to detect breast cancer, rib fractures, blood clots in the lungs and brain, as well as a host of other lesions. Radiologists are increasingly relying on AI to assist with these diagnoses. As technology evolves, this trend is set to continue.
However, there are two significant caveats to consider. First, while AI is an exceptionally powerful tool, it isn’t infallible. AI systems can detect abnormalities where none exist (false positives, FP) and might occasionally overlook abnormalities that are present (false negatives, FN).
Secondly, we are far from the stage where we can allow an AI system to solely interpret a mammogram without the oversight of a radiologist who has undergone years of training to identify and diagnose a myriad of often subtle diseases that may be concealed in the nuances of x-rays. In essence, discussions about AI in radiology are effectively about the interplay between AI and radiologists i.e., the AI-radiologist interaction. AI should therefore be primarily be viewed as a supportive tool for radiologists.
Our team at the Brown Radiology Human Factors Lab has been exploring how AI and radiologists jointly interpret images, or, in other words, how AI influences a radiologist’s interpretation. In a recent study, we aimed to address two key questions: 1) If AI makes an error, will it mislead the radiologist? 2) If erroneous AI impairs the radiologists’ performance, is there a way to counteract this effect?
To examine this empirically, we had radiologists interpret the same 90 chest radiographs on four occasions at least one month apart. Half of the time, there was lung cancer. In one condition, radiologists did not have any AI to help them. In the other four conditions, radiologists interpreted cases with the help of an AI system that said an image was “abnormal” or “normal”. Radiologists thought the purpose of the study was to compare each of the four different AI systems, but in reality, the goal was to examine how radiologists responded to the AI feedback was given.
In one condition, radiologists were told that the AI feedback would be deleted from the patient’s file. In another condition, radiologists were told that the AI feedback would be kept in the patient’s file. In the final condition, radiologists were also told AI feedback would be kept, but a box was placed around the putative pathology.
Unknown to the radiologists, they were in fact using the same AI system across all scenarios. The rationale behind this was straightforward: To ascertain whether our alterations in the presentation of feedback influenced the radiologists’ performance, the AI feedback needed to remain consistent.
As stated earlier, the study’s objective was to examine how radiologists react to inaccurate AI feedback. Consequently, we centered our analysis on the 12 (out of 90) instances where the AI made an error. In 8 of these instances, the AI indicated a lung abnormality when the lung was actually normal (FP), whereas in 4 cases, the AI failed to detect a genuine lung abnormality (FN).
Research question 1: Will incorrect AI mislead the radiologist?
The answer to this question was a resounding “yes.” In the four instances where the AI produced a False Negative, radiologists rarely overlooked any abnormalities when interpreting without the aid of AI. However, when they used the AI and failed to identify an abnormality, radiologists also frequently missed it. Likewise, in the 8 instances where the AI resulted in a False Positive, the AI feedback correspondingly elevated the radiologist’s False Positive rate compared to when no AI feedback was given.
Research Question 2: Can we mitigate the impact of incorrect AI?
The answer to this question is also clearly “yes.” The False Positive rate diminished when radiologists believed that the (erroneous) AI feedback would be removed from the patient’s record compared to when they thought it would be retained. Furthermore, the False Negative rate was reduced when a box was added around the purported pathology compared to scenarios without such highlighting.
What should we make of these findings?
Radiologists, like everyone else, are susceptible to influence and bias. The susceptibility of a radiologist to inaccurate AI is not unexpected, but it underscores the need for caution when integrating AI into radiology. Radiologists need to feel empowered to disagree with AI, as AI will make mistakes.
Encouragingly, our research indicates that there are methods to enhance the probability of radiologists accurately challenging AI conclusions. Firstly, when AI identifies a pathology, there should be a clear indication of its suspected location.
Secondly, if there’s a record of a radiologist contradicting AI, the likelihood of them doing so decreases. We need to contemplate whether we should store AI results or if AI should exclusively be a tool accessible to the radiologist. This presents a genuine dilemma. While the medical field rightly gravitates towards increased transparency for patients, we must recognise that such openness, especially in the realm of AI in radiology, might come with repercussions.
In the near future, AI will be integral to almost every radiology practice. A crucial responsibility of experimental psychologists is to evaluate various approaches to how AI is implemented. While this might appear to be a theoretical concern, considering the vast amount of radiological imaging that AI could interpret, for some individuals, the implications could be life-altering.
Bernstein, M. H., Atalay, M. K., Dibble, E. H., Maxwell, A. W., Karam, A. R., Agarwal, S., … & Baird, G. L. (2023). Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography. European Radiology, 1-7. https://doi.org/10.1007/s00330-023-09747-1