While active efforts are advancing medical artificial intelligence (AI) model development and clinical translation, safety issues of the AI models emerge, but little research has been done. We perform a study to investigate the behaviors of an AI diagnosis model under adversarial images generated by Generative Adversarial Network (GAN) models and to evaluate the effects on human experts when visually identifying potential adversarial images. Our GAN model makes intentional modifications to the diagnosis-sensitive contents of mammogram images in deep learning-based computer-aided diagnosis (CAD) of breast cancer. In our experiments the adversarial samples fool the AI-CAD model to output a wrong diagnosis on 69.1% of the cases that are initially correctly classified by the AI-CAD model. Five breast imaging radiologists visually identify 29%-71% of the adversarial samples. Our study suggests an imperative need for continuing research on medical AI model’s safety issues and for developing potential defensive solutions against adversarial attacks.
more »
« less
Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams
Abstract Though consistently shown to detect mammographically occult cancers, breast ultrasound has been noted to have high false-positive rates. In this work, we present an AI system that achieves radiologist-level accuracy in identifying breast cancer in ultrasound images. Developed on 288,767 exams, consisting of 5,442,907 B-mode and Color Doppler images, the AI achieves an area under the receiver operating characteristic curve (AUROC) of 0.976 on a test set consisting of 44,755 exams. In a retrospective reader study, the AI achieves a higher AUROC than the average of ten board-certified breast radiologists (AUROC: 0.962 AI, 0.924 ± 0.02 radiologists). With the help of the AI, radiologists decrease their false positive rates by 37.3% and reduce requested biopsies by 27.8%, while maintaining the same level of sensitivity. This highlights the potential of AI in improving the accuracy, consistency, and efficiency of breast ultrasound diagnosis.
more »
« less
- Award ID(s):
- 1922658
- PAR ID:
- 10308047
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 12
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract AimsNeural network classifiers can detect aortic stenosis (AS) using limited cardiac ultrasound images. While networks perform very well using cart-based imaging, they have never been tested or fine-tuned for use with focused cardiac ultrasound (FoCUS) acquisitions obtained on handheld ultrasound devices. Methods and resultsProspective study performed at Tufts Medical Center. All patients ≥65 years of age referred for clinically indicated transthoracic echocardigraphy (TTE) were eligible for inclusion. Parasternal long axis and parasternal short axis imaging was acquired using a commercially available handheld ultrasound device. Our cart-based AS classifier (trained on ∼10 000 images) was tested on FoCUS imaging from 160 patients. The median age was 74 (inter-quartile range 69–80) years, 50% of patients were women. Thirty patients (18.8%) had some degree of AS. The area under the received operator curve (AUROC) of the cart-based model for detecting AS was 0.87 (95% CI 0.75–0.99) on the FoCUS test set. Last-layer fine-tuning on handheld data established a classifier with AUROC of 0.94 (0.91–0.97). AUROC during temporal external validation was 0.97 (95% CI 0.89–1.0). When performance of the fine-tuned AS classifier was modelled on potential screening environments (2 and 10% AS prevalence), the positive predictive value ranged from 0.72 (0.69–0.76) to 0.88 (0.81–0.97) and negative predictive value ranged from 0.94 (0.94–0.94) to 0.99 (0.99–0.99) respectively. ConclusionOur cart-based machine-learning model for AS showed a drop in performance when tested on handheld ultrasound imaging collected by sonographers. Fine-tuning the AS classifier improved performance and demonstrates potential as a novel approach to detecting AS through automated interpretation of handheld imaging.more » « less
-
Generative artificial intelligence (AI) technology is expected to have a profound impact on chemical education. While there are certainly positive uses, some of which are being actively implemented even now, there is a reasonable concern about its use in cheating. Efforts are underway to detect generative AI usage on open-ended questions, lab reports, and essays, but its detection on multiple choice exams is largely unexplored. Here we propose the use of Rasch analysis to identify the unique behavioral pattern of ChatGPT on General Chemistry II, multiple choice exams. While raw statistics (e.g., average, ability, outfit) were insufficient to readily identify ChatGPT instances, a strategy of fixing the ability scale on high success questions and then refitting the outcomes dramatically enhanced its outlier behavior in terms of Z-standardized out-fit statistic and ability displacement. Setting the detection threshold to a true positive rate (TPR) of 1.0, a false positive rate (FPR) of <0.1 was obtained across a majority of the 20 exams investigated here. Furthermore, the receiver operating characteristic curve (i.e., FPR vs TPR) exhibited outstanding areas under the curve of >0.9 for nearly all exams. While limitations of this method are described and the analysis is by no means exhaustive, these outcomes suggest that the unique behavior patterns of generative AI chat bots can be identified using Rasch modeling and fit statistics.more » « less
-
Breastfeeding provides both nutrients and immunities necessary for infant growth. Understanding the biomechanics of breastfeeding requires capturing both positive and negative pressures exerted by infants on the breast. This clinical experimental work utilizes thin, flexible pressure sensors to capture the positive oral pressures of 7 mother-infant dyads during breastfeeding while simultaneously measuring vacuum pressures and imaging of the infants oral cavity movement via ultrasound. Methods for denoising signals and evaluating ultrasound images are discussed. Changes and deformations on the nipple are evaluated. The results reveal that pressure from the infant’s maxilla and mandible are evenly distributed in an oscillatory pattern corresponding to the vacuum pressure patterns. Variations in nipple dimensions are considerably smaller than variations in either pressure but the ultrasound shows positive pressure dominates structural changes during breastfeeding. Clinical implications for infant-led milk expression and data processing are discussed.more » « less
-
Abstract BackgroundLung cancer is the deadliest and second most common cancer in the United States due to the lack of symptoms for early diagnosis. Pulmonary nodules are small abnormal regions that can be potentially correlated to the occurrence of lung cancer. Early detection of these nodules is critical because it can significantly improve the patient's survival rates. Thoracic thin‐sliced computed tomography (CT) scanning has emerged as a widely used method for diagnosing and prognosis lung abnormalities. PurposeThe standard clinical workflow of detecting pulmonary nodules relies on radiologists to analyze CT images to assess the risk factors of cancerous nodules. However, this approach can be error‐prone due to the various nodule formation causes, such as pollutants and infections. Deep learning (DL) algorithms have recently demonstrated remarkable success in medical image classification and segmentation. As an ever more important assistant to radiologists in nodule detection, it is imperative ensure the DL algorithm and radiologist to better understand the decisions from each other. This study aims to develop a framework integrating explainable AI methods to achieve accurate pulmonary nodule detection. MethodsA robust and explainable detection (RXD) framework is proposed, focusing on reducing false positives in pulmonary nodule detection. Its implementation is based on an explanation supervision method, which uses nodule contours of radiologists as supervision signals to force the model to learn nodule morphologies, enabling improved learning ability on small dataset, and enable small dataset learning ability. In addition, two imputation methods are applied to the nodule region annotations to reduce the noise within human annotations and allow the model to have robust attributions that meet human expectations. The 480, 265, and 265 CT image sets from the public Lung Image Database Consortium and Image Database Resource Initiative (LIDC‐IDRI) dataset are used for training, validation, and testing. ResultsUsing only 10, 30, 50, and 100 training samples sequentially, our method constantly improves the classification performance and explanation quality of baseline in terms of Area Under the Curve (AUC) and Intersection over Union (IoU). In particular, our framework with a learnable imputation kernel improves IoU from baseline by 24.0% to 80.0%. A pre‐defined Gaussian imputation kernel achieves an even greater improvement, from 38.4% to 118.8% from baseline. Compared to the baseline trained on 100 samples, our method shows less drop in AUC when trained on fewer samples. A comprehensive comparison of interpretability shows that our method aligns better with expert opinions. ConclusionsA pulmonary nodule detection framework was demonstrated using public thoracic CT image datasets. The framework integrates the robust explanation supervision (RES) technique to ensure the performance of nodule classification and morphology. The method can reduce the workload of radiologists and enable them to focus on the diagnosis and prognosis of the potential cancerous pulmonary nodules at the early stage to improve the outcomes for lung cancer patients.more » « less
An official website of the United States government
