skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Title: Machine learning-based lifetime breast cancer risk reclassification compared with the BOADICEA model: impact on screening recommendations
Abstract BackgroundThe clinical utility of machine-learning (ML) algorithms for breast cancer risk prediction and screening practices is unknown. We compared classification of lifetime breast cancer risk based on ML and the BOADICEA model. We explored the differences in risk classification and their clinical impact on screening practices. MethodsWe used three different ML algorithms and the BOADICEA model to estimate lifetime breast cancer risk in a sample of 112,587 individuals from 2481 families from the Oncogenetic Unit, Geneva University Hospitals. Performance of algorithms was evaluated using the area under the receiver operating characteristic (AU-ROC) curve. Risk reclassification was compared for 36,146 breast cancer-free women of ages 20–80. The impact on recommendations for mammography surveillance was based on the Swiss Surveillance Protocol. ResultsThe predictive accuracy of ML-based algorithms (0.843 ≤ AU-ROC ≤ 0.889) was superior to BOADICEA (AU-ROC = 0.639) and reclassified 35.3% of women in different risk categories. The largest reclassification (20.8%) was observed in women characterised as ‘near population’ risk by BOADICEA. Reclassification had the largest impact on screening practices of women younger than 50. ConclusionML-based reclassification of lifetime breast cancer risk occurred in approximately one in three women. Reclassification is important for younger women because it impacts clinical decision- making for the initiation of screening.  more » « less
Award ID(s):
1734853
PAR ID:
10163688
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
British Journal of Cancer
Volume:
123
Issue:
5
ISSN:
0007-0920
Page Range / eLocation ID:
p. 860-867
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Breast cancer is the most common cancer detected in women and current screening methods for the disease are not sensitive. Volatile organic compounds (VOCs) include endogenous metabolites that provide information about health and disease which might be useful to develop a better screening method for breast cancer. The goal of this study was to classify mice with and without tumors and compare tumors localized to the mammary pad and tumor cells injected into the iliac artery by differences in VOCs in urine. After 4T1.2 tumor cells were injected into BALB/c mice either in the mammary pad or into the iliac artery, urine was collected, VOCs from urine headspace were concentrated by solid phase microextraction and results were analyzed by gas chromatography-mass spectrometry quadrupole time-of-flight. Multivariate and univariate statistical analyses were employed to find potential biomarkers for breast cancer and metastatic breast cancer in mice models. A set of six VOCs classified mice with and without tumors with an area under the receiver operator characteristic (ROC AUC) of 0.98 (95% confidence interval [0.85, 1.00]) via five-fold cross validation. Classification of mice with tumors in the mammary pad and iliac artery was executed utilizing a different set of six VOCs, with a ROC AUC of 0.96 (95% confidence interval [0.75, 1.00]). 
    more » « less
  2. Abstract BackgroundBreast cancer poses a significant health risk to women worldwide, with approximately 30% being diagnosed annually in the United States. The identification of cancerous mammary tissues from non-cancerous ones during surgery is crucial for the complete removal of tumors. ResultsOur study innovatively utilized machine learning techniques (Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Network (CNN)) alongside Raman spectroscopy to streamline and hasten the differentiation of normal and late-stage cancerous mammary tissues in mice. The classification accuracy rates achieved by these models were 94.47% for RF, 96.76% for SVM, and 97.58% for CNN, respectively. To our best knowledge, this study was the first effort in comparing the effectiveness of these three machine-learning techniques in classifying breast cancer tissues based on their Raman spectra. Moreover, we innovatively identified specific spectral peaks that contribute to the molecular characteristics of the murine cancerous and non-cancerous tissues. ConclusionsConsequently, our integrated approach of machine learning and Raman spectroscopy presents a non-invasive, swift diagnostic tool for breast cancer, offering promising applications in intraoperative settings. 
    more » « less
  3. Background: We investigated the association between reproductive risk factors and breast cancer subtype in Black women. On the basis of the previous literature, we hypothesized that the relative prevalence of specific breast cancer subtypes might differ according to reproductive factors. Methods: We conducted a pooled analysis of 2,188 (591 premenopausal, 1,597 postmenopausal) Black women with a primary diagnosis of breast cancer from four studies in the southeastern United States. Breast cancers were classified by clinical subtype. Case-only polytomous logistic regression models were used to estimate ORs and 95% confidence intervals (CI) for HER2+ and triple-negative breast cancer (TNBC) status in relation to estrogen receptor–positive (ER+)/HER2− status (referent) for reproductive risk factors. Results: Relative to women who had ER+/HER2− tumors, women who were age 19–24 years at first birth (OR, 1.78; 95% CI, 1.22–2.59) were more likely to have TNBC. Parous women were less likely to be diagnosed with HER2+ breast cancer and more likely to be diagnosed with TNBC relative to ER+/HER2− breast cancer. Postmenopausal parous women who breastfed were less likely to have TNBC [OR, 0.65 (95% CI, 0.43–0.99)]. Conclusions: This large pooled study of Black women with breast cancer revealed etiologic heterogeneity among breast cancer subtypes. Impact: Black parous women who do not breastfeed are more likely to be diagnosed with TNBC, which has a worse prognosis, than with ER+/HER2− breast cancer. 
    more » « less
  4. Abstract MotivationBreast cancer is a type of cancer that develops in breast tissues, and, after skin cancer, it is the most commonly diagnosed cancer in women in the United States. Given that an early diagnosis is imperative to prevent breast cancer progression, many machine learning models have been developed in recent years to automate the histopathological classification of the different types of carcinomas. However, many of them are not scalable to large-scale datasets. ResultsIn this study, we propose the novel Primal-Dual Multi-Instance Support Vector Machine to determine which tissue segments in an image exhibit an indication of an abnormality. We derive an efficient optimization algorithm for the proposed objective by bypassing the quadratic programming and least-squares problems, which are commonly employed to optimize Support Vector Machine models. The proposed method is computationally efficient, thereby it is scalable to large-scale datasets. We applied our method to the public BreaKHis dataset and achieved promising prediction performance and scalability for histopathological classification. Availability and implementationSoftware is publicly available at: https://1drv.ms/u/s!AiFpD21bgf2wgRLbQq08ixD0SgRD?e=OpqEmY. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  5. BackgroundRisk-based screening for lung cancer is currently being considered in several countries; however, the optimal approach to determine eligibility remains unclear. Ensemble machine learning could support the development of highly parsimonious prediction models that maintain the performance of more complex models while maximising simplicity and generalisability, supporting the widespread adoption of personalised screening. In this work, we aimed to develop and validate ensemble machine learning models to determine eligibility for risk-based lung cancer screening. Methods and findingsFor model development, we used data from 216,714 ever-smokers recruited between 2006 and 2010 to the UK Biobank prospective cohort and 26,616 high-risk ever-smokers recruited between 2002 and 2004 to the control arm of the US National Lung Screening (NLST) randomised controlled trial. The NLST trial randomised high-risk smokers from 33 US centres with at least a 30 pack-year smoking history and fewer than 15 quit-years to annual CT or chest radiography screening for lung cancer. We externally validated our models among 49,593 participants in the chest radiography arm and all 80,659 ever-smoking participants in the US Prostate, Lung, Colorectal and Ovarian (PLCO) Screening Trial. The PLCO trial, recruiting from 1993 to 2001, analysed the impact of chest radiography or no chest radiography for lung cancer screening. We primarily validated in the PLCO chest radiography arm such that we could benchmark against comparator models developed within the PLCO control arm. Models were developed to predict the risk of 2 outcomes within 5 years from baseline: diagnosis of lung cancer and death from lung cancer. We assessed model discrimination (area under the receiver operating curve, AUC), calibration (calibration curves and expected/observed ratio), overall performance (Brier scores), and net benefit with decision curve analysis.Models predicting lung cancer death (UCL-D) and incidence (UCL-I) using 3 variables—age, smoking duration, and pack-years—achieved or exceeded parity in discrimination, overall performance, and net benefit with comparators currently in use, despite requiring only one-quarter of the predictors. In external validation in the PLCO trial, UCL-D had an AUC of 0.803 (95% CI: 0.783, 0.824) and was well calibrated with an expected/observed (E/O) ratio of 1.05 (95% CI: 0.95, 1.19). UCL-I had an AUC of 0.787 (95% CI: 0.771, 0.802), an E/O ratio of 1.0 (95% CI: 0.92, 1.07). The sensitivity of UCL-D was 85.5% and UCL-I was 83.9%, at 5-year risk thresholds of 0.68% and 1.17%, respectively, 7.9% and 6.2% higher than the USPSTF-2021 criteria at the same specificity. The main limitation of this study is that the models have not been validated outside of UK and US cohorts. ConclusionsWe present parsimonious ensemble machine learning models to predict the risk of lung cancer in ever-smokers, demonstrating a novel approach that could simplify the implementation of risk-based lung cancer screening in multiple settings. 
    more » « less