skip to main content

Title: Generalizability and Transportability of the National Lung Screening Trial Data: Extending Trial Results to Different Populations
Abstract Background:

Randomized controlled trials (RCT) play a central role in evidence-based healthcare. However, the clinical and policy implications of implementing RCTs in clinical practice are difficult to predict as the studied population is often different from the target population where results are being applied. This study illustrates the concepts of generalizability and transportability, demonstrating their utility in interpreting results from the National Lung Screening Trial (NLST).


Using inverse-odds weighting, we demonstrate how generalizability and transportability techniques can be used to extrapolate treatment effect from (i) a subset of NLST to the entire NLST population and from (ii) the entire NLST to different target populations.


Our generalizability analysis revealed that lung cancer mortality reduction by LDCT screening across the entire NLST [16% (95% confidence interval [CI]: 4–24)] could have been estimated using a smaller subset of NLST participants. Using transportability analysis, we showed that populations with a higher prevalence of females and current smokers had a greater reduction in lung cancer mortality with LDCT screening [e.g., 27% (95% CI, 11–37) for the population with 80% females and 80% current smokers] than those with lower prevalence of females and current smokers.


This article illustrates how generalizability and transportability methods extend estimation of more » RCTs' utility beyond trial participants, to external populations of interest, including those that more closely mirror real-world populations.


Generalizability and transportability approaches can be used to quantify treatment effects for populations of interest, which may be used to design future trials or adjust lung cancer screening eligibility criteria.

« less
; ; ; ; ;
Publication Date:
Journal Name:
Cancer Epidemiology, Biomarkers & Prevention
Page Range or eLocation-ID:
p. 2227-2234
DOI PREFIX: 10.1158
Sponsoring Org:
National Science Foundation
More Like this
  1. e20551 Background: Enzyme activity is at the center of all biological processes. When these activities are misregulated by changes in sequence, expression, or activity, pathologies emerge. Misregulation of protease enzymes such as Matrix Metalloproteinases and Cathepsins play a key role in the pathophysiology of cancer. We describe here a novel class of graphene-based, cost effective biosensors that can detect altered protease activation in a blood sample from early stage lung cancer patients. Methods: The Gene Expression Omnibus (GEO) tool was used to identify proteases differentially expressed in lung cancer and matched normal tissue. Biosensors were assembled on a graphene backbone annotated with one of a panel of fluorescently tagged peptides. The graphene quenches fluorescence until the peptide is either cleaved by active proteases or altered by post-translational modification. 19 protease biosensors were evaluated on 431 commercially collected serum samples from non-lung cancer controls (69%) and pathologically confirmed lung cancer cases (31%) tested over two independent cohorts. Serum was incubated with each of the 19 biosensors and enzyme activity was measured indirectly as a continuous variable by a fluorescence plate reader. Analysis was performed using Emerge, a proprietary predictive and classification modeling system based on massively parallel evolving “Turing machine” algorithms.more »Each analysis stratified allocation into training and testing sets, and reserved an out-of-sample validation set for reporting. Results: 256 clinical samples were initially evaluated including 35% cancer cases evenly distributed across stages I (29%), II (26%), III (24%) and IV (21%). The case controls included common co-morbidies in the at-risk population such as COPD, chronic bronchitis, and benign nodules (19%). Using the Emerge classification analysis, biosensor biomarkers alone (no clinical factors) demonstrated Sensitivity (Se.) = 92% (CI 82%-99%) and Specificity (Sp.) = 82% (CI 69%-91%) in the out-of-sample set. An independent cohort of 175 clinical cases (age 67±8, 52% male) focused on early detection (26% cancer, 70% Stage I, 30% Stage II/III) were similarly evaluated. Classification showed Se. = 100% (CI 79%-100%) and Sp. = 93% (CI 80%-99%) in the out-of-sample set. For the entire dataset of 175 samples, Se. = 100% (CI 92%-100%) and Sp. = 97% (CI 92%-99%) was observed. Conclusions: Lung cancer can be treated if it is diagnosed when still localized. Despite clear data showing screening for lung cancer by Low Dose Computed Tomography (LDCT) is effective, screening compliance remains very low. Protease biosensors provide a cost effective additional specialized tool with high sensitivity and specificity in detection of early stage lung cancer. A large prospective trial of at-risk smokers with follow up is being conducted to evaluate a commercial version of this assay.« less
  2. Abstract Aims

    To investigate whether the cumulative exposure risks of incident T2D are shared with other common chronic diseases.

    Research design and methods

    We first establish and report the cross-sectional prevalence, cross-sectional co-prevalence, and incidence of seven T2D-associated chronic diseases [hypertension, atrial fibrillation, coronary artery disease, obesity, chronic obstructive pulmonary disease (COPD), and chronic kidney and liver diseases] in the UK Biobank. We use published weights of genetic variants and exposure variables to derive the T2D polygenic (PGS) and polyexposure (PXS) risk scores and test their associations to incident diseases.


    PXS was associated with higher levels of clinical risk factors including BMI, systolic blood pressure, blood glucose, triglycerides, and HbA1c in individuals without overt or diagnosed T2D. In addition to predicting incident T2D, PXS and PGS were significantly and positively associated with the incidence of all 7 other chronic diseases. There were 4% and 8% of individuals in the bottom deciles of PXS and PGS, respectively, who were prediabetic at baseline but had low risks of T2D and other chronic diseases. Compared to the remaining population, individuals in the top deciles of PGS and PXS had particularly high risks of developing chronic diseases. For instance, the hazard ratio of COPD and obesitymore »for individuals in the top T2D PXS deciles was 2.82 (95% CI 2.39–3.35,P = 4.00 × 10−33) and 2.54 (95% CI 2.24–2.87,P = 9.86 × 10−50), respectively, compared to the remaining population. We also found that PXS and PGS were both significantly (P < 0.0001) and positively associated with the total number of incident diseases.


    T2D shares polyexposure risks with other common chronic diseases. Individuals with an elevated genetic and non-genetic risk of T2D also have high risks of cardiovascular, liver, lung, and kidney diseases.

    « less
  3. Abstract Objective Randomized controlled trials (RCTs) are the gold standard method for evaluating whether a treatment works in health care but can be difficult to find and make use of. We describe the development and evaluation of a system to automatically find and categorize all new RCT reports. Materials and Methods Trialstreamer continuously monitors PubMed and the World Health Organization International Clinical Trials Registry Platform, looking for new RCTs in humans using a validated classifier. We combine machine learning and rule-based methods to extract information from the RCT abstracts, including free-text descriptions of trial PICO (populations, interventions/comparators, and outcomes) elements and map these snippets to normalized MeSH (Medical Subject Headings) vocabulary terms. We additionally identify sample sizes, predict the risk of bias, and extract text conveying key findings. We store all extracted data in a database, which we make freely available for download, and via a search portal, which allows users to enter structured clinical queries. Results are ranked automatically to prioritize larger and higher-quality studies. Results As of early June 2020, we have indexed 673 191 publications of RCTs, of which 22 363 were published in the first 5 months of 2020 (142 per day). We additionally include 304 111 trial registrationsmore »from the International Clinical Trials Registry Platform. The median trial sample size was 66. Conclusions We present an automated system for finding and categorizing RCTs. This yields a novel resource: a database of structured information automatically extracted for all published RCTs in humans. We make daily updates of this database available on our website (« less
  4. Abstract Background Interstitial lung abnormalities (ILA) can be detected on computed tomography (CT) in lung cancer patients and have an association with mortality in advanced non-small cell lung cancer (NSCLC) patients. The aim of this study is to demonstrate the significance of ILA for mortality in patients with stage I NSCLC using Boston Lung Cancer Study cohort. Methods Two hundred and thirty-one patients with stage I NSCLC from 2000 to 2011 were investigated in this retrospective study (median age, 69 years; 93 males, 138 females). ILA was scored on baseline CT scans prior to treatment using a 3-point scale (0 = no evidence of ILA, 1 = equivocal for ILA, 2 = ILA) by a sequential reading method. ILA score 2 was considered the presence of ILA. The difference of overall survival (OS) for patients with different ILA scores were tested via log-rank test and multivariate Cox proportional hazards models were used to estimate hazard ratios (HRs) including ILA score, age, sex, smoking status, and treatment as the confounding variables. Results ILA was present in 22 out of 231 patients (9.5%) with stage I NSCLC. The presence of ILA was associated with shorter OS (patients with ILA score 2, median 3.85 years [95% confidence interval (CI): 3.36 –more »not reached (NR)]; patients with ILA score 0 or 1, median 10.16 years [95%CI: 8.65 - NR]; P  <  0.0001). In a Cox proportional hazards model, the presence of ILA remained significant for increased risk for death (HR = 2.88, P  = 0.005) after adjusting for age, sex, smoking and treatment. Conclusions ILA was detected on CT in 9.5% of patients with stage I NSCLC. The presence of ILA was significantly associated with a shorter OS and could be an imaging marker of shorter survival in stage I NSCLC.« less
  5. Abstract Background

    The clinical utility of machine-learning (ML) algorithms for breast cancer risk prediction and screening practices is unknown. We compared classification of lifetime breast cancer risk based on ML and the BOADICEA model. We explored the differences in risk classification and their clinical impact on screening practices.


    We used three different ML algorithms and the BOADICEA model to estimate lifetime breast cancer risk in a sample of 112,587 individuals from 2481 families from the Oncogenetic Unit, Geneva University Hospitals. Performance of algorithms was evaluated using the area under the receiver operating characteristic (AU-ROC) curve. Risk reclassification was compared for 36,146 breast cancer-free women of ages 20–80. The impact on recommendations for mammography surveillance was based on the Swiss Surveillance Protocol.


    The predictive accuracy of ML-based algorithms (0.843 ≤ AU-ROC ≤ 0.889) was superior to BOADICEA (AU-ROC = 0.639) and reclassified 35.3% of women in different risk categories. The largest reclassification (20.8%) was observed in women characterised as ‘near population’ risk by BOADICEA. Reclassification had the largest impact on screening practices of women younger than 50.


    ML-based reclassification of lifetime breast cancer risk occurred in approximately one in three women. Reclassification is important for younger women because it impacts clinical decision- making for the initiation of screening.