ImportanceScreening with low-dose computed tomography (CT) has been shown to reduce mortality from lung cancer in randomized clinical trials in which the rate of adherence to follow-up recommendations was over 90%; however, adherence to Lung Computed Tomography Screening Reporting & Data System (Lung-RADS) recommendations has been low in practice. Identifying patients who are at risk of being nonadherent to screening recommendations may enable personalized outreach to improve overall screening adherence. ObjectiveTo identify factors associated with patient nonadherence to Lung-RADS recommendations across multiple screening time points. Design, Setting, and ParticipantsThis cohort study was conducted at a single US academic medical center across 10 geographically distributed sites where lung cancer screening is offered. The study enrolled individuals who underwent low-dose CT screening for lung cancer between July 31, 2013, and November 30, 2021. ExposuresLow-dose CT screening for lung cancer. Main Outcomes and MeasuresThe main outcome was nonadherence to follow-up recommendations for lung cancer screening, defined as failing to complete a recommended or more invasive follow-up examination (ie, diagnostic dose CT, positron emission tomography–CT, or tissue sampling vs low-dose CT) within 15 months (Lung-RADS score, 1 or 2), 9 months (Lung-RADS score, 3), 5 months (Lung-RADS score, 4A), or 3 months (Lung-RADS score, 4B/X). Multivariable logistic regression was used to identify factors associated with patient nonadherence to baseline Lung-RADS recommendations. A generalized estimating equations model was used to assess whether the pattern of longitudinal Lung-RADS scores was associated with patient nonadherence over time. ResultsAmong 1979 included patients, 1111 (56.1%) were aged 65 years or older at baseline screening (mean [SD] age, 65.3 [6.6] years), and 1176 (59.4%) were male. The odds of being nonadherent were lower among patients with a baseline Lung-RADS score of 1 or 2 vs 3 (adjusted odds ratio [AOR], 0.35; 95% CI, 0.25-0.50), 4A (AOR, 0.21; 95% CI, 0.13-0.33), or 4B/X, (AOR, 0.10; 95% CI, 0.05-0.19); with a postgraduate vs college degree (AOR, 0.70; 95% CI, 0.53-0.92); with a family history of lung cancer vs no family history (AOR, 0.74; 95% CI, 0.59-0.93); with a high age-adjusted Charlson Comorbidity Index score (≥4) vs a low score (0 or 1) (AOR, 0.67; 95% CI, 0.46-0.98); in the high vs low income category (AOR, 0.79; 95% CI, 0.65-0.98); and referred by physicians from pulmonary or thoracic-related departments vs another department (AOR, 0.56; 95% CI, 0.44-0.73). Among 830 eligible patients who had completed at least 2 screening examinations, the adjusted odds of being nonadherent to Lung-RADS recommendations at the following screening were increased in patients with consecutive Lung-RADS scores of 1 to 2 (AOR, 1.38; 95% CI, 1.12-1.69). Conclusions and RelevanceIn this retrospective cohort study, patients with consecutive negative lung cancer screening results were more likely to be nonadherent with follow-up recommendations. These individuals are potential candidates for tailored outreach to improve adherence to recommended annual lung cancer screening.
more »
« less
Going beyond the means: Exploring the role of bias from digital determinants of health in technologies
BackgroundIn light of recent retrospective studies revealing evidence of disparities in access to medical technology and of bias in measurements, this narrative review assesses digital determinants of health (DDoH) in both technologies and medical formulae that demonstrate either evidence of bias or suboptimal performance, identifies potential mechanisms behind such bias, and proposes potential methods or avenues that can guide future efforts to address these disparities. ApproachMechanisms are broadly grouped intophysical and biological biases(e.g., pulse oximetry, non-contact infrared thermometry [NCIT]),interaction of human factors and cultural practices(e.g., electroencephalography [EEG]), andinterpretation bias(e.g, pulmonary function tests [PFT], optical coherence tomography [OCT], and Humphrey visual field [HVF] testing). This review scope specifically excludes technologies incorporating artificial intelligence and machine learning. For each technology, we identify both clinical and research recommendations. ConclusionsMany of the DDoH mechanisms encountered in medical technologies and formulae result in lower accuracy or lower validity when applied to patients outside the initial scope of development or validation. Our clinical recommendations caution clinical users in completely trusting result validity and suggest correlating with other measurement modalities robust to the DDoH mechanism (e.g., arterial blood gas for pulse oximetry, core temperatures for NCIT). Our research recommendations suggest not only increasing diversity in development and validation, but also awareness in the modalities of diversity required (e.g., skin pigmentation for pulse oximetry but skin pigmentation and sex/hormonal variation for NCIT). By increasing diversity that better reflects patients in all scenarios of use, we can mitigate DDoH mechanisms and increase trust and validity in clinical practice and research.
more »
« less
- Award ID(s):
- 1919038
- PAR ID:
- 10658778
- Editor(s):
- Marcelo, Alvin
- Publisher / Repository:
- PLOS Digital Health
- Date Published:
- Journal Name:
- PLOS Digital Health
- Volume:
- 2
- Issue:
- 10
- ISSN:
- 2767-3170
- Page Range / eLocation ID:
- e0000244
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ImportanceLarge language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. ObjectiveTo summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty. Data SourcesA systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024. Study SelectionStudies evaluating 1 or more LLMs in health care. Data Extraction and SynthesisThree independent reviewers categorized studies via keyword searches based on the data used, the health care tasks, the NLP and NLU tasks, the dimensions of evaluation, and the medical specialty. ResultsOf 519 studies reviewed, published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. The most common health care tasks were assessing medical knowledge such as answering medical licensing examination questions (44.5%) and making diagnoses (19.5%). Administrative tasks such as assigning billing codes (0.2%) and writing prescriptions (0.2%) were less studied. For NLP and NLU tasks, most studies focused on question answering (84.2%), while tasks such as summarization (8.9%) and conversational dialogue (3.3%) were infrequent. Almost all studies (95.4%) used accuracy as the primary dimension of evaluation; fairness, bias, and toxicity (15.8%), deployment considerations (4.6%), and calibration and uncertainty (1.2%) were infrequently measured. Finally, in terms of medical specialty area, most studies were in generic health care applications (25.6%), internal medicine (16.4%), surgery (11.4%), and ophthalmology (6.9%), with nuclear medicine (0.6%), physical medicine (0.4%), and medical genetics (0.2%) being the least represented. Conclusions and RelevanceExisting evaluations of LLMs mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such as fairness, bias, and toxicity and deployment considerations received limited attention. Future evaluations should adopt standardized applications and metrics, use clinical data, and broaden focus to include a wider range of tasks and specialties.more » « less
-
Grewal, Harpreet Singh (Ed.)Study objectiveThis study aimed to prospectively validate the performance of an artificially augmented home sleep apnea testing device (WVU-device) and its patented technology. MethodologyThe WVU-device, utilizing patent pending (US 20210001122A) technology and an algorithm derived from cardio-pulmonary physiological parameters, comorbidities, and anthropological information was prospectively compared with a commercially available and Center for Medicare and Medicaid Services (CMS) approved home sleep apnea testing (HSAT) device. The WVU-device and the HSAT device were applied on separate hands of the patient during a single night study. The oxygen desaturation index (ODI) obtained from the WVU-device was compared to the respiratory event index (REI) derived from the HSAT device. ResultsA total of 78 consecutive patients were included in the prospective study. Of the 78 patients, 38 (48%) were women and 9 (12%) had a Fitzpatrick score of 3 or higher. The ODI obtained from the WVU-device corelated well with the HSAT device, and no significant bias was observed in the Bland-Altman curve. The accuracy for ODI > = 5 and REI > = 5 was 87%, for ODI> = 15 and REI > = 15 was 89% and for ODI> = 30 and REI of > = 30 was 95%. The sensitivity and specificity for these ODI /REI cut-offs were 0.92 and 0.78, 0.91 and 0.86, and 0.94 and 0.95, respectively. ConclusionThe WVU-device demonstrated good accuracy in predicting REI when compared to an approved HSAT device, even in patients with darker skin tones.more » « less
-
Background Remote patient monitoring (RPM) technologies can support patients living with chronic conditions through self-monitoring of physiological measures and enhance clinicians’ diagnostic and treatment decisions. However, to date, large-scale pragmatic RPM implementation within health systems has been limited, and understanding of the impacts of RPM technologies on clinical workflows and care experience is lacking. Objective In this study, we evaluate the early implementation of operational RPM initiatives for chronic disease management within the ambulatory network of an academic medical center in New York City, focusing on the experiences of “early adopter” clinicians and patients. Methods Using a multimethod qualitative approach, we conducted (1) interviews with 13 clinicians across 9 specialties considered as early adopters and supporters of RPM and (2) speculative design sessions exploring the future of RPM in clinical care with 21 patients and patient representatives, to better understand experiences, preferences, and expectations of pragmatic RPM use for health care delivery. Results We identified themes relevant to RPM implementation within the following areas: (1) data collection and practices, including impacts of taking real-world measures and issues of data sharing, security, and privacy; (2) proactive and preventive care, including proactive and preventive monitoring, and proactive interventions and support; and (3) health disparities and equity, including tailored and flexible care and implicit bias. We also identified evidence for mitigation and support to address challenges in each of these areas. Conclusions This study highlights the unique contexts, perceptions, and challenges regarding the deployment of RPM in clinical practice, including its potential implications for clinical workflows and work experiences. Based on these findings, we offer implementation and design recommendations for health systems interested in deploying RPM-enabled health care.more » « less
-
Abstract ObjectiveNeuropsychological testing is essential for both clinical and basic stroke research; however, the in-person nature of this testing is a limitation. Virtual testing overcomes the hurdles of geographic location, mobility issues and permits social distancing, yet its validity has received relatively little investigation, particularly in comparison with in-person testing. MethodWe expand on our prior findings of virtual testing feasibility by assessing virtual versus in-person administration of language and communication tasks with 48 left-hemisphere stroke patients (21 F, 27 M; mean age = 63.4 ± 12; mean years of education = 15.3 ± 3.5) in a quasi-test–retest paradigm. Each participant completed two testing sessions: one in their home and one in the research lab. Participants were assigned to one of the eight groups, with the testing condition (fully in-person, partially virtual), order of home session (first, second) and technology (iPad, Windows tablet) varied across groups. ResultsAcross six speech-language tasks that utilized varying response modalities and interfaces, we found no significant difference in performance between virtual and in-person testing. However, our results reveal key considerations for successful virtual administration of neuropsychological tests, including technology complications and disparities in internet access. ConclusionsVirtual administration of neuropsychological assessments demonstrates comparable reliability with in-person data collection involving stroke survivors, though technology issues must be taken into account.more » « less
An official website of the United States government

