Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            BackgroundLaypeople have easy access to health information through large language models (LLMs), such as ChatGPT, and search engines, such as Google. Search engines transformed health information access, and LLMs offer a new avenue for answering laypeople’s questions. ObjectiveWe aimed to compare the frequency of use and attitudes toward LLMs and search engines as well as their comparative relevance, usefulness, ease of use, and trustworthiness in responding to health queries. MethodsWe conducted a screening survey to compare the demographics of LLM users and nonusers seeking health information, analyzing results with logistic regression. LLM users from the screening survey were invited to a follow-up survey to report the types of health information they sought. We compared the frequency of use of LLMs and search engines using ANOVA and Tukey post hoc tests. Lastly, paired-sample Wilcoxon tests compared LLMs and search engines on perceived usefulness, ease of use, trustworthiness, feelings, bias, and anthropomorphism. ResultsIn total, 2002 US participants recruited on Prolific participated in the screening survey about the use of LLMs and search engines. Of them, 52% (n=1045) of the participants were female, with a mean age of 39 (SD 13) years. Participants were 9.7% (n=194) Asian, 12.1% (n=242) Black, 73.3% (n=1467) White, 1.1% (n=22) Hispanic, and 3.8% (n=77) were of other races and ethnicities. Further, 1913 (95.6%) used search engines to look up health queries versus 642 (32.6%) for LLMs. Men had higher odds (odds ratio [OR] 1.63, 95% CI 1.34-1.99; P<.001) of using LLMs for health questions than women. Black (OR 1.90, 95% CI 1.42-2.54; P<.001) and Asian (OR 1.66, 95% CI 1.19-2.30; P<.01) individuals had higher odds than White individuals. Those with excellent perceived health (OR 1.46, 95% CI 1.1-1.93; P=.01) were more likely to use LLMs than those with good health. Higher technical proficiency increased the likelihood of LLM use (OR 1.26, 95% CI 1.14-1.39; P<.001). In a follow-up survey of 281 LLM users for health, most participants used search engines first (n=174, 62%) to answer health questions, but the second most common first source consulted was LLMs (n=39, 14%). LLMs were perceived as less useful (P<.01) and less relevant (P=.07), but elicited fewer negative feelings (P<.001), appeared more human (LLM: n=160, vs search: n=32), and were seen as less biased (P<.001). Trust (P=.56) and ease of use (P=.27) showed no differences. ConclusionsSearch engines are the primary source of health information; yet, positive perceptions of LLMs suggest growing use. Future work could explore whether LLM trust and usefulness are enhanced by supplementing answers with external references and limiting persuasive language to curb overreliance. Collaboration with health organizations can help improve the quality of LLMs’ health output.more » « lessFree, publicly-accessible full text available January 1, 2026
- 
            ImportanceVirtual patient-physician communications have increased since 2020 and negatively impacted primary care physician (PCP) well-being. Generative artificial intelligence (GenAI) drafts of patient messages could potentially reduce health care professional (HCP) workload and improve communication quality, but only if the drafts are considered useful. ObjectivesTo assess PCPs’ perceptions of GenAI drafts and to examine linguistic characteristics associated with equity and perceived empathy. Design, Setting, and ParticipantsThis cross-sectional quality improvement study tested the hypothesis that PCPs’ ratings of GenAI drafts (created using the electronic health record [EHR] standard prompts) would be equivalent to HCP-generated responses on 3 dimensions. The study was conducted at NYU Langone Health using private patient-HCP communications at 3 internal medicine practices piloting GenAI. ExposuresRandomly assigned patient messages coupled with either an HCP message or the draft GenAI response. Main Outcomes and MeasuresPCPs rated responses’ information content quality (eg, relevance), using a Likert scale, communication quality (eg, verbosity), using a Likert scale, and whether they would use the draft or start anew (usable vs unusable). Branching logic further probed for empathy, personalization, and professionalism of responses. Computational linguistics methods assessed content differences in HCP vs GenAI responses, focusing on equity and empathy. ResultsA total of 16 PCPs (8 [50.0%] female) reviewed 344 messages (175 GenAI drafted; 169 HCP drafted). Both GenAI and HCP responses were rated favorably. GenAI responses were rated higher for communication style than HCP responses (mean [SD], 3.70 [1.15] vs 3.38 [1.20];P = .01,U = 12 568.5) but were similar to HCPs on information content (mean [SD], 3.53 [1.26] vs 3.41 [1.27];P = .37;U = 13 981.0) and usable draft proportion (mean [SD], 0.69 [0.48] vs 0.65 [0.47],P = .49,t = −0.6842). Usable GenAI responses were considered more empathetic than usable HCP responses (32 of 86 [37.2%] vs 13 of 79 [16.5%]; difference, 125.5%), possibly attributable to more subjective (mean [SD], 0.54 [0.16] vs 0.31 [0.23];P < .001; difference, 74.2%) and positive (mean [SD] polarity, 0.21 [0.14] vs 0.13 [0.25];P = .02; difference, 61.5%) language; they were also numerically longer (mean [SD] word count, 90.5 [32.0] vs 65.4 [62.6]; difference, 38.4%), but the difference was not statistically significant (P = .07) and more linguistically complex (mean [SD] score, 125.2 [47.8] vs 95.4 [58.8];P = .002; difference, 31.2%). ConclusionsIn this cross-sectional study of PCP perceptions of an EHR-integrated GenAI chatbot, GenAI was found to communicate information better and with more empathy than HCPs, highlighting its potential to enhance patient-HCP communication. However, GenAI drafts were less readable than HCPs’, a significant concern for patients with low health or English literacy.more » « less
- 
            Abstract ObjectivesTo evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings. Materials and MethodsRadiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as “definitely actionable” (DA) or “possibly actionable—clinical correlation” (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale. ResultsFor the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were “hallucinated” outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision. ConclusionGPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via “human-in-the-loop” workflows remains critical for clinical implementation.more » « less
- 
            Hastings, Janna (Ed.)BackgroundHealthcare crowdsourcing events (e.g. hackathons) facilitate interdisciplinary collaboration and encourage innovation. Peer-reviewed research has not yet considered a healthcare crowdsourcing event focusing on generative artificial intelligence (GenAI), which generates text in response to detailed prompts and has vast potential for improving the efficiency of healthcare organizations. Our event, the New York University Langone Health (NYULH) Prompt-a-thon, primarily sought to inspire and build AI fluency within our diverse NYULH community, and foster collaboration and innovation. Secondarily, we sought to analyze how participants’ experience was influenced by their prior GenAI exposure and whether they received sample prompts during the workshop. MethodsExecuting the event required the assembly of an expert planning committee, who recruited diverse participants, anticipated technological challenges, and prepared the event. The event was composed of didactics and workshop sessions, which educated and allowed participants to experiment with using GenAI on real healthcare data. Participants were given novel “project cards” associated with each dataset that illuminated the tasks GenAI could perform and, for a random set of teams, sample prompts to help them achieve each task (the public repository of project cards can be found athttps://github.com/smallw03/NYULH-Generative-AI-Prompt-a-thon-Project-Cards). Afterwards, participants were asked to fill out a survey with 7-point Likert-style questions. ResultsOur event was successful in educating and inspiring hundreds of enthusiastic in-person and virtual participants across our organization on the responsible use of GenAI in a low-cost and technologically feasible manner. All participants responded positively, on average, to each of the survey questions (e.g., confidence in their ability to use and trust GenAI). Critically, participants reported a self-perceived increase in their likelihood of using and promoting colleagues’ use of GenAI for their daily work. No significant differences were seen in the surveys of those who received sample prompts with their project task descriptions ConclusionThe first healthcare Prompt-a-thon was an overwhelming success, with minimal technological failures, positive responses from diverse participants and staff, and evidence of post-event engagement. These findings will be integral to planning future events at our institution, and to others looking to engage their workforce in utilizing GenAI.more » « less
- 
            Abstract Health care delivery is undergoing an accelerated period of digital transformation, spurred in part by the COVID-19 pandemic and the use of “virtual-first” care delivery models such as telemedicine. Medical education has responded to this shift with calls for improved digital health training, but there is as yet no universal understanding of the needed competencies, domains, and best practices for teaching these skills. In this paper, we argue that a “digital determinants of health” (DDoH) framework for understanding the intersections of health outcomes, technology, and training is critical to the development of comprehensive digital health competencies in medical education. Much like current social determinants of health models, the DDoH framework can be integrated into undergraduate, graduate, and professional education to guide training interventions as well as competency development and evaluation. We provide possible approaches to integrating this framework into training programs and explore priorities for future research in digitally-competent medical education.more » « less
- 
            BackgroundVisual disability is a growing problem for many middle-aged and older adults. Conventional mobility aids, such as white canes and guide dogs, have notable limitations that have led to increasing interest in electronic travel aids (ETAs). Despite remarkable progress, current ETAs lack empirical evidence and realistic testing environments and often focus on the substitution or augmentation of a single sense. ObjectiveThis study aims to (1) establish a novel virtual reality (VR) environment to test the efficacy of ETAs in complex urban environments for a simulated visual impairment (VI) and (2) evaluate the impact of haptic and audio feedback, individually and combined, on navigation performance, movement behavior, and perception. Through this study, we aim to address gaps to advance the pragmatic development of assistive technologies (ATs) for persons with VI. MethodsThe VR platform was designed to resemble a subway station environment with the most common challenges faced by persons with VI during navigation. This environment was used to test our multisensory, AT-integrated VR platform among 72 healthy participants performing an obstacle avoidance task while experiencing symptoms of VI. Each participant performed the task 4 times: once with haptic feedback, once with audio feedback, once with both feedback types, and once without any feedback. Data analysis encompassed metrics such as completion time, head and body orientation, and trajectory length and smoothness. To evaluate the effectiveness and interaction of the 2 feedback modalities, we conducted a 2-way repeated measures ANOVA on continuous metrics and a Scheirer-Ray-Hare test on discrete ones. We also conducted a descriptive statistical analysis of participants’ answers to a questionnaire, assessing their experience and preference for feedback modalities. ResultsResults from our study showed that haptic feedback significantly reduced collisions (P=.05) and the variability of the pitch angle of the head (P=.02). Audio feedback improved trajectory smoothness (P=.006) and mitigated the increase in the trajectory length from haptic feedback alone (P=.04). Participants reported a high level of engagement during the experiment (52/72, 72%) and found it interesting (42/72, 58%). However, when it came to feedback preferences, less than half of the participants (29/72, 40%) favored combined feedback modalities. This indicates that a majority preferred dedicated single modalities over combined ones. ConclusionsAT is crucial for individuals with VI; however, it often lacks user-centered design principles. Research should prioritize consumer-oriented methodologies, testing devices in a staged manner with progression toward more realistic, ecologically valid settings to ensure safety. Our multisensory, AT-integrated VR system takes a holistic approach, offering a first step toward enhancing users’ spatial awareness, promoting safer mobility, and holds potential for applications in medical treatment, training, and rehabilitation. Technological advancements can further refine such devices, significantly improving independence and quality of life for those with VI.more » « less
- 
            Abstract The COVID-19 pandemic has boosted digital health utilization, raising concerns about increased physicians’ after-hours clinical work (work-outside-work”). The surge in patients’ digital messages and additional time spent on work-outside-work by telemedicine providers underscores the need to evaluate the connection between digital health utilization and physicians’ after-hours commitments. We examined the impact on physicians’ workload from two types of digital demands - patients’ messages requesting medical advice (PMARs) sent to physicians’ inbox (inbasket), and telemedicine. Our study included 1716 ambulatory-care physicians in New York City regularly practicing between November 2022 and March 2023. Regression analyses assessed primary and interaction effects of (PMARs) and telemedicine on work-outside-work. The study revealed a significant effect ofPMARs on physicians’ work-outside-work and that this relationship is moderated by physicians’ specialties. Non-primary care physicians or specialists experienced a more pronounced effect than their primary care peers. Analysis of their telemedicine load revealed that primary care physicians received fewerPMARs and spent less time in work-outside-work with more telemedicine. Specialists faced increasedPMARs and did more work-outside-work as telemedicine visits increased which could be due to the difference in patient panels. ReducingPMARvolumes and efficient inbasket management strategies needed to reduce physicians’ work-outside-work. Policymakers need to be cognizant of potential disruptions in physicians carefully balanced workload caused by the digital health services.more » « less
- 
            Free, publicly-accessible full text available July 5, 2026
- 
            Oliveira, Marcos (Ed.)Impulsivity has been proposed as a key driver of obesity. However, evidence linking impulsivity and obesity has relied on the study of individual factors, with limited account for the urban attributes of obesogenic environments. Here, we investigate the relationship between obesity and impulsivity through urban scaling and causal discovery. For 915 cities in the United States of America, we study the prevalence of obesity in adults, attention deficit hyperactivity disorder (ADHD) in children, and relevant urban features. We observe sublinear scaling of obesity and ADHD with population size, these disorders being less prevalent in larger cities. By applying a causal discovery tool to the deviations of cities from the urban scaling laws, we identify an influence of ADHD on obesity, moderated by lifestyle. The strength of these associations is confirmed by individual-level data on a cohort of 19,333 children, wherein we observe that ADHD modulates obesity both directly and indirectly.more » « lessFree, publicly-accessible full text available May 15, 2026
- 
            Remote patient monitoring (RPM) is the use of digital technologies to improve patient care at a distance. However, current RPM solutions are often biased toward tech-savvy patients. To foster health equity, researchers have studied how to address the socio-economic and cognitive needs of diverse patient groups, but their emotional needs have remained largely neglected. We perform the first qualitative study to explore the emotional needs of diverse patients around RPM. Specifically, we conduct a thematic analysis of 18 interviews and 4 focus groups at a large US healthcare organization. We identify emotional needs that lead to four emotional tensions within and across stakeholder groups when applying an equity focus to the design and implementation of RPM technologies. The four emotional tensions are making diverse patients feel: (i) heard vs. exploited; (ii) seen vs. deprioritized for efficiency; (iii) empowered vs. anxious; and (iv) cared for vs. detached from care. To manage these emotional tensions across stakeholders, we develop design recommendations informed by a paradox mindset (i.e., both-and rather than and-or strategies).more » « lessFree, publicly-accessible full text available May 2, 2026
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
