skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Cancer incidence data at the ZIP Code Tabulation Area level in the United States interpolated by Monte Carlo simulation with multiple constraints
Abstract High-quality cancer data are fundamental for public health research and policy, but cancer data for small geographic units and population subgroups in the United States are rarely available due to small-sample suppression rules, spatial coarsening, and data incompleteness. These limitations hinder high-resolution spatial analyses and precision public health interventions. This study provides a high-resolution cancer incidence dataset for the U.S., generated through a multi-constraint Monte Carlo simulation framework that reconstructs suppressed county-level cancer data and systematically disaggregates them to ZIP Code Tabulation Areas (ZCTAs), guided by demographic constraints. This method integrates population subgroup structures and macro-level incidence rates as constraints, ensuring consistency and reliability across spatial scales. The resulting dataset spans multiple geographic units, from state and county levels to ZCTAs, enabling comprehensive analyses of cancer burden, in-depth spatial analyses, and precision public health interventions across multiple scales.  more » « less
Award ID(s):
1841403
PAR ID:
10595521
Author(s) / Creator(s):
; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Data
Volume:
12
Issue:
1
ISSN:
2052-4463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Understanding dynamic human mobility changes and spatial interaction patterns at different geographic scales is crucial for assessing the impacts of non-pharmaceutical interventions (such as stay-at-home orders) during the COVID-19 pandemic. In this data descriptor, we introduce a regularly-updated multiscale dynamic human mobility flow dataset across the United States, with data starting from March 1st, 2020. By analysing millions of anonymous mobile phone users’ visits to various places provided by SafeGraph, the daily and weekly dynamic origin-to-destination (O-D) population flows are computed, aggregated, and inferred at three geographic scales: census tract, county, and state. There is high correlation between our mobility flow dataset and openly available data sources, which shows the reliability of the produced data. Such a high spatiotemporal resolution human mobility flow dataset at different geographic scales over time may help monitor epidemic spreading dynamics, inform public health policy, and deepen our understanding of human behaviour changes under the unprecedented public health crisis. This up-to-date O-D flow open data can support many other social sensing and transportation applications. 
    more » « less
  2. Abstract Overdose deaths involving fentanyl represent a major public health crisis in the USA. This study investigates the spatiotemporal dynamics of fentanyl-involved deaths before, during, and after the COVID-19 pandemic and examines how sociodemographic factors influence these deaths across geographic regions. Using a retrospective ecological approach, we analyzed data on ZIP code-level fentanyl-related deaths in Cook County, IL, between 2018 and 2023, obtained from the Medical Examiner’s Office and linked with sociodemographic data from the American Community Survey. We first mapped area-level death rates to assess their distribution and then conducted global and local clustering analyses to identify spatial autocorrelations and the locations of high- or low-death-rate areas. A geographically weighted Poisson regression (GWPR) model evaluated the associations between area-level fentanyl-related death rates and the area-level proportion of young adults, males, and individuals with at least a college degree, disability rate, and poverty rate. Spatial analyses found stronger spatial autocorrelations during (2020–2021) and after (2022–2023) the pandemic. Initially, high death rates were concentrated in the downtown area of Chicago, and they expanded to the surrounding areas during and after the pandemic. The GWPR model revealed that an increase in the area-level proportions of poverty, disability, and young adult residents increased the fentanyl-related death rates in most of the areas. Our findings highlight the urgent need to address the evolving dynamics of fentanyl-related overdoses through tailored public health interventions that account for the unique socioeconomic determinants of different regions. Importantly, a comprehensive approach to addressing differences in overdose death rates and their risk factors will be crucial to mitigating this public health crisis. 
    more » « less
  3. Abstract CDC WONDER is a web-based tool for the dissemination of epidemiologic data collected by the National Vital Statistics System. While CDC WONDER has built-in privacy protections, they do not satisfy formal privacy protections such as differential privacy and thus are susceptible to targeted attacks. Given the importance of making high-quality public health data publicly available while preserving the privacy of the underlying data subjects, we aim to improve the utility of a recently developed approach for generating Poisson-distributed, differentially private synthetic data by using publicly available information to truncate the range of the synthetic data. Specifically, we utilize county-level population information from the US Census Bureau and national death reports produced by the CDC to inform prior distributions on county-level death rates and infer reasonable ranges for Poisson-distributed, county-level death counts. In doing so, the requirements for satisfying differential privacy for a given privacy budget can be reduced by several orders of magnitude, thereby leading to substantial improvements in utility. To illustrate our proposed approach, we consider a dataset comprised of over 26,000 cancer-related deaths from the Commonwealth of Pennsylvania belonging to over 47,000 combinations of cause-of-death and demographic variables such as age, race, sex, and county-of-residence and demonstrate the proposed framework’s ability to preserve features such as geographic, urban/rural, and racial disparities present in the true data. 
    more » « less
  4. Abstract Increasing incidence of tick-borne human diseases and geographic range expansion of tick vectors elevates the importance of research on characteristics of tick species that transmit pathogens. Despite their global distribution and role as vectors of pathogens such as Rickettsia spp., ticks in the genus Dermacentor Koch, 1844 (Acari: Ixodidae) have recently received less attention than ticks in the genus Ixodes Latreille, 1795 (Acari: Ixodidae). To address this knowledge gap, we compiled an extensive database of Dermacentor tick traits, including morphological characteristics, host range, and geographic distribution. Zoonotic vector status was determined by compiling information about zoonotic pathogens found in Dermacentor species derived from primary literature and data repositories. We trained a machine learning algorithm on this data set to assess which traits were the most important predictors of zoonotic vector status. Our model successfully classified vector species with ~84% accuracy (mean AUC) and identified two additional Dermacentor species as potential zoonotic vectors. Our results suggest that Dermacentor species that are most likely to be zoonotic vectors are broad ranging, both in terms of the range of hosts they infest and the range of ecoregions across which they are found, and also tend to have large hypostomes and be small-bodied as immature ticks. Beyond the patterns we observed, high spatial and species-level resolution of this new, synthetic dataset has the potential to support future analyses of public health relevance, including species distribution modeling and predictive analytics, to draw attention to emerging or newly identified Dermacentor species that warrant closer monitoring for zoonotic pathogens. 
    more » « less
  5. Abstract Lyme disease is the most common vector‐borne disease in temperate zones and a growing public health threat in the United States (US). The life cycles of the tick vectors and spirochete pathogen are highly sensitive to climate, but determining the impact of climate change on Lyme disease burden has been challenging due to the complex ecology of the disease and the presence of multiple, interacting drivers of transmission. Here we incorporated 18 years of annual, county‐level Lyme disease case data in a panel data statistical model to investigate prior effects of climate variation on disease incidence while controlling for other putative drivers. We then used these climate–disease relationships to project Lyme disease cases using CMIP5 global climate models and two potential climate scenarios (RCP4.5 and RCP8.5). We find that interannual variation in Lyme disease incidence is associated with climate variation in all US regions encompassing the range of the primary vector species. In all regions, the climate predictors explained less of the variation in Lyme disease incidence than unobserved county‐level heterogeneity, but the strongest climate–disease association detected was between warming annual temperatures and increasing incidence in the Northeast. Lyme disease projections indicate that cases in the Northeast will increase significantly by 2050 (23,619 ± 21,607 additional cases), but only under RCP8.5, and with large uncertainty around this projected increase. Significant case changes are not projected for any other region under either climate scenario. The results demonstrate a regionally variable and nuanced relationship between climate change and Lyme disease, indicating possible nonlinear responses of vector ticks and transmission dynamics to projected climate change. Moreover, our results highlight the need for improved preparedness and public health interventions in endemic regions to minimize the impact of further climate change‐induced increases in Lyme disease burden. 
    more » « less