skip to main content


Title: Are spatial models advantageous for predicting county level HIV epidemiology across the United States?
Predicting human immunodeficiency virus (HIV) epidemiology is vital for achieving public health mile- stones. Incorporating spatial dependence when data varies by region can often provide better prediction results, at the cost of computational efficiency. However, with the growing number of covariates available that capture the data variability, the benefit of a spatial model could be less crucial. We investigate this conjecture by considering both non-spatial and spatial models for county-level HIV prediction over the US. Due to many counties with zero HIV incidences, we utilize a two-part model, with one part esti- mating the probability of positive HIV rates and the other estimating HIV rates of counties not classified as zero. Based on our data, the compound of logistic regression and a generalized estimating equation outperforms the candidate models in making predictions. The results suggest that considering spatial correlation for our data is not necessarily advantageous when the purpose is making predictions.  more » « less
Award ID(s):
1922758
NSF-PAR ID:
10291027
Author(s) / Creator(s):
Date Published:
Journal Name:
Spatial and spatiotemporal epidemiology
Volume:
38
Issue:
2021
ISSN:
1877-5845
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    In demand of predicting new human immunodeficiency virus (HIV) diagnosis rates based on publicly available HIV data that are abundant in space but have few points in time, we propose a class of spatially varying auto-regressive models compounded with conditional auto-regressive spatial correlation structures. We then propose to use the copula approach and a flexible conditional auto-regressive formulation to model the dependence between adjacent counties. These models allow for spatial and temporal correlation as well as space–time interactions and are naturally suitable for predicting HIV cases and other spatiotemporal disease data that feature a similar data structure. We apply the proposed models to HIV data over Florida, California and New England states and compare them with a range of linear mixed models that have been recently popular for modelling spatiotemporal disease data. The results show that for such data our proposed models outperform the others in terms of prediction.

     
    more » « less
  2. Abstract

    Estimating and predicting the state of the atmosphere is a probabilistic problem for which an ensemble modeling approach often is taken to represent uncertainty in the system. Common methods for examining uncertainty and assessing performance for ensembles emphasize pointwise statistics or marginal distributions. However, these methods lose specific information about individual ensemble members. This paper explores contour band depth (cBD), a method of analyzing uncertainty in terms of contours of scalar fields. cBD is fully nonparametric and induces an ordering on ensemble members that leads to box-and-whisker-plot-type visualizations of uncertainty for two-dimensional data. By applying cBD to synthetic ensembles, we demonstrate that it provides enhanced information about the spatial structure of ensemble uncertainty. We also find that the usefulness of the cBD analysis depends on the presence of multiple modes and multiple scales in the ensemble of contours. Finally, we apply cBD to compare various convection-permitting forecasts from different ensemble prediction systems and find that the value it provides in real-world applications compared to standard analysis methods exhibits clear limitations. In some cases, contour boxplots can provide deeper insight into differences in spatial characteristics between the different ensemble forecasts. Nevertheless, identification of outliers using cBD is not always intuitive, and the method can be especially challenging to implement for flow that exhibits multiple spatial scales (e.g., discrete convective cells embedded within a mesoscale weather system).

    Significance Statement

    Predictions of Earth’s atmosphere inherently come with some degree of uncertainty owing to incomplete observations and the chaotic nature of the system. Understanding that uncertainty is critical when drawing scientific conclusions or making policy decisions from model predictions. In this study, we explore a method for describing model uncertainty when the quantities of interest are well represented by contours. The method yields a quantitative visualization of uncertainty in both the location and the shape of contours to an extent that is not possible with standard uncertainty quantification methods and may eventually prove useful for the development of more robust techniques for evaluating and validating numerical weather models.

     
    more » « less
  3. Computer-aided design (CAD) programs are essential to engineering as they allow for better designs through low-cost iterations. While CAD programs are typically taught to undergraduate students as a job skill, such software can also help students learn engineering concepts. A current limitation of CAD programs (even those that are specifically designed for educational purposes) is that they are not capable of providing automated real-time help to students. To encourage CAD programs to build in assistance to students, we used data generated from students using a free, open-source CAD software called Aladdin to demonstrate how student data combined with machine learning techniques can predict how well a particular student will perform in a design task. We challenged students to design a house that consumed zero net energy as part of an introductory engineering technology undergraduate course. Using data from 128 students, along with the scikit-learn Python machine learning library, we tested our models using both total counts of design actions and sequences of design actions as inputs. We found that our models using early design sequence actions are particularly valuable for prediction. Our logistic regression model achieved a >60% chance of predicting if a student would succeed in designing a zero net energy house. Our results suggest that it would be feasible for Aladdin to provide useful feedback to students when they are approximately halfway through their design. Further improvements to these models could lead to earlier predictions and thus provide students feedback sooner to enhance their learning. 
    more » « less
  4. ABSTRACT

    Within the contiguous USA, Florida is unique in having tropical and subtropical climates, a great abundance and diversity of mosquito vectors, and high rates of human travel. These factors contribute to the state being the national ground zero for exotic mosquito-borne diseases, as evidenced by local transmission of viruses spread by Aedes aegypti, including outbreaks of dengue in 2022 and Zika in 2016. Because of limited treatment options, integrated vector management is a key part of mitigating these arboviruses. Practical knowledge of when and where mosquito populations of interest exist is critical for surveillance and control efforts, and habitat predictions at various geographic scales typically rely on ecological niche modeling. However, most of these models, usually created in partnership with academic institutions, demand resources that otherwise may be too time-demanding or difficult for mosquito control programs to replicate and use effectively. Such resources may include intensive computational requirements, high spatiotemporal resolutions of data not regularly available, and/or expert knowledge of statistical analysis. Therefore, our study aims to partner with mosquito control agencies in generating operationally useful mosquito abundance models. Given the increasing threat of mosquito-borne disease transmission in Florida, our analytic approach targets recent Ae. aegypti abundance in the Tampa Bay area. We investigate explanatory variables that: 1) are publicly available, 2) require little to no preprocessing for use, and 3) are known factors associated with Ae. aegypti ecology. Out of our 4 final models, none required more than 5 out of the 36 predictors assessed (13.9%). Similar to previous literature, the strongest predictors were consistently 3- and 4-wk temperature and precipitation lags, followed closely by 1 of 2 environmental predictors: land use/land cover or normalized difference vegetation index. Surprisingly, 3 of our 4 final models included one or more socioeconomic or demographic predictors. In general, larger sample sizes of trap collections and/or citizen science observations should result in greater confidence in model predictions and validation. However, given disparities in trap collections across jurisdictions, individual county models rather than a multicounty conglomerate model would likely yield stronger model fits. Ultimately, we hope that the results of our assessment will enable more accurate and precise mosquito surveillance and control of Ae. aegypti in Florida and beyond.

     
    more » « less
  5. Abstract The objective of this study was to investigate the importance of multiple county-level features in the trajectory of COVID-19. We examined feature importance across 2787 counties in the United States using data-driven machine learning models. Existing mathematical models of disease spread usually focused on the case prediction with different infection rates without incorporating multiple heterogeneous features that could impact the spatial and temporal trajectory of COVID-19. Recognizing this, we trained a data-driven model using 23 features representing six key influencing factors affecting the pandemic spread: social demographics of counties, population activities, mobility within the counties, movement across counties, disease attributes, and social network structure. Also, we categorized counties into multiple groups according to their population densities, and we divided the trajectory of COVID-19 into three stages: the outbreak stage, the social distancing stage, and the reopening stage. The study aimed to answer two research questions: (1) The extent to which the importance of heterogeneous features evolved at different stages; (2) The extent to which the importance of heterogeneous features varied across counties with different characteristics. We fitted a set of random forest models to determine weekly feature importance. The results showed that: (1) Social demographic features, such as gross domestic product, population density, and minority status maintained high-importance features throughout stages of COVID-19 across 2787 studied counties; (2) Within-county mobility features had the highest importance in counties with higher population densities; (3) The feature reflecting the social network structure (Facebook, social connectedness index), had higher importance for counties with higher population densities. The results showed that the data-driven machine learning models could provide important insights to inform policymakers regarding feature importance for counties with various population densities and at different stages of a pandemic life cycle. 
    more » « less