skip to main content


Title: Spatially Varying Auto-Regressive Models for Prediction of New Human Immunodeficiency Virus Diagnoses
Summary

In demand of predicting new human immunodeficiency virus (HIV) diagnosis rates based on publicly available HIV data that are abundant in space but have few points in time, we propose a class of spatially varying auto-regressive models compounded with conditional auto-regressive spatial correlation structures. We then propose to use the copula approach and a flexible conditional auto-regressive formulation to model the dependence between adjacent counties. These models allow for spatial and temporal correlation as well as space–time interactions and are naturally suitable for predicting HIV cases and other spatiotemporal disease data that feature a similar data structure. We apply the proposed models to HIV data over Florida, California and New England states and compare them with a range of linear mixed models that have been recently popular for modelling spatiotemporal disease data. The results show that for such data our proposed models outperform the others in terms of prediction.

 
more » « less
NSF-PAR ID:
10398879
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series C: Applied Statistics
Volume:
67
Issue:
4
ISSN:
0035-9254
Page Range / eLocation ID:
p. 1003-1022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Larsen, Stefano (Ed.)

    Real-time monitoring usingin-situsensors is becoming a common approach for measuring water-quality within watersheds. High-frequency measurements produce big datasets that present opportunities to conduct new analyses for improved understanding of water-quality dynamics and more effective management of rivers and streams. Of primary importance is enhancing knowledge of the relationships between nitrate, one of the most reactive forms of inorganic nitrogen in the aquatic environment, and other water-quality variables. We analysed high-frequency water-quality data fromin-situsensors deployed in three sites from different watersheds and climate zones within the National Ecological Observatory Network, USA. We used generalised additive mixed models to explain the nonlinear relationships at each site between nitrate concentration and conductivity, turbidity, dissolved oxygen, water temperature, and elevation. Temporal auto-correlation was modelled with an auto-regressive–moving-average (ARIMA) model and we examined the relative importance of the explanatory variables. Total deviance explained by the models was high for all sites (99%). Although variable importance and the smooth regression parameters differed among sites, the models explaining the most variation in nitrate contained the same explanatory variables. This study demonstrates that building a model for nitrate using the same set of explanatory water-quality variables is achievable, even for sites with vastly different environmental and climatic characteristics. Applying such models will assist managers to select cost-effective water-quality variables to monitor when the goals are to gain a spatial and temporal in-depth understanding of nitrate dynamics and adapt management plans accordingly.

     
    more » « less
  2. Abstract

    The coronavirus disease of 2019 pandemic has catalyzed the rapid development of mRNA vaccines, whereas, how to optimize the mRNA sequence of exogenous gene such as severe acute respiratory syndrome coronavirus 2 spike to fit human cells remains a critical challenge. A new algorithm, iDRO (integrated deep-learning-based mRNA optimization), is developed to optimize multiple components of mRNA sequences based on given amino acid sequences of target protein. Considering the biological constraints, we divided iDRO into two steps: open reading frame (ORF) optimization and 5′ untranslated region (UTR) and 3′UTR generation. In ORF optimization, BiLSTM-CRF (bidirectional long-short-term memory with conditional random field) is employed to determine the codon for each amino acid. In UTR generation, RNA-Bart (bidirectional auto-regressive transformer) is proposed to output the corresponding UTR. The results show that the optimized sequences of exogenous genes acquired the pattern of human endogenous gene sequence. In experimental validation, the mRNA sequence optimized by our method, compared with conventional method, shows higher protein expression. To the best of our knowledge, this is the first study by introducing deep-learning methods to integrated mRNA sequence optimization, and these results may contribute to the development of mRNA therapeutics.

     
    more » « less
  3. Abstract

    Understanding power system dynamics is essential for interarea oscillation analysis and the detection of grid instabilities. The FNET/GridEye is a GPS‐synchronized wide‐area frequency measurement network that provides an accurate picture of the normal real‐time operational condition of the power system dynamics, giving rise to new and intricate spatiotemporal patterns of power loads. We propose to model FNET/GridEye grid frequency data from the U.S. Eastern Interconnection with a spatiotemporal statistical model. We predict the frequency data at locations without observations, a critical need during disruption events where measurement data are inaccessible. Spatial information is accounted for either as neighboring measurements in the form of covariates or with a spatiotemporal correlation model captured by a latent Gaussian field. The proposed method is useful in estimating power system dynamic response from limited phasor measurements and holds promise for predicting instability that may lead to undesirable effects such as cascading outages.

     
    more » « less
  4. Abstract

    The joint analysis of spatial and temporal processes poses computational challenges due to the data's high dimensionality. Furthermore, such data are commonly non-Gaussian. In this paper, we introduce a copula-based spatiotemporal model for analyzing spatiotemporal data and propose a semiparametric estimator. The proposed algorithm is computationally simple, since it models the marginal distribution and the spatiotemporal dependence separately. Instead of assuming a parametric distribution, the proposed method models the marginal distributions nonparametrically and thus offers more flexibility. The method also provides a convenient way to construct both point and interval predictions at new times and locations, based on the estimated conditional quantiles. Through a simulation study and an analysis of wind speeds observed along the border between Oregon and Washington, we show that our method produces more accurate point and interval predictions for skewed data than those based on normality assumptions.

     
    more » « less
  5. Abstract

    Whistler‐mode chorus waves play an essential role in the acceleration and loss of energetic electrons in the Earth’s inner magnetosphere, with the more intense waves producing the most dramatic effects. However, it is challenging to predict the amplitude of strong chorus waves due to the imbalanced nature of the data set, that is, there are many more non‐chorus data points than strong chorus waves. Thus, traditional models usually underestimate chorus wave amplitudes significantly during active times. Using an imbalanced regressive (IR) method, we develop a neural network model of lower‐band (LB) chorus waves using 7‐year observations from the EMFISIS instrument onboard Van Allen Probes. The feature selection process suggests that the auroral electrojet index alone captures most of the variations of chorus waves. The large amplitude of strong chorus waves can be predicted for the first time. Furthermore, our model shows that the equatorial LB chorus’s spatiotemporal evolution is similar to the drift path of substorm‐injected electrons. We also show that the chorus waves have a peak amplitude at the equator in the source MLT near midnight, but toward noon, there is a local minimum in amplitude at the equator with two off‐equator amplitude peaks in both hemispheres, likely caused by the bifurcated drift paths of substorm injections on the dayside. The IR‐based chorus model will improve radiation belt prediction by providing chorus wave distributions, especially storm‐time strong chorus. Since data imbalance is ubiquitous and inherent in space physics and other physical systems, imbalanced regressive methods deserve more attention in space physics.

     
    more » « less