NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A framework of zero-inflated Bayesian negative binomial regression models for spatiotemporal data

https://doi.org/10.1016/j.jspi.2023.106098

He, Qing; Huang, Hsin-Hsiung (March 2024, Journal of Statistical Planning and Inference)

Spatiotemporal data analysis with massive zeros is widely used in many areas such as epidemiology and public health. We use a Bayesian framework to fit zero-inflated negative binomial models and employ a set of latent variables from Pólya-Gamma distributions to derive an efficient Gibbs sampler. The proposed model accommodates varying spatial and temporal random effects through Gaussian process priors, which have both the simplicity and flexibility in modeling nonlinear relationships through a covariance function. To conquer the computation bottleneck that GPs may suffer when the sample size is large, we adopt the nearest-neighbor GP approach that approximates the covariance matrix using local experts. For the simulation study, we adopt multiple settings with varying sizes of spatial locations to evaluate the performance of the proposed model such as spatial and temporal random effects estimation and compare the result to other methods. We also apply the proposed model to the COVID-19 death counts in the state of Florida, USA from 3/25/2020 through 7/29/2020 to examine relationships between social vulnerability and COVID-19 deaths.
more » « less
Full Text Available
Robust sufficient dimension reduction via α -distance covariance

https://doi.org/10.1080/10485252.2024.2313137

Huang, Hsin-Hsiung; Yu, Feng; Zhang, Teng (February 2024, Journal of Nonparametric Statistics)

We introduce a novel sufficient dimension-reduction (SDR) method which is robust against outliers using α-distance covariance (dCov)in dimension-reduction problems. Under very mild conditions on the predictors, the central subspace is effectively estimated and model-free without estimating link function based on the projection on the Stiefel manifold. We establish the convergence property of the pro-posed estimation under some regularity conditions. We compare the performance of our method with existing SDR methods by simulation and real data analysis and show that our algorithm improves the computational efficiency and effectiveness.
more » « less
Full Text Available
A framework of regularized low-rank matrix models for regression and classification

https://doi.org/10.1007/s11222-023-10318-z

Huang, Hsin-Hsiung; Yu, Feng; Fan, Xing; Zhang, Teng (February 2024, Statistics and Computing)

While matrix-covariate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional matrix-valued covariates. To address these issues, this paper proposes a framework of matrix-covariate regression models based on a low-rank constraint and an additional regularization term for structured signals, with considerations of models of both continuous and binary responses. We propose an efficient Riemannian-steepest-descent algorithm for regression coefficient estimation. We prove that the consistency of the proposed estimator is in the order of O(sqrt{r(q+m)+p}/sqrt{n}), where r is the rank, p x m is the dimension of the coefficient matrix and p is the dimension of the coefficient vector. When the rank r is small, this rate improves over O(sqrt{qm+p}/sqrt{n}), the consistency of the existing work (Li et al. in Electron J Stat 15:1909-1950, 2021) that does not apply a rank constraint. In addition, we prove that all accumulation points of the iterates have similar estimation errors asymptotically and substantially attaining the minimax rate. We validate the proposed method through a simulated dataset on two-dimensional shape images and two real datasets of brain signals and microscopic leucorrhea images.
more » « less
Full Text Available
The properties of the positronium lifetime image reconstruction based on maximum likelihood estimation

https://doi.org/10.5604/01.3001.0054.1807

Chen, Zhuo; An, Lingling; Kao, Chien-Min; Huang, Hsin-Hsiung (December 2023, Bio-Algorithms and Med-Systems)

The positronium lifetime imaging (PLI) reconstruction is a technique used in time-of-flight (TOF) positron emission tomography (PET) imaging that involves measuring the lifespan of positronium, which is a metastable electron-positron pair that arises when a PET molecule releases a positron, prior to its annihilation. We have previously developed a maximum likelihood (ML) algorithm for PLI reconstruction and demonstrated that it can generate quantitatively accurate lifetime images for a 570 ps (pico-seconds) TOF PET system. In this study, we conducted further investigations into the statistical properties of the algorithm, including the variability of the reconstruction results, the sensitivity of the algorithm to the number of acquired PLI events and its robustness to hyperparameter choices. Our findings indicate that the proposed ML method produces sufficiently stable lifetime images to enable reliable distinction of regions of interest. Moreover, the number of PLI events required to produce quantitatively accurate lifetime images is computationally plausible. These results demonstrate the potential of our ML algorithm for advancing the capabilities of TOF PET imaging.
more » « less
Full Text Available
Multi-omics Integrative Analysis for Incomplete Data Using Weighted p-Value Adjustment Approaches

https://doi.org/10.1007/s13253-024-00603-3

Zhang, Wenda; Ma, Zichen; Ho, Yen-Yi; Yang, Shuyi; Habiger, Joshua; Huang, Hsin-Hsiung; Huang, Yufei (February 2024, Journal of Agricultural, Biological and Environmental Statistics)

Abstract The advancements in high-throughput technologies provide exciting opportunities to obtain multi-omics data from the same individuals in a biomedical study, and joint analyses of data from multiple sources offer many benefits. However, the occurrence of missing values is an inevitable issue in multi-omics data because measurements such as mRNA gene expression levels often require invasive tissue sampling from patients. Common approaches for addressing missing measurements include analyses based on observations with complete data or multiple imputation methods. In this paper, we propose a novel integrative multi-omics analytical framework based onp-value weight adjustment in order to incorporate observations with incomplete data into the analysis. By splitting the data into a complete set with full information and an incomplete set with missing measurements, we introduce mechanisms to derive weights and weight-adjustedp-values from the two sets. Through simulation analyses, we demonstrate that the proposed framework achieves considerable statistical power gains compared to a complete case analysis or multiple imputation approaches. We illustrate the implementation of our proposed framework in a study of preterm infant birth weights by a joint analysis of DNA methylation, mRNA, and the phenotypic outcome. Supplementary materials accompanying this paper appear online.
more » « less
Full Text Available
Statistical modeling of Peromyscus maniculatus (deer mouse) amounts per trap with spatiotemporal data

https://doi.org/10.1007/s42081-023-00212-3

Huang, Hsin-Hsiung; He, Qing (July 2023, Japanese Journal of Statistics and Data Science)

The North American deer mice (Peromyscus maniculatus) have been used as an environmental change indicator in North America. Since precipitation and temperature changes affect plant productivity and deer mouse habitats, they are substantial factors of deer mouse population radical variations. Therefore, modeling their association is important for monitoring dynamic changes of the deer mouse amounts per trap and relationships among weather variables such as precipitation, maximum and minimum temperatures. We acquired the National Ecological Observatory Network (NEON) data of deer mouse monthly amounts in traps for 2013 through 2022 in the contiguous United States from long-term study sites maintained for monitoring spatial differences and temporal changes in populations. We categorize the contiguous United States into six regions associated with climates. The proposed method identifies important factors of temperature and precipitation seasonal patterns with the month and year temporal effect interacting with the proposed climate-related regions.
more » « less
Full Text Available
Unsupervised vessel trajectory reconstruction

https://doi.org/10.3389/fams.2023.1124091

Chen, Chih-Wei; Huang, Hsin-Hsiung (March 2023, Frontiers in Applied Mathematics and Statistics)

A trajectory is a sequence of observations in time and space, for examples, the path formed by maritime vessels, orbital debris, or aircraft. It is important to track and reconstruct vessel trajectories using the Automated Identification System (AIS) data in real-world applications for maritime navigation safety. In this project, we use the National Science Foundation (NSF)'s Algorithms for Threat Detection program (ATD) 2019 Challenge AIS data to develop novel trajectory reconstruction method. Given a sequence ofNunlabeled timestamped observations, the goal is to track trajectories by clustering the AIS points with predicted positions using the information from the true trajectoriesΧ. It is a natural way to connect the observed pointx_îwith the closest point that is estimated by using the location, time, speed, and angle information from a set of the points under considerationx_i∀i∈ {1, 2, …,N}. The introduced method is an unsupervised clustering-based method that does not train a supervised model which may incur a significant computational cost, so it leads to a real-time, reliable, and accurate trajectory reconstruction method. Our experimental results show that the proposed method successfully clusters vessel trajectories.
more » « less
Full Text Available
Smoothing regression and impact measures for accidents of traffic flows

https://doi.org/10.1080/02664763.2023.2175799

Yu, Zhou; Yang, Jie; Huang, Hsin-Hsiung (February 2023, Journal of Applied Statistics)

Full Text Available
Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data

https://doi.org/10.51387/23-NEJSDS20

He, Qing; Harrison, Charles W.; Huang, Hsin-Hsiung (January 2023, The New England Journal of Statistics in Data Science)

Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.
more » « less
Full Text Available
Clustering Gene Expressions Using the Table Invitation Prior

https://doi.org/10.3390/genes13112036

Harrison, Charles W.; He, Qing; Huang, Hsin-Hsiung (November 2022, Genes)

A prior for Bayesian nonparametric clustering called the Table Invitation Prior (TIP) is used to cluster gene expression data. TIP uses information concerning the pairwise distances between subjects (e.g., gene expression samples) and automatically estimates the number of clusters. TIP’s hyperparameters are estimated using a univariate multiple change point detection algorithm with respect to the subject distances, and thus TIP does not require an analyst’s intervention for estimating hyperparameters. A Gibbs sampling algorithm is provided, and TIP is used in conjunction with a Normal-Inverse-Wishart likelihood to cluster 801 gene expression samples, each of which belongs to one of five different types of cancer.
more » « less
Full Text Available

« Prev Next »

Search for: All records