skip to main content


Search for: All records

Award ID contains: 1934568

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Summary

    The accurate estimation of prediction errors in time series is an important problem. It immediately affects the accuracy of prediction intervals but also the quality of a number of widely used time series model selection criteria such as AIC and others. Except for simple cases, however, it is difficult or even infeasible to obtain exact analytical expressions for one-step and multi-step predictions. This may be one of the reasons that, unlike in the independent case (see Efron, 2004), until today there has been no fully established methodology for time series prediction error estimation. Starting from an approximation to the bias-variance decomposition of the squared prediction error, this work is therefore concerned with the estimation of prediction errors in both univariate and multivariate stationary time series. In particular, several estimates are developed for a general class of predictors that includes most of the popular linear, nonlinear, parametric and nonparametric time series models used in practice, where causal invertible ARMA and nonparametric AR processes are discussed as lead examples. Simulation results indicate that the proposed estimators perform quite well in finite samples. The estimates may also be used for model selection when the purpose of modeling is prediction.

     
    more » « less
  2. Abstract

    With continued fossil‐fuel dependence, anthropogenic aerosols over South Asia are projected to increase until the mid‐21st century along with greenhouse gases (GHGs). Using the Community Earth System Model (CESM1) Large Ensemble, we quantify the influence of aerosols and GHGs on South Asian seasonal precipitation patterns over the 21st century under a very high‐emissions (RCP 8.5) trajectory. We find that increasing local aerosol concentrations could continue to suppress precipitation over South Asia in the near‐term, delaying the emergence of precipitation increases in response to GHGs by several decades in the monsoon season and a decade in the post‐monsoon season. Emergence of this wetting signal is expected in both seasons by the mid‐21st century. Our results demonstrate that the trajectory of local aerosols together with GHGs will shape near‐future precipitation patterns over South Asia. Therefore, constraining precipitation response to different trajectories of both forcers is critical for informing near‐term adaptation efforts.

     
    more » « less
  3. Abstract

    Data from high-energy observations are usually obtained as lists of photon events. A common analysis task for such data is to identify whether diffuse emission exists, and to estimate its surface brightness, even in the presence of point sources that may be superposed. We have developed a novel nonparametric event list segmentation algorithm to divide up the field of view into distinct emission components. We use photon location data directly, without binning them into an image. We first construct a graph from the Voronoi tessellation of the observed photon locations and then grow segments using a new adaptation of seeded region growing that we callSeeded Region Growing on Graph, after which the overall method is namedSRGonG. Starting with a set of seed locations, this results in an oversegmented data set, whichSRGonGthen coalesces using a greedy algorithm where adjacent segments are merged to minimize a model comparison statistic; we use the Bayesian Information Criterion. UsingSRGonGwe are able to identify point-like and diffuse extended sources in the data with equal facility. We validateSRGonGusing simulations, demonstrating that it is capable of discerning irregularly shaped low-surface-brightness emission structures as well as point-like sources with strengths comparable to that seen in typical X-ray data. We demonstrateSRGonG’s use on the Chandra data of the Antennae galaxies and show that it segments the complex structures appropriately.

     
    more » « less
  4. Abstract

    We derive precise asymptotic results that are directly usable for confidence intervals and Wald hypothesis tests for likelihood-based generalized linear mixed model analysis. The essence of our approach is to derive the exact leading term behaviour of the Fisher information matrix when both the number of groups and number of observations within each group diverge. This leads to asymptotic normality results with simple studentizable forms. Similar analyses result in tractable leading term forms for the determination of approximate locally D-optimal designs.

     
    more » « less
  5. Abstract

    Extending computational harmonic analysis tools from the classical setting of regular lattices to the more general setting of graphs and networks is very important, and much research has been done recently. The generalized Haar–Walsh transform (GHWT) developed by Irion and Saito (2014) is a multiscale transform for signals on graphs, which is a generalization of the classical Haar and Walsh–Hadamard transforms. We propose theextendedgeneralized Haar–Walsh transform (eGHWT), which is a generalization of the adapted time–frequency tilings of Thiele and Villemoes (1996). The eGHWT examines not only the efficiency of graph-domain partitions but also that of “sequency-domain” partitionssimultaneously. Consequently, the eGHWT and its associated best-basis selection algorithm for graph signals significantly improve the performance of the previous GHWT with the similar computational cost,$$O(N \log N)$$O(NlogN), whereNis the number of nodes of an input graph. While the GHWT best-basis algorithm seeks the most suitable orthonormal basis for a given task among more than$$(1.5)^N$$(1.5)Npossible orthonormal bases in$$\mathbb {R}^N$$RN, the eGHWT best-basis algorithm can find a better one by searching through more than$$0.618\cdot (1.84)^N$$0.618·(1.84)Npossible orthonormal bases in$$\mathbb {R}^N$$RN. This article describes the details of the eGHWT best-basis algorithm and demonstrates its superiority using several examples including genuine graph signals as well as conventional digital images viewed as graph signals. Furthermore, we also show how the eGHWT can be extended to 2D signals and matrix-form data by viewing them as a tensor product of graphs generated from their columns and rows and demonstrate its effectiveness on applications such as image approximation.

     
    more » « less
  6. Abstract

    How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration ofEscherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences.

     
    more » « less
  7. Abstract

    The most dynamic electromagnetic coupling between the magnetosphere and ionosphere occurs in the polar upper atmosphere. It is critical to quantify the electromagnetic energy and momentum input associated with this coupling as its impacts on the ionosphere and thermosphere system are global and major, often leading to considerable disturbances in near‐Earth space environments. The current general circulation models of the upper atmosphere exhibit systematic biases that can be attributed to an inadequate representation of the Joule heating rate resulting from unaccounted stochastic fluctuations of electric fields associated with the magnetosphere‐ionosphere coupling. These biases exist regardless of geomagnetic activity levels. To overcome this limitation, a new multiresolution random field modeling approach is developed, and the efficacy of the approach is demonstrated using Super Dual Auroral Radar Network (SuperDARN) data carefully curated for the study during a largely quiet 4‐hour period on February 29, 2012. Regional small‐scale electrostatic fields sampled at different resolutions from a probabilistic distribution of electric field variability conditioned on actual SuperDARN LOS observations exhibit considerably more localized fine‐scale features in comparison to global large‐scale fields modeled using the SuperDARN Assimilative Mapping procedure. The overall hemispherically integrated Joule heating rate is increased by a factor of about 1.5 due to the effect of random regional small‐scale electric fields, which is close to the lower end of arbitrarily adjusted Joule heating multiplicative factor of 1.5 and 2.5 typically used in upper atmosphere general circulation models. The study represents an important step toward a data‐driven ensemble modeling of magnetosphere‐ionosphere‐atmosphere coupling processes.

     
    more » « less
  8. Free, publicly-accessible full text available September 1, 2025
  9. Discrimination-aware classification methods remedy socioeconomic disparities exacerbated by machine learning systems. In this paper, we propose a novel data pre-processing technique that assigns weights to training instances in order to reduce discrimination without changing any of the inputs or labels. While the existing reweighing approach only looks into sensitive attributes, we refine the weights by utilizing both sensitive and insensitive ones. We formulate our weight assignment as a linear programming problem. The weights can be directly used in any classification model into which they are incorporated. We demonstrate three advantages of our approach on synthetic and benchmark datasets. First, discrimination reduction comes at a small cost in accuracy. Second, our method is more scalable than most other pre-processing methods. Third, the trade-off between fairness and accuracy can be explicitly monitored by model users. Code is available athttps://github.com/frnliang/refined_reweighing.

     
    more » « less
    Free, publicly-accessible full text available August 20, 2025
  10. Free, publicly-accessible full text available March 30, 2025