skip to main content

Search for: All records

Creators/Authors contains: "Yu, Guo"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract The Coronavirus Disease 2019 (COVID-19) has had a profound impact on global health and economy, making it crucial to build accurate and interpretable data-driven predictive models for COVID-19 cases to improve public policy making. The extremely large scale of the pandemic and the intrinsically changing transmission characteristics pose a great challenge for effectively predicting COVID-19 cases. To address this challenge, we propose a novel hybrid model in which the interpretability of the Autoregressive model (AR) and the predictive power of the long short-term memory neural networks (LSTM) join forces. The proposed hybrid model is formalized as a neural network with an architecture that connects two composing model blocks, of which the relative contribution is decided data-adaptively in the training procedure. We demonstrate the favorable performance of the hybrid model over its two single composing models as well as other popular predictive models through comprehensive numerical studies on two data sources under multiple evaluation metrics. Specifically, in county-level data of 8 California counties, our hybrid model achieves 4.173% MAPE, outperforming the composing AR (5.629%) and LSTM (4.934%) alone on average. In country-level datasets, our hybrid model outperforms the widely-used predictive models such as AR, LSTM, Support Vector Machines, Gradient Boosting,more »and Random Forest, in predicting the COVID-19 cases in Japan, Canada, Brazil, Argentina, Singapore, Italy, and the United Kingdom. In addition to the predictive performance, we illustrate the interpretability of our proposed hybrid model using the estimated AR component, which is a key feature that is not shared by most black-box predictive models for COVID-19 cases. Our study provides a new and promising direction for building effective and interpretable data-driven models for COVID-19 cases, which could have significant implications for public health policy making and control of the current COVID-19 and potential future pandemics.« less
    Free, publicly-accessible full text available December 1, 2024
  2. Free, publicly-accessible full text available June 13, 2024
  3. Abstract

    In the present paper, we are with integrable discretization of a modified Camassa–Holm (mCH) equation with linear dispersion term. The key of the construction is the semidiscrete analog for a set of bilinear equations of the mCH equation. First, we show that these bilinear equations and their determinant solutions either in Gram‐type or Casorati‐type can be reduced from the discrete Kadomtsev–Petviashvili (KP) equation through Miwa transformation. Then, by scrutinizing the reduction process, we obtain a set of semidiscrete bilinear equations and their general soliton solution in Gram‐type or Casorati‐type determinant form. Finally, by defining dependent variables and discrete hodograph transformations, we are able to derive an integrable semidiscrete analog of the mCH equation. It is also shown that the semidiscrete mCH equation converges to the continuous one in the continuum limit.

  4. The traditional framework for feature selection treats all features as costing the same amount. However, in reality, a scientist often has considerable discretion regarding which variables to measure, and the decision involves a tradeoff between model accuracy and cost (where cost can refer to money, time, difficulty or intrusiveness). In particular, unnecessarily including an expensive feature in a model is worse than unnecessarily including a cheap feature. We propose a procedure, which we call cheap knockoffs, for performing feature selection in a cost‐conscious manner. The key idea behind our method is to force higher cost features to compete with more knockoffs than cheaper features. We derive an upper bound on the weighted false discovery proportion associated with this procedure, which corresponds to the fraction of the feature cost that is wasted on unimportant features. We prove that this bound holds simultaneously with high probability over a path of selected variable sets of increasing size. A user may thus select a set of features based, for example, on the overall budget, while knowing that no more than a particular fraction of feature cost is wasted. We investigate, through simulation and a biomedical application, the practical importance of incorporating cost considerations intomore »the feature selection process.

    « less
  5. null (Ed.)
  6. Abstract

    Estimating the probabilities of rare floods in mountainous watersheds is challenging due to the hydrometeorological complexity of seasonally varying snowmelt and soil moisture dynamics, as well as spatiotemporal variability in extreme precipitation. Design storm methods and statistical flood frequency analyses often overlook these complexities and how they shape the probabilities of rare floods. This study presents a process‐based approach that combines gridded precipitation, stochastic storm transposition (SST), and physics‐based distributed rainfall‐runoff modeling to simulate flood peak and volume distributions up to the 10,000‐year recurrence interval and to provide insights into the hydrometeorological drivers of those events. The approach is applied to a small mountainous watershed in the Colorado Front Range in the United States. We show that storm transposition in the Front Range can be justified under existing definitions of regional precipitation homogeneity. The process‐based results show close agreement with a statistically based mixture distribution that considers underlying flood drivers. We further demonstrate that antecedent conditions and snowmelt drive frequent peak discharges and rarer flood volumes, while the upper tail of the flood peak distribution appears to be controlled by heavy rainfall and rain‐on‐snow. In particular, we highlight the important role of early fall extreme rainfall in controlling raremore »flood peaks (but not volumes), despite only one such event having been observed in recent decades. Notwithstanding issues related to the accuracy of gridded precipitation datasets, these findings highlight the potential of SST and process‐based modeling to help understand the relationships between flood drivers and flood frequencies.

    « less