skip to main content


Search for: All records

Creators/Authors contains: "Shang, Zuofeng"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available January 1, 2025
  2. Free, publicly-accessible full text available January 1, 2025
  3. Abstract

    We propose a new approach, called as functional deep neural network (FDNN), for classifying multidimensional functional data. Specifically, a deep neural network is trained based on the principal components of the training data which shall be used to predict the class label of a future data function. Unlike the popular functional discriminant analysis approaches which only work for one‐dimensional functional data, the proposed FDNN approach applies to general non‐Gaussian multidimensional functional data. Moreover, when the log density ratio possesses a locally connected functional modular structure, we show that FDNN achieves minimax optimality. The superiority of our approach is demonstrated through both simulated and real‐world datasets.

     
    more » « less
  4. High‐dimensional classification is a fundamentally important research problem in high‐dimensional data analysis. In this paper, we derive a nonasymptotic rate for the minimax excess misclassification risk when feature dimension exponentially diverges with the sample size and the Bayes classifier possesses a complicated modular structure. We also show that classifiers based on deep neural networks can attain the above rate, hence, are minimax optimal.

     
    more » « less
  5. Abstract

    Solar flares, especially the M- and X-class flares, are often associated with coronal mass ejections. They are the most important sources of space weather effects, which can severely impact the near-Earth environment. Thus it is essential to forecast flares (especially the M- and X-class ones) to mitigate their destructive and hazardous consequences. Here, we introduce several statistical and machine-learning approaches to the prediction of an active region’s (AR) flare index (FI) that quantifies the flare productivity of an AR by taking into account the number of different class flares within a certain time interval. Specifically, our sample includes 563 ARs that appeared on the solar disk from 2010 May to 2017 December. The 25 magnetic parameters, provided by the Space-weather HMI Active Region Patches (SHARP) from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory, characterize coronal magnetic energy stored in ARs by proxy and are used as the predictors. We investigate the relationship between these SHARP parameters and the FI of ARs with a machine-learning algorithm (spline regression) and the resampling method (Synthetic Minority Oversampling Technique for Regression with Gaussian Noise). Based on the established relationship, we are able to predict the value of FIs for a given AR within the next 1 day period. Compared with other four popular machine-learning algorithms, our methods improve the accuracy of FI prediction, especially for a large FI. In addition, we sort the importance of SHARP parameters by the Borda count method calculated from the ranks that are rendered by nine different machine-learning methods.

     
    more » « less
  6. In many practices, scientists are particularly interested in detecting which of the predictors are truly associated with a multivariate response. It is more accurate to model multiple responses as one vector rather than separating each component one by one. This is particularly true for complex traits having multiple correlated components. A Bayesian multivariate variable selection (BMVS) approach is proposed to select important predictors influencing the multivariate response from a candidate pool with ultrahigh dimension. By applying the sample‐size‐dependent spike and slab priors, the BMVS approach satisfies the strong selection consistency property under certain conditions, which represents the advantages of BMVS over other existing Bayesian multivariate regression‐based approaches. The proposed approach considers the covariance structure of multiple responses without assuming independence and integrates the estimation of covariance‐related parameters together with all regression parameters into one framework through a fast‐updating Markov chain Monte Carlo (MCMC) procedure. It is demonstrated through simulations that the BMVS approach outperforms some other relevant frequentist and Bayesian approaches. The proposed BMVS approach possesses a flexibility of wide applications, including genome‐wide association studies with multiple correlated phenotypes and a large scale of genetic variants and/or environmental variables, as demonstrated in the real data analyses section. The computer code and test data of the proposed method are available as an R package.

     
    more » « less
  7. null (Ed.)
  8. null (Ed.)