Abstract Fusion learning methods, developed for the purpose of analyzing datasets from many different sources, have become a popular research topic in recent years. Individualized inference approaches through fusion learning extend fusion learning approaches to individualized inference problems over a heterogeneous population, where similar individuals are fused together to enhance the inference over the target individual. Both classical fusion learning and individualized inference approaches through fusion learning are established based on weighted aggregation of individual information, but the weight used in the latter is localized to thetargetindividual. This article provides a review on two individualized inference methods through fusion learning,iFusion andiGroup, that are developed under different asymptotic settings. Both procedures guarantee optimal asymptotic theoretical performance and computational scalability. This article is categorized under:Statistical Learning and Exploratory Methods of the Data Sciences > Manifold LearningStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Nonparametric MethodsData: Types and Structure > Massive Data
more »
« less
Information criteria for model selection
Abstract The rapid development of modeling techniques has brought many opportunities for data‐driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general. This article is categorized under:Data: Types and Structure > Traditional Statistical DataStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Information Theoretic MethodsStatistical Models > Model Selection
more »
« less
- PAR ID:
- 10397958
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- WIREs Computational Statistics
- Volume:
- 15
- Issue:
- 5
- ISSN:
- 1939-5108
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Gaussian process (GP) is a staple in the toolkit of a spatial statistician. Well‐documented computing roadblocks in the analysis of large geospatial datasets using GPs have now largely been mitigated via several recent statistical innovations. Nearest neighbor Gaussian process (NNGP) has emerged as one of the leading candidates for such massive‐scale geospatial analysis owing to their empirical success. This article reviews the connection of NNGP to sparse Cholesky factors of the spatial precision (inverse‐covariance) matrix. Focus of the review is on these sparse Cholesky matrices which are versatile and have recently found many diverse applications beyond the primary usage of NNGP for fast parameter estimation and prediction in the spatial (generalized) linear models. In particular, we discuss applications of sparse NNGP Cholesky matrices to address multifaceted computational issues in spatial bootstrapping, simulation of large‐scale realizations of Gaussian random fields, and extensions to nonparametric mean function estimation of a GP using random forests. We also review a sparse‐Cholesky‐based model for areal (geographically aggregated) data that addresses long‐established interpretability issues of existing areal models. Finally, we highlight some yet‐to‐be‐addressed issues of such sparse Cholesky approximations that warrant further research. This article is categorized under:Algorithms and Computational Methods > AlgorithmsAlgorithms and Computational Methods > Numerical Methodsmore » « less
-
Abstract BackgroundEmerging evidence indicates an elevated risk of post-concussion musculoskeletal (MSK) injuries in collegiate athletes; however, identifying athletes at highest risk remains to be elucidated. ObjectiveThe purpose of this study was to model post-concussion MSK injury risk in collegiate athletes by integrating a comprehensive set of variables by machine learning. MethodsA risk model was developed and tested on a dataset of 194 athletes (155 in the training set and 39 in the test set) with 135 variables entered into the analysis, which included participant’s heath and athletic history, concussion injury and recovery specific criteria, and outcomes from a diverse array of concussions assessments. The machine learning approach involved transforming variables by the Weight of Evidence method, variable selection using L1-penalized logistic regression, model selection via the Akaike Information Criterion, and a final L2-regularized logistic regression fit. ResultsA model with 48 predictive variables yielded significant predictive performance of subsequent MSK injury with an area under the curve of 0.82. Top predictors included cognitive, balance, and reaction at Baseline and Acute timepoints. At a specified false positive rate of 6.67%, the model achieves a true positive rate (sensitivity) of 79% and a precision (positive predictive value) of 95% for identifying at-risk athletes via a well calibrated composite risk score. ConclusionThese results support the development of a sensitive and specific injury risk model using standard data combined with a novel methodological approach that may allow clinicians to target high injury risk student-athletes. The development and refinement of predictive models, incorporating machine learning and utilizing comprehensive datasets, could lead to improved identification of high-risk athletes and allow for the implementation of targeted injury risk reduction strategies by identifying student-athletes most at risk for post-concussion MSK injury. Key PointsThere is a well-established elevated risk of post-concussion subsequent musculoskeletal injury; however, prior efforts have failed to identify risk factors.This study developed a composite risk score model with an AUC of 0.82 from common concussion clinical measures and participant demographics.By identifying athletes at elevated risk, clinicians may be able to reduce injury risk through targeted injury risk reduction programs.more » « less
-
Abstract Since the very first detection of gravitational waves from the coalescence of two black holes in 2015, Bayesian statistical methods have been routinely applied by LIGO and Virgo to extract the signal out of noisy interferometric measurements, obtain point estimates of the physical parameters responsible for producing the signal, and rigorously quantify their uncertainties. Different computational techniques have been devised depending on the source of the gravitational radiation and the gravitational waveform model used. Prominent sources of gravitational waves are binary black hole or neutron star mergers, the only objects that have been observed by detectors to date. But also gravitational waves from core‐collapse supernovae, rapidly rotating neutron stars, and the stochastic gravitational‐wave background are in the sensitivity band of the ground‐based interferometers and expected to be observable in future observation runs. As nonlinearities of the complex waveforms and the high‐dimensional parameter spaces preclude analytic evaluation of the posterior distribution, posterior inference for all these sources relies on computer‐intensive simulation techniques such as Markov chain Monte Carlo methods. A review of state‐of‐the‐art Bayesian statistical parameter estimation methods will be given for researchers in this cross‐disciplinary area of gravitational wave data analysis. This article is categorized under:Applications of Computational Statistics > Signal and Image Processing and CodingStatistical and Graphical Methods of Data Analysis > Markov Chain Monte Carlo (MCMC)Statistical Models > Time Series Modelsmore » « less
-
Abstract The potential energy of molecular species and their conformers can be computed with a wide range of computational chemistry methods, from molecular mechanics to ab initio quantum chemistry. However, the proper choice of the computational approach based on computational cost and reliability of calculated energies is a dilemma, especially for large molecules. This dilemma is proved to be even more problematic for studies that require hundreds and thousands of calculations, such as drug discovery. On the other hand, driven by their pattern recognition capabilities, neural networks started to gain popularity in the computational chemistry community. During the last decade, many neural network potentials have been developed to predict a variety of chemical information of different systems. Neural network potentials are proved to predict chemical properties with accuracy comparable to quantum mechanical approaches but with the cost approaching molecular mechanics calculations. As a result, the development of more reliable, transferable, and extensible neural network potentials became an attractive field of study for researchers. In this review, we outlined an overview of the status of current neural network potentials and strategies to improve their accuracy. We provide recent examples of studies that prove the applicability of these potentials. We also discuss the capabilities and shortcomings of the current models and the challenges and future aspects of their development and applications. It is expected that this review would provide guidance for the development of neural network potentials and the exploitation of their applicability. This article is categorized under:Data Science > Artificial Intelligence/Machine LearningMolecular and Statistical Mechanics > Molecular InteractionsSoftware > Molecular Modelingmore » « less
An official website of the United States government
