NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sharpen data-driven prediction rules of individual large earthquakes with aid of Fourier and Gauss

https://doi.org/10.1038/s41598-023-43181-z

Cho, In Ho (December 2023, Scientific Reports)

Abstract Predicting individual large earthquakes (EQs)’ locations, magnitudes, and timing remains unreachable. The author’s prior study shows that individual large EQs have unique signatures obtained from multi-layered data transformations. Via spatio-temporal convolutions, decades-long EQ catalog data are transformed into pseudo-physics quantities (e.g., energy, power, vorticity, and Laplacian), which turn into surface-like information via Gauss curvatures. Using these new features, a rule-learning machine learning approach unravels promising prediction rules. This paper suggests further data transformation via Fourier transformation (FT). Results show that FT-based new feature can help sharpen the prediction rules. Feasibility tests of large EQs ($$M\ge$$ $M \geq$ 6.5) over the past 40 years in the western U.S. show promise, shedding light on data-driven prediction of individual large EQs. The handshake among ML methods, Fourier, and Gauss may help answer the long-standing enigma of seismogenesis.
more » « less
Full Text Available
Statistical inference with semiparametric nonignorable nonresponse models

https://doi.org/10.1111/sjos.12652

Uehara, Masatoshi; Lee, Danhyang; Kim, Jae‐Kwang (May 2023, Scandinavian Journal of Statistics)

Abstract How to deal with nonignorable response is often a challenging problem encountered in statistical analysis with missing data. Parametric model assumption for the response mechanism is sensitive to model misspecification. We consider a semiparametric response model that relaxes the parametric model assumption in the response mechanism. Two types of efficient estimators, profile maximum likelihood estimator and profile calibration estimator, are proposed, and their asymptotic properties are investigated. Two extensive simulation studies are used to compare with some existing methods. We present an application of our method using data from the Korean Labor and Income Panel Survey.
more » « less
Pursuit of hidden rules behind the irregularity of nano capillary lithography by hybrid intelligence

https://doi.org/10.1038/s41598-023-41022-7

Cho, In Ho; Ji, Myung Gi; Kim, Jaeyoun (August 2023, Scientific Reports)

Abstract Nature finds a way to leverage nanotextures to achieve desired functions. Recent advances in nanotechnologies endow fascinating multi-functionalities to nanotextures by modulating the nanopixel’s height. But nanoscale height control is a daunting task involving chemical and/or physical processes. As a facile, cost-effective, and potentially scalable remedy, the nanoscale capillary force lithography (CFL) receives notable attention. The key enabler is optical pre-modification of photopolymer’s characteristics via ultraviolet (UV) exposure. Still, the underlying physics of the nanoscale CFL is not well understood, and unexplained phenomena such as the “forbidden gap” in the nano capillary rise (unreachable height) abound. Due to the lack of large data, small length scales, and the absence of first principles, direct adoptions of machine learning or analytical approaches have been difficult. This paper proposes a hybrid intelligence approach in which both artificial and human intelligence coherently work together to unravel the hidden rules with small data. Our results show promising performance in identifying transparent, physics-retained rules of air diffusivity, dynamic viscosity, and surface tension, which collectively appear to explain the forbidden gap in the nanoscale CFL. This paper promotes synergistic collaborations of humans and AI for advancing nanotechnology and beyond.
more » « less
Soft calibration for selection bias problems under mixed-effects models

https://doi.org/10.1093/biomet/asad016

Gao, Chenyin; Yang, Shu; Kim, Jae Kwang (March 2023, Biometrika)

Abstract Calibration weighting has been widely used to correct selection biases in nonprobability sampling, missing data and causal inference. The main idea is to calibrate the biased sample to the benchmark by adjusting the subject weights. However, hard calibration can produce enormous weights when an exact calibration is enforced on a large set of extraneous covariates. This article proposes a soft calibration scheme, where the outcome and the selection indicator follow mixed-effect models. The scheme imposes an exact calibration on the fixed effects and an approximate calibration on the random effects. On the one hand, our soft calibration has an intrinsic connection with best linear unbiased prediction, which results in a more efficient estimation compared to hard calibration. On the other hand, soft calibration weighting estimation can be envisioned as penalized propensity score weight estimation, with the penalty term motivated by the mixed-effect structure. The asymptotic distribution and a valid variance estimator are derived for soft calibration. We demonstrate the superiority of the proposed estimator over other competitors in simulation studies and using a real-world data application on the effect of BMI screening on childhood obesity.
more » « less
Full Text Available
Gauss curvature-based unique signatures of individual large earthquakes and its implications for customized data-driven prediction

https://doi.org/10.1038/s41598-022-12575-w

Cho, In Ho (May 2022, Scientific Reports)

Abstract Statistical descriptions of earthquakes offer important probabilistic information, and newly emerging technologies of high-precision observations and machine learning collectively advance our knowledge regarding complex earthquake behaviors. Still, there remains a formidable knowledge gap for predicting individual large earthquakes’ locations and magnitudes. Here, this study shows that the individual large earthquakes may have unique signatures that can be represented by new high-dimensional features—Gauss curvature-based coordinates. Particularly, the observed earthquake catalog data are transformed into a number of pseudo physics quantities (i.e., energy, power, vorticity, and Laplacian) which turn into smooth surface-like information via spatio-temporal convolution, giving rise to the new high-dimensional coordinates. Validations with 40-year earthquakes in the West U.S. region show that the new coordinates appear to hold uniqueness for individual large earthquakes ($$M_w \ge 7.0$$ $M_{w} \geq 7.0$ ), and the pseudo physics quantities help identify a customized data-driven prediction model. A Bayesian evolutionary algorithm in conjunction with flexible bases can identify a data-driven model, demonstrating its promising reproduction of individual large earthquake’s location and magnitude. Results imply that an individual large earthquake can be distinguished and remembered while its best-so-far model can be customized by machine learning. This study paves a new way to data-driven automated evolution of individual earthquake prediction.
more » « less
Unraveling hidden rules behind the wet-to-dry transition of bubble array by glass-box physics rule learner

https://doi.org/10.1038/s41598-022-07170-y

Cho, In Ho; Yeom, Sinchul; Sarkar, Tanmoy; Oh, Tae-Sik (February 2022, Scientific Reports)

Abstract A liquid–gas foam, here called bubble array, is a ubiquitous phenomenon widely observed in daily lives, food, pharmaceutical and cosmetic products, and even bio- and nano-technologies. This intriguing phenomenon has been often studied in a well-controlled environment in laboratories, computations, or analytical models. Still, real-world bubble undergoes complex nonlinear transitions from wet to dry conditions, which are hard to describe by unified rules as a whole. Here, we show that a few early-phase snapshots of bubble array can be learned by a glass-box physics rule learner (GPRL) leading to prediction rules of future bubble array. Unlike the black-box machine learning approach, the glass-box approach seeks to unravel expressive rules of the phenomenon that can evolve. Without known principles, GPRL identifies plausible rules of bubble prediction with an elongated bubble array data that transitions from wet to dry states. Then, the best-so-far GPRL-identified rule is applied to an independent circular bubble array, demonstrating the potential generality of the rule. We explain how GPRL uses the spatio-temporal convolved information of early bubbles to mimic the scientist’s perception of bubble sides, shapes, and inter-bubble influences. This research will help combine foam physics and machine learning to better understand and control bubbles.
more » « less
Semiparametric imputation using conditional Gaussian mixture models under item nonresponse

https://doi.org/10.1111/biom.13410

Lee, Danhyang; Kim, Jae Kwang (December 2020, Biometrics)

Abstract Imputation is a popular technique for handling item nonresponse. Parametric imputation is based on a parametric model for imputation and is not robust against the failure of the imputation model. Nonparametric imputation is fully robust but is not applicable when the dimension of covariates is large due to the curse of dimensionality. Semiparametric imputation is another robust imputation based on a flexible model where the number of model parameters can increase with the sample size. In this paper, we propose a new semiparametric imputation based on a more flexible model assumption than the Gaussian mixture model. In the proposed mixture model, we assume a conditional Gaussian model for the study variable given the auxiliary variables, but the marginal distribution of the auxiliary variables is not necessarily Gaussian. The proposed mixture model is more flexible and achieves a better approximation than the Gaussian mixture models. The proposed method is applicable to high‐dimensional covariate problem by including a penalty function in the conditional log‐likelihood function. The proposed method is applied to the 2017 Korean Household Income and Expenditure Survey conducted by Statistics Korea.
more » « less
A framework for glass-box physics rule learner and its application to nano-scale phenomena

https://doi.org/10.1038/s42005-020-0339-x

Cho, In Ho; Li, Qiang; Biswas, Rana; Kim, Jaeyoun (May 2020, Communications Physics)

Abstract Attempts to use machine learning to discover hidden physical rules are in their infancy, and such attempts confront more challenges when experiments involve multifaceted measurements over three-dimensional objects. Here we propose a framework that can infuse scientists’ basic knowledge into a glass-box rule learner to extract hidden physical rules behind complex physics phenomena. A “convolved information index” is proposed to handle physical measurements over three-dimensional nano-scale specimens, and the multi-layered convolutions are “externalized” over multiple depths at the information level, not in the opaque networks. A transparent, flexible link function is proposed as a mathematical expression generator, thereby pursuing “glass-box” prediction. Consistent evolution is realized by integrating a Bayesian update and evolutionary algorithms. The framework is applied to nano-scale contact electrification phenomena, and results show promising performances in unraveling transparent expressions of a hidden physical rule. The proposed approach will catalyze a synergistic machine learning-physics partnership.
more » « less
Ultra Data-Oriented Parallel Fractional Hot-Deck Imputation With Efficient Linearized Variance Estimation

https://doi.org/10.1109/TKDE.2023.3249567

Yang, Yicheng; Kwon, Yonghyun; Kim, Jae Kwang; Cho, In Ho (September 2023, IEEE Transactions on Knowledge and Data Engineering)

Full Text Available
Review: Evolution of Fractional Hot Deck Imputation for Curing Incomplete Data-From Small to Ultra Large Sizes

https://doi.org/10.5121/csit.2023.131315

Cho, In Ho; Kim, Jae-Kwang; Yang, Yicheng; Kwon, Yonghyun; Chapagain, Ashish (July 2023, International Conference on Computer Science and Information Technology)

Machine learning (ML) advancements hinge upon data - the vital ingredient for training. Statistically-curing the missing data is called imputation, and there are many imputation theories and tools. Butthey often require difficult statistical and/or discipline-specific assumptions, lacking general tools capable of curing large data. Fractional hot deck imputation (FHDI) can cure data by filling nonresponses with observed values (thus, hot-deck) without resorting to assumptions. The review paper summarizes how FHDI evolves to ultra dataoriented parallel version (UP-FHDI).Here, ultra data have concurrently large instances (bign) and high dimensionality (big-p). The evolution is made possible with specialized parallelism and fast variance estimation technique. Validations with scientific and engineering data confirm that UP-FHDI can cure ultra data(p >10,000& n > 1M), and the cured data sets can improve the prediction accuracy of subsequent ML. The evolved FHDI will help promote reliable ML with cured big data.
more » « less
Full Text Available

« Prev Next »

Search for: All records