- Award ID(s):
- 1741340
- PAR ID:
- 10178114
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 116
- Issue:
- 44
- ISSN:
- 0027-8424
- Page Range / eLocation ID:
- 22071 to 22080
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Abstract Machine learning methods have been widely applied to big data analysis in genomics and epigenomics research. Although accuracy and efficiency are common goals in many modeling tasks, model interpretability is especially important to these studies towards understanding the underlying molecular and cellular mechanisms. Deep neural networks (DNNs) have recently gained popularity in various types of genomic and epigenomic studies due to their capabilities in utilizing large-scale high-throughput bioinformatics data and achieving high accuracy in predictions and classifications. However, DNNs are often challenged by their potential to explain the predictions due to their black-box nature. In this review, we present current development in the model interpretation of DNNs, focusing on their applications in genomics and epigenomics. We first describe state-of-the-art DNN interpretation methods in representative machine learning fields. We then summarize the DNN interpretation methods in recent studies on genomics and epigenomics, focusing on current data- and computing-intensive topics such as sequence motif identification, genetic variations, gene expression, chromatin interactions and non-coding RNAs. We also present the biological discoveries that resulted from these interpretation methods. We finally discuss the advantages and limitations of current interpretation approaches in the context of genomic and epigenomic studies. Contact:xiaoman@mail.ucf.edu, haihu@cs.ucf.edumore » « less
-
Abstract The first step in constructing a machine learning model is defining the features of the dataset that can be used for optimal learning. In this work, we discuss feature selection methods, which can be used to build better models, as well as achieve model interpretability. We applied these methods in the context of stress hotspot classification problem, to determine what microstructural characteristics can cause stress to build up in certain grains during uniaxial tensile deformation. The results show how some feature selection techniques are biased and demonstrate a preferred technique to get feature rankings for physical interpretations.
-
Abstract Statistical and machine learning methods have become paramount in order to handle large size claims data as part of health care fraud detection frameworks. Among these, predictive methods such as regression and classification algorithms are widely used with labeled data. However, the imbalanced nature of health care claims data and skewness of fraud distributions result with challenges in practical applications. This paper presents the use of various classification algorithms and data preāprocessing methods on claim payment populations and overpayment scenarios with different characteristics. It can help the health care practitioners evaluate the advantages and disadvantages of these analytical methods, and choose the right classification method and apply them properly for their specific circumstances. We utilize publicly available U.S. Medicare Part B health care claims payment data from the hospitals with a number of fraud label scenarios to demonstrate potential fraud patterns. We discuss the computational demand and accuracy of the methods.
-
Abstract Robust quantification of predictive uncertainty is a critical addition needed for machine learning applied to weather and climate problems to improve the understanding of what is driving prediction sensitivity. Ensembles of machine learning models provide predictive uncertainty estimates in a conceptually simple way but require multiple models for training and prediction, increasing computational cost and latency. Parametric deep learning can estimate uncertainty with one model by predicting the parameters of a probability distribution but does not account for epistemic uncertainty. Evidential deep learning, a technique that extends parametric deep learning to higher-order distributions, can account for both aleatoric and epistemic uncertainties with one model. This study compares the uncertainty derived from evidential neural networks to that obtained from ensembles. Through applications of the classification of winter precipitation type and regression of surface-layer fluxes, we show evidential deep learning models attaining predictive accuracy rivaling standard methods while robustly quantifying both sources of uncertainty. We evaluate the uncertainty in terms of how well the predictions are calibrated and how well the uncertainty correlates with prediction error. Analyses of uncertainty in the context of the inputs reveal sensitivities to underlying meteorological processes, facilitating interpretation of the models. The conceptual simplicity, interpretability, and computational efficiency of evidential neural networks make them highly extensible, offering a promising approach for reliable and practical uncertainty quantification in Earth system science modeling. To encourage broader adoption of evidential deep learning, we have developed a new Python package, Machine Integration and Learning for Earth Systems (MILES) group Generalized Uncertainty for Earth System Science (GUESS) (MILES-GUESS) (
https://github.com/ai2es/miles-guess ), that enables users to train and evaluate both evidential and ensemble deep learning.Significance Statement This study demonstrates a new technique, evidential deep learning, for robust and computationally efficient uncertainty quantification in modeling the Earth system. The method integrates probabilistic principles into deep neural networks, enabling the estimation of both aleatoric uncertainty from noisy data and epistemic uncertainty from model limitations using a single model. Our analyses reveal how decomposing these uncertainties provides valuable insights into reliability, accuracy, and model shortcomings. We show that the approach can rival standard methods in classification and regression tasks within atmospheric science while offering practical advantages such as computational efficiency. With further advances, evidential networks have the potential to enhance risk assessment and decision-making across meteorology by improving uncertainty quantification, a longstanding challenge. This work establishes a strong foundation and motivation for the broader adoption of evidential learning, where properly quantifying uncertainties is critical yet lacking.
-
Machine learning (ML) plays an increasingly important role in improving a user's experience. However, most UX practitioners face challenges in understanding ML's capabilities or envisioning what it might be. We interviewed 13 designers who had many years of experience designing the UX of ML-enhanced products and services. We probed them to characterize their practices. They shared they do not view themselves as ML experts, nor do they think learning more about ML would make them better designers. Instead, our participants appeared to be the most successful when they engaged in ongoing collaboration with data scientists to help envision what to make and when they embraced a data-centric culture. We discuss the implications of these findings in terms of UX education and as opportunities for additional design research in support of UX designers working with ML.more » « less