Abstract We develop a new methodology for the multi‐resolution assimilation of electric fields by extending a Gaussian process model (Lattice Kriging) used for scalar field originally to vector field. This method takes the background empirical model as “a priori” knowledge and fuses real observations under the Gaussian process framework. The comparison of assimilated results under two different background models and three different resolutions suggests that (a) the new method significantly reduces fitting errors compared with the global spherical harmonic fitting (SHF) because it uses range‐limited basis functions ideal for the local fitting and (b) the fitting resolution, determined by the number of basis functions, is adjustable and higher resolution leads to smaller errors, indicating that more structures in the data are captured. We also test the sensitivity of the fitting results to the total amount of input data: (a) as the data amount increases, the fitting results deviate from the background model and become more determined by data and (b) the impacts of data can reach remote regions with no data available. The assimilation also better captures short‐period variations in local PFISR measurements than the SHF and maintains a coherent pattern with the surrounding. The multi‐resolution Lattice Kriging is examined via attributing basis functions into multiple levels with different resolutions (fine level is located in the region with observations). Such multi‐resolution fitting has the smallest error and shortest computation time, making the regional high‐resolution modeling efficient. Our method can be modified to achieve the multi‐resolution assimilation for other vector fields from unevenly distributed observations.
more »
« less
Traditional kriging versus modern Gaussian processes for large‐scale mining data
The canonical technique for nonlinear modeling of spatial/point‐referenced data is known as kriging in geostatistics, and as Gaussian Process (GP) regression for surrogate modeling and statistical learning. This article reviews many similarities shared between kriging and GPs, but also highlights some important differences. One is that GPs impose a process that can be used to automate kernel/variogram inference, thus removing the human from the loop. The GP framework also suggests a probabilistically valid means of scaling to handle a large corpus of training data, that is, an alternative to ordinary kriging. Finally, recent GP implementations are tailored to make the most of modern computing architectures, such as multi‐core workstations and multi‐node supercomputers. We argue that such distinctions are important even in classically geostatistical settings. To back that up, we present out‐of‐sample validation exercises using two, real, large‐scale borehole data sets acquired in the mining of gold and other minerals. We compare classic kriging with several variations of modern GPs and conclude that the latter is more economical (fewer human and compute resources), more accurate and offers better uncertainty quantification. We go on to show how the fully generative modeling apparatus provided by GPs can gracefully accommodate left‐censoring of small measurements, as commonly occurs in mining data and other borehole assays.
more »
« less
- Award ID(s):
- 1822108
- PAR ID:
- 10652343
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Statistical Analysis and Data Mining: The ASA Data Science Journal
- Volume:
- 16
- Issue:
- 5
- ISSN:
- 1932-1864
- Page Range / eLocation ID:
- 488 to 506
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We introduce a novel method for Gaussian process (GP) modeling of massive datasets called globally approximate Gaussian process (GAGP). Unlike most large-scale supervised learners such as neural networks and trees, GAGP is easy to fit and can interpret the model behavior, making it particularly useful in engineering design with big data. The key idea of GAGP is to build a collection of independent GPs that use the same hyperparameters but randomly distribute the entire training dataset among themselves. This is based on our observation that the GP hyperparameter approximations change negligibly as the size of the training data exceeds a certain level, which can be estimated systematically. For inference, the predictions from all GPs in the collection are pooled, allowing the entire training dataset to be efficiently exploited for prediction. Through analytical examples, we demonstrate that GAGP achieves very high predictive power matching (and in some cases exceeding) that of state-of-the-art supervised learning methods. We illustrate the application of GAGP in engineering design with a problem on data-driven metamaterials, using it to link reduced-dimension geometrical descriptors of unit cells and their properties. Searching for new unit cell designs with desired properties is then achieved by employing GAGP in inverse optimization.more » « less
-
Gaussian processes (GPs) are very widely used for modeling of unknown functions or surfaces in applications ranging from regression to classification to spatial processes. Although there is an increasingly vast literature on applications, methods, theory and algorithms related to GPs, the overwhelming majority of this literature focuses on the case in which the input domain corresponds to a Euclidean space. However, particularly in recent years with the increasing collection of complex data, it is commonly the case that the input domain does not have such a simple form. For example, it is common for the inputs to be restricted to a non-Euclidean manifold, a case which forms the motivation for this article. In particular, we propose a general extrinsic framework for GP modeling on manifolds, which relies on embedding of the manifold into a Euclidean space and then constructing extrinsic kernels for GPs on their images. These extrinsic Gaussian processes (eGPs) are used as prior distributions for unknown functions in Bayesian inferences. Our approach is simple and general, and we show that the eGPs inherit fine theoretical properties from GP models in Euclidean spaces. We consider applications of our models to regression and classification problems with predictors lying in a large class of manifolds, including spheres, planar shape spaces, a space of positive definite matrices, and Grassmannians. Our models can be readily used by practitioners in biological sciences for various regression and classification problems, such as disease diagnosis or detection. Our work is also likely to have impact in spatial statistics when spatial locations are on the sphere or other geometric spaces.more » « less
-
null (Ed.)Individual differences have been recognized as an important factor in the learning process. However, there are few successes in using known dimensions of individual differences in solving an important problem of predicting student performance and engagement in online learning. At the same time, learning analytics research has demonstrated that the large volume of learning data collected by modern e-learning systems could be used to recognize student behavior patterns and could be used to connect these patterns with measures of student performance. Our paper attempts to bridge these two research directions. By applying a sequence mining approach to a large volume of learner data collected by an online learning system, we build models of student learning behavior. However, instead of following modern work on behavior mining (i.e., using this behavior directly for performance prediction tasks), we attempt to follow traditional work on modeling individual differences in quantifying this behavior on a latent data-driven personality scale. Our research shows that this data-driven model of individual differences performs significantly better than several traditional models of individual differences in predicting important parameters of the learning process, such as success and engagement.more » « less
-
Gaussian Processes (GP) are a powerful framework for modeling expensive black-box functions and have thus been adopted for various challenging modeling and optimization problems. In GP-based modeling, we typically default to a stationary covariance kernel to model the underlying function over the input domain, but many real-world applications, such as controls and cyber-physical system safety, often require modeling and optimization of functions that are locally stationary and globally non-stationary across the domain; using standard GPs with a stationary kernel often yields poor modeling performance in such scenarios. In this paper, we propose a novel modeling technique called Class-GP (Class Gaussian Process) to model a class of heterogeneous functions, i.e., non-stationary functions which can be divided into locally stationary functions over the partitions of input space with one active stationary function in each partition. We provide theoretical insights into the modeling power of Class-GP and demonstrate its benefits over standard modeling techniques via extensive empirical evaluations.more » « less
An official website of the United States government

