skip to main content


Search for: All records

Creators/Authors contains: "Vakayil, Akhil"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In this work, we propose a novel framework for large-scale Gaussian process (GP) modeling. Contrary to the global, and local approximations proposed in the literature to address the computational bottleneck with exact GP modeling, we employ a combined global-local approach in building the approximation. Our framework uses a subset-of-data approach where the subset is a union of a set of global points designed to capture the global trend in the data, and a set of local points specific to a given testing location to capture the local trend around the testing location. The correlation function is also modeled as a combination of a global, and a local kernel. The predictive performance of our framework, which we refer to as TwinGP, is comparable to the state-of-the-art GP modeling methods, but at a fraction of their computational cost. 
    more » « less
    Free, publicly-accessible full text available April 2, 2025
  2. null (Ed.)
    In this article, we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of support points (SP), whichwas initially developed for finding the optimal representative points of a continuous distribution. We adapt SP for subsampling from a dataset using a sequential nearest neighbor algorithm. We also extend SP to deal with categorical variables so that SPlit can be applied to both regression and classification problems. The implementation of SPlit on real datasets shows substantial improvement in the worst-case testing performance for several modeling methods compared to the commonly used random splitting procedure. 
    more » « less
  3. Abstract

    In this work, we develop a method namedTwinningfor partitioning a dataset into statistically similar twin sets.Twinningis based onSPlit, a recently proposed model‐independent method for optimally splitting a dataset into training and testing sets.Twinningis orders of magnitude faster than theSPlitalgorithm, which makes it applicable to Big Data problems such as data compression.Twinningcan also be used for generating multiple splits of a given dataset to aid divide‐and‐conquer procedures andk‐fold cross validation.

     
    more » « less