skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Informed Down-Sampled Lexicase Selection: Identifying Productive Training Cases for Efficient Problem Solving
Abstract Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases, allowing for more individuals to be explored with the same number of program executions. However, sampling randomly can exclude important cases from the down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while still benefiting from reduced per-evaluation costs.  more » « less
Award ID(s):
2117377
PAR ID:
10567995
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Massachusetts Institute of Technology
Date Published:
Journal Name:
Evolutionary Computation
Volume:
32
Issue:
4
ISSN:
1530-9304
Page Range / eLocation ID:
307 to 337
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Lexicase selection has been proven highly successful for finding effective solutions to problems in genetic programming, especially for test-based problems where there are many distinct test cases that must all be passed. However, lexicase (as with most selection schemes) requires all prospective solutions to be evaluated against most test cases each generation, which can be computationally expensive. Here, we propose reducing the number of per-generation evaluations required by applying random subsampling: using a subset of test cases each generation (down-sampling) or by assigning test cases to subgroups of the population (cohort assignment). Tests are randomly reassigned each generation, and candidate solutions are only ever evaluated on test cases that they are assigned to, radically reducing the total number of evaluations needed while ensuring that each lineage eventually encounters all test cases. We tested these lexicase variants on five different program synthesis problems, across a range of down-sampling levels and cohort sizes. We demonstrate that these simple techniques to reduce the number of per-generation evaluations in lexicase can substantially improve overall performance for equivalent computational effort. 
    more » « less
  2. An ensemble data-learning approach based on proper orthogonal decomposition (POD) and Galerkin projection (EnPOD-GP) is proposed for thermal simulations of multi-core CPUs to improve training efficiency and the model accuracy for a previously developed global POD-GP method (GPOD-GP). GPOD-GP generates one set of basis functions (or POD modes) to account for thermal behavior in response to variations in dynamic power maps (PMs) in the entire chip, which is computationally intensive to cover possible variations of all power sources. EnPOD-GP however acquires multiple sets of POD modes to significantly improve training efficiency and effectiveness, and its simulation accuracy is independent of any dynamic PM. Compared to finite element simulation, both GPOD-GP and EnPOD-GP offer a computational speedup over 3 orders of magnitude. For a processor with a small number of cores, GPOD-GP provides a more efficient approach. When high accuracy is desired and/or a processor with more cores is involved, EnPOD-GP is more preferable in terms of training effort and simulation accuracy and efficiency. Additionally, the error resulting from EnPOD-GP can be precisely predicted for any random spatiotemporal power excitation. 
    more » « less
  3. Abstract It is important to understand how point measurements across spatially heterogeneous ecosystems are scaled to represent these systems. Stream biogeochemistry presents an illustrative example because water quality concerns within stream networks and recipient water bodies motivate heterogeneous watershed studies. Measurements of the stream water‐groundwater (SW‐GW) interface (i.e., the shallow stream subsurface) are well‐documented for point‐scale sampling density measurements (i.e., cm2–m2features), but poorly characterized for network‐scale sampling density measurements (i.e., km2; stream reaches and networks). Sampling the SW‐GW interface is more time and labor intensive than surface water sampling, meaning sample point selection must be made with care for network‐scale analyses. In this study, we endeavor to determine which of two common spatial sampling schemes is appropriate for characterizing SW‐GW interface biogeochemistry across a third‐order stream network, focusing on dissolved organic carbon. The first scheme, called Local Sampling, focuses on characterizing small‐scale (< 10 m2) variability produced by the local physical and biogeochemical heterogeneity, with fewer points across the stream network. The second scheme, called Longitudinal Sampling, has approximately the same number of measurements distributed over many more points across the stream network with less local variability characterization. This comparison reveals that selection of a Local Sampling versus a Longitudinal Sampling scheme influences the biogeochemical pattern interpretation at the stream network scale. Additionally, this study found that increasing observation efforts at the local scale added limited information for reach‐ to network‐scale biogeochemical patterns, suggesting that emphasis should be placed on characterizing variability across broader spatial scales with the Longitudinal Sampling approach. 
    more » « less
  4. Training example order in SGD has long been known to affect convergence rate. Recent results show that accelerated rates are possible in a variety of cases for permutation-based sample orders, in which each example from the training set is used once before any example is reused. In this paper, we develop a broad condition on the sequence of examples used by SGD that is sufficient to prove tight convergence rates in both strongly convex and non-convex settings. We show that our approach suffices to recover, and in some cases improve upon, previous state-of-the-art analyses for four known example-selection schemes: (1) shuffle once, (2) random reshuffling, (3) random reshuffling with data echoing, and (4) Markov Chain Gradient Descent. Motivated by our theory, we propose two new example-selection approaches. First, using quasi-Monte-Carlo methods, we achieve unprecedented accelerated convergence rates for learning with data augmentation. Second, we greedily choose a fixed scan-order to minimize the metric used in our condition and show that we can obtain more accurate solutions from the same number of epochs of SGD. We conclude by empirically demonstrating the utility of our approach for both convex linear-model and deep learning tasks. Our code is available at: https://github.com/EugeneLYC/qmc-ordering. 
    more » « less
  5. null (Ed.)
    Sampling-based methods promise scalability improvements when paired with stochastic gradient descent in training Graph Convolutional Networks (GCNs). While effective in alleviating the neighborhood explosion, due to bandwidth and memory bottlenecks, these methods lead to computational overheads in preprocessing and loading new samples in heterogeneous systems, which significantly deteriorate the sampling performance. By decoupling the frequency of sampling from the sampling strategy, we propose LazyGCN, a general yet effective framework that can be integrated with any sampling strategy to substantially improve the training time. The basic idea behind LazyGCN is to perform sampling periodically and effectively recycle the sampled nodes to mitigate data preparation overhead. We theoretically analyze the proposed algorithm and show that under a mild condition on the recycling size, by reducing the variance of inner layers, we are able to obtain the same convergence rate as the underlying sampling method. We also give corroborating empirical evidence on large real-world graphs, demonstrating that the proposed schema can significantly reduce the number of sampling steps and yield superior speedup without compromising the accuracy. 
    more » « less