skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 17, 2026

Title: Alsatian: Optimizing Model Search for Deep Transfer Learning
Transfer learning is an effective technique for tuning a deep learning model when training data or computational resources are limited. Instead of training a new model from scratch, the parameters of an existing base model are adjusted for the new task. The accuracy of such a fine-tuned model depends on the suitability of the base model chosen. Model search automates the selection of such a base model by evaluating the suitability of candidate models for a specific task. This entails inference with each candidate model on task-specific data. With thousands of models available through model stores, the computational cost of model search is a major bottleneck for efficient transfer learning. In this work, we presentAlsatian, a novel model search system. Based on the observation that many candidate models overlap to a significant extent and following a careful bottleneck analysis, we propose optimization techniques that are applicable to many model search frameworks. These optimizations include: (i) splitting models into individual blocks that can be shared across models, (ii) caching of intermediate inference results and model blocks, and (iii) selecting a beneficial search order for models to maximize sharing of cached results. In our evaluation on state-of-the-art deep learning models from computer vision and natural language processing, we show thatAlsatianoutperforms baselines by up to 14x.  more » « less
Award ID(s):
2420577 2420691
PAR ID:
10618186
Author(s) / Creator(s):
; ;
Publisher / Repository:
SIGMOD 2025
Date Published:
Journal Name:
Proceedings of the ACM on Management of Data
Volume:
3
Issue:
3
ISSN:
2836-6573
Page Range / eLocation ID:
1 to 27
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The underlying factors that lead to specific strains within a species to emerge as human pathogens remain mostly enigmatic. The diarrheal disease cholera is caused by strains from a phylogenetically confined group within theVibrio choleraespecies, the pandemic cholera group (PCG), making it an ideal model system to tackle this puzzling phenomenon. Comprehensive analyses of over 1,840V. choleraegenomes, including environmental isolates from this study, reveal that the species consists of eleven groups, with the PCG belonging to the largest and located within a lineage shared with environmental strains. This hierarchical classification provided us with a framework to unravel the ecoevolutionary dynamics of the genetic determinants associated with the emergence of toxigenicV. cholerae. Our analyses indicate that this phenomenon is largely dependent on the acquisition of unique modular gene clusters and allelic variations that confer a competitive advantage during intestinal colonization. We determined that certain PCG-associated alleles are essential for successful colonization whereas others provide a nonlinear competitive advantage, acting as a critical bottleneck that clarifies the isolated emergence of PCG. For instance, toxigenic strains encoding non-PCG alleles of a)tcpFor b) a sextuple allelic exchange mutant for genestcpA,toxT,VC0176,VC1791,rfbT,andompU, lose their ability to colonize the intestine. Interestingly, these alleles do not play a role in the colonization of newly established model environmental reservoirs. Our study uncovers the evolutionary roots of toxigenicV. choleraeoffering a tractable approach for investigating the emergence of pathogenic clones within an environmental population. 
    more » « less
  2. Deep Learning (DL) is a class of machine learning algorithms that are used in a wide variety of applications. Like any software system, DL programs can have bugs. To support bug localization in DL programs, several tools have been proposed in the past. As most of the bugs that occur due to improper model structure known as structural bugs lead to inadequate performance during training, it is challenging for developers to identify the root cause and address these bugs. To support bug detection and localization in DL programs, in this article, we propose Theia, which detects and localizes structural bugs in DL programs. Unlike the previous works, Theia considers the training dataset characteristics to automatically detect bugs in DL programs developed using two DL libraries,KerasandPyTorch. Since training the DL models is a time-consuming process, Theia detects these bugs at the beginning of the training process and alerts the developer with informative messages containing the bug’s location and actionable fixes which will help them to improve the structure of the model. We evaluated Theia on a benchmark of 40 real-world buggy DL programs obtained fromStack Overflow. Our results show that Theia successfully localizes 57/75 structural bugs in 40 buggy programs, whereas NeuraLint, a state-of-the-art approach capable of localizing structural bugs before training localizes 17/75 bugs. 
    more » « less
  3. Objective:Cognitive training may benefit older adults with mild cognitive impairment (MCI), but the prognostic factors are not well-established. Methods:This study analyzed data from a 78-week trial with 107 participants with MCI, comparing computerized cognitive training (CCT) and computerized crossword puzzle training (CPT). Outcomes were changes in cognitive and functional measures from baseline. Linear mixed-effect models were used to identify prognostic factors for each intervention. Results:Baseline neuropsychological composite z-score was positively associated with cognitive and functional improvements for both interventions in univariable models, retaining significance in the final multivariable model for functional outcome in CPT (P< 0.001). Apolipoprotein E e4 carriers had worse cognitive (P= 0.023) and functional (P= 0.001) outcomes than noncarriers for CPT but not CCT. African Americans showed greater functional improvements than non-African Americans in both CPT (P= 0.001) and CCT (P= 0.010). Better baseline odor identification was correlated with cognitive improvements in CPT (P= 0.006) and functional improvements in CCT (P< 0.001). Conclusion:Baseline cognitive test performance, African American background, and odor identification ability are potential prognostic factors for improved outcomes with cognitive interventions in older adults with MCI. Apolipoprotein E e4 is associated with poor outcomes. Replication of these findings may improve the selection of cognitive interventions for individuals with MCI. 
    more » « less
  4. Probabilistic inference is fundamentally hard, yet many tasks require optimization on top of inference, which is even harder. We present a newoptimization-via-compilationstrategy to scalably solve a certain class of such problems. In particular, we introduce a new intermediate representation (IR), binary decision diagrams weighted by a novel notion ofbranch-and-bound semiring, that enables a scalable branch-and-bound based optimization procedure. This IR automaticallyfactorizesproblems through program structure andprunessuboptimal values via a straightforward branch-and-bound style algorithm to find optima. Additionally, the IR is naturally amenable tostaged compilation, allowing the programmer to query for optima mid-compilation to inform further executions of the program. We showcase the effectiveness and flexibility of the IR by implementing two performant languages that both compile to it: dappl and pineappl. dappl is a functional language that solves maximum expected utility problems with first-class support for rewards, decision making, and conditioning. pineappl is an imperative language that performs exact probabilistic inference with support for nested marginal maximum a posteriori (MMAP) optimization via staging. 
    more » « less
  5. Deep learning has become the most popular direction in machine learning and artificial intelligence. However, the preparation of training data, as well as model training, are often time-consuming and become the bottleneck of the end-to-end machine learning lifecycle. Reusing models for inferring a dataset can avoid the costs of retraining. However, when there are multiple candidate models, it is challenging to discover the right model for reuse. Although there exist a number of model-sharing platforms such as ModelDB, TensorFlow Hub, PyTorch Hub, and DLHub, most of these systems require model uploaders to manually specify the details of each model and model downloaders to screen keyword search results for selecting a model. We are lacking a highly productive model search tool that selects models for deployment without the need for any manual inspection and/or labeled data from the target domain. This paper proposes multiple model search strategies including various similarity-based approaches and non-similarity-based approaches. We design, implement and evaluate these approaches on multiple model inference scenarios, including activity recognition, image recognition, text classification, natural language processing, and entity matching. The experimental evaluation showed that our proposed asymmetric similarity-based measurement, adaptivity, outperformed symmetric similarity-based measurements and non-similarity-based measurements in most of the workloads. 
    more » « less