skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on August 28, 2026

Title: Incorporating local step-size adaptivity into the no-U-turn sampler using Gibbs self-tuning
Adapting the step size locally in the no-U-turn sampler (NUTS) is challenging because the step-size and path-length tuning parameters are interdependent. The determination of an optimal path length requires a predefined step size, while the ideal step size must account for errors along the selected path. Ensuring reversibility further complicates this tuning problem. In this paper, we present a method for locally adapting the step size in NUTS that is an instance of the Gibbs self-tuning (GIST) framework. Our approach guarantees reversibility with an acceptance probability that depends exclusively on the conditional distribution of the step size. We validate our step-size-adaptive NUTS method on Neal’s funnel density and a high-dimensional normal distribution, demonstrating its effectiveness in challenging scenarios.  more » « less
Award ID(s):
2111224
PAR ID:
10634292
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
AIP Publishing
Date Published:
Journal Name:
The Journal of Chemical Physics
Volume:
163
Issue:
8
ISSN:
0021-9606
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Monte Carlo (MC) methods are widely used in many research areas such as physical simulation, statistical analysis, and machine learning. Application of MC methods requires drawing fast mixing samples from a given probability distribution. Among existing sampling methods, the Hamiltonian Monte Carlo (HMC) utilizes gradient information during Hamiltonian simulation and can produce fast mixing samples at the highest efficiency. However, without carefully chosen simulation parameters for a specific problem, HMC generally suffers from simulation locality and computation waste. As a result, the No-U-Turn Sampler (NUTS) has been proposed to automatically tune these parameters during simulation and is the current state-of-the-art sampling algorithm. However, application of NUTS requires frequent gradient calculation of a given distribution and high-volume vector processing, especially for large-scale problems, leading to drawing an expensively large number of samples and a desire of hardware acceleration. While some hardware acceleration works have been proposed for traditional Markov Chain Monte Carlo (MCMC) and HMC methods, there is no existing work targeting hardware acceleration of the NUTS algorithm. In this paper, we present the first NUTS accelerator on FPGA while addressing the high complexity of this state-of-the-art algorithm. Our hardware and algorithm co-optimizations include an incremental resampling technique which leads to a more memory efficient architecture and pipeline optimization for multi-chain sampling to maximize the throughput. We also explore three levels of parallelism in the NUTS accelerator to further boost performance. Compared with optimized C++ NUTS package: RSTAN, our NUTS accelerator can reach a maximum speedup of 50.6X and an energy improvement of 189.7X. 
    more » « less
  2. A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift. 
    more » « less
  3. Cavities in large-scale photonic integrated circuits (PICs) often suffer from a wider distribution of resonance frequencies due to fabrication errors. It is crucial to adjust the resonances of cavities using post-processing methods to minimize the frequency distribution. We have developed a concept of passive tuning by manipulating the mode index of a portion of a microring cavity, which we named mode index engineering (MIE). Through analytical studies and numerical experiments, we have found that depositing a thin film of dielectric material on top of the cavity or etching the material enables us to fine-tune the resonances and minimize the frequency distribution. This versatile method allows for the selective tuning of each cavity’s resonance in a large set of cavities in a post-fabrication step, providing robust passive tuning in large-scale PICs. We show that the proposed method achieves a tuning resolution below 1/Q and a range of up to 103/Q for visible to near-infrared wavelengths. Furthermore, this method can be applied and explored in various integrated photonic cavities and material configurations. 
    more » « less
  4. Tree search algorithms, such as branch-and-bound, are the most widely used tools for solving combinatorial and non-convex problems. For example, they are the foremost method for solving (mixed) integer programs and constraint satisfaction problems. Tree search algorithms come with a variety of tunable parameters that are notoriously challenging to tune by hand. A growing body of research has demonstrated the power of using a data-driven approach to automatically optimize the parameters of tree search algorithms. These techniques use atraining setof integer programs sampled from an application-specific instance distribution to find a parameter setting that has strong average performance over the training set. However, with too few samples, a parameter setting may have strong average performance on the training set but poor expected performance on future integer programs from the same application. Our main contribution is to provide the firstsample complexity guaranteesfor tree search parameter tuning. These guarantees bound the number of samples sufficient to ensure that the average performance of tree search over the samples nearly matches its future expected performance on the unknown instance distribution. In particular, the parameters we analyze weightscoring rulesused for variable selection. Proving these guarantees is challenging because tree size is a volatile function of these parameters: we prove that, for any discretization (uniform or not) of the parameter space, there exists a distribution over integer programs such that every parameter setting in the discretization results in a tree with exponential expected size, yet there exist parameter settings between the discretized points that result in trees of constant size. In addition, we provide data-dependent guarantees that depend on the volatility of these tree-size functions: our guarantees improve if the tree-size functions can be well approximated by simpler functions. Finally, via experiments, we illustrate that learning an optimal weighting of scoring rules reduces tree size. 
    more » « less
  5. Instruction tuning is critical for adapting large language models (LLMs) to downstream tasks, and recent studies have demonstrated that small amounts of human-curated data can outperform larger datasets, challenging traditional data scaling laws. While LLM-based data quality rating systems offer a cost-effective alternative to human annotation, they often suffer from inaccuracies and biases, even in powerful models like GPT-4. In this work, we introduce DS2, a Diversity-aware Score curation method for Data Selection. By systematically modeling error patterns through a score transition matrix, DS2 corrects LLM-based scores and promotes diversity in the selected data samples. Our approach shows that a curated subset (just 3.3% of the original dataset) outperforms full-scale datasets (300k samples) across various machine-alignment benchmarks, and matches or surpasses human-aligned datasets such as LIMA with the same sample size (1k samples). These findings challenge conventional data scaling assumptions, highlighting that redundant, low-quality samples can degrade performance and reaffirming that "more can be less." 
    more » « less