skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably against two strong baselines.  more » « less
Award ID(s):
1901030
PAR ID:
10560275
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
COLM
Date Published:
Format(s):
Medium: X
Location:
Conference on Language Modeling
Sponsoring Org:
National Science Foundation
More Like this
  1. Embeddings for instructions have been shown to be essential for software reverse engineering and automated program analysis. However, due to the complexity of dependencies and inherent variability of instructions, instruction embeddings using models that are successful for natural language processing may not be effective. In this paper, we perform geometric analysis of instruction embeddings at the token level and instruction family level, showing much greater variability and leading to degraded performance on intrinsic analyses. Then we propose to use metric learning to improve the relationships among instructions using triplet loss. Our results on a large dataset of instruction groups shows significant improvements. We also provide a theoretical analysis of the instruction embeddings by looking at the BERT components and characteristics of inner-product matrices for attention in the transformer blocks. The code will be available publicly after the paper is accepted for publication. 
    more » « less
  2. In this paper, we focus on numerical solutions for random genetic drift problem, which is governed by a degenerated convection-dominated parabolic equation. Due to the fixation phenomenon of genes, Dirac delta singularities will develop at boundary points as time evolves. Based on an energetic variational approach (EnVarA), a balance between the maximal dissipation principle (MDP) and least action principle (LAP), we obtain the trajectory equation. In turn, a numerical scheme is proposed using a convex splitting technique, with the unique solvability (on a convex set) and the energy decay property (in time) justified at a theoretical level. Numerical examples are presented for cases of pure drift and drift with semi-selection. The remarkable advantage of this method is its ability to catch the Dirac delta singularity close to machine precision over any equidistant grid. 
    more » « less
  3. Unmanned Aerial Vehicle (UAV) Networks have recently attracted great attention as being able to provide convenient and fast wireless connections. One central question is how to allocate a limited number of UAVs to provide wireless services across a large number of regions, where each region has dynamic arriving flows and flows depart from the system once they receive the desired amount of service (referred to as the flow-level dynamic model). In this paper, we propose a MaxWeight-type scheduling algorithm taking into account sharp flow-level dynamics that efficiently redirect UAVs across a large number of regions. However, in our considered model, each flow experiences an independent fading channel and will immediately leave the system once it completes its service, which makes its evolution quite different from the traditional queueing model for wireless networks. This poses significant challenges in our performance analysis. Nevertheless, we incorporate sharp flow-dynamic into the Lyapunov-drift analysis framework, and successfully establish both throughput and heavy-traffic optimality of the proposed algorithm. Extensive simulations are performed to validate the effectiveness of our proposed algorithm. 
    more » « less
  4. Abstract DUNE is an international experiment dedicated to addressing some of the questions at the forefront of particle physics and astrophysics, including themystifying preponderance of matter over antimatter in the early universe. The dual-site experiment will employ an intense neutrino beam focused on a near and a far detector as it aims to determine the neutrino mass hierarchy and to make high-precision measurements of the PMNS matrix parameters, including the CP-violating phase. It will also stand ready to observe supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model.The DUNE far detector implements liquid argon time-projection chamber (LArTPC) technology, and combines the many tens-of-kiloton fiducial mass necessary for rare event searches with the sub-centimeter spatial resolutionrequired to image those events with high precision. The addition of a photon detection system enhances physics capabilities for all DUNE physics drivers and opens prospects for further physics explorations. Given its size, the far detector will be implemented as a set of modules, with LArTPC designs that differ from one another as newer technologies arise.In the vertical drift LArTPC design, a horizontal cathode bisects the detector, creating two stacked drift volumes in which ionization charges drift towards anodes at either the top or bottom. The anodes are composed of perforated PCB layers with conductive strips, enabling reconstruction in 3D.Light-trap-style photon detection modules are placed both on the cryostat's side walls and on the central cathode where they are optically powered.This Technical Design Report describes in detail the technical implementations of each subsystem of this LArTPC that, together with the other far detector modules and the near detector, will enable DUNE to achieve its physics goals. 
    more » « less
  5. Olanoff, D.; Johnson, K.; Spitzer, S. M. (Ed.)
    Mathematics education needs measures that can be used to research and/or evaluate the impact of professional development for constructs that are broadly relevant to the field. To address this need we developed the Priorities for Mathematics Instruction (PMI) survey consisting of two scales focused on the constructs of Explicit Attention to Concepts (EAC) and Student Opportunities to Struggle (SOS) – which have been linked to increased student understanding and achievement. We identified the most critical assumptions that underlie the proposed interpretation and use of the scale scores and then examined the related validity evidence. We found the evidence for each assumption supports the proposed interpretation and use of the scale scores. 
    more » « less