skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Logic for Expressing Log-Precision Transformers
One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed that finite-precision transformers can be equivalently expressed in a generalization of first-order logic. However, finite-precision transformers are a weak transformer variant because, as we show, a single head can only attend to a constant number of tokens and, in particular, cannot represent uniform attention. Since attending broadly is a core capability for transformers, we ask whether a minimally more expressive model that can attend universally can also be characterized in logic. To this end, we analyze transformers whose forward pass is computed in logn precision on contexts of length n. We prove that any log-precision transformer can be equivalently expressed as a first-order logic sentence that, in addition to standard universal and existential quantifiers, may also contain majority-vote quantifiers. This is the tightest known upper bound and first logical characterization of log-precision transformers.  more » « less
Award ID(s):
1922658
PAR ID:
10535866
Author(s) / Creator(s):
;
Publisher / Repository:
NeurIPS 2023
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Antonis Achilleos; Dario Della Monica (Ed.)
    In this paper, we explore the descriptive complexity theory of finite groups by examining the power of the second Ehrenfeucht-Fraisse bijective pebble game in Hella's (Ann. Pure Appl. Log., 1989) hierarchy. This is a Spoiler-Duplicator game in which Spoiler can place up to two pebbles each round. While it trivially solves graph isomorphism, it may be nontrivial for finite groups, and other ternary relational structures. We first provide a novel generalization of Weisfeiler-Leman (WL) coloring, which we call 2-ary WL. We then show that the 2-ary WL is equivalent to the second Ehrenfeucht-Fraisse bijective pebble game in Hella's hierarchy. Our main result is that, in the pebble game characterization, only O(1) pebbles and O(1) rounds are sufficient to identify all groups without Abelian normal subgroups (a class of groups for which isomorphism testing is known to be in P; Babai, Codenotti, & Qiao, ICALP 2012). In particular, we show that within the first few rounds, Spoiler can force Duplicator to select an isomorphism between two such groups at each subsequent round. By Hella's results (ibid.), this is equivalent to saying that these groups are identified by formulas in first-order logic with generalized 2-ary quantifiers, using only O(1) variables and O(1) quantifier depth. 
    more » « less
  2. The cofinality quantifiers were introduced by Shelah as an example of a compact logic stronger than first-order logic. We show that the classes of models axiomatized by these quantifiers can be turned into an Abstract Elementary Class by restricting to positive and deliberate uses. Rather than using an ad hoc proof, we give a general framework of abstract Skolemizations. This method gives a uniform proof that a wide range of classes are Abstract Elementary Classes. 
    more » « less
  3. Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the same length, they struggle with length generalization, i.e., handling inputs of unseen lengths. In this work, we demonstrate that looped Transformers with an adaptive number of steps significantly improve length generalization. We focus on tasks with a known iterative solution, involving multiple iterations of a RASP-L operation—a length-generalizable operation that can be expressed by a finite-sized Transformer. We train looped Transformers using our proposed learning algorithm and observe that they learn highly length-generalizable solutions for various tasks. 
    more » « less
  4. This paper addresses the problem of creating abstract transformers automatically. The method we present automates the construction of static analyzers in a fashion similar to the wayyaccautomates the construction of parsers. Our method treats the problem as a program-synthesis problem. The user provides specifications of (i) the concrete semantics of a given operationop, (ii) the abstract domainAto be used by the analyzer, and (iii) the semantics of a domain-specific languageLin which the abstract transformer is to be expressed. As output, our method creates an abstract transformer foropin abstract domainA, expressed inL(an “L-transformer foropoverA”). Moreover, the abstract transformer obtained is a most-preciseL-transformer foropoverA; that is, there is no otherL-transformer foropoverAthat is strictly more precise. We implemented our method in a tool called AMURTH. We used AMURTH to create sets of replacement abstract transformers for those used in two existing analyzers, and obtained essentially identical performance. However, when we compared the existing transformers with the transformers obtained using AMURTH, we discovered that four of the existing transformers were unsound, which demonstrates the risk of using manually created transformers. 
    more » « less
  5. We show that in an ultraproduct of finite fields, the mod-n nonstandard size of definable sets varies definably in families. Moreover, if K is any pseudofinite field, then one can assign "nonstandard sizes mod n" to definable sets in K. As n varies, these nonstandard sizes assemble into a definable strong Euler characteristic on K, taking values in the profinite completion hat(Z) of the integers. The strong Euler characteristic is not canonical, but depends on the choice of a nonstandard Frobenius. When Abs(K) is finite, the Euler characteristic has some funny properties for two choices of the nonstandard Frobenius. Additionally, we show that the theory of finite fields remains decidable when first-order logic is expanded with parity quantifiers. However, the proof depends on a computational algebraic geometry statement whose proof is deferred to a later paper. 
    more » « less