Syntax and Semantics Meet in the “Middle”: Probing the Syntax-Semantics Interface of LMs Through Agentivity

Tjuatja, Lindia; Liu, Emmy; Levin, Lori; Neubig, Graham

doi:10.18653/v1/2023.starsem-1.14

We present a grammar for a robust class of data types that includes algebraic data types (ADTs), (truly) nested types, generalized algebraic data types (GADTs), and their higher-kinded analogues. All of the data types our grammar defines, as well as their associated type constructors, are shown to have fully functorial initial algebra semantics in locally presentable categories. Since local presentability is a modest hypothesis, needed for such semantics for even the simplest ADTs, our semantic framework is actually quite conservative. Our results thus provide evidence that if a category supports fully functorial initial algebra semantics for standard ADTs, then it does so for advanced higher-kinded data types as well. To give our semantics we introduce a new type former called Lan that captures on the syntactic level the categorical notion of a left Kan extension. We show how left Kan extensions capture propagation of a data type’s syntactic generators across the entire universe of types, via a certain completion procedure, so that the type constructor associated with a data type becomes a bona fide functor with a canonical action on morphisms. A by-product of our semantics is a precise measure of the semantic complexity of data types, given by the least cardinal λ for which the functor underlying a data type is λ-accessible. The proof of our main result allows this cardinal to be read off from a data type definition without much effort. It also gives a sufficient condition for a data type to have semantic complexity ω, thus characterizing those data types whose data elements are effectively enumerable.

More Like this