skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Failures and successes to learn a core conceptual distinction from the statistics of language
Generic statements like “tigers are striped” and “cars have radios” com- municate information that is, in general, true. However, while the first state- ment is true *in principle*, the second is true only statistically. People are exquisitely sensitive to this principled-vs-statistical distinction. It has been argued that this ability to distinguish between something being true by virtue of it being a category member versus being true because of mere statistical regularity, is a general property of people’s conceptual machinery and cannot itself be learned. We investigate whether the distinction between principled and statistical properties can be learned from language itself. If so, it raises the possibility that language experience can bootstrap core conceptual dis- tinctions and that it is possible to learn sophisticated causal models directly from language. We find that language models are all sensitive to statistical prevalence, but struggle with representing the principled-vs-statistical dis- tinction controlling for prevalence. Until GPT-4, which succeeds.  more » « less
Award ID(s):
2020969
PAR ID:
10547759
Author(s) / Creator(s):
; ;
Editor(s):
Nölle, J; Raviv, L; Graham, E; Hartmann, S; Jadoul, Y; Josserand, M; Matzinger, T; Mudd, K; Pleyer, M; Slonimska, A; Wacewicz, S; Watson, S
Publisher / Repository:
The Evolution of Language: Proceedings of the 15th International Conference (Evolang XV)
Date Published:
Format(s):
Medium: X
Location:
Madison, WI
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Satirical news is regularly shared in modern social media because it is entertaining with smartly embedded humor. However, it can be harmful to society because it can sometimes be mistaken as factual news, due to its deceptive character. We found that in satirical news, the lexical and pragmatical attributes of the context are the key factors in amusing the readers. In this work, we propose a method that differentiates the satirical news and true news. It takes advantage of satirical writing evidence by leveraging the difference between the prediction loss of two language models, one trained on true news and the other on satirical news, when given a new news article. We compute several statistical metrics of language model prediction loss as features, which are then used to conduct downstream classification. The proposed method is computationally effective because the language models capture the language usage differences between satirical news documents and traditional news documents, and are sensitive when applied to documents outside their domains. 
    more » « less
  2. Both historically and in terms of practiced academic organization, the anticipation should be that a flourishing synergistic interface exists between statistics and operations research in general, and between spatial statistics/econometrics and spatial optimization in particular. Unfortunately, for the most part, this expectation is false. The purpose of this paper is to address this existential missing link by focusing on the beneficial contributions of spatial statistics to spatial optimization, via spatial autocorrelation (i.e., dis/similar attribute values tend to cluster together on a map), in order to encourage considerably more future collaboration and interaction between contributors to their two parent bodies of knowledge. The key basic statistical concept in this pursuit is the median in its bivariate form, with special reference to the global and to sets of regional spatial medians. One-dimensional examples illustrate situations that the narrative then extends to two-dimensional illustrations, which, in turn, connects these treatments to the spatial statistics centrography theme. Because of computational time constraints (reported results include some for timing experiments), the summarized analysis restricts attention to problems involving one global and two or three regional spatial medians. The fundamental and foundational spatial, statistical, conceptual tool employed here is spatial autocorrelation: geographically informed sampling designs—which acknowledge a non-random mixture of geographic demand weight values that manifests itself as local, homogeneous, spatial clusters of these values—can help spatial optimization techniques determine the spatial optima, at least for location-allocation problems. A valuable discovery by this study is that existing but ignored spatial autocorrelation latent in georeferenced demand point weights undermines spatial optimization algorithms. All in all, this paper should help initiate a dissipation of the existing isolation between statistics and operations research, hopefully inspiring substantially more collaborative work by their professionals in the future. 
    more » « less
  3. Unlabeled data is a key component of modern machine learning. In general, the role of unlabeled data is to impose a form of smoothness, usually from the similarity information encoded in a base kernel, such as the ε-neighbor kernel or the adjacency matrix of a graph. This work revisits the classical idea of spectrally transformed kernel regression (STKR), and provides a new class of general and scalable STKR estimators able to leverage unlabeled data. Intuitively, via spectral transformation, STKR exploits the data distribution for which unlabeled data can provide additional information. First, we show that STKR is a principled and general approach, by characterizing a universal type of “target smoothness”, and proving that any sufficiently smooth function can be learned by STKR. Second, we provide scalable STKR implementations for the inductive setting and a general transformation function, while prior work is mostly limited to the transductive setting. Third, we derive statistical guarantees for two scenarios: STKR with a known polynomial transformation, and STKR with kernel PCA when the transformation is unknown. Overall, we believe that this work helps deepen our understanding of how to work with unlabeled data, and its generality makes it easier to inspire new methods. 
    more » « less
  4. We discuss the challenges of principled statistical inference in modern data science. Conditionality principles are argued as key to achieving valid statistical inference, in particular when this is performed after selecting a model from sample data itself. 
    more » « less
  5. Much of the progress in contemporary NLP has come from learning representations, such as masked language model (MLM) contextual embeddings, that turn challenging problems into simple classification tasks. But how do we quantify and explain this effect? We adapt general tools from computational learning theory to fit the specific characteristics of text datasets and present a method to evaluate the compatibility between representations and tasks. Even though many tasks can be easily solved with simple bag-of-words (BOW) representations, BOW does poorly on hard natural language inference tasks. For one such task we find that BOW cannot distinguish between real and randomized labelings, while pre-trained MLM representations show 72x greater distinction between real and random labelings than BOW. This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task, enabling comparisons between representations without requiring empirical evaluations that may be sensitive to initializations and hyperparameters. The method provides a fresh perspective on the patterns in a dataset and the alignment of those patterns with specific labels. 
    more » « less