Abstract While research in heritage language phonology has found that transfer from the majority language can lead to divergent attainment in adult heritage language grammars, the extent to which language transfer develops during a heritage speaker's lifespan is understudied. To explore such cross-linguistic transfer, I examine the rate of glottalization between consonant-to-vowel sequences at word junctures produced by child and adult Spanish heritage speakers (i.e., HSs) in both languages. My results show that, in Spanish, child HSs produce greater rates of vowel-initial glottal phonation than their age-matched monolingually-raised Spanish counterparts, suggesting that the Spanish child HSs’ grammars are more permeable to transfer than those of the adult HSs. In English, child and adult HSs show similarly low rates of glottal phonation when compared to their age-matched monolingually-raised English speakers’ counterparts. The findings for English can be explained by either an account of transfer at the individual level or the community level.
more »
« less
Depth-Bounded Statistical PCFG Induction as a Model of Human Grammar Acquisition
Abstract This article describes a simple PCFG induction model with a fixed category domain that predicts a large majority of attested constituent boundaries, and predicts labels consistent with nearly half of attested constituent labels on a standard evaluation data set of child-directed speech. The article then explores the idea that the difference between simple grammars exhibited by child learners and fully recursive grammars exhibited by adult learners may be an effect of increasing working memory capacity, where the shallow grammars are constrained images of the recursive grammars. An implementation of these memory bounds as limits on center embedding in a depth-specific transform of a recursive grammar yields a significant improvement over an equivalent but unbounded baseline, suggesting that this arrangement may indeed confer a learning advantage.
more »
« less
- Award ID(s):
- 1816891
- PAR ID:
- 10285336
- Date Published:
- Journal Name:
- Computational Linguistics
- Volume:
- 47
- Issue:
- 1
- ISSN:
- 0891-2017
- Page Range / eLocation ID:
- 181 to 216
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Aldrich, Jonathan; Salvaneschi, Guido (Ed.)While programmers know that memory representation of data structures can have significant effects on performance, compiler support to optimize the layout of those structures is an under-explored field. Prior work has optimized the layout of individual, non-recursive structures without considering how collections of those objects in linked or recursive data structures are laid out. This work introduces Marmoset, a compiler that optimizes the layouts of algebraic datatypes, with a special focus on producing highly optimized, packed data layouts where recursive structures can be traversed with minimal pointer chasing. Marmoset performs an analysis of how a recursive ADT is used across functions to choose a global layout that promotes simple, strided access for that ADT in memory. It does so by building and solving a constraint system to minimize an abstract cost model, yielding a predicted efficient layout for the ADT. Marmoset then builds on top of Gibbon, a prior compiler for packed, mostly-serial representations, to synthesize optimized ADTs. We show experimentally that Marmoset is able to choose optimal layouts across a series of microbenchmarks and case studies, outperforming both Gibbon’s baseline approach, as well as MLton, a Standard ML compiler that uses traditional pointer-heavy representations.more » « less
-
There has been growing interest in developing ubiquitous technologies to analyze adult-child speech in naturalistic settings such as free play in order to support children's social and academic development, language acquisition, and parent-child interactions. However, these technologies often rely on off-the-shelf speech processing tools that have not been evaluated on child speech or child-directed adult speech, whose unique characteristics might result in significant performance gaps when using models trained on adult speech. This work introduces the Playlogue dataset containing over 33 hours of long-form, naturalistic, play-based adult-child conversations from three different corpora of preschool-aged children. Playlogue enables researchers to train and evaluate speaker diarization and automatic speech recognition models on child-centered speech. We demonstrate the lack of generalizability of existing state-of-the-art models when evaluated on Playlogue, and show how fine-tuning models on adult-child speech mitigates the performance gap to some extent but still leaves considerable room for improvement. We further annotate over 5 hours of the Playlogue dataset with 8668 validated adult and child speech act labels, which can be used to train and evaluate models to provide clinically relevant feedback on parent-child interactions. We investigate the performance of state-of-the-art language models at automatically predicting these speech act labels, achieving significant accuracy with simple chain-of-thought prompting or minimal fine-tuning. We use inhome pilot data to validate the generalizability of models trained on Playlogue, demonstrating its utility in improving speech and language technologies for child-centered conversations. The Playlogue dataset is available for download at https://huggingface.co/datasets/playlogue/playlogue-v1.more » « less
-
We present a novel framework to automatically derive highly efficient parametric multi-way recursive divide-&-conquer algorithms for a class of dynamic programming (DP) problems. Standard two-way or any fixed R-way recursive divide-&-conquer algorithms may not fully exploit many-core processors. To run efficiently on a given machine, the value of R may need to be different for every level of recursion based on the number of processors available and the sizes of memory/caches at different levels of the memory hierarchy. The set of R values that work well on a given machine may not work efficiently on another machine with a different set of machine parameters. To improve portability and efficiency, Multi-way Autogen generates parametric multi-way recursive divide-&-conquer algorithms where the value of R can be changed on the fly for every level of recursion. We present experimental results demonstrating the performance and scalability of the parallel programs produced by our framework.more » « less
-
Abstract Distinguishing between continuous and first-order phase transitions is a major challenge in random discrete systems. We study the topic for events with recursive structure on Galton–Watson trees. For example, let $$\mathcal{T}_1$$ be the event that a Galton–Watson tree is infinite and let $$\mathcal{T}_2$$ be the event that it contains an infinite binary tree starting from its root. These events satisfy similar recursive properties: $$\mathcal{T}_1$$ holds if and only if $$\mathcal{T}_1$$ holds for at least one of the trees initiated by children of the root, and $$\mathcal{T}_2$$ holds if and only if $$\mathcal{T}_2$$ holds for at least two of these trees. The probability of $$\mathcal{T}_1$$ has a continuous phase transition, increasing from 0 when the mean of the child distribution increases above 1. On the other hand, the probability of $$\mathcal{T}_2$$ has a first-order phase transition, jumping discontinuously to a non-zero value at criticality. Given the recursive property satisfied by the event, we describe the critical child distributions where a continuous phase transition takes place. In many cases, we also characterise the event undergoing the phase transition.more » « less