<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>The clarity of word repetitions in American English infant-directed speech</title></titleStmt>
			<publicationStmt>
				<publisher>Cambridge University Press</publisher>
				<date>09/15/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10668922</idno>
					<idno type="doi">10.1017/S0305000925100263</idno>
					<title level='j'>Journal of Child Language</title>
<idno>0305-0009</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Daniel Swingley</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>Words in infant-directed speech (IDS) are often phonetically reduced. This likely renders words harder for infants to learn and recognize. This difficulty might be mitigated by the repetitive nature of IDS, in particular if reduced instances are often preceded by clear instances (i.e., the first-mention effect). To characterize phonetic clarity in American English word repetitions, words were extracted from the IDS of eight mothers and presented to adults (n=36) who judged their clarity. First mentions of repeated words were found to be clearer than second mentions, though this effect was small. Clarity was rated as greater for less common words and for utterance-final words. Clarity was also greater for words parents thought their child knew. The results help guide intuitions about the phonetic problem infants face when learning their first words.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>categorization of words, but descriptive work has left more unknown than known about how phonetic reduction plays out in infant-directed speech.</p><p>A premise of the present work is that understanding how infants start discovering words requires a better characterization of patterns of reduction and hyperarticulation in infant-directed speech. If infants often fail to encode hypoarticulated instances of words, or do not consider them equivalent to their more exaggerated, canonical forms, then simulating early development depends on knowing how common reduced forms are, and how they are distributed. For example, studies of discourse among adults and among parent-child dyads show that when a word is used twice in close succession, the first instance is often hyperarticulated relative to the second (e.g., <ref type="bibr">Bortfeld &amp; Morgan, 2010;</ref><ref type="bibr">Fisher &amp; Tokura, 1995;</ref><ref type="bibr">Fowler &amp; Housum, 1987;</ref><ref type="bibr">Tippenhauer, Fourakis, Watson, &amp; Lew-Williams, 2020)</ref>. If one of the ways infants discover words is by noting repeated stretches of speech in nearby utterances <ref type="bibr">(Nencheva, Schwab, Lew-Williams, &amp; Fausey, 2024)</ref>, we can learn more about how this process might work by assessing the magnitude and consistency of first mention effects. If first mention effects are large and consistent, it would suggest that infants could use a strategy of listening for hyperarticulated portions of sentences, and monitoring subsequent sentences for the presence of similar-sounding stretches of speech-but this would only work if infants were capable of relating hyperarticulated forms to their reduced variants.</p><p>Here, in a sample of American English infant-directed speech, we assessed the magnitude and distribution of first-mention effects on spoken word clarity. Based on prior research, we expected that the clearest, most emphatic, or most hyperarticulated instances of words would tend to be those occurring for the first time in a discourse, and that the subsequent mention of these words would be substantially reduced. In addition, we evaluated several predictors we thought might relate to the first-mention effect, or to spoken-word clarity more generally: word frequency, whether the parent thought the child knew the word, and whether the word was utterance-final.</p><p>To summarize, then, our goal was to estimate the following effects on speech clarity, as measured using rating and transcription tasks: (a) the size of first-mention effects; (b) whether first-mention effects would be modulated by word frequency, child knowledge of the relevant word, or utterance-final word position; and (c) whether word clarity, independent of first or second mention, would be affected by frequency, child word knowledge, or sentence position.</p><p>In contrast to some prior research, our goal was to characterize the problem facing the infant, and not to describe the difference between infant-directed speech and adultdirected speech. As a result, we were able to examine this question using a speech corpus of natural, unscripted interaction, rather than speech hedged in by constraining situational devices meant to channel conversations with adult and child addressees onto similar paths.</p><p>We selected word repetitions from the Brent corpus of infant-directed interaction <ref type="bibr">(Brent &amp; Siskind, 2001)</ref>, and played them to adult native English speakers for transcription and judgments of clarity. These responses permitted us to evaluate the magnitude of firstmention effects and the conditions under which they appear.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiment</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Methods</head><p>Stimuli. This work was originally done (and preregistered) as two separate experiments with different stimulus sets, selected according to slightly different criteria.</p><p>The original purpose of the second experiment was to clarify certain effects present in the first dataset but that seemed likely to have been carried by a small number of stimuli. Indeed, the primary result of the second study was that these effects did not replicate with a stimulus set that was better balanced on the relevant variables. In the interest of contributing a more concise report that avoids such detours, these two experiments have been merged here.<ref type="foot">foot_0</ref> </p><p>Our first step was to extract repeated word tokens from the recordings of eight mother-child dyads in the Brent corpus (the same portion of the corpus that was analyzed in <ref type="bibr">Swingley &amp; Humphrey, 2017)</ref>. Repetitions were defined as word types of identical orthographic transcription repeated in consecutive utterances. The first mention was always the first instance in its source recording session. Types occurring for the first time within the initial 125 utterances of the session were excluded, on the grounds that they might have also been spoken not long before the recording began. Items repeated within an utterance were excluded, because in many cases they were not in real sentences (e.g., "teeth teeth teeth."). "Utterance" was defined as in the original Brent corpus transcription: a stretch of speech from one talker with no pauses any longer than 300 ms.</p><p>These sets of repetition pairs were then restricted to just those words that appear on the Words and Gestures version of the MacArthur-Bates Communicative Development Inventory <ref type="bibr">(Fenson et al., 1994)</ref>, which had been completed at the researchers' 12-month visit by 7 of the 8 mothers, and at the 15-month visit by all 8. This allowed us to evaluate, in a quite preliminary way, whether intelligibility varied with the parent's belief that her child understood the word she was saying.</p><p>In creating the first stimulus set, we chose words that were used by at least four of the eight mothers, to restrict the set to broadly common types. We dropped this restriction in creating the second stimulus set, to expand the number of possible items. The second stimulus set also excluded the dyad for which no 12-month CDI was available (mom 's2'), and selected items in such a way that among the pairs, the first-mention and secondmention that were utterance-final or not utterance-final were balanced as well as possible, and that the counts of these in turn were as similar as possible among words known and unknown on the 12-month CDI. The two stimulus sets were otherwise selected the same way.</p><p>We excluded items that are often function words that rarely receive emphatic focus (e.g., am, gonna, for, is), words functioning as proper names (e.g., mama, Piglet), words that are onomatopoeic or occur in stereotyped interactions (e.g., pattycake, hi, yum), and words that exist outside the usual phonological constraints of English (e.g., hmm, mmhm). Some word types were included as pairs more than once, but in such cases the pairs were always drawn from different dyads.</p><p>Having created a pool of potential items this way, two members of our lab listened to each of the available tokens to evaluate its recording quality for undesirable features like hiss, overlapping speech, or unusually quiet vocalization, but not considering the degree of emphasis or articulation of the words. To be considered for inclusion in the study, both tokens of the pair needed to achieve a recording-quality rating score of at least a 3 out of 5 by both of the raters.</p><p>Once the potential item set was narrowed by these constraints, a quasirandom process was implemented to select the final set of tokens, sampling as evenly as possible from the dyads.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Stimulus distribution over <ref type="bibr">Brent dyads. Dyad labels (c1,</ref><ref type="bibr">d1...)</ref> are as given in the Brent dataset. dyad c1 d1 f1 i1 s1 s2 v1 v2 count Expt 1 25 24 24 17 28 26 10 18 count Expt 2 16 17 11 17 23 0 21 8 The final complete item stock included 172 pairs in the first set and 113 pairs in the second set. Their distribution by speaker is given in Table 1. Infant age at the time of the recording, for both the first and the second stimulus sets, ranged from 8 months, 28 days, up to 15 months, 8 days, with the mean age for the first stimulus set 12 months 9 days, and the second set 12 months, 11 days. The words were typical of the vocabulary of the CDI: airplane, bug, cry, open, store, where, etc. The full list of words is given in the publicly available dataset.</p><p>The words evaluated came from different sentence positions. Considering whether an item was from a one-word sentence ("isolated"), the first or last word in a multi-word utterance, or utterance-medially, the words may be counted as shown in Table <ref type="table">2</ref>. Across the first and second utterances, the mean number of words occurring between the first mention and the second was 3.5 (sd, 2.3; 25 th percentile, 2; 75 th percentile, 5). Once identified, the audio tokens were extracted from their sentences and scaled to have equivalent maximum amplitude using the norm function of the utility sox.</p><p>Procedures.</p><p>All judges completed three tasks: rating, transcription, and paired comparison. In the rating and transcription tasks, judges heard one word at a time and indicated on a scale from 1 to 5 how clearly realized they thought the word was, and then typed in as a free response what word they thought they had heard. Listeners were permitted to re-play the word as often as they desired (though they usually did not; most listeners only re-played a few items). In the paired comparison task, both members of a first-mention / second-mention pair were played, and judges indicated one or the other as the more clearly articulated and easiest to understand. Re-plays were not permitted for this task. The rating and transcription tasks were completed first, and then the paired comparison task. The instructions for the rating tasks are given in the Supporting Materials.</p><p>All items were used in all three tasks for every participant.</p><p>To set up the presentation orders for the paired comparison task, items were divided into an a set and a b set, where each set included one pair of each word (i.e., each word type: car, ball, ...). In trial-order 1, for the a set, the first-mention token was presented first on the paired trials, and for the b set, the second-mention token was presented first on the paired trials. The reverse was true in trial-order 2. In this way, every token, whether a firstmention or a second-mention in the corpus, was presented first or second in a paired trial equally often. All trials were randomly ordered at presentation time. Each listener heard half of the items from the a set and half from the b set.</p><p>Out of a concern that vagaries of the participant recruitment process might introduce confounding variation across the (original) Experiments 1 and 2 datasets, a quasirandom selection of pairs from the first study was included in the stimulus set of the second study.</p><p>This matched set of stimuli allowed us to evaluate the possibility that participants in the two experiments were different enough in their performance to limit the comparability of the experiments. The items were selected by averaging the participant ratings of the first and second mentions of each pair; placing the pairs into quintiles, and randomly sampling five pairs from each quintile. These 25 pairs (50 token clips) were analyzed separately to compare the participant pools for the two stimulus sets.</p><p>Participants were recruited using the Prolific platform, an online testing site. 36 judges were included in the final sample, evenly divided between the Experiment 1 and Experiment 2 stimulus sets. PennController software <ref type="bibr">(Zehr &amp; Schwarz, 2018)</ref> handled stimulus presentation and data recording. At the start of the experimental session,</p><p>participants were asked a few questions about their demographic background. All participants indicated that English was their first language and their language of daily use.</p><p>To guard against respondents who were not actually English speakers or who might have been software programs rather than humans, a few introductory multiple-choice questions were asked about the names of household objects shown in an image (a sieve, pliers), and a few simple linguistic word problems were given. Performance on these was variable, but almost all participants got most of the questions right, and the total score on these tests did not correlate with performance in the transcription task and was not related to the degree to which a given participant's responses were correlated with the average responses of the other participants. Responses were also checked for any suspicious behaviors like conspicuous patterns in the ratings or long stretches of strange transcription responses. No participants were excluded on the basis of any of these checks. The whole rating procedure took judges about 45 minutes.</p><p>judges to rank the clarity of a word that was originally present in two consecutive sentences. To begin, Figure <ref type="figure">1</ref> provides an overview of the first-mention advantage, showing every item from each of the dyads as the proportion of judges picking the first mention as the clearer token. There was substantial variability in this effect, and it was not large, but it was shown by all eight of the dyads.</p><p>We tested a series of possible stimulus characteristics that might influence judges' choice of the first-mention token as the clearer one. First, we evaluated whether the order with which a given token appeared on a given trial affected performance. Indeed, it did:</p><p>judges tended to choose the first mention as clearer 66.3% of the time when they heard it second (se 2.4% over items) but only 49.6% of the time when they heard it first (se 2.5% over items). This difference was significant in a paired t-test (t=12.6, df=284, p&lt;0.0001).</p><p>We speculate that this effect came about because judges hearing the word a second time were primed by the first time, and the resulting ease of recognition of the second token biased their rating of clarity of articulation. This effect was large and consistent, but did not interact with any variable of interest in any of the subsequent analyses. Given that every item was counterbalanced over this variable, and was responded to by the same number of judges in each presentation condition, this order effect does not interfere with interpretation of the other variables; however, because there was some variability in the size of the trial position effect between judges, we included a random effect that included both judge and trial position in regressions predicting the size of the first-mention effect.</p><p>We first evaluated whether first-mention effects would be moderated by whether the parent thought the child knew the word. The dataset included parent report on the CDI checklist instrument for each tested word at 12 months and 15 months. The recordings spanned the ages of 8 to 15 months. This means that for some children and some words, the CDI results were informative about the mother's estimate of the child's knowledge of the words she was saying; for other words, the CDI results were ambiguous. For example, a "known" on the CDI at 12 months indicates that in a 13-month-old recording, the parent probably thought the child knew the word; but in a 10-month-old recording, we do not really know, because maybe the child learned the word at 11 months. Similarly, if the parent indicated "not known" on the 12 month CDI and "known" at 15 months, we may assume "not known" for an 11 month recording but cannot be sure for a 14 month recording. When we follow these considerations to their conclusions for each item in the dataset, 140 words were indicated as not known at the time of recording, 52 were indicated as known, and 93 were ambiguous and excluded from these analyses. (We also computed a parallel set of analyses that included all of the words and used the result on the 12-month CDI as the CDI measure. These analyses yielded very similar results.)</p><p>In this analysis we also evaluated potential effects of word frequency on the first mention effect, considering the possibility that parents might speak a word more clearly the first time if it were rarer, but then back off this extra hyperarticulation after the first mention. Word frequency was estimated based on frequency of occurrence in the relevant speaker's own data within the Brent corpus. Thus, this analysis predicted first-mention choice (yes or no) from trial position, CDI status, centered (log) word frequency, and the interaction of CDI status and frequency, with random effects for subject and trial position, and word.</p><p>Parents did not exhibit stronger or weaker first mention effects for words they thought their children knew. In addition, neither frequency nor its interaction with CDI status predicted first-mention choice. Table <ref type="table">3</ref> displays the analysis. Another element of parental speech we considered was the tendency to place important words sentence-finally in infant-directed speech (e.g., <ref type="bibr">Aslin, Woodward, LaMendola, &amp; Bever, 1996;</ref><ref type="bibr">Fernald &amp; Mazzie, 1991)</ref>. Utterance-final words often have longer vowels than utterance-medial words in infant-directed speech (e.g., <ref type="bibr">Swingley, 2019)</ref> and, in hyperarticulated contexts, are recognized more easily <ref type="bibr">(Fernald, McRoberts, &amp; Swingley, 2001</ref>; see also <ref type="bibr">Seidl &amp; Johnson, 2006)</ref>. There is some evidence that words that appear frequently in utterance-final position are learned more easily than words that do not <ref type="bibr">(Frank et al., 2017</ref>; though see <ref type="bibr">Swingley &amp; Humphrey, 2017)</ref>. Thus, it seemed reasonable to consider whether utterance-finality might relate to first-mention hyperarticulation. We would expect that first-mention choices would be more likely when the first-mention was utterance-final and less likely when the second-mention was utterance final. An interaction might suggest that when parents place successive instances of a word utterance-finally, they are using a teaching register that could be immune to the typical discourse effects of second mentions.</p><p>The regression revealed a significant effect of utterance-final positioning of the first mention, favoring its selection; and a nonsignificant complementary effect in the (expected) opposite direction for utterance-final positioning of the second mention; but no interaction.</p><p>Frequency modulated the effect of utterance position on first-mention effects. When words were more common, utterance-final first mentions yielded larger first-mention advantages (p=.021), and utterance final second mentions yielded smaller first-mention advantages for more common words, though this latter effect was not significant (p=.075). See Table <ref type="table">4</ref>. This may be viewed graphically in Figure <ref type="figure">2</ref>. When words were low in frequency (on the left side of each facet), utterance finality had a minimal impact on choice of the firstmention token. As words gained in frequency, listeners' choice of the first or second mention as the clearer one was increasingly dominated by utterance position, on those items for which the two tokens differed in utterance-finality. Thus, for high-frequency words, listeners chose the first mention when only it was utterance-final, and the second mention when only it was utterance-final. When utterance-finality was equivalent for the first and second mentions, frequency did not have a significant impact on the size of the first-mention effect. We confirmed this result in a variant of the above regression analysis that restricted the dataset to the 207 items for which neither or both tokens were utterancefinal. In this restricted dataset, frequency did not have a significant impact on first-mention choice, whether on its own (coef. = -.05, p &gt; 0.5) or in interaction with utterance position (i.e., both-final vs. neither-final; coef. = .07, p &gt; 0.5). The interaction of frequency and utterance position on the first-mention effect is probably not really "about" the firstmention effect per se; its effects on first-vs. second-mention clarity may be best viewed as collateral effects of the fact that first and second mention are confounded with utterance position when considering pairs in which utterance position varies between tokens. This is</p><p>Table 5 Analysis of first-mention and second-mention mean difference scores in ratings of word clarity, including CDI result and utterance-final positioning . coef. stderr t value p value (Intercept) 0.067 0.105 0.634 0.5266 cdi [known] 0.079 0.123 0.638 0.5240 log freq. -0.035 0.049 -0.705 0.4817 1 st mention utt-final [yes] 0.746 0.170 4.383 0.0000 2 nd mention utt-final [yes] -0.145 0.187 -0.774 0.4398 1st men. utt-final [yes] : 2nd men. utt-final [yes] -0.451 0.245 -1.837 0.0678</p><p>The analysis showed no impact of CDI knowledge on the size of the first mention effect. Word frequency was also unrelated to the size of the first mention effect. Regarding utterance position, the first mention effect was significantly larger when the first-mention token was utterance-final, and tended to be smaller when the second-mention token was utterance-final, though this latter difference was not reliable. As one might expect, the firstmention advantage when the first-mention token was utterance-final was diminished when the second-mention token was also utterance-final, though this interaction effect was only marginally significant (p=0.068).</p><p>Given that there was no sign of an impact of CDI knowledge on the first-mention effect, a follow-up analysis excluded the CDI and therefore could include all 5130 ratings in the dataset (the 18 judges' first-mention minus second-mention difference scores for 285 items). Predictors were frequency, utterance position (utterance-final or not), and their interactions.</p><p>The results were consistent with the outcome of the choice task. Ratings were affected by utterance-final position, so when only one of the tokens was utterance-final, that token was rated more highly, either increasing the first mention effect (if the first mention were utterance-final) or, decreasing it (though not significantly) if the second mention were utterance-final). Word frequency modulated these first-mention effects, with more common words showing utterance-finality enhancement effects more strongly. These outcomes are enumerated in Table <ref type="table">6</ref> and displayed graphically in Figure <ref type="figure">4</ref>.</p><p>Considering Figure <ref type="figure">4</ref>, we can estimate the first-mention effect as the difference between the dark blue regression lines and the lighter red ones. The effect is essentially unaffected by frequency (the lines are nearly parallel) when neither word was utterancefinal (leftmost panel) or when both were (rightmost panel). The two center panels show the outcome of mixing an utterance-final advantage with the first-mention advantage. The firstvs. second-mention difference changes with frequency in both cases, though in opposite directions, as one would expect: strengthening when the first mention is utterance final, becoming increasingly negative when the second mention is final. The ratings data reveal a feature of this effect that was not visible in the choice data, namely that frequency's influence on the first-mention effect is carried mainly by the utterance-medial word.</p><p>Ratings of tokens that were not utterance-final fell off strongly with frequency (r = -.270, t(232) = -4.27, p &lt; 0.0001), whereas ratings of utterance-final tokens fell off less strongly (r = -.010, t(334) = -1.82, p = 0.069). This suggests that mothers maintain hyperarticulation for utterance-final words, but as words become more common, they allow phonetic reduction to take place in less privileged sentence positions. This generalization appears to be true independently of first-mention effects. second), utterance-final position, centered log-frequency, CDI, the interaction of CDI with mention and utterance position, and the interaction of log frequency and utterance position.</p><p>Listener identity and item-pair were included as random effects, with random slopes for mention. The CDI interactions were not significant, but are included here in keeping with the purpose of the analysis. The results show that second mentions were rated lower; more frequent words were rated lower; utterance-final words were rated higher, and words that the mother thought her child knew were rated higher. Variants of this analysis that included other interactions (not shown) indicated that none were significant or near significant.</p><p>Most of these effects are familiar from previous analyses; for example, the frequency effect's interaction with utterance position is visible in Figure <ref type="figure">4</ref>. Note, for example, the steep slope of the frequency effect in the "neither" panel of that figure (both tokens utterance-medial) relative to the shallower slope in the "both" panel (both tokens utterancefinal), and the analogous findings in the middle panels. Utterance-finality clearly protects words from frequency-based reduction, to some degree.</p><p>The CDI effect is in the wrong direction for the hypothesis that mothers would hyperarticulate words more strongly or more consistently when children do not know them yet. Instead, mothers seem to hyperarticulate when they think it will help them be understood. We return to this theme in the Discussion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 7</head><p>Ordinal regression predicting ratings of words. Data were entered at the trial level. Contrasts were treatment coded. <ref type="bibr">[yes] and [known]</ref> in brackets refer to the level designated as treatment. All data for which CDI scores were available were included. coef. exp(coef) std. err. z value p value mention [second] -0.446 0.640 0.140 -3.178 0.0015 log freq. -0.724 0.485 0.162 -4.469 0.0000 cdi [known] 1.243 3.466 0.429 2.896 0.0038 utterance-final [yes] 1.658 5.249 0.239 6.939 0.0000 mention [2nd]: cdi [known] -0.184 0.832 0.271 -0.678 0.4979 utt.-final [yes]: cdi [known] -0.029 0.971 0.420 -0.070 0.9442 log freq.: utt.-final [yes] 0.488 1.629 0.174 2.801 0.0051</p><p>Transcriptions. Finally, we considered judges' transcriptions of the words. This task was intended to address a limitation of rating methods, which is that it is difficult to calibrate rating differences to functional consequences. It could be that ratings are just aesthetic judgements whose range is limited enough that even poorly rated words would be perfectly recognizable.</p><p>Judges' free transcription responses were not always words. Responses not in the PronLex dictionary <ref type="bibr">(Kingsbury, Strassel, McLemore, &amp; MacIntyre, 1994)</ref> were evaluated</p><p>one by one. When they appeared to be typographical errors or misspellings, they were corrected (coffe for coffee, manget for magnet). When they were nonwords but seemed to be plausibly intended as such, they were retained, and pronunciations were estimated for them by analogy to other words (myan in response to lion was assumed to rhyme with lion). When responses were English words with more than one pronunciation (like read), the pronunciation was assumed to be the one closest to the transcription of the word in the corpus. Pronunciations were tabulated to evaluate the phonological distance between responses and the corpus transcription, possibly providing a more sensitive measure than the binary outcome of whether a response matched exactly. Distances were computed using the R stringdist function's implementation of the Levenshtein distance metric (van der Loo, 2014). The Levenshtein distance, also known as the edit distance, is the minimum number of additions, removals, or substitutions required to convert one string into another. Most words were recognized; that is, most responses matched the corpus transcription. Among first mentions, 70.5% of trials' responses matched; among second mentions, 66.4%. This difference was significant by proportion test (&#967; 2 =19.34, p &lt; 0.0001). <ref type="table">8</ref>, enumerated over responses without averaging. For each pair, the mean Levenshtein distance of the responses to the second mention were subtracted from the mean to the first mention, giving a distribution of difference scores that was roughly normal in form, with a mean of -0.133 and standard deviation of 0.713 (first quartile, -0.333; third quartile, 0.167). The mean was significantly less than zero by two-tailed one-sample t-test (t(284) = -3.141, p = 0.0019). Note that because the measure is a distance, a negative mean corresponds to a tendency for first mentions to be closer to the correct pronunciation than the second mentions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The full table of distances is given in Table</head><p>Predictors of transcription accuracy were evaluated using negative binomial regression analysis (similar to Poisson regression, but taking into account overdispersion in the outcome distribution). Data were entered at the trial level, with the outcome being the Levenshtein distance of the response from the canonical pronunciation of the spoken word.</p><p>Predictors were mention, (log) frequency in the maternal corpus, word knowledge on the CDI, whether the word was utterance-final, the interaction of these predictors with mention, the interaction of frequency and utterance-finality, and random effects terms for subject and for item pair, with a slope term for mention in the item effect. This analysis is given in Table <ref type="table">9</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 9</head><p>Negative binomial regression predicting the phonological distance of transcriptions from the correct word. For the 'mention' predictor, regression coefficients are negative when first-mention effects are stronger. Exponentiated coefficients (exp(coef)) give the multiplicative change in phonological distance expected given a unit change in the predictor. Contrasts are treatment coded. Material in brackets, like <ref type="bibr">[yes]</ref>, refers to the level designated as treatment. Random effects were listener and item pair with random slopes for mention.</p><p>coef. exp(coef) std. err. z value p value (Intercept) -0.822 0.440 0.201 -4.085 0.0000 mention [2nd] 0.250 1.284 0.183 1.367 0.1716 log freq. 0.408 1.504 0.141 2.903 0.0037 cdi [yes] -0.764 0.466 0.301 -2.540 0.0111 utt.-final [yes] -0.891 0.410 0.215 -4.137 0.0000 mention [2nd]: log freq. -0.143 0.867 0.095 -1.496 0.1346 mention [2nd]: utt.-final [yes] 0.182 1.200 0.239 0.762 0.4462 mention [2nd]: cdi [yes] 0.197 1.218 0.246 0.800 0.4234 log freq.: utt.-final [yes] -0.383 0.682 0.141 -2.714 0.0067</p><p>No interaction terms for mention were significant; thus, the results provided no robust evidence for an effect of word frequency, CDI reporting of word knowledge, or utterance position on the magnitude of the first-mention effect on transcription accuracy.</p><p>Removing nonsignificant interaction predictors and re-running that analysis resulted in the outcome presented in Table <ref type="table">10</ref>.</p><p>Transcriptions of second mentions were significantly less close to the target than transcriptions of first mentions. Higher maternal word frequency was associated with greater distance, in keeping with the typical effects of frequency on reduction, but this effect was attenuated in utterance-final position. In general, utterance-finality was linked to closer proximity to the canonical form. Words were also closer to the canonical form when mothers thought their child knew them.</p><p>The effects of frequency and sentence position on transcription accuracy are shown in the right panel of Figure <ref type="figure">5</ref>, with the analogous ratings data in the left panel. Recall that for the transcription measure, greater distance (higher on the y axis) corresponds to lower clarity. Over trials, 77.6% of the utterance-final transcription distances were zero; only 55.4% of the utterance-medial distances were zero. The effect of utterance position on transcription accuracy was quite large, and became more pronounced with greater word frequency.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 10</head><p>Negative binomial regression predicting the phonological distance of transcriptions from the correct word, with nonsignificant interactions removed from the formula (cf.</p><p>Table 9). coef. exp(coef) z value p value (Intercept) -0.894 0.409 -6.196 0.0000 mention [2nd] 0.223 1.250 6.304 0.0000 log freq. 0.338 1.402 3.322 0.0009 cdi [known] -0.636 0.529 -2.544 0.0109 utt-final [yes] -0.340 0.712 -5.148 0.0000 log freq.:utt-final -0.340 0.712 -5.907 0.0000</p><p>Summary of the results across the three tasks.</p><p>The three tasks yielded similar results:</p><p>However we asked judges to evaluate words, first-mentions were found to be moderately phonetically clearer than second-mentions, as expected. This effect was nevertheless quite variable over items, and its magnitude was not predictable based on our measurements of word frequency, maternal estimates of their child's knowledge of the word, or utterancefinality. The results therefore failed to support a number of hypotheses about maternal discourse effects: for example, that maintenance of more emphatic realizations from first to second mention might coincide with maternal beliefs about the child's knowledge of the word; or that mothers might tend to compensate for placing a word utterance-medially in a second mention by hyperarticulating it a bit more. If such effects are present in English infant-directed speech, they may be too weak to emerge from the myriad other influences on the phonetic realization of words.</p><p>known words (by definition) do not need to be taught, parents might feel more free to offer reduced second mentions of known words relative to unknown words. Instead, we found the clarity of known words to be greater than the clarity of unknown words independently of mention.</p><p>If maternal conversation with infants were primarily dedicated to word teaching, we might expect the reverse, namely that as-yet-unknown words would be the clearest of all.</p><p>Instead, the data show considerable heterogeneity in clarity. Why might this be? One possibility is that a word's referent might be independently clear from the situational context. For example, if a toy car figured prominently in a play interaction before being mentioned, a parent might reasonably refrain from hyperarticulation because the lexical concept was already "given" in the discourse.</p><p>Figure <ref type="figure">6</ref>. Over all items, mean rating for the first mention and the second mention of each pair. The first-mention advantage is shown by color and position. When judges gave higher (better) clarity ratings to the first mention than the second mention, the plot point for that item falls in the upper left portion of the plot.</p><p>--Another possibility is that the parent's priority is not always to maximize the likelihood of being understood. Parents speaking with their infants have a range of goals.</p><p>One of these is just to maintain an ongoing social connection, a goal that might lead parents to produce a stream of talk without necessarily ensuring that its details be linguistically interpretable. Parents also sometimes talk simply to entertain themselves while changing diapers or washing bottles. Perhaps this diversity of speech functions underlies some of the variability in the clarity effects found here <ref type="bibr">(Beech &amp; Swingley, 2024</ref>).</p><p>Consider, for example, Figure <ref type="figure">6</ref>. The dominant feature of the ratings distribution is that within adjacent utterances, ratings of repetitions of words are correlated, the first with the second (r = 0.775, t(283) = 20.6, p &lt; 0.0001). It appears that there are sentence pairs in which common words were relevant enough to be repeated, but that were still spoken with low clarity, and other sentence pairs in which both words were hyperarticulated. If hyperarticulation is linked primarily to the speaker's desire to be understood, rather than to the speaker's interest in teaching new words, it follows that there are some utterance pairs in which the parent does not make extra efforts to use clear articulation, either because conveying a given word is not a priority, or because she considers it unnecessary in context.</p><p>Utterance-finality is well known to be linked with clarity, particularly in infantdirected speech (e.g., <ref type="bibr">Fernald &amp; Mazzie, 1991)</ref>. Parents place words that are significant in the discourse in utterance-final position, and frequently also speak those words with noticeable pitch peaks and increased duration in the words' segments (e.g., <ref type="bibr">Aslin et al., 1996;</ref><ref type="bibr">Swingley, 2019)</ref>. Fernald and Mazzie found that parents maintained this pitch feature on the second mentions of words to a greater degree when talking to children than adults did in conversation with other adults. This behavior would be expected to reduce firstmention effects.</p><p>The frequency effect found here is a familiar one in psycholinguistics, and broadly attested <ref type="bibr">(Clopper &amp; Turnbull, 2018;</ref><ref type="bibr">Jurafsky, Bell, Gregory, &amp; Raymond, 2001)</ref>. There is some debate in the literature about whether reduction of high-frequency words reflects production processes or audience design. Perhaps words uttered more frequently are spoken along well-canalized production pathways that have weathered away some of the distinguishing features of the component sounds. Or, perhaps speakers are alert to the inthe-moment needs of the listener, and intuit that listeners require clearer realizations for lower-probability words. If the latter hypothesis explains our results here, the lack of interaction with the CDI results is surprising; a priori one would expect the word knowledge variable to dominate the frequency variable. If parents think their child does not know a word, despite its high occurrence frequency, an audience-tuned phonetic approach would suggest hyperarticulating such words independently of their frequency. This would be revealed as an interaction that we did not observe. Thus, our results suggest that the frequency effects may emerge from psycholinguistic processes in the speakers having to do with word retrieval and production representations rather than fine attunement to the needs of infant listeners.</p><p>What do these results mean for infant word-finding early in language learning? The words we tested here were relatively privileged words, first of all for being repeated in successive utterances, and second for being present on the CDI. These are the kinds of words that make up children's early lexicons. The substantial phonetic variability with which they are apparently presented to children suggests that we should not assume that a word's presence in a transcript implies that the word is available to the infant, particularly if it is not yet familiar. It is not yet known whether presentation of a very clear token as a first mention followed by a relatively hypoarticulated token helps infants to accept the second as an instance of the same type as the first (much as our adult listeners found the second token they heard to be considerably more interpretable than the first). If so, it might also help educate infants about the typical form of phonetic reduction.</p><p>This study has some limitations. The listening conditions of the participants, and the participants' language backgrounds, cannot be guaranteed, so it is possible that there is some contamination in the data. In addition, a dataset with more vocabulary measurements would be quite useful; although we established some robust relations between CDI status and word clarity, their strength would be better estimated with vocabulary measurements made closer in time to the speech samples. Our conclusions about vocabulary knowledge would be stronger, too, if the CDI comparisons could be made within words across children. Here, although some words were tested with more than one pair (in different dyads), there were not enough CDI-discordant items to perform a within-word test of the relationship between clarity and word knowledge, and as a result it is theoretically possible that the CDI effects are actually facts about the particular set of words children tend to know and that an unmeasured variable is responsible for the greater average clarity of the "known" words. This could be resolved using a design specifically targeting words whose "known" status varies across children, or developmentally within children.</p><p>Computational models of infant word-finding typically make the simplifying assumptions that words are always spoken the same way, or with only minor contextuallydependent variations, and that infants are capable of reliably extracting a veridical phonological transcription of spoken words as strings of syllables or phones. This is unlikely to be the case, given the marked variability in word realizations noted here and in prior research (e.g., <ref type="bibr">Bard &amp; Anderson, 1983</ref>). An alternative possibility is that at first infants operate over the phonetic signal directly rather than over phonetic categorizations, and derive both words and phones at the same time (e.g., <ref type="bibr">Feldman, Griffiths, Goldwater, &amp; Morgan, 2013;</ref><ref type="bibr">Swingley, 2009)</ref>. Given how consistently we found that utterance-final words were easier to identify, and rated as clearer than other words, it might be useful to contemplate an alternative model of the infant as listening for prominent utterance-final chunks of speech, or for sequences that are repeated in close succession <ref type="bibr">(McInnes &amp; Goldwater, 2011;</ref><ref type="bibr">Nencheva et al., 2024)</ref>, and building the initial lexicon from these.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>The other devia on from the preregistra on is that the ini al preregistra on proposed making phone c measurements of the tokens and entering those into a separate series of regressions. Ul mately we decided that these phone c variables are manifesta ons of hyperar cula on or clarity, rather than predictors of clarity, and that the analyses would be more coherent without these variables.</p></note>
		</body>
		</text>
</TEI>
