Not AvailableThe goal of this research is to understand how bilingual and monolingual parents adjust their speech when talking to infants. We examined pitch characteristics of infant-directed speech (IDS) and adult-directed speech (ADS) with Spanish-English bilingual and English monolingual parents and their infants (8–20 months of age). Thirty-eight parent-infant dyads participated in two naturalistic play tasks. Parents spoke with a bilingual researcher to collect samples of ADS. Results showed that both parent groups produced higher maximum and average fundamental frequency in IDS than ADS, suggesting that caregivers adjust pitch similarly in IDS across registers. However, for bilinguals, the IDS versus ADS difference was larger in English than Spanish; bilingual parents differentiated IDS adjustments across languages. The analyses across word repetitions revealed that in bilingual parents’ IDS, there was no change in pitch across the first and second repetitions of target words, even when repetitions occurred in different languages. Taken together, results suggest that bilingual parents adjust their IDS pitch similarly to English-speaking monolinguals, but they differentiate English and Spanish IDS adjustments. Overall, this project contributes to our understanding of parents’ register adjustments across multilingual language learning contexts.
more »
« less
The intelligibility of consonants in American English infant-directed speech
To begin learning their language, infants must locate words in the speech signal. Some models of word discovery presuppose that the discovery process depends on identifying phonetic segments (phones) in speech. To test the plausibility of models arguing that infants can reliably categorize consonants in speech, adult native speakers were asked to identify the consonant in vowel-consonant-vowel sequences extracted from spontaneous English infant-directed speech. Listeners could consistently identify some instances of consonants (for example, correctly indicating that an /s/ was an /s/). But many tokens (about half) were not consistently identifiable. Performance was significantly worse for codas than onsets. Providing the full utterance context in low-pass-filtered form did not aid recognition, nor did familiarization with the talker. In a second task, listeners were barely above chance in guessing whether a consonant was a word onset or a word-final coda. Performance on infant-directed speech was not markedly better than performance on a comparison set of adult-directed speech consonants. Erroneous responses frequently had little systematic resemblance to the correct answer. The results suggest that it is not plausible that infants can parse most utterances exhaustively into strings of uttered speech sounds and feed those strings into a statistical clustering mechanism.
more »
« less
- PAR ID:
- 10668921
- Publisher / Repository:
- Publisher: Elsevier. Repository: OSF
- Date Published:
- Journal Name:
- Cognitive Psychology
- Volume:
- 161
- Issue:
- C
- ISSN:
- 0010-0285
- Page Range / eLocation ID:
- 101766
- Subject(s) / Keyword(s):
- language acquisition categorization speech phonetics unsupervised learning infant development
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The current study utilized eye-tracking to investigate the effects of intersensory redundancy and language on infant visual attention and detection of a change in prosody in audiovisual speech. Twelve-month-old monolingual English-learning infants viewed either synchronous (redundant) or asynchronous (non-redundant) presentations of a woman speaking in native or non-native speech. Halfway through each trial, the speaker changed prosody from infant-directed speech (IDS) to adult-directed speech (ADS) or vice versa. Infants focused more on the mouth of the speaker on IDS trials compared to ADS trials regardless of language or intersensory redundancy. Additionally, infants demonstrated greater detection of prosody changes from IDS speech to ADS speech in native speech. Planned comparisons indicated that infants detected prosody changes across a broader range of conditions during redundant stimulus presentations. These findings shed light on the influence of language and prosody on infant attention and highlight the complexity of audiovisual speech processing in infancy.more » « less
-
Abstract Computational models of infant word‐finding typically operate over transcriptions of infant‐directed speech corpora. It is now possible to test models of word segmentation on speech materials, rather than transcriptions of speech. We propose that such modeling efforts be conducted over the speech of the experimental stimuli used in studies measuring infants' capacity for learning from spoken sentences. Correspondence with infant outcomes in such experiments is an appropriate benchmark for models of infants. We demonstrate such an analysis by applying the DP‐Parser model of Algayres and colleagues to auditory stimuli used in infant psycholinguistic experiments by Pelucchi and colleagues. The DP‐Parser model takes speech as input, and creates multiple overlapping embeddings from each utterance. Prospective words are identified as clusters of similar embedded segments. This allows segmentation of each utterance into possible words, using a dynamic programming method that maximizes the frequency of constituent segments. We show that DP‐Parse mimics American English learners' performance in extracting words from Italian sentences, favoring the segmentation of words with high syllabic transitional probability. This kind of computational analysis over actual stimuli from infant experiments may be helpful in tuning future models to match human performance.more » « less
-
Children are adept at learning their language’s speech-sound categories, but just how these categories function in their developing lexicon has not been mapped out in detail. Here, we addressed whether, in a language-guided looking procedure, 2-year-olds would respond to a mispronunciation of the voicing of the initial consonant of a newly learned word. First, to provide a baseline of mature native-speaker performance, adults were taught a new word under training conditions of low prosodic variability. In a second experiment, 24- and 30-month-olds were taught a new word under training conditions of high or low prosodic variability. Children and adults showed evidence of learning the taught word. Adults’ target looking was reduced when the novel word was realized at test with a change in the voicing of the initial consonant, but children did not show any such decrement in target fixation. For both children and adults, most learners did not treat the pho- nologically distinct variant as a different word. Acoustic-phonetic variability during teaching did not have consistent effects. Thus, under conditions of intensive short-term training, 24- and 30-month-olds did not differentiate a newly learned word from a variant differing only in consonant voicing. High task complexity during training could explain why mispronunciation detec- tion was weaker here than in some prior studies.more » « less
-
Comparing human and machine's use of coarticulatory vowel nasalization for linguistic classificationAnticipatory coarticulation is a highly informative cue to upcoming linguistic information: listeners can identify that the word is ben and not bed by hearing the vowel alone. The present study compares the relative performances of human listeners and a self-supervised pre-trained speech model (wav2vec 2.0) in the use of nasal coarticulation to classify vowels. Stimuli consisted of nasalized (from CVN words) and non-nasalized (from CVCs) American English vowels produced by 60 humans and generated in 36 TTS voices. wav2vec 2.0 performance is similar to human listener performance, in aggregate. Broken down by vowel type: both wav2vec 2.0 and listeners perform higher for non-nasalized vowels produced naturally by humans. However, wav2vec 2.0 shows higher correct classification performance for nasalized vowels, than for non-nasalized vowels, for TTS voices. Speaker-level patterns reveal that listeners' use of coarticulation is highly variable across talkers. wav2vec 2.0 also shows cross-talker variability in performance. Analyses also reveal differences in the use of multiple acoustic cues in nasalized vowel classifications across listeners and the wav2vec 2.0. Findings have implications for understanding how coarticulatory variation is used in speech perception. Results also can provide insight into how neural systems learn to attend to the unique acoustic features of coarticulation.more » « less
An official website of the United States government

