<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Leveraging interdisciplinary perspectives to optimize auditory training for cochlear implant users</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>09/01/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10280757</idno>
					<idno type="doi">10.1111/lnc3.12394</idno>
					<title level='j'>Language and Linguistics Compass</title>
<idno>1749-818X</idno>
<biblScope unit="volume">14</biblScope>
<biblScope unit="issue">9</biblScope>					

					<author>Julia R. Drouin</author><author>Rachel M. Theodore</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Over the past 50 years, significant advancements have been made in the clinical management of severe to profound sensorineural hearing loss. Cochlear implants (CIs) are now widely considered to be an effective treatment option for both pediatric and adult patients. CIs are surgically implanted medical devices that use electrical stimulation of the auditory nerve to create the sensation of hearing. In the CI system, acoustic energy is detected by an external microphone and sent to a speech processor that is worn on the head. The speech processor is involved in extracting relevant acoustic features and determining electrode stimulation patterns.</p><p>The signal is then transmitted via radio waves to an internal component, which transforms the input into electrical pulses to stimulate the auditory nerve (e.g., <ref type="bibr">Loizou, 1998)</ref>. Current electrode arrays are designed to provide stimulation for up to 22 intra-cochlear electrodes. The way in which electrodes are stimulated varies depending on the selected processing strategy. For example, in a continuous interleaved (CIS) processing strategy, acoustic information is bandpass-filtered based on the number of electrodes, and electrical pulses are generated using amplitudes proportional to the amount of energy contained in each bandpass channel. Electrodes are stimulated in a non-simultaneous fashion to minimize channel interaction (e.g., <ref type="bibr">Loizou, 1998;</ref><ref type="bibr">Loizou, 2006)</ref>. Conversely, an advanced combination encoder (ACE) processing strategy uses a subset of the available electrodes with stimulation occurring for frequency bands that contain the largest amplitudes (e.g., <ref type="bibr">Hu &amp; Loizou, 2008)</ref>.</p><p>Listening through an implant is a markedly different experience than acoustic hearing. In the normal-hearing (NH) system, the cochlea contains thousands of hair cells that code finegrained differences in frequency, which supports a high-resolution representation of auditory information in the environment. Conversely, most CI users only have up to 22 electrodes available to represent the auditory signal, which results in a coarser signal representation. As a consequence, the implant does not afford access to the fine-grained spectral resolution that is available to NH listeners. Paradoxically, studies examining speech perception outcomes in CI users have demonstrated that listeners do not necessarily need access to the richness of spectral information present in the natural speech signal in order to achieve reliable speech comprehension -many CI users recognize speech transmitted through the implant, as do NH participants listening to spectrally degraded speech (e.g., <ref type="bibr">Remez, Rubin, Pisoni, &amp; Carrell, 1981;</ref><ref type="bibr">Shannon, Zeng, Kamath, Wygonski, &amp; Ekelid, 1995)</ref>. However, there is tremendous individual variability in speech perception outcomes among CI users. While some users show performance at or near the typical range, many others do not (e.g., <ref type="bibr">Busby, Roberts, Tong, &amp; Clark 1991;</ref><ref type="bibr">Dawson &amp; Clark, 1997;</ref><ref type="bibr">Niparko et al., 2010)</ref>. A variety of patient-related factors have been examined to determine their relevant contributions to speech perception outcomes. Research has found that many factors contribute to patient performance including age of implantation (e.g., <ref type="bibr">Gantz et al., 1988;</ref><ref type="bibr">Parkin, Stewart, Dankowski, &amp; Haas, 1990;</ref><ref type="bibr">Shea, Domio, &amp; Orchik, 1990)</ref>, pre-vs. post-lingual language status at time of implantation (e.g., <ref type="bibr">Dawson et al., 1992)</ref>, etiology of deafness (e.g., <ref type="bibr">Battmer et al., 1995)</ref>, and residual hearing following implantation (e.g., <ref type="bibr">Gantz et al., 1988;</ref><ref type="bibr">van Dijk, van Olphen, Langereis, Mens, Brokx, &amp; Smoorenburg, 1999;</ref><ref type="bibr">Friedland, Venick, &amp; Niparko, 2003)</ref>. In addition, other work has sought to link parameters of the device itself to outcome performance including number of active electrodes (e.g., Friesen, <ref type="bibr">Shannon, Baskent, &amp; Wang, 2001)</ref>, coding strategy (e.g., <ref type="bibr">Skinner, Holden, Whitford, Plant, Psarros, &amp; Holden, 2002)</ref>, and insertion depth (e.g., <ref type="bibr">Finley &amp; Skinner, 2009)</ref>. Both patient-and devicerelated factors have been shown to partially account for the variance observed in patient outcomes following implantation. However, to date there is no comprehensive set of predictors that can reliably account for differences in outcome performance observed with the device. That is, while a substantial portion our current knowledge base has justified the efficacy of the CI in order to demonstrate a benefit of the device at an implementation level (e.g., <ref type="bibr">Pisoni, Kronenberger, Harris &amp; Moberly, 2017)</ref>, we have a more limited understanding as to why a CI works well for some patients, yet poorly for others. As a result, the current evidence base does not account for why patients with comparable language status, medical history, demographic background, and device may perform differently with the implant (e.g., <ref type="bibr">Moberly, Bates, Harris, &amp; Pisoni, 2016;</ref><ref type="bibr">Pisoni, Cleary, Geers, &amp; Tobey, 1999)</ref>.</p><p>One account as to why some CI users show poor performance outcomes is the lack of availability of acoustic cues. However, research examining NH participants has found that individuals can recognize speech in the absence of acoustic cues that they routinely have access to. For example, listeners readily demonstrate comprehension of compressed speech <ref type="bibr">(Licklider &amp; Pollack, 1948)</ref>, sine-wave speech (e.g., <ref type="bibr">Remez, Rubin, Pisoni, &amp; Carrell, 1981)</ref>, and noisevocoded speech (e.g., <ref type="bibr">Shannon et al., 1995)</ref>, suggesting that availability of all acoustic cues is not necessary to achieve comprehension -instead listeners dynamically learn to adapt to the input to which they are exposed. This is an important consideration in the context of CI rehabilitation as it demonstrates that listeners do not necessarily need access to the full spectrum of acoustic cues in natural speech because adaptation mechanisms can help facilitate comprehension for acoustically degraded input. Thus, an alternate approach to understanding performance outcomes in CI users might not be which acoustic cues are available per se, but rather how the individual learns to adapt to the input transmitted through the CI.</p><p>One proposed technique to improve listening through a CI is auditory training. In general, auditory training can be characterized as passive (i.e., incidental) or active training. In passive auditory training, listeners learn to adapt to the input transmitted through the CI in an unsupervised and unstructured manner. Passive training is a relatively common recommendation in the clinical domain, particularly for adult patients, as it allows the patient to integrate use of the CI into their everyday life. However, passive adaptation may be a time-consuming process, requiring frequent exposure to routine sounds in order to accurately pair an auditory stimulus to meaning. One study illustrated this point by examining the role of passive learning on adaptation for CI users undergoing an update to their processing strategy -a manipulation that requires even experienced users to adjust the mappings between auditory input and linguistic representations <ref type="bibr">(Fu, Shannon, &amp; Galvin, 2002)</ref>. In this study, the frequency mapping was modified for a group of experienced CI users, who were allowed a three-month time period to passively adapt to the new parameters. They found that the CI users showed only minimum adaptation (i.e., no or modest improvements in speech recognition compared to baseline measures) in the time period of passive adjustment, though performance did improve with continued experience. These results suggest that unsupervised learning may not promote peak performance, particularly in the case where the CI map is novel or is not optimally configured for the user.</p><p>In contrast, listeners who engage in active auditory training engage in listening tasks that target a specific goal, such as phoneme, word, or sentence comprehension. As such, learning can be facilitated using rich contexts and feedback to meet the goal. Active auditory training may be beneficial for all CI users; it can help poor performers by providing structured opportunities to learn meaningful linguistic information, and can aid high performers by providing learning opportunities under difficult listening conditions (e.g., poor signal-to-noise ratio, novel talkers; <ref type="bibr">Fu &amp; Galvin, 2008)</ref>. Moreover, it is common for CI manufacturers to update processing strategies and technology regularly, requiring even experienced CI users to adapt to novel input. However, few CI users participate in active auditory training as part of rehabilitation, with even fewer programs targeted at post-lingual deafened adults <ref type="bibr">(Fu &amp; Galvin, 2007;</ref><ref type="bibr">Prendergast &amp; Kelley, 2002)</ref>. As a consequence, most CI users adapt to the implant in a passive fashion, though growing research suggests that this form of adaptation may not be sufficient to maximize listeners' performance (e.g., <ref type="bibr">Fu &amp; Galvin, 2008)</ref>.</p><p>Many critical challenges exist for implementing auditory training as part of standardized aural rehabilitation. First, there is significant variability in speech perception outcomes following training; some individuals demonstrate a benefit, while others do not <ref type="bibr">(Fu et al., 2005;</ref><ref type="bibr">Wu, Yang, Lin, &amp; Fu, 2007;</ref><ref type="bibr">Busby et al., 1991)</ref>. As a consequence, it is unclear how recommendations might differ across patients, and whether training may only be beneficial for patients demonstrating poor performance relative to patients who perform at a high level. In addition, because there are no concrete clinical recommendations on what the structure of active training should be (i.e., tasks, training duration, outcome measures), it is likely that few clinicians prioritize training as part of aural rehabilitation given the lack of evidence-based recommendations. Another challenge is that gains are often only measured immediately following training, with few studies examining whether performance is maintained over time, which is a critical factor to establish for any rehabilitation program. This contributes to systemic billing challenges for audiologists who do provide training. Finally, even in cases where auditory training appears to promote improvements in speech understanding, the outcome variables (e.g., task accuracy) may not explain why the improvement occurred because the measures do not index mechanistic processing changes. Standard clinical assessments of speech perception outcomes include word (e.g., Maryland CNC lists) and sentence (e.g., AzBio lists) comprehension in quiet and noise. Using these metrics alone, it is difficult to determine why one patient shows a greater benefit from training relative to a patient who has a similar profile yet gleans little benefit from training. The use of other physiologic measures may better assess changes at lower processing levels that precede changes in behavioral responses, though the relationship between physiologic changes and mechanistic changes is an area of continued research.</p><p>Here we review current forms of active auditory training including bottom-up focused training, in which attention is directed towards fine-grained acoustic differences in the signal, and top-down focused training paradigms, in which attention is directed towards the global structure of a sentence and/or contextual cues. For the nature of this review, our focus is on examining task-specific differences across training types with the population of interest being post-lingual adults, unless otherwise stated. We adopt an interdisciplinary perspective to bridge findings from the clinical rehabilitation literature with findings from the psycholinguistics domain, highlighting the role of top-down lexical feedback on adaptation to acoustically degraded speech in NH listeners. In this review, we argue that active auditory training may be necessary to optimally target the plasticity mechanisms underlying improvements in speech perception for listeners adapting to a CI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Current forms of active auditory training in aural rehabilitation</head><p>Formal auditory training is aimed at enhancing auditory skills and improving speech understanding through a variety of listening exercises <ref type="bibr">(Sweetow &amp; Palmer, 2005;</ref><ref type="bibr">Sweetow &amp; Sabes, 2007;</ref><ref type="bibr">Rayes, Malky, &amp; Vickers, 2019)</ref>. It has been argued that auditory training should be a standard part of aural rehabilitation as a way to structure how the listener learns to understand the transmitted signal. However, few CI users engage in formal training, with even fewer programs targeted at post-lingual deafened adults, which may be due to a failure to prioritize training as part of the aural rehabilitation process for these users <ref type="bibr">(Fu &amp; Galvin, 2007;</ref><ref type="bibr">Prendergast &amp; Kelly, 2002)</ref>. Thus, current training protocols are often targeted at the pediatric population who demonstrate greater degrees of plasticity. While neural structure and function is maximally plastic during the first few years of life, plasticity mechanisms are present throughout the lifespan (e.g., <ref type="bibr">Schramm, Fitzpatrick, &amp; Seguin, 2002;</ref><ref type="bibr">Tong, Busby, &amp; Clark, 1988;</ref><ref type="bibr">Sharma, Gilley, Dorman, &amp; Baldwin, 2007;</ref><ref type="bibr">Most, Shrem, &amp; Duvdevani, 2010)</ref>. Adults continue to learn new skills, but often require more exposure or practice to do so. This principle underlies flexibility in the CI candidacy criterion, which now allows pre-lingual or post-lingual deafened adults the opportunity to receive an implant. Theories of residual plasticity would predict that implanted adults could reach the performance of younger implanted individuals with sufficient and prolonged experience <ref type="bibr">(Kral &amp; Sharma, 2012)</ref>. However, research has shown that lateimplanted individuals, in this case pre-lingual deafened participants over the age of 12, who used their CI for an extended period of time (i.e., at least six months post-implantation) showed tremendous variability in open-set sentence recognition <ref type="bibr">(Schramm, Fitzpatrick, &amp; Segun, 2002)</ref>.</p><p>Of the 15 participants in their sample, 10 participants showed an improvement from pre-to postimplantation performance on open set word recognition. However, nine participants performed at or below 40% performance, a level considered to represent minimal benefit. Thus, not all of the patients showed an improvement post-implantation, and of those patients who did demonstrate improvement, most of the sample did not perform at a level consistent with an acceptable benefit. Thus, the results of <ref type="bibr">Schramm et al. (2002)</ref> indicate that extended passive listening through a CI may not maximize speech understanding. Studies following newly implanted individuals suggest that most gains occur within the first 3-6 months following implantation <ref type="bibr">(Kessler, Loeb, &amp; Barker,1995;</ref><ref type="bibr">Spivak &amp; Waltzman, 1990;</ref><ref type="bibr">Waltzman, Cohen, &amp; Shapiro, 1986;</ref><ref type="bibr">Fu &amp; Galvin, 2008)</ref>, with less improvement observed after this time point (e.g., <ref type="bibr">Dorman, Loizou, &amp; Rainey, 1997;</ref><ref type="bibr">Pelizzone &amp; Cosendai, &amp; Tinembart, 1999;</ref><ref type="bibr">Helms et al., 2004)</ref>. This has prompted discussion in the audiological field that CI users may need other rehabilitative support to fully acclimate to their device (e.g., <ref type="bibr">Fu &amp; Galvin, 2008)</ref>.</p><p>There are two primary approaches to active auditory training for rehabilitation: bottomup focused training and top-down focused training. Both approaches have the same end goal of improving speech understanding for CI users, though they use different strategies to meet that goal. Bottom-up approaches, also called analytic training, focus on the building blocks of speech perception, including sensitivity to fine-grained acoustic details in the speech signal. Top-down approaches, also called synthetic training, focus on training the listener to use lexical and contextual cues to fill in perceptual gaps while processing speech <ref type="bibr">(Fu &amp; Galvin, 2007)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Bottom-up focused auditory training</head><p>The bottom-up approach encourages listeners to focus on the acoustic signal itself. True bottom-up training is designed to promote improvements in processing efficiency of the sensory signal <ref type="bibr">(Fu &amp; Galvin, 2007)</ref>. This is particularly prominent for CI users as they experience a spectral mismatch in perception because the auditory input is filtered through a limited number of electrodes. While the amplitude envelope within each channel is relatively preserved, the spectral structure of the channel is poorly conveyed (e.g., <ref type="bibr">Shannon et al., 1995;</ref><ref type="bibr">Nelson, Jin, Carney and Nelson, 2003;</ref><ref type="bibr">Moberly et al., 2014)</ref>. Previous research has demonstrated that NH listeners rely on a variety of spectral and temporal cues for phonemic decisions including formants (e.g., <ref type="bibr">Hillenbrand, Getty, Clark, Wheeler, 1995;</ref><ref type="bibr">Neel, 2008)</ref>, formant transitions (e.g., <ref type="bibr">Delattre, Liberman, Cooper, 1955;</ref><ref type="bibr">Stevens &amp; Klatt, 1974;</ref><ref type="bibr">Stevens &amp; Blumstein, 1978;</ref><ref type="bibr">Walley &amp; Carrell, 1983)</ref>, spectral center (e.g., <ref type="bibr">van Son &amp; Pols, 1997)</ref>, and periodicity (e.g., <ref type="bibr">Cole &amp; Cooper, 1975)</ref>. A critical question is the degree to which CI users rely on the same acoustic cues that NH listeners use to achieve comprehension. Research has found evidence that some CI users demonstrate similar acoustic cue weighting strategies to that of NH listeners. For example, <ref type="bibr">Iverson, Smith, and Evans (2006)</ref> found that vowel recognition in CI users was adversely affected when the speech signal was processed to remove formant movement or equate vowel duration, suggesting that these cues, as with NH listeners, are strongly weighted in perception for CI users. However, there is also evidence suggesting that CI users may weight acoustic cues differently than NH listeners. For example, <ref type="bibr">Hedrick and Carney (1997)</ref> studied four post-lingual deafened CI users to determine the relative contributions of amplitude and formant transition information on consonant-vowel identification. They found that compared to a NH control group, the CI users consistently relied more on the relative amplitude cues while the NH listeners used both formant structure and amplitude information, suggesting a difference in perceptual weighting strategies between the groups. Similar findings were demonstrated by <ref type="bibr">Moberly et al. (2014)</ref>, who examined labeling patterns of the /b&#593;/-/w&#593;/ contrast in a sample of 20 post-lingual CI users. While NH listeners weighted the spectral structure over amplitude structure exclusively, CI users showed variability in which cues were prioritized. Some users relied on amplitude structure to a greater degree, in line with a theory of shifted perceptual strategies in which CI users weight the acoustic cues that are most saliently delivered. However, it was found that the CI users who adopted a strategy closer to NH listeners, in which spectral structure was weighted over amplitude cues, demonstrated the best word recognition performance overall.</p><p>It is thought that bottom-up training approaches promote changes in how acoustic information is weighted by providing structured training in which listeners engage in tasks that AUDITORY TRAINING AND COCHLEAR IMPLANTS 10 direct attention towards fine-grained, meaningful acoustic cues. A limited number of studies have examined bottom-up auditory training in CI users and results have been equivocal. For example, <ref type="bibr">Busby, Roberts, Tong, and Clark (1991)</ref> did not observe significant improvements in vowel perception for a small sample of pre-lingual deafened CI patients who completed 10 onehour training sessions. Conversely, <ref type="bibr">Dawson and Clark (1997)</ref> examined five CI patients following 10 weeks of bottom-up focused training sessions using explicit vowel training tasks that focused on rhyme generation, discrimination, and identification. They found that four out of the five participants showed some improvement in at least one of the trained areas, and one patient showed generalization to novel contexts. Similar results were reported by Fu, Galvin, <ref type="bibr">Wang, and Nogaki (2005)</ref>, who examined 10 CI patients who completed at-home training that was customized based on their baseline performance. Participants completed a three-alternative forced-choice discrimination task. On each trial of this task, participants heard two identical sounds and one outlier sound that differed maximally with respect to acoustic features. Over the course of the experiment, the outlier sound became increasingly similar to the identical sounds, making the experiment more difficult as it progressed. Participants also completed threealternative forced-choice vowel, consonant, and nonsense syllable identification tasks that also increased in difficulty with improved performance. They found improvements in vowel and consonant recognition, with some participants also showing generalization to sentence recognition tasks.</p><p>Overall, it is difficult to conclude whether strict bottom-up focused training promotes significant improvements in speech comprehension as these studies suffer from limited sample sizes, document wide variability in benefits across patients, and show limited generalization.</p><p>However, the remaining low-level acoustic cues available to CI users may be more amenable to auditory training and may play a greater role for attending to speech in acoustically simple environments, like listening in quiet (e.g., <ref type="bibr">Fu &amp; Galvin, 2007)</ref>. A key limitation with the bottomup approach is that the tasks used during training may be considered less functionally relevant than top-down approaches. Specifically, training focused on recognition of single phonemes is not typically a task required to achieve comprehension under everyday listening conditions.</p><p>Instead, most listeners require comprehension of a continuous stream of sentences in everyday environments and have access to broad contextual cues. Spoken language comprehension occurs through interactive processes at both the low-level acoustic level and higher-order linguistic level. Future research is needed to explicate the degree to which bottom-up focused training generalizes to more functionally relevant communication tasks, such as single words or sentences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Top-down focused auditory training</head><p>Top-down based approaches to auditory training emphasize the use of lexical and contextual cues to decode the speech signal. This type of training targets skills that support auditory processing and language skills. While bottom-up approaches first direct attention towards the fine-grained structure of the input, top-down approaches first direct attention towards the global structure of the signal, which is thought to closely mirror how NH listeners learn language <ref type="bibr">(Nittrouer &amp; Caldwell-Tarr, 2016)</ref>. Unlike analytic approaches, the focus of the top-down approach is on using contextual cues to facilitate comprehension. Materials and training tasks for top-down based auditory training include transcribing words or sentences under clear or challenging listening conditions, such as in background noise or with multiple talkers.</p><p>One common form of top-down auditory training is connected discourse tracking <ref type="bibr">(DeFilippo &amp; Scott, 1978)</ref>. This type of training requires a talker (sender) and listener (receiver) who create a set of speech materials for the talker to read. The talker reads the prepared text and the receiver must repeat back exactly what was said, with access to visual and/or auditory information. If the listener makes an error, then the talker may paraphrase the materials or prompt the listener with additional cues <ref type="bibr">(Levitt, Waltzman, Shapiro, Cohen, 1986)</ref>. Clinicians assess the number of words correct per minute and can compare performance in auditory only, visual only, or audiovisual conditions to determine individual contributions of each domain on comprehension. Research utilizing connected discourse tracking has found variable results on its efficacy. <ref type="bibr">Levitt et al. (1986)</ref> examined performance of five CI patients using Connected Discourse Tracking over a 10-week training period. They report that all five subjects showed significant improvements in tracking rate, though the magnitude of learning differed considerably, and none of the participants reached the threshold considered within normal listening levels. The use of connected discourse tracking has been a topic of debate in the audiology domain because the paradigm has many uncontrolled variables (e.g., talker differences, listener differences, text materials, repeated presentations) and the guidelines to compare performance within subjects are not well established <ref type="bibr">(Tye-Murray &amp; Tyler, 1988)</ref>.</p><p>However, use of top-down focused training paradigms may hold promise for aural rehabilitation, as they are commonly used in the psycholinguistics literature for adaptation to degraded speech input for NH listeners. In the following section, we review the role of lexical information on speech perception for NH listeners, and highlight findings demonstrating that contextual cues play an influential role for adaptation to atypical speech input.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Lexical influences on speech perception</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Lexical effects in NH users</head><p>Everyday listening environments present a challenge for even the most skilled NH individuals because listening often occurs in suboptimal environments and talkers show a high degree of variability in the acoustic realization of speech sounds (e.g., <ref type="bibr">Delattre, Liberman, Cooper, 1955;</ref><ref type="bibr">Liberman, Cooper, Shankweiler, &amp; Studdert-Kennedy, 1967;</ref><ref type="bibr">Peterson &amp; Barney, 1952;</ref><ref type="bibr">Jusczyk, 1997;</ref><ref type="bibr">Theodore, Miller, &amp; DeSteno, 2009;</ref><ref type="bibr">Newman et al., 2001)</ref>. These findings have been formalized as the "lack of invariance" problem for speech perception, which reflects the fact that there is no one-to-one mapping between the acoustic signal and a given speech sound. However, structured variability is not inherently negative as it can give listeners information about the talker, for example, which listeners can use to dynamically modify the structure of phonetic category representations (e.g., <ref type="bibr">Allen, Miller, DeSteno, 2003;</ref><ref type="bibr">Theodore, Miller, &amp; DeSteno, 2009;</ref><ref type="bibr">Clayards, Tannenhaus, Aslin, &amp; Jacobs, 2008;</ref><ref type="bibr">Miller, 1994;</ref><ref type="bibr">Drouin, Theodore, &amp; Myers, 2016)</ref>.</p><p>Seminal research has shown that listeners use lexical information to guide speech perception, which is particularly useful when the speech signal is variable, ambiguous, or degraded. Phonemic restoration is one demonstration of a lexical effect on perception, in which listeners fill in missing or degraded input using contextual knowledge. <ref type="bibr">Warren (1970)</ref> first demonstrated that when a cough or tone replaced the initial /s/ in a word like legislatures, listeners could not reliably identify when the gap occurred and many participants reported that the word was fully intact, suggesting an apparent restoration of the missing phoneme, which has also been observed in listeners with hearing loss <ref type="bibr">(Ba&#351;kent, Eiler, &amp; Edwards, 2010)</ref>. The lexical effect has also been observed with perception of ambiguous speech input. For example, <ref type="bibr">Ganong (1980)</ref> exposed listeners to stimuli varying along a voice-onset-time continuum where one end of the continuum yielded perception of a lexical item (e.g., kiss or gift), while the other end did not (e.g., giss or kift). He found that listeners were biased to perceive more of the speech continuum as a /g/ when in the context of /&#618;ft/, where only the /g/ interpretation yields a real English word.</p><p>However, when the same voice-onset-time variants were placed in the context of /&#618;s/, listeners perceived more of the continuum as /k/, as only the /k/ interpretation yields a lexical interpretation.</p><p>Collectively, these findings demonstrate how speech perception for NH listeners reflects interactions at both the phonemic and lexical levels -all of which occurs with relative ease and within milliseconds as the speech signal unfolds in time. Indeed, many studies on the time course of lexical access have shown that access to even a single phoneme creates lexical competition, supporting activation of lexical candidates consistent with the phonemic input (e.g., <ref type="bibr">Allopenna, Magnuson, &amp; Tannehaus, 1998)</ref>. For example, <ref type="bibr">Allopenna et al. (1998)</ref> used a visual world eyetracking paradigm in which listeners heard a target word (e.g., beaker) while simultaneously viewing an array of pictures containing a cohort competitor (e.g., beetle), a rhyme competitor (e.g., speaker), and an unrelated word (e.g., dolphin). As the first few hundred milliseconds of the target word unfolded over time, the cohort competitor showed the greatest initial competition. As the end of the target word was processed, the rhyme competitor emerged as the dominant competitor. Such patterns have been modeled using interactive accounts of spoken word recognition, such as TRACE, in which lower-level perceptual processing is altered based on top-down lexical feedback that unfolds over time (e.g., <ref type="bibr">McClelland &amp; Elman, 1986;</ref><ref type="bibr">Allopenna et al., 1998)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Lexical effects in CI users</head><p>An important consideration is how lexical information may be weighted differently depending on the specific listening context. When the quality of the signal is good (e.g., clear speech, minimal background noise), bottom-up information may contribute more to processing. However, when the stimulus quality is poor (e.g., degraded input, background noise), top-down feedback may play a more dominant role in speech processing (e.g., <ref type="bibr">Norris et al., 2003;</ref><ref type="bibr">Fu &amp; Galvin, 2007)</ref>. Under this view, top-down feedback might contribute more to speech processing for CI users because the transmitted signal contains limited spectral information. Strikingly, previous research has shown atypical patterns of lexical influences on speech perception for CI users. For example, recent work on lexical access in CI users using a visual world paradigm has demonstrated significant delays in lexical activation which may be more pronounced for prelingual CI users <ref type="bibr">(Farris-Trimble, McMurray, Cigrand, &amp; Tomblin, 2014;</ref><ref type="bibr">McMurray, Farris-Trimble, &amp; Rigler, 2017)</ref>. Delays in lexical access at the word level may have detrimental, cascading effects at the sentence level because the opportunity to build sentence structure is impaired as the delay lengthens. Indeed, one study found that pediatric CI users do not use sentence context to facilitate word recognition; instead, they appear to process sentences as a string of unrelated words <ref type="bibr">(Conway, Deocampo, Walk, Anaya, &amp; Pisoni, 2014)</ref>. Future research is needed to examine how auditory training for CI users may be used to foster a perceptual system that uses lexical cues to improve processing efficiency. In the next section, we review the role of both lexically-oriented and non-linguistic training paradigms on improving perception of degraded speech signals for na&#239;ve NH listeners, which lay the foundation to promote optimal translation to the clinical population.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Factors that promote perceptual learning for degraded speech signals</head><p>A common occurrence in everyday listening conditions is the experience of poor speech understanding when initially parsing an acoustically degraded speech signal, which improves with sufficient exposure or with sufficient context cues. This occurs regularly in the initial activation period for CI patients, who often may not perceive speech during activation, yet show perceptual improvements with extended use. Audiologists counsel users on the importance of wearing the device regularly to improve how the brain learns the new sensory signals transmitted through the CI. This also occurs for NH listeners who may initially encounter a novel talker with an unfamiliar accent or dialect. Despite initial difficulty, listeners report a rapid improvement in their speech understanding with exposure. This phenomenon is referred to in the psychology literature as perceptual learning. Perceptual learning can be defined as adaptive changes in an organism's perceptual system that enhance the system for future interactions <ref type="bibr">(Goldstone, 1998)</ref>.</p><p>In the speech domain, perceptual learning can result in long-lasting changes to the mapping process between the speech signal and phonetic categories <ref type="bibr">(Goldstone, 1998;</ref><ref type="bibr">Norris, Cutler, &amp; McQueen, 2003)</ref>. A body of research has studied perceptual learning in lab-based settings in NH listeners to examine the factors that underlie improvements in speech understanding. Across many studies and different forms of acoustic degradations, research has demonstrated that NH listeners consistently rely on lexical information to promote improvements in comprehension of challenging speech signals, and that improvements can be leveraged through a brief training period.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">A role for lexical information</head><p>One example of the influence of lexical information on speech learning occurs in the lexically guided perceptual learning paradigm. In this paradigm, listeners complete a brief training exposure phase followed by a test phase. During the training phase, participants listen to words and nonwords and complete a lexical decision task <ref type="bibr">(Norris et al., 2003;</ref><ref type="bibr">Kraljic &amp; Samuel, 2005;</ref><ref type="bibr">Drouin et al., 2016;</ref><ref type="bibr">Drouin et al., 2018)</ref>. Critically, listeners are exposed to an atypical production during the training phase, such as a fricative that is spectrally ambiguous between /s/ and /&#643;/. For one group of listeners (e.g., /s/-bias group), the ambiguous replaces the medial /s/ in words that normally contain an /s/ (e.g., dinosaur). For a different group of listeners (e.g., /&#643;/-bias group), the same ambiguous sound replaced the medial /&#643;/ in words that normally containing an /&#643;/ (e.g., publisher). Accordingly, lexical context can be used to resolve the ambiguity in the atypical productions. In the test phase, listeners categorize items from a nonword continuum (e.g., /&#593;&#643;i/-/&#593;si/) to assess changes in the mapping between speech acoustics and the phonetic categories. The standard result in this paradigm is that performance differs between the two exposure groups at test in line with their exposure during training; specifically, listeners in the /s/-bias training group categorize more test items as /s/ compared to listeners in the /&#643;/-bias group. This finding demonstrates a persistent influence of lexical context on speech perception, even when disambiguating lexical context is removed. It has been confirmed that this effect is explicitly driven by lexical context as learning is absent when the ambiguous sound is placed in a nonword context during exposure (e.g., <ref type="bibr">Norris et al., 2003)</ref>.</p><p>Lexically guided perceptual learning is also observed for listeners learning to adapt to noise-vocoded speech. Noise-vocoding is a digital signal manipulation that is frequently used in the research domain to approximate the auditory experience of an individual using a CI. Noisevocoding consists of dividing the natural speech signal into frequency bands, extracting the amplitude envelope, and replacing the fine structure in each band with noise (e.g., <ref type="bibr">Faulkner, Rosen, &amp; Smith, 2000;</ref><ref type="bibr">Loizou, Dorman, &amp; Tu, 1999;</ref><ref type="bibr">Shannon et al., 1995;</ref><ref type="bibr">Davis, et al., 2005)</ref>. This manipulation has been useful in research studies to determine the spectral resolution that is necessary for proficient speech comprehension. Standard noise-vocoded learning paradigms consist of a pre-test, or baseline measure of speech understanding, followed by a brief period of linguistic training with feedback. A post-test transcription measure is then compared to baseline to gauge improvement as a function of training. Numerous training studies have found that while perception of noise-vocoded speech is poor at initial exposure, listeners show robust perceptual learning with only minutes of training exposure, which generalizes across sentences (e.g., <ref type="bibr">Loebach, Bent, &amp; Pisoni, 2008)</ref>, talkers <ref type="bibr">(Huyck, Smith, Hawkins, &amp; Johnsrude, 2017)</ref>, and stimulus types (e.g., Hervais-Adelman, <ref type="bibr">Davis, Johnsrude, &amp; Carlyon, 2008;</ref><ref type="bibr">Loebach, Pisoni, &amp; Svirsky, 2009)</ref>.</p><p>Previous research suggests that perceptual learning of noise-vocoded speech, like other atypical input, is strongly influenced by stimulus lexicality and feedback. In a series of experiments, <ref type="bibr">Davis et al. (2005)</ref> trained NH listeners on noise-vocoded sentences composed of either words or nonwords. During training, listeners were provided with interleaved feedback in which they heard the noise-vocoded sentence, then heard the clear version, and finally heard the same noise-vocoded sentence repeated again. At test, participants were asked to report as many words as possible and no feedback was provided. They found that listeners trained with vocoded sentences composed of words performed significantly better than those trained with nonword sentences. In fact, the participants trained with nonword sentences were indistinguishable from na&#239;ve listeners with no training, highlighting the role of lexical context in facilitating comprehension of degraded input.</p><p>In addition to stimulus lexicality, research has also examined the role of lexical feedback in perceptual learning of noise-vocoded speech. Listeners do not necessarily require explicit feedback to learn noise-vocoded speech <ref type="bibr">(Davis et al., 2005)</ref>; however, feedback may allow learning to occur more efficiently. Studies have examined the role of both the type of feedback and the time course of feedback on learning noise-vocoded speech. Top-down approaches to adaptation posit that learning occurs via a comparison process between a given input and its target representation (e.g., <ref type="bibr">Norris et al., 2003;</ref><ref type="bibr">Mirman, McClelland, &amp; Holt, 2006)</ref>. Providing explicit feedback as to what the target item is can allow for ambiguous representations to be adjusted to reflect the intended item <ref type="bibr">(Hervais-Adelman et al., 2008)</ref>. Research has shown that both written and auditory feedback are effective during training for impoverished input <ref type="bibr">(Davis et al., 2005;</ref><ref type="bibr">Schwab, Nusbaum, Pisoni, 1985;</ref><ref type="bibr">Greenspan, Nusbaum, Pisoni, 1988)</ref> and that perceptual learning occurs more rapidly for listeners who receive feedback prior to hearing the stimulus, suggesting that knowing the identity of the target can allow learning to occur more rapidly <ref type="bibr">(Davis et al., 2005;</ref><ref type="bibr">Hervais-Adelman et al., 2005)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Alternate routes to perceptual learning</head><p>The architecture of the TRACE model of spoken word recognition <ref type="bibr">(McClelland &amp; Elman, 1986</ref>) is consistent with a facilitative role for top-down auditory training on speech perception. In this framework, engaging the lexicon during training allows for an improved tuning in the mapping of the sensory signal to prelexical representations through a top-down feedback mechanism. Top-down approaches to adaptation posit that learning occurs via a comparison process between a given input and a target representation (e.g., <ref type="bibr">Norris et al., 2003)</ref>.</p><p>Providing explicit lexical feedback as to what the target item is can allow for potentially ambiguous representations to be adjusted to reflect the intended item -leading to a refinement in the mapping process given experience <ref type="bibr">(Hervais-Adelman et al., 2005)</ref>. Support for this framework comes from the finding that perceptual learning of noise-vocoded speech is absent for listeners who are trained with nonwords sentences <ref type="bibr">(Davis et al., 2005)</ref>, in line with other findings from the psycholinguistic domain (e.g., <ref type="bibr">Norris et al., 2003)</ref>. However, this finding was not replicated by listeners trained on single nonwords, suggesting that when the memory load is sufficiently reduced, listeners can utilize non-linguistic information for speech adaptation.</p><p>Nevertheless, lexical context may maximize perceptual learning for vocoded speech as the magnitude of learning was smaller for single word training (e.g., <ref type="bibr">Hervais-Adelman et al., 2008)</ref>, compared to training with sentences (e.g., <ref type="bibr">Davis, Johnsrude, Hervais-Adelman, Taylor, &amp; McGettigan, 2005)</ref>. Research has shown that both written and auditory lexical feedback are effective for training of impoverished input <ref type="bibr">(Davis et al., 2005;</ref><ref type="bibr">Schwab, Nusbaum, Pisoni, 1985;</ref><ref type="bibr">Greenspan, Nusbaum, Pisoni, 1988)</ref> and that perceptual learning occurs more rapidly for listeners who receive feedback prior to hearing the stimulus, which suggests that knowing the identity of the target can allow learning to occur more rapidly <ref type="bibr">(Davis et al., 2005;</ref><ref type="bibr">Hervais-Adelman et al., 2005)</ref>. This finding is in line with the TRACE supervised learning framework, which allows backpropagation between processing layers to integrate available contextual information into the percept <ref type="bibr">(McClelland &amp; Elman, 1986;</ref><ref type="bibr">Davis et al., 2005;</ref><ref type="bibr">Hervais-Adelman et al., 2005)</ref>. Under this framework, lexically rich contexts during training promotes an optimal environment in which to learn degraded input.</p><p>An alternative account posits that learning may be mediated by the similarity of the tasks used during training and test. In a transfer-appropriate processing (TAP) framework (e.g., <ref type="bibr">Franks et al., 2000)</ref>, performance is maximized when the task used to assess learning at test is identical to the task used during training. Under this theoretical framework, learning occurs because of a match between training and test tasks. Support for this view comes from studies demonstrating perceptual learning on tasks that mirror that the test task (i.e., <ref type="bibr">Davis et al., 2005)</ref>. Thus, advocates for this view argue that linguistic focused training tasks are necessary to observe a linguistic benefit. This view has been challenged by findings that demonstrate a linguistic benefit using training tasks that direct attention towards non-linguistic characteristics of the signal. For example, recent work has demonstrated perceptual learning of noise-vocoded speech using non-linguistic training tasks that differ from the test task <ref type="bibr">(Loebach, Bent, &amp; Pisoni, 2008)</ref>. In this study, three groups of NH listeners heard noise-vocoded sentences produced by multiple talkers during a training period. During training, one group was asked to transcribe the sentence, one group was asked to identify the talker, and another group was asked to identify the gender of the talker. All listeners received feedback in line with the assigned task. At test, all participants, regardless of training group, were asked to complete a transcription task to measure language comprehension. The results showed that perceptual learning was equivalent for listeners trained with either the linguistic transcription task or the non-linguistic talker identification task. This finding suggests that a linguistic benefit can be obtained regardless of whether the training phase mirrors the task used during test, and is in line with other forms of perceptual learning for adaptation to ambiguous speech sounds (e.g., <ref type="bibr">Drouin et al., 2018)</ref>.</p><p>The findings from <ref type="bibr">Loebach et al. (2008)</ref> are considered with respect to a depth of processing framework (e.g., <ref type="bibr">Craik &amp; Lockhart, 1972)</ref>. In this framework, perceptual learning occurs if the task used during training is sufficiently challenging so that the listener needs to attend to fine-grained acoustic information in order to accurately perform the task. Under this view, perceptual learning is directly mediated by the attentional requirements of the task and thus can be achieved using a variety of top-down or bottom-up focused training strategies, so long as the listener is sufficiently challenged to engage with the stimuli. Support from this view comes from the <ref type="bibr">Loebach et al. (2008)</ref> finding showing that perceptual learning, as measured through a linguistic transcription task, was equally maximized for two groups of listeners who completed a challenging task during training (i.e., lexical transcription and talker identification), but was less robust for listeners who completed an easier gender identification task during training. They argue that this finding reflects the attentional requirements of the training task. Specifically, the talker identification and transcription tasks were more cognitively challenging tasks relative to the gender identification training group, which resulted in a disparity in how listeners encoded and learned the novel input. Together, these findings suggest that similarity between training and test asks in addition to the attentional requirements of the task are both important factors to consider when designing adaptation studies where the outcome goal is comprehension.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Translating principles of learning to auditory training for CI users</head><p>Predicting variability in speech perception outcomes following implantation remains a challenge in the CI literature. Here we suggest that active auditory training may offer a means to maximize adaptation for both low and high performers. In this review, we have focused on findings from the psycholinguistics literature demonstrating that robust perceptual learning can occur even in the face of acoustic degradation. Human speech perception is a highly plastic and adaptive process that allows for the compensation of a variety of listening conditions with relative ease. The conditions that promote ease of understanding for atypical speech input has been the primary focus of this review. We have examined the influential effects of lexical knowledge on processing impoverished input for NH listeners, with a focus on how active topdown feedback can propagate to improve the mapping between a sensory signal and prelexical representations. This is an important consideration with respect to the rehabilitation literature because it suggests that passive listening through the implant, without engagement in active training, may not sufficiently maximize opportunities to improve perception. Translational research characterizing how CI users benefit from lexical context and how lexically driven training approaches promote improvements in perception is needed to begin to formalize concrete rehabilitation recommendations. While many outstanding questions still exist, the literature reviewed here for NH listeners lays the foundation for establishing a benefit of lexical context and feedback for degraded speech signals -raising the possibility that modeling training paradigms in this way for CI users may also promote similar benefits. If lexical context is the putative factor for driving perceptual learning of degraded speech input transmitted through a CI, then listeners trained using lexically oriented paradigms may show the greatest long-term benefit from their device.</p><p>We have also reviewed alternate routes to perceptual learning under different theoretical frameworks and consider how task-based listening strategies may mediate learning outcomes. This is particularly important when considering how to design optimal training paradigms. If training using cognitively easier tasks (i.e., talker identification instead of lexical transcription) promotes equivalent learning outcomes on linguistic tasks, then it opens the opportunity to better customize training to patient needs. An understudied area in the current research domain with respect to bottom-up or top-down focused listening strategies is the degree to which gains are maintained over long-term time periods. Only through a longitudinal design may we better document potential differences among training groups. Namely, would two groups of listeners trained using different tasks show equivalent performance in the short-term, but differ with respect to long-term benefit of gains? Currently, we do not have the evidence base to definitively answer this question, which contributes to hesitations about recommending a specific training protocol.</p><p>Finally, there is a significant need to better understand how individual patient factors interact with training outcomes and to characterize differences in plasticity across patient profiles. Even within the NH population there remains significant variation in how individuals adapt to acoustically poor input. While most training studies have focused exclusively on outcomes at the group level, characterizing individual differences in learning is emerging as an active area of research. As reviewed, both bottom-up and top-down processes drive perceptual adaptation to acoustically poor input, therefore differences may emerge from variation in how listeners code the low-level sensory signal or how listeners integrate higher-order contextual cues in the mapping process. Could customized training programs be used to specifically target weaknesses in either domain to close the gap between high and low performers? We propose that the next phase of auditory training research should be aimed at examining the use of novel outcome measures, long-term assessments, training variables (e.g., task, duration, time of intervention), and individual difference metrics. By establishing these parameters, scientists and clinicians will be able to facilitate recommendations for the conditions that promote general speech learning as well as how training paradigms might be customized to the needs of individual CI users.</p></div>		</body>
		</text>
</TEI>
