<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>The role of speech fidelity in the irrelevant sound effect: Insights from noise-vocoded speech backgrounds</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>01/01/2018</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10078770</idno>
					<idno type="doi">10.1177/1747021817739257</idno>
					<title level='j'>Quarterly Journal of Experimental Psychology</title>
<idno>1747-0218</idno>
<biblScope unit="volume">71</biblScope>
<biblScope unit="issue">10</biblScope>					

					<author>Josh Dorsi</author><author>Navin Viswanathan</author><author>Lawrence D Rosenblum</author><author>James W Dias</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The Irrelevant Sound Effect (ISE) is the finding that background sound impairs accuracy for visually presented serial recall tasks. Among various auditory backgrounds, speech typically acts as the strongest distractor. Based on the changing-state hypothesis, speech is a disruptive background because it is more complex than other nonspeech backgrounds. In the current study, we evaluate an alternative explanation by examining whether the speech-likeness of the background (speech fidelity) contributes, beyond signal complexity, to the ISE. We did this by using noisevocoded speech as a background. In Experiment 1, we varied the complexity of the background by manipulating the number of vocoding channels. Results indicate that the ISE increases with the number of channels, suggesting that more complex signals produce greater ISEs. In Experiment 2, we varied complexity and speech fidelity independently. At each channel level, we selectively reversed a subset of channels to design a low-fidelity signal that was equated in overall complexity. Experiment 2 results indicated that speech-like noise-vocoded speech produces a larger ISE than selectively reversed noise-vocoded speech. Finally, in Experiment 3, we evaluated the locus of the speech-fidelity effect by assessing the distraction produced by these stimuli in a missing-item task. In this task, even though noisevocoded speech disrupted task performance relative to silence, neither its complexity nor speech fidelity contributed to this effect. Together, these findings indicate a clear role for speech fidelity of the background beyond its changingstate quality and its attention capture potential.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Irrelevant Sound Effect (ISE) is the observation that irrelevant background sounds, such as speech or tones, reduce the accuracy of serial recall, relative to background noise or silence <ref type="bibr">(Colle &amp; Welsh, 1976;</ref><ref type="bibr">Jones &amp; Macken, 1993;</ref><ref type="bibr">Salam&#233; &amp; Baddeley, 1987)</ref>. For example, participants in a typical ISE paradigm view a sequence of letters or numbers appearing one at a time while a background sound is presented through headphones or speakers (e.g., <ref type="bibr">Colle &amp; Welsh, 1976;</ref><ref type="bibr">Jones &amp; Macken, 1993;</ref><ref type="bibr">Salam&#233; &amp; Baddeley, 1987)</ref>. Even when participants are instructed to ignore these backgrounds, recall of the visually presented sequence is impaired <ref type="bibr">(Colle &amp; Welsh, 1976;</ref><ref type="bibr">Jones &amp; Macken, 1993;</ref><ref type="bibr">Salam&#233; &amp; Baddeley, 1987)</ref>. Interestingly, the effect of speech is found even for foreign speech or non-words <ref type="bibr">(Colle &amp; Welsh, 1976;</ref><ref type="bibr">Jones, Miles, &amp; Page, 1990;</ref><ref type="bibr">Salam&#233; &amp; Baddeley, 1982)</ref>.</p><p>These early findings highlighted the utility of the ISE for studying short-term memory and for understanding auditory distraction, and thus have motivated a sizable literature. For example, the ISE has been cited as strong evidence for the Working Memory Model <ref type="bibr">(Baddeley &amp; Hitch, 1974)</ref>. On this model, the phonological loop of working memory maintains the serial order of targets (letters or numbers), while speech gains automatic access to this memory system <ref type="bibr">(Salam&#233; &amp; Baddeley, 1987)</ref>. The finding that non-speech auditory stimuli, such as music or even tones, can also produce the ISE challenges this model's speech-specificity assumption <ref type="bibr">(Jones, 1993;</ref><ref type="bibr">Jones &amp; Macken, 1993;</ref><ref type="bibr">Salame &amp; Baddeley, 1989)</ref>.</p><p>Such findings prompted researchers to investigate the general properties of sound, not specific to speech, that affect serial recall accuracy. This investigation revealed the changing-state effect: serial recall accuracy changes inversely with the number of perceived auditory segments in the background, a characteristic known as the sound's changing-state quality <ref type="bibr">(Jones &amp; Macken, 1993;</ref><ref type="bibr">Macken, 2014)</ref>. For example, an irrelevant background of different tones will impair serial recall accuracy more than a single repeating tone (e.g., <ref type="bibr">Jones &amp; Macken, 1993)</ref>. This finding motivated the changing-state hypothesis, explaining that this pattern of effects occurs because, relative to the different tone condition, the changing-state nature of the single repeating tone is reduced. This highlights the essential stimulus-to-disruption relationship of the changing-state hypothesis; the ISE corresponds to the functional (acoustic/perceptual) complexity of the signal (henceforth the signal's 'changing-state complexity').</p><p>The changing-state hypothesis has the benefit of accounting for nonspeech effects in the ISE by suggesting that the ISE is the result of conflict between two serial processes; the process involved in the focal serial recall task, and the organisation of the auditory objects (e.g., <ref type="bibr">Jones &amp; Macken, 1993;</ref><ref type="bibr">Macken, 2014)</ref>. Under this account, backgrounds with alternating tones are organised resulting in cues indicating the order of the changes in a sound sequence. These order cues then interfere with the maintenance of the serial order of the to-be-remembered items.</p><p>Despite this parsimony, this account has a notable limitation: a growing literature suggests that the content, in addition to the changing-state complexity, of speech can influence the amount of serial recall disruption. For example, participants have lower serial recall accuracy when the irrelevant background contains their name (e.g., <ref type="bibr">R&#246;er, Bell, &amp; Buchner, 2013)</ref>. Similarly, negative valence words such as "Apathetic" produce greater disruption than do neutral valence words such as "Curious" (Buchner,  Rothermund, Wentura, &amp; Mehl, 2004 1 ). Such findings challenge the notion that the general changing-state complexity of an auditory signal is the sole cause of the ISE.</p><p>A recent account of the ISE that can reconcile the reliable changing-state effect, with the effects of speech content is the duplex-mechanism account offered by <ref type="bibr">Hughes (2014;</ref><ref type="bibr"/> see also <ref type="bibr">Hughes, Vachon, &amp; Jones, 2007)</ref>. This account assumes that irrelevant sound disrupts serial recall at two loci; the "interference-by-process" and the "attention-capture" mechanisms. Here, the interference-by-process mechanism is the same mechanism invoked by the changing-state hypotheses (discussed above). The other mechanism, attention capture, can be "specific" when the background sound is "&#8230; meaningful or of interest to the individual" <ref type="bibr">(Hughes, 2014, p. 31)</ref>, as when the background contains the listeners name (e.g., <ref type="bibr">R&#246;er et al., 2013)</ref>. Attention capture may also be "aspecific," when the irrelevant sound alone is meaningless, but differences between it and other tokens from the irrelevant sound, causes it to exogenously capture attention <ref type="bibr">(Hughes, 2014)</ref>. According to Hughes, aspecific attention capture requires that an item within the auditory stimulus must violate the listeners' expectations <ref type="bibr">(Hughes, 2014)</ref>. For example, aspecific attention capture may occur when a single word is spoken by a male voice within a stream of words produced by a female speaker <ref type="bibr">(Hughes et al., 2007</ref>; see also <ref type="bibr">Hughes, 2014)</ref>.</p><p>The changing-state hypothesis and the duplex-mechanism accounts share the basic approach of comparing serial recall disruption associated with different backgrounds in order to make inferences about the structure of underlying cognitive mechanisms. Other researchers have focused on the structure of the irrelevant stimulus to identify acoustic characteristics shared across backgrounds in order to offer a more precise explanation of "changing-state complexity" or to otherwise better characterise the stimulus-to-recall disruption relationship. This research revealed that even though both speech and nonspeech backgrounds produce the ISE, many of the largest disruptive effects are produced by speech. For instance, a recent study systematically examined ISEs produced by 40 different auditory backgrounds from several studies conducted within a single lab. These backgrounds included, speech, tones, music, and traffic and office noise <ref type="bibr">(Schlittmeier, Weissgerber, Kerber, Fastl, &amp; Hellbr&#252;ck, 2012)</ref>. Remarkably, across these diverse backgrounds, speech produced the largest ISE <ref type="bibr">(Schlittmeier et al., 2012)</ref>. Compatibly, a recent analysis that compared the ISE reported for several different types of background sounds also found that speech, including foreign, reversed, and laboratory transformations of speech are consistently more disruptive than non-speech backgrounds <ref type="bibr">(Ellermeier &amp; Zimmer, 2014)</ref>.</p><p>The question of why speech backgrounds produce the strongest ISE is the focus of the current study. From a changing-state perspective, the potency of speech is attributed to its greater changing-state complexity relative to other nonspeech backgrounds. For instance, <ref type="bibr">Tremblay, Nicholls, Alford, and Jones (2000)</ref> used sinewave-speech to investigate the role of speech perception in the ISE. Sinewave-speech is an acoustic transformation of natural speech that preserves the spectrotempral relationships of the speech signal in a series of time-varying sinusoids (see <ref type="bibr">Remez, Rubin, Pisoni, &amp; Carell, 1981</ref>). An interesting quality of sinewave-speech is that listeners may hear it as either speech or non-speech. While naive listeners may report that sinewave-speech sounds like computer beeps or bird sounds, listeners informed about the nature of sinewave-speech can perceive its linguistic content <ref type="bibr">(Remez et al., 1981)</ref>. <ref type="bibr">Tremblay et al. (2000)</ref> investigated whether the ISE was dependent on whether perceivers identified the irrelevant sound as speech or non-speech. This permitted them to equate changingstate complexity of the signal while examining the effect of different percepts. Their results demonstrated that irrespective of training, both groups showed the same level of serial recall disruption. The authors concluded that the ISE produced by sinewave-speech was driven by its changing-state complexity and was independent of speech-likeness.</p><p>In a follow-up study, <ref type="bibr">Viswanathan, Dorsi, and George (2014a)</ref>, investigated this conclusion further. First, they noted that the sinewave-speech signal preserved the acoustic structure of speech irrespective of how listeners were trained to perceive it. In other words, regardless of whether the sinewave speech was identified as speech, the acoustic structure was still lawfully related to meaningful articulatory (speech) gestures. To determine if the acoustic structure produced by articulation (speech fidelity) or changing-state complexity was responsible for the ISE, <ref type="bibr">Viswanathan et al. (2014a)</ref> created a special type of sinewave-speech in which they reversed two of the three sinusoids that made the sinewave speech signal (also see <ref type="bibr">Viswanathan, Magnuson, &amp; Fowler, 2014b)</ref>. This manipulation disrupted the dynamic time-varying acoustic structure of the speech stimuli, reducing the lawful relationship to natural speech, while preserving acoustic complexity. In other words, sinewave speech contains both speech and changing-state complexity information, while selectively reversed sine-wave speech contains only changing-state complexity information. By comparing the effects of sinewave-speech, which maintained the signal's fidelity and its changing-state complexity, to selectively reversed sinewave-speech, which maintained its complexity but not its speech fidelity; the researchers isolated the effect of speech structure. Their results showed that higher speech-fidelity backgrounds (sinewave speech) are more disruptive than lower (no) speech-fidelity ones (selectively reversed sinewave speech) indicating that speech fidelity of the acoustic signal, beyond changing-state complexity, contributes to the disruptive properties of speech in the ISE. While this study offers preliminary evidence, because it did not independently manipulate complexity it does not conclusively indicate that speech fidelity always contributes to the ISE. It is possible that the effect of speech fidelity is only critical in reduced signals like 3-formant sinewaves.</p><p>Taken together, the studies reviewed above prompt the critical question: why is speech more disruptive than other background sounds? The goal of the current study is to evaluate the effects of speech fidelity (as was done previously by <ref type="bibr">Viswanathan et al., 2014a)</ref> and speech signal complexity on the ISE using a different transformation of the speech signal that allows speech fidelity and changingstate complexity to be manipulated independently. To do this, we used noise-vocoded speech as the irrelevant background during a series of serial recall tasks. Noise-vocoded speech is a transformation of natural speech that is generated by dividing speech into frequency channels, mapping the intensity variation within each channel, and then applying these intensity variations to corresponding channels in white noise <ref type="bibr">(Davis, Johnsrude, Hervais-Adelman, Taylor, &amp; McGettigan, 2005;</ref><ref type="bibr">Shannon, Zeng, Kamath, Wygonski, &amp; Ekelid, 1995)</ref>. Despite lacking the fine spectral detail of natural speech, noise-vocoded speech preserves its amplitude variations and can still be intelligible <ref type="bibr">(Shannon et al., 1995)</ref>. Prior research demonstrates that increasing the number of channels in noise-vocoded speech, from 1 to 20 channels, makes it more disruptive to serial recall <ref type="bibr">(Ellermeier, Kattner, Ueda, Doumoto, &amp; Nakajima, 2015</ref>; see also <ref type="bibr">W&#246;stmann &amp; Obleser, 2016)</ref>.</p><p>While the work of <ref type="bibr">Ellermeier et al. (2015)</ref> shows that increases in channel number increase the ISE, their study only used 1, 2, 4, and 20 channelled noise-vocoded speech.</p><p>In Experiment 1, we investigate the effect of frequency channel number on the ISE further by examining the channels 3, 6, 9, and 12; spanning the lower range of intelligible noise-vocoded speech. While manipulating the number of vocoding channels in the background allows Experiment 1 to control the changing-state complexity of the backgrounds, this manipulation does not allow increasing changing-state complexity to be dissociated from increased speech-fidelity. Thus, in Experiment 2, we apply the selective-reversal process used in <ref type="bibr">Viswanathan et al. (2014)</ref> to noise-vocoded speech to determine if the effect of channel number is independent of the effect of speech fidelity. In the context of the duplex mechanism account, it is not clear from Experiment 2 if this dissociation results from the interference-by-process or attention-capture mechanisms. To determine the locus of the effect of speech fidelity on the ISE, Experiment 3 presents the typical and selectively reversed noise-vocoded speech from Experiment 2 in the context of a missing-item task (e.g., <ref type="bibr">Hughes et al., 2007)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiment 1</head><p>Experiment 1 investigates the effect of the number of channels in noise-vocoded speech on the ISE. Decreases in serial recall accuracy associated with increasing channel number (e.g., <ref type="bibr">Ellermeier et al., 2015)</ref> may be the result of increased changing-state complexity as well as increased speech fidelity of the signal. This is because higher channel noisevocoded speech preserves more speech information, as evident from its greater intelligibility. To better understand the influence of channel number (changing-state complexity) and speech-fidelity on the ISE, we chose 3, 6, 9, and 12 channelled noise-vocoded speech as an irrelevant background. This range of noise-vocoding channels was selected because it spans from minimally or non-intelligible to easily intelligible (e.g., <ref type="bibr">Shannon et al., 1995)</ref>. To provide a strong test of whether noise-vocoded speech produces the ISE, we chose to compare its disruptive effects to the effect of white noise, which has the same intensity as noise-vocoded speech but lacks the speech-like amplitude variation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><p>Participants. Eighty-one students from the State University of New York at New Paltz received course credit for their participation. Participants were randomly assigned to one of four experimental groups: 3, 6, 9, and 12 channelled noise-vocoded speech. All subjects were native English speakers and reported normal hearing and normal or corrected to normal vision.</p><p>Materials. Noise-vocoded backgrounds were generated from the natural (non-SWS) speech tokens used by Viswanathan et al. (2014a); these tokens were as follows: bowls, boy, day, dog, go, than, and view. Noise-vocoded speech was synthesised using Praat <ref type="bibr">(Davis et al., 2005)</ref>. This script was modified to generate the four noise-vocoded speech conditions: 3, 6, 9, and 12 channels that were used in this study (see Appendix for additional details). White noise segments were matched in average intensity and duration to the noise-vocoded speech tokens.</p><p>The background tokens were arranged into four random ordered lists. In order to coincide with the presentation of the to-be-remembered items, each irrelevant sound list was 10 s long. For each list, the silent interval between tokens was between 150-300 ms. To reach the 10 s duration of the serial recall list, and to avoid long intervals between tokens, each list repeated each of the seven irrelevant words once (see <ref type="bibr">Viswanathan et al., 2014a)</ref>. Participants heard these acoustic stimuli through sound insulated headphones at 70 db. Each trial consisted of one randomly selected background list, and each experiment session repeated each list 6 times (see <ref type="bibr">Viswanathan et al., 2014a)</ref>. Every participant was presented with both noise-vocoded speech and white noise backgrounds. We were concerned that the potentially high intelligibility of the 12 and 9 channel conditions (e.g., <ref type="bibr">Loizou, Dorman, &amp; Tu, 1999)</ref> would bias the perception of the 3 and 6 channel conditions (e.g., <ref type="bibr">Davis et al., 2005)</ref> and as such channels of noise-vocoded speech were tested between subjects.</p><p>The recall task consisted of visually presenting the targets: L R T S M K F <ref type="bibr">(Tremblay et al., 2000;</ref><ref type="bibr">Viswanathan et al., 2014a)</ref>. Each trial consisted of a random ordering of these target items. Participants saw these targets on a computer screen for 1000 ms each, with a 500 ms interval between items. Participant sat three feet from the computer screen, and targets appeared at the centre of the display 500 ms following a "***" fixation point. The first target appeared simultaneously with the first irrelevant sound item and the irrelevant sound persisted throughout the duration of the trial (see <ref type="bibr">Viswanathan et al., 2014a</ref> for more details).</p><p>Procedure. Participants were told that they would see a series of letters on the screen and hear sounds through their headphones. Participants were instructed to report the presented letter sequence in the correct order and to ignore any sounds they heard. Participants initiated each serial recall task by pressing the spacebar on their keyboard. Participants were prompted to type in their response by a blinking cursor in the upper left corner of the computer screen 1000 ms following the presentation of the last visual item.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results and discussion</head><p>Serial recall accuracy was measured as the number of tobe-remembered letters which were reported in their correct serial position. In total, 21 participants were placed into the 3 channel condition, 19 were placed in the 6 channel, 19 in the 9 channel, and 21 in the 12 channel conditions.</p><p>To determine whether noise-vocoded speech produced the ISE, we compared serial recall accuracy for white noise and noise-vocoded speech conditions (see Table <ref type="table">1</ref>) in a one-tailed paired sample t-test. This analysis found that noise-vocoded speech produced significantly lower serial recall accuracy, t(79) = 5.259, p &lt; .001, r = .509, confirming that our stimuli produced the ISE. As there were not multiple levels of white noise a factorial analysis of variance (ANOVA) of these data was not appropriate. To determine the effect of the number of channels in noisevocoded speech, we next transformed our data into difference scores by subtracting the noise-vocoded speech conditions from the white noise conditions (see Figure <ref type="figure">1</ref>). These difference scores were submitted to a 4 level (3, 6, 9, and 12 channels) one-way ANOVA. We found that the number of noise-vocoded channels affected the degree of serial recall disruption, F(3, 76) = 4.400, p &lt; .007, &#951; 2 p = .148. In a follow-up analysis, we found a significant linear trend of channel number on serial recall accuracy, F(3, 76) = 4.400, p = .002, &#951; 2 p = .148, supporting a linear relationship between channel and recall accuracy. 2  These results indicate that increasing the number of channels in noise-vocoded speech results in increased serial recall disruption. What remains is to determine what about noise-vocoded speech channels affects serial recall disruption. Interestingly, pilot data demonstrated that the effect of vocoding channel quantity on the ISE does not correspond to the intelligibility of the noise-vocoded speech. 3 Is the effect of channel number in noise-vocoded speech due to increasing changing state-complexity, or increased speech fidelity of these backgrounds? To determine this, we conducted Experiment 2 in which we manipulated both changing-state complexity and speech fidelity independently.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiment 2</head><p>Experiment 2 extends <ref type="bibr">Viswanathan et al. (2014a)</ref> by using selectively reversed noise-vocoded speech. In selectively reversed noise-vocoded speech, the lower two-thirds of the vocoded channels are temporally reversed relative to the upper third. This manipulation is analogous to selectively reversed sinewave-speech and will likewise distort the speech information contained in the acoustic signal. Importantly, while selective reversal allows us to manipulate speech fidelity of an acoustic speech signal, the results of Experiment 1 suggest that the number of noise-vocoded channels allows us to manipulate changing-state complexity.</p><p>If the effect of noise-vocoded speech channels on serial recall accuracy is solely due to changing-state complexity, then within each channel condition selectively reversed and typical noise-vocoded speech should produce the same serial recall disruption; that is selective-reversal should not interact with channel number. Alternatively, if the speech fidelity of vocoded speech also matters, then, within different channel conditions selectively reversed noisevocoded speech, with its low speech fidelity, should be less disruptive than typical noise-vocoded speech.</p><p>Experiment 1 investigated the effect of changing-state complexity in the ISE by measuring serial recall disruption associated with four different channel groups of noise-vocoded speech. These four different noisevocoded channel conditions were presented between subjects so as to avoid cross condition learning effects. To eliminate the possibility that differences associated with noise-vocoded channels could be attributed to pre-existing group differences 4 Experiment 2 presented channel conditions within subjects, and used a blocked counterbalanced design to account for any learning effects.</p><p>To focus its investigation, Experiment 2 reduced its conditions to three levels of noise-vocoded channels. The three noise-vocoded channel groups in Experiment 2 were 6, 12, and 18, each differing by a factor of 6 channels and thus, as in Experiment 1, channel number increased linearly across conditions. The 18 channel condition was included to expand the range of channel conditions beyond what was used in the Experiment 1. Paired comparisons conducted in Experiment 1 failed to find a difference between any adjacent channel conditions; the increased channel difference between groups used in Experiment 2 may enhance any difference caused by channel number and thereby offer a better opportunity to detect channel differences. Additionally, Experiment 2 adopted larger sample sizes, which were more similar to prior work (e.g., <ref type="bibr">Tremblay et al., 2000)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><p>Participants. Experiment 2 consisted of 77 participants from the University of California, Riverside. All participants were native English speakers, had normal hearing and normal or corrected to normal vision. All participants received course credit for their participation.</p><p>Materials. Experiment 2 used the same background words as Experiment 1. However, these words were recorded, synthesised into noise-vocoded speech, and arranged into random ordered lists specifically for Experiment 2. To make selectively reversed noise-vocoded speech, twothirds of the frequency channels for each noise-vocoded speech token (approximately corresponding to 0-1700 hz range of the acoustic signal, see Appendix) were reversed Table <ref type="table">1</ref>. Displays the raw serial recall accuracy scores for all conditions in Experiments 1 and 2. Bonferroni corrected paired comparisons in Experiment 1 found that all except the 3 channel condition were significantly different from white noise; across the noise-vocoded speech conditions both the 9 and 12 channel conditions were different from the 3 channel condition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Channels</head><p>White Error bars represent standard error of the mean.</p><p>relative to the remaining channels. Having established that noise-vocoded speech produces the ISE in Experiment 1, we opted to use silence (instead of white noise) as a control, in line with many studies of the ISE (e.g., <ref type="bibr">Ellermeier et al., 2015;</ref><ref type="bibr">Elliott &amp; Briganti, 2012;</ref><ref type="bibr">Elliott et al., 2016)</ref>. All other aspects of the stimuli used in Experiment 2 were the same as those used in Experiment 1.</p><p>Procedure. The procedure of Experiment 1 was followed for the serial recall task of Experiment 2. A slight alteration was made to the visual presentation of the target items such that they were presented in the centre of a 1.5-inch square border located in the center of the computer monitor. Participants were prompted to type in their response by a ":" presented in the upper left corner of the display box, 1500 ms following the presentation of the last visual item. These additions affected all conditions equally. Instructions were provided orally by researchers from a prepared script. On screen instructions at the start of the experiment re-iterated the verbal instructions provided by researchers. The order of channels and the speech fidelity were manipulated within subjects with their order of presentation counterbalanced across different subjects. Prior to our main analyses we tested for and found no effect of sequence of channel presentation on the effects of channel number or selective-reversal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results and discussion</head><p>In Experiment 2, we compared the serial recall accuracy associated with noise-vocoded speech and selectively reversed noise-vocoded speech across three levels of vocoded channel composition. As was done for Experiment 1, scores for the serial recall task were calculated as the average number of letters reported in the correct serial position.</p><p>We first conducted paired samples t-tests comparing noise-vocoded speech and selectively reversed noisevocoded speech to silence. These tests were Bonferroni corrected for 2 comparisons (alpha = .025). These tests confirmed that both noise-vocoded speech, t(76)= 6.408, p &lt; .001, r = .592, and selectively reversed noise-vocoded speech, t(76)= 4.078, p &lt; 0.001, r = .423, were significantly different from silence, showing that our stimuli were successful in producing the ISE. We then calculated the amount of ISE in each condition by subtracting the accuracy for noise-vocoded trials from the accuracy in silent trials in the same block (see Figure <ref type="figure">2</ref>). These difference scores were used for all subsequent analyses.</p><p>These difference scores were submitted to a 2 (Background: noise-vocoded vs. selectively reversed) X 3 (Channel: 6, 12, and 18) repeated measures ANOVA. This analysis revealed a main effect of background, F(1, 76) = 9.195, p = .003, &#951; 2 p = .108, demonstrating that noisevocoded speech was more disruptive than selectively reversed noise-vocoded speech, and consistent with our hypothesis that the ISE is sensitive to speech fidelity. This analysis also found a main effect of channel, F(2, 152) = 5.426, p = .005, &#951; 2 p = .067, consistent with the results of Experiment 1 and prior research showing that serial recall accuracy is sensitive to the complexity of the irrelevant background signal. No interaction was found, F(2, 152) = .575, p = .564, &#951; 2 p = .008, indicating no evidence that the speech fidelity effect depended on the number of channels.</p><p>To determine the locus of the effect of channel found in the serial recall task post hoc contrast analyses of the channel conditions for noise-vocoded speech and selectively reversed noise-vocoded speech were conducted. For noisevocoded speech, this found a marginal linear effect of channel, F(1, 76) = 3.766, p = .056, &#951; 2 p = .047, consistent with the result of Experiment 1. Interestingly, the selectively reversed conditions also resulted in a significant linear effect of channel, F(1, 76) = 3.985, p = .049, &#951; 2 p = .050. 5 The results of this trend analysis indicate that the effect of channel is robust, being present even in the selectively reversed conditions. Collectively, these results suggest that the effect of changing-state complexity is present for both speech and non-speech conditions.</p><p>Based on these results it is clear that speech fidelity is important to the ISE; the selectively reversed noisevocoded speech with less speech fidelity caused less disruption. This effect was observed despite the typical and selectively reversed noise-vocoded speech being matched in channel number. This makes it unlikely that the effect of speech fidelity could be attributed to differences in changing-state complexity.</p><p>There are two possible loci for the effect of speech fidelity on serial recall. First, it could be that the effect of speech fidelity is specific to the ISE and occurs by disrupting the serial process. Second, this effect could be reflective of signals with high speech fidelity preferentially engaging the attention-capture mechanisms that are posited by the duplex mechanism account <ref type="bibr">(Hughes, 2014)</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiment 3</head><p>We designed Experiment 3 to dissociate whether the effect of speech fidelity was due to interference-by-process or attention capture. Experiment 3 used the same stimuli as Experiment 2 but used a missing-item instead of serial recall task. The missing-item task presents participants with a sequence of items from a pre-defined limited set. The task for the participant is to indicate what item from the set was not included in the presentation. This paradigm shares many characteristics with the serial recall task (i.e., a small set of to-be-remember items presented sequentially) the critical difference being that the missing-item task does not require the participant to maintain serial order information.</p><p>The missing-item task has been used previously to determine whether effects observed in the ISE can be attributed to attention capture or interference-by-process 6 (e.g., <ref type="bibr">Hughes et al., 2007)</ref>. This is because the missing-item task does not require serial order information and therefore is not susceptible to interference from the serial information from the background sound <ref type="bibr">(Hughes, 2014)</ref>. As noted above, from the results of Experiment 2, it is unclear which process posited by the duplex mechanism account supports the observed effect of speech-fidelity in the ISE. If selectively reversed noise-vocoded speech produces disruption in the missing-item task, then the effect of selective reversal can be attributed more generally to the preferential engagement of the attention-capture mechanism <ref type="bibr">(Hughes, 2014)</ref>. Likewise, if the typical noise-vocoded speech produces more disruption than the selectively reversed noisevocoded speech, then the effect of speech-fidelity is likely supported by attention capture. Alternatively, if we fail to find an effect of selective-reversal, then the effect of speech-fidelity found in Experiment 2 can be more easily attributed to interference-by-process. This effect would be consistent with our hypothesis that the ISE is sensitive to the speech fidelity of the acoustic signal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><p>Participants. In all, 77 participants from the University of California, Riverside, participated in this experiment. All participants were native English speakers, had normal hearing and normal or corrected to normal vision and received course credit for their participation.</p><p>Materials. Experiment 3 used the same materials as Experiment 2.</p><p>Procedure. To make the results between experiments comparable, Experiment 3 used the same letter set for its missing-item task as was used in the serial recall task of Experiment 2. The missing-item task consisted of random ordered sequences of six of the letters from the seven letter set used Experiment 2 (F K L M R S T). Participants placed in this task were informed that their task would be to view sequences of six letters drawn from the seven-letter set and report the missing item. Participants were given the complete seven-letter set prior to beginning the missing-item task. As stated above, Experiment 3 used the same irrelevant backgrounds as Experiment 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results and discussion</head><p>For the missing-item task accuracy was calculated as either correct if participants identified the missing item or incorrect if they indicated an item that was present or an item not from the set. We next calculated Bonferroni corrected paired samples t-tests comparing noise-vocoded speech and selectively reversed noise-vocoded speech to silence (alpha = .025). These analyses found a significant effect of noise-vocoded speech, t(76)= 2.814, p = .006, r = .307, showing that noise-vocoded speech caused attention capture. The effect of selectively reversed noise-vocoded speech was not significant, t(76) = 1.439, p = .14, r = .162. Data from this experiment were next converted into difference scores to examine the effect of changing-state complexity and speech fidelity on disruption (see Figure <ref type="figure">3</ref>).</p><p>The ANOVA for the missing-item task failed to show an effect of channel, F(2, 152) = .841, p = .433, &#951; 2 p = .011, demonstrating that the effect of channel found in Experiment 2 cannot be attributed to the attention-capture mechanism. This analysis is also consistent with prior studies that have also failed to find a changing-state effect in the missing-item task (e.g., <ref type="bibr">Hughes et al., 2007)</ref>. Critically, this analysis also failed to find an effect background, F(1, 76) = 2.232, p = .193, &#951; 2 p = .029, suggesting that performance in the missing-item task was not affected by the speech fidelity of the signal. This indicates that effect of selective-reversal reported for Experiment 2 cannot be attributed to attention capture. This analysis also failed to find an interaction, F(2, 152) = .443, p = .643, &#951; 2 p = .006. Collectively, this analysis indicates that even though the presence of noise-vocoded backgrounds disrupts performance in this task, there is no effect of speech fidelity. Thus, the effects observed for the serial recall task from Experiment 2 cannot be attributed to the attentioncapture mechanism.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>General discussion</head><p>The potency of speech as a disruptive background is illustrated in recent reviews (e.g., <ref type="bibr">Ellermeier &amp; Zimmer, 2014;</ref><ref type="bibr">Schlittmeier et al., 2012)</ref> as well as recent empirical work (e.g., <ref type="bibr">Viswanathan et al., 2014a)</ref>. The goal of the research presented here was to understand what makes speech a more disruptive background than non-speech. One explanation is that this effect is solely the result of the greater changingstate complexity for speech relative to non-speech. This hypothesis was tested against an alternative explanation; that the speech fidelity of the background, even when controlled for its overall complexity, makes it particularly disruptive.</p><p>To test these explanations we compared the serial recall accuracy in noise-vocoded speech backgrounds. Experiment 1 confirmed that serial recall disruption linearly increased with the number of vocoding channels, a finding which according to the changing-state hypothesis confirms that changing-state complexity increases with channel number. Experiment 2 assessed the roles of changing-state complexity (operationally defined as the number of vocoding channels) and the speech fidelity of the acoustic speech signal. Speech fidelity of the noise-vocoded speech was manipulated by selectively reversing a subset of the vocoding channels. Critically, we found that across different channel conditions, selectively reversed noisevocoded speech was less disruptive than its normal (nonreversed) noise-vocoded speech counterpart despite sharing the same changing-state complexity (number of channels). This indicates that speech fidelity has an effect on the ISE beyond the overall complexity of the signal. Experiment 3 demonstrated that overall, noise-vocoded backgrounds produce more disruption than silence in a missing-item task. Note the null effect of channel in the missing-item task of Experiment 3, in contrast to the effect of channel on the serial recall task of Experiment 2 indicates that channel number affects the ISE through the interference-by-process mechanism. Thus, these findings are consistent with our conclusion that channel number influences the changing-state complexity of the background. Critically however, the speech-fidelity also did not contribute to performance on the missing-item task. This suggests that the effect of speech fidelity on ISE is not reducible to speech's ability to preferentially capture attention. Instead, the speech structure appears to specifically interfere with the serial rehearsal process. Together the results of these experiments present interesting implications for ISE accounts. The essential prediction of the changing-state hypothesis is that the degree of serial recall disruption will correspond to the number of auditory states in the sound, and as such, speech should only be as disruptive as its changing-state complexity. We operationally defined changing-state complexity as the number of channels in the noise-vocoded speech. Consistent with the predictions of the changing-state hypothesis, our results show that serial recall disruption is related to the number of vocoding channels. However, the results of Experiment 2 show that selectively reversed noise-vocoded speech is less disruptive than typical noise-vocoded speech despite being composed of the same number of vocoding channels. Similar to <ref type="bibr">Viswanathan et al. (2014a)</ref>, these results highlight a role for speech fidelity and are inconsistent with the changing-state hypothesis that the only driver of the ISE is the changing-state complexity of the signal.</p><p>The duplex-mechanism theory for the ISE proposes two mechanisms that can account for the ISE; an interferenceby-process mechanism and an attention-capture mechanism. Recall that under this account one mechanism for the ISE is the interference-by-process mechanism which supports the changing-state effect. The interference-byprocess mechanism, as outlined in the preceding discussion does not account for the differential disruption between noise-vocoded and selectively reversed noisevocoded speech. This leaves the attention-capture mechanism to account for effects of speech fidelity. However, this explanation is ruled out by the results of Experiment 3.</p><p>While <ref type="bibr">Viswanathan et al. (2014b)</ref> used a secondary measure of speech perception to demonstrate that the selective-reversal process disrupts speech fidelity, no such measure was used here, and it is possible that the selective reversal process affects speech fidelity differently in sinewave speech (e.g., <ref type="bibr">Viswanathan et al., 2014a</ref><ref type="bibr">Viswanathan et al., , 2014b) )</ref> and noise-vocoded speech (i.e., the present study). More importantly, no independent measure exists to determine a signal's changing-state quality. Lacking such an independent measure makes it difficult to determine if the main effect of speech fidelity found in Experiment 2 constitutes an independent effect on the ISE, or if speech fidelity (in addition to channel composition) influences the signal's changingstate quality and that in turn influences the ISE.</p><p>Speech fidelity could directly influence the ISE by appealing to phonological interference (e.g., <ref type="bibr">Larsen, Baddeley, &amp; Andrade, 2000)</ref>; however, this account, and similar accounts, alone are unable to explain the breadth of findings that the changing-state and duplex accounts successfully account for (e.g., see <ref type="bibr">Jones, Macken, &amp; Nicholls, 2004)</ref>. The 'Perceptual-Motor Account' proposes that subvocal rehearsal converts to-be-remembered items into a perceptual-motor plan that maintains serial order by taking advantage of the inherent serial nature of speech, such as coarticulation (see <ref type="bibr">Hughes &amp; Marsh, 2017)</ref>. On this account, speech fidelity would disrupt the maintenance of the serial order of the to-be-remember items by introducing a signal with obligatory access to the perceptual-motor plan.</p><p>While proponents of the perceptual-motor account indicate that such obligatory access is associated with auditory stimuli generally (see <ref type="bibr">Hughes &amp; Marsh, 2017, p. 3)</ref>, this proposed mechanism converges with work in the speech literature. Specifically, it has been found that listening to speech produces subtle activity in the listener's articulatory tract that matches the place of articulation of the heard speech (e.g., <ref type="bibr">Fadiga, Craighero, Buccino, &amp; Rizzolatti, 2002;</ref><ref type="bibr">Sundara, Namasivayam, &amp; Chen, 2001</ref>). The results of <ref type="bibr">Sundara et al. (2001)</ref> are consistent with the perceptual-motor account's prediction that speech affects the ISE through obligatory activation of the articulatory motor system; however, it is unclear how other auditory stimuli such as tones might produce similar effects, or if this effect might be modulated by the selective-reversal process used in the present study. In short, the current results do not completely conform to established theoretical accounts.</p><p>To conclude, as reviewed earlier, speech-like signals produce the strongest ISE <ref type="bibr">(Ellermeier &amp; Zimmer, 2014;</ref><ref type="bibr">Schlittmeier et al., 2012;</ref><ref type="bibr">Viswanathan et al., 2014a)</ref>. The current set of experiments suggests that the potency of speech signals is not solely due to its changing-state complexity and attention-capturing capability. Instead, there appears to be clear need for any account of the ISE to incorporate mechanisms for speech sensitivity that go beyond complexity and the content of speech and consider the dynamic stimulus level structure specific to speech.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Three participant groups; 6, 12, and 18 channelled noise-vocoded speech used in Experiment 2. Error bars represent standard error of the mean.</p></note>
		</body>
		</text>
</TEI>
