<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Statistical learning subserves a higher purpose: Novelty detection in an information foraging system.</title></titleStmt>
			<publicationStmt>
				<publisher>American Psychological Association</publisher>
				<date>02/24/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10586649</idno>
					<idno type="doi">10.1037/rev0000547</idno>
					<title level='j'>Psychological Review</title>
<idno>0033-295X</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Ram Frost</author><author>Louisa Bogaerts</author><author>Arthur G Samuel</author><author>James S Magnuson</author><author>Lori L Holt</author><author>Morten H Christiansen</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Statistical learning (SL) is typically assumed to be a core mechanism by which organisms learn covarying structures and recurrent patterns in the environment, with the main purpose of facilitating processing of expected events. Within this theoretical framework, the environment is viewed as relatively stable, and SL ‘captures’ the regularities therein through implicit unsupervised learning by mere exposure. Focusing primarily on language— the domain in which SL theory has been most influential—we review evidence that the environment is far from fixed: it is dynamic, in continual flux, and learners are far from passive absorbers of regularities; they interact with their environments, thereby selecting and even altering the patterns they learn from. We therefore argue for an alternative cognitive architecture, where SL serves as a subcomponent of an information foraging (IF) system. IF aims to detect and assimilate novel recurrent patterns in the input that deviate from randomness, for which SL supplies a baseline. The broad implications of this viewpoint and their relevance to recent debates in cognitive neuroscience are discussed.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>1 Some neural network models-e.g., SRNs <ref type="bibr">(Elman, 1990)</ref> and Large Language Models [LLMs] such as <ref type="bibr">GPT-3.0 (Brown et al., 2020)</ref>-learn statistical patterns in language through self-supervised learning, where the networks make predictions for what should come next and use the subsequent actual input as indirect feedback on the correctness of those predictions, adjusting their weights accordingly (see Contreras <ref type="bibr">Kallens et al., 2023, for discussion)</ref>. Thus, self-supervised learning is arguably a form of unsupervised learning, and could be a plausible candidate mechanism supporting SL. Interestingly, LLMs are able to reproduce or emulate human language more closely when they also receive supervised learning from human feedback in addition to selfsupervised learning (as in <ref type="bibr">GPT-Instruct and ChatGPT;</ref><ref type="bibr">Ouyang et al., 2022)</ref>.</p><p>changing. Hence, the primary goal of the system is to detect these changes. The IF system thus aims to detect novel recurrent patterns in the input that deviate from randomness and from past baseline regularities, in the service of efficient and continuously adaptive processing. In support of this proposal, we draw primarily on examples from language because this is the domain for which SL theory has been most influential, but as we discuss further on, the motivation and principles we describe apply more generally.</p><p>The IF approach to cognitive architecture we put forward departs from the conventional SL view across a range of important dimensions, as summarized in Table <ref type="table">1</ref>, which also provides a roadmap for this article. In the following, we explicate each of these dimensions in detail, provide behavioral and neurobiological evidence in support of the proposed architecture, discuss how the IF approach aligns with important higher cognitive functions such as curiosity and creativity, and outline a blueprint for a accommodation. While there are of course stable aspects to the linguistic environment at abstract levels of description, deviations from central tendencies constantly occur at the level of the actual input. These deviations signal high informational value. Efficient language processing thus requires that the constant changes in regularities be perceived rapidly, allowing effective comprehension of the novel linguistic input. This can be easily demonstrated in the domains of speech, print, or sentence processing.</p><p>In spoken language processing, the inventory of speech sounds comprising a language can be relatively well-defined, but the statistical regularities in speech streams are characterized by varying phonetic realizations. Speakers' productions diverge due to variation in physical characteristics (sex, age, size), social characteristics (gender, social identity), dialect, accent, speech rate, vocabulary, and cultural backgrounds. Listeners must not only constantly accommodate this variation to achieve phonetic constancy (see <ref type="bibr">Luthra, 2023</ref>, for a recent overview), but also simultaneously leverage the information it provides for purposes ranging from talker identification (e.g., <ref type="bibr">Perrachione, Pierrehumbert, &amp; Wong, 2009)</ref>, to learning talker-specific phonetic idiosyncrasies that improve later perception (e.g., <ref type="bibr">Norris, Cutler &amp; McQueen, 2003;</ref><ref type="bibr">Nygaard &amp; Pisoni, 1998)</ref>, to making a surprising array of physical and social inferences (e.g., <ref type="bibr">Krauss, Freyberg, &amp; Morsella, 2002;</ref><ref type="bibr">Munson &amp; Babel, 2007)</ref>.</p><p>The predictability of word sequences significantly varies across different age groups and in different linguistic environments. In everyday conversations, speakers shift swiftly between different registers and codes, adapting their style of talking to their interlocutors, whether that be authority figures, like a police officer or our boss, or more affiliative dialogic partners, like parents, children, or friends (see <ref type="bibr">Goulart et al., 2020)</ref>. In such different linguistic environments, interlocutors' language further tends to rapidly adapt to context-specific parameters of the interaction at every level, potentially mitigating those sources of variability <ref type="bibr">(Garrod &amp; Pickering, 2004)</ref>. In many cases, conversational partners' speech becomes more similar in terms of articulatory, acoustic, and prosodic details (e.g., <ref type="bibr">Kim, Horton, &amp; Bradlow, 2011;</ref><ref type="bibr">Lee et al., 2018;</ref><ref type="bibr">Pardo, 2006)</ref>, syntax (e.g., <ref type="bibr">Bock, 1986)</ref>, semantics (e.g., <ref type="bibr">Dideriksen, Christiansen, Tyl&#233;n, Dingemanse, &amp; Fusaroli, 2023)</ref>, and kinetic alignment of head and hands <ref type="bibr">(Trujillo, Dideriksen, Tyl&#233;n, Christiansen, &amp; Fusaroli, 2023)</ref>. But in other cases, interlocutors will deviate from one another to provide new information when it is helpful to solve a particular task (e.g., <ref type="bibr">Didriksen et al., 2023;</ref><ref type="bibr">Trujillo et al., 2023;</ref><ref type="bibr">see Fusaroli, R&#261;czaszek-Leonardi, &amp; Tyl&#233;n, 2014, for discussion)</ref>. Speakers and listeners appear to attune exquisitely to novel information in conversation, with speakers producing words referring to new information for the first time with greater clarity than words referring to given (old) information (e.g., <ref type="bibr">Fowler &amp; Housum, 1987)</ref>. Speakers (and writers) also appear to strive to maintain 'uniform information density' over time, via word choices and prosodic structure (e.g., <ref type="bibr">Aylett &amp; Turk, 2004;</ref><ref type="bibr">Frank &amp; Jaeger, 2008;</ref><ref type="bibr">Genzel &amp; Charniak, 2002;</ref><ref type="bibr">Gibson et al., 2019)</ref>, even to the level of discourse properties <ref type="bibr">(Asr &amp; Demberg, 2015)</ref>. Speakers and listeners are sensitive to dynamic changes in discourse-relevant semantic and phonological neighborhoods even as they alter those neighborhoods themselves (e.g., by settling on a shared vocabulary in a novel task, as in <ref type="bibr">Brown-Schmidt &amp; Tanenhaus, 2008</ref><ref type="bibr">-see Brown-Schmidt et al., 2015</ref>, for a review). Hence, the statistical properties of the speech environment are anything but fixed or stable, and the main game in conversational context is to perceive and adapt to the ever-changing novel structure -not to simply assimilate stable aspects of the environment.</p><p>A similar state of affairs characterizes reading. Predictability of letters within words is to a large extent stable, constrained by orthographic and phonotactic rules (e.g., <ref type="bibr">Siegelman, Kearns, &amp; Rueckl, 2020)</ref>, and linguistic form enables some general predictions at higher levels of abstraction (e.g., <ref type="bibr">Snell &amp; Theeuwes, 2020)</ref>. However, beyond this, readers are constantly faced with novel regularities. The predictability of printed words, which drives ocular movements during text reading, changes significantly as a function of variation in writing style, the period when the text was written, and the type of text being processed, from movie subtitles to newspapers. Efficient processing thus requires the reading system to detect and adjust to such changes as rapidly as possible. Recent research tracking eye-movements indeed shows that readers rapidly perceive and adapt to specific syntactic structures characteristic of the writing style <ref type="bibr">(Yan &amp; Jaeger, 2020)</ref>, and to expected sequences of word-lengths from sentence onset in a given text, to optimize ocular movements <ref type="bibr">(Snell &amp; Theeuwes, 2020)</ref>. For example, when presented with sentences of uniform word-length, readers adjust their preferred saccade length incredibly rapidly; just a few exemplars of a given word-length suffice <ref type="bibr">(Cutter, Dreighe, &amp; Liversedge 2017</ref><ref type="bibr">, 2018)</ref>. Such context-dependent adjustments are well documented for speech as well.</p><p>If listeners hear speech segments that are intentionally made ambiguous, with the lexical context providing disambiguation, they rapidly recalibrate their specification of the segments accordingly (e.g., <ref type="bibr">Norris, McQueen, &amp; Cutler, 2003)</ref>. Language users also attune rapidly to changes in phonotactics when they produce speech <ref type="bibr">(Dell, Reed, Adams, &amp; Meyer, 2000)</ref>, or simply listen to it (e.g., in the context of a lexical decision task; <ref type="bibr">Onishi, Chambers, &amp; Fisher, 2002)</ref>.</p><p>In the same vein, syntactic structures in spoken and written language are anything but uniform, reflecting the immense creativity characterizing human linguistic interaction <ref type="bibr">(Christiansen &amp; Chater, 2022)</ref>. These distributional changes have a direct impact on online sentence processing <ref type="bibr">(Wells, Christiansen, Race, Acheson, &amp; MacDonald, 2009</ref>). In addition, readers must also contend with the different distributions of syntactic regularities associated with different genres of writing <ref type="bibr">(Snell &amp; Theeuwes, 2020)</ref>, from academic treatises and newspaper articles to fictional books and blog postings <ref type="bibr">(Goulart et al., 2020)</ref>.</p><p>To summarize, while linguistic input allows for predictions at various levels of abstraction, speakers, listeners, and readers must constantly adapt to novel, ever-changing structures in the input stream, rather than merely encoding stable ones.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The active nature of the learner</head><p>The main appeal of SL is in the robustness and power of implicit learning mechanisms, which are already operating in newborns (e.g., <ref type="bibr">Bulf, Johnson, &amp; Valenza, 2011)</ref>, and which do not require overt attention <ref type="bibr">(Saffran et al., 1997)</ref>. In the latter seminal study of Saffran and colleagues, children showed learning of transitional probabilities of speech sounds heard in the background while they drew pictures.</p><p>In typical experimental studies of SL, learners are not informed about the existence of statistical regularities, nor warned of a subsequent test of their knowledge of them. The exposure phase involves passive listening or viewing of continuous input streams, yet participants (on the average) consistently show learning of the recurrent patterns in the input. These results paint the statistical learner as an efficient passive absorber of environmental regularities, which are assimilated via robust implicit learning mechanisms. In Figure <ref type="figure">1</ref>, this view is reflected in the "sea sponge metaphor": the learner is immersed in statistical regularities and assimilates them (see <ref type="bibr">Tandoc et al., 2024</ref>, for a similar description).</p><p>The IF perspective does not dispute the existence of implicit unsupervised learning of recurrent patterns; indeed, such learning has been demonstrated from humans and other primates to songbirds (e.g., <ref type="bibr">Santolin &amp; Saffran, 2018;</ref><ref type="bibr">Lu &amp; Vicario, 2014)</ref>, and serves as a key subcomponent of IF. Crucially, though, the focus of IF is on the learner as an active explorer, an information forager, who registers recurrent patterns, but actively intervenes in regularity learning by exploring, interacting with, and altering its environment. In Figure <ref type="figure">1</ref>, this view is depicted in the octopus metaphor, in contrast to the passive sea sponge. Regularities inform the learner where to forage -away from highly predictable patterns and away from randomness related to simple noise (i.e., the environment is not only dynamic but also noisy), allowing it to direct attention and action to times, places, and non-spurious events with potentially high informational content or reward. Deviations from regularity are one important source (among many) that guide perception, attention, and action in service of adaptive learning. Note that while the sponge is a metaphor for the system(s) comprising passive SL through mere exposure, the octopus is a metaphor for both SL system(s) and the mechanisms supporting IF. These comprise the organism itself -its body and perceptual/cognitive systems can be actively and intentionally directed towards information seeking and learning.</p><p>Our take is that regularity learning, like all aspects of cognition, is inextricably linked to perception, action, and the environment <ref type="bibr">(Sheya &amp; Smith, 2019)</ref>, as well as neural and genetic activity <ref type="bibr">(Gottlieb, 2007;</ref><ref type="bibr">Gottlieb &amp; Oudeyer, 2018)</ref>. Theoretical perspectives such as dynamical system approaches to development (e.g., <ref type="bibr">Smith &amp; Thelen, 2003)</ref> and probabilistic epigenesis <ref type="bibr">(Gottlieb, 2007)</ref>, emphasize bidirectional interactions among all these levels. Hence, a learner is not simply shaped by the environment and its regularities. An active learner alters the environment through their actions, whether in speech communication as we have described above, or, say, learning words or linking them to objects. For example, as Smith, Yu and colleagues have documented, an infant's actions modulate the multimodal context for language learning. When infants hold and manipulate an object, they change the visual and haptic context, and such actions, as well as where they direct attention in the visual world, can also modulate the language produced by adults around them (e.g., <ref type="bibr">Slone, Abney, Smith, &amp; Yu, 2023;</ref><ref type="bibr">Suanda, Barnhart, Smith, &amp; Yu, 2018;</ref><ref type="bibr">Suanda, Smith, &amp; Yu, 2017)</ref>. Similar interactions with people around them influence other domains of learning (e.g., <ref type="bibr">Karmazyn-Raz &amp; Smith, 2023;</ref><ref type="bibr">Smith, Jayaraman, Clerkin, &amp; Yu, 2018;</ref><ref type="bibr">Suarez-Rivera, Smith, &amp; Yu, 2019;</ref><ref type="bibr">Yu, Zhang, Slone, &amp; Smith, 2021)</ref>.</p><p>How the learning of regularities is contingent on active responses has been recently shown in the singleton paradigm, where participants search for a shape singleton target and are asked to respond to it. <ref type="bibr">Li, van Moorselaar, and Theeuwes (2024)</ref> reported that if a target's location is predicted by the location of a target in the preceding trial, the execution of an arbitrary key response for both trials of the pair is needed for learning the across-trial statistical regularity. Passively attending to target locations did not result in any learning of the target's spatial contingencies.</p><p>To clarify, the active nature of the learner does not imply conscious awareness or metaawareness. It implies that learners do not automatically absorb the recurrent patterns in sensory input, but continuously interact with their environment and change it, whether consciously or not. The earliest ways in which infants shape their environments are subtle and likely occur in the absence of any awareness (when an object or entity in the visual world captures an infant's attention, adults are more likely to speak about it). However, over experience, infants increasingly leverage this to intentionally shape adults' behavior by directing their gaze or body (by pointing, approaching, or handling) selectively at aspects of the world they find interesting, especially deviations from regularity -including deviations they themselves cause.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Aim of the system</head><p>Given the ever-changing environment, the IF approach to cognitive architecture assumes that the priority of cognitive systems lies in continuously detecting novel co-occurrences and other coherent covariations in the input. It regards organisms as "information foragers" (see <ref type="bibr">Pirolli &amp; Card, 1999, for</ref> coining this concept), where "information" means that a meaningful change in patterns of cooccurrences in sensory input has occurred. Our approach in this context is related in part to the information-theoretic notion of information <ref type="bibr">(Shannon, 1948)</ref>, which ties the informational load that events carry to the inverse of their predictability. By this view, highly predictable events carry very little information. However, in the context of regularity learning, random events are also uninformative.</p><p>Hence, an information forager discounts random events as well as highly predictable events since both carry little information. If the primary aim is to detect the novel regularities in the environment, this requires a mechanism that generates a reference against which changes in input regularities can be perceived. Conventionally, SL has been seen as a mechanism aiming to perceive and assimilate stable structural regularities in the environment. In the IF approach, SL mechanisms that track statistical regularities have a different core purpose: they provide the baseline from which a change in cooccurrence can be detected. To be clear, we consider the detection of novel regularities and the generation of baseline references as functionally distinct at the cognitive level of description. As discussed further below, extant data raise the question of whether they can be implemented within one computationally and neurobiologically unified mechanism.</p><p>Consider a typical visual SL task that presents a continuous stream of shapes or artificial letters appearing in triplets, where elements within triplets are fully predictable, and elements following triplet boundaries are less predictable (e.g., <ref type="bibr">Fiser &amp; Aslin, 2002;</ref><ref type="bibr">Turk-Browne et al., 2005;</ref><ref type="bibr">Saffran et al., 1996;</ref><ref type="bibr">Siegelman &amp; Frost, 2015)</ref>. Conventional SL theory assumes that learning serves to facilitate processing of the predictable stimuli (see for example, Turk-Browne, <ref type="bibr">Scholl, Johnson, &amp; Chun, 2010;</ref><ref type="bibr">Siegelman, Bogaerts, Kronenfeld, &amp; Frost, 2018)</ref>. Some recent findings, however, suggest that learners in such tasks track novelty. For example, a study that tracked EEG activity during the continuous presentation of triplets of visual stimuli revealed that increased pattern repetitions resulted in increased beta-band activity, which has been associated with sensory prediction (e.g., <ref type="bibr">Arnal &amp; Giraud, 2012)</ref> and top-down modulation (e.g., <ref type="bibr">Hipp, Engel, &amp; Siegel, 2011)</ref>. Importantly, however, this top-down modulation was present at triplet transitions, where a novel shape is about to appear, and not within triplets, where shapes are predictable <ref type="bibr">(Bogaerts, Richter, Landau, &amp; Frost, 2020)</ref>. As we further discuss in detail, the suggestion that probabilistic knowledge can upweight surprising rather than predictable events, favoring novelty over familiarity, is now acknowledged across different domains of cognitive neuroscience.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The target of perception</head><p>SL theory assumes that learning regularities in the environment enables their exploitation by facilitating perception of and action upon expected events. As such it considers likely recurrent events to be the foremost target of perception and learning. In contrast, an IF approach holds that while predictable events can be favored by various top-down mechanisms, unexpected patterns of events constitute informational novelty, and are, therefore, the main target of perception. Indeed, as early as <ref type="bibr">Pavlov (1927)</ref>, orienting behavior has been taken as a primary mechanism aimed at detecting the slightest change in the environment. Further, it was argued that recurrent presentations of stimuli result in neuronal representations that encapsulate the stimuli's specific features, so that all sensory input could be compared with the existing neuronal models, and a mismatch between novel input and the models would result in an orienting reaction <ref type="bibr">(Sokolov, 1963)</ref>. The competition between baseline habituation and novelty has been shown to drive orienting behavior and foraging for visual information (e.g., <ref type="bibr">Sirois &amp; Mareschal, 2004)</ref>. While there is extensive variability in the definition of what constitutes "novelty" in this context (see, e.g., <ref type="bibr">Gati &amp; Ben-Shakhar, 1990</ref>, for discussion), the view that neuronal models of recurrent and expected events serve as a baseline to flag novel and unexpected recurrent events converges with our view (and see, e.g., <ref type="bibr">Egner et al., 2010;</ref><ref type="bibr">Meyer &amp; Olson, 2010;</ref><ref type="bibr">Kumar, Kaposvari, &amp; Vogels, 2017;</ref><ref type="bibr">Richter et al., 2018</ref>, for evidence of surprisal detection in the neural domain).</p><p>Recent work on attention provides additional support for the IF approach. The learned distractor suppression literature shows that when the distractors in a series of search displays frequently occur in the same location, they capture attention significantly less (see Theeuwes, Bogaerts, &amp; van Moorselaar, 2022 for review). These results suggest that attention is modulated by distributional regularities in the environment, prioritizing novelty. Only distractors occurring at unexpected locations compete strongly for attention. Learned suppression of predictable distractors has also been observed for distractor features. For example, a distractor in a specific color loses its ability to capture attention with repeated exposure (Vatterott &amp; Vecera, 2012; see Geng, Won, &amp; Carlisle, 2019, for review). In addition, if distractors are highly frequent this can eliminate capture, even if their location and features are unpredictable <ref type="bibr">(Bogaerts, van Moorselaar, &amp; Theeuwes, 2022;</ref><ref type="bibr">Won, Kosoyan, &amp; Geng, 2019)</ref>.</p><p>Consistent with our approach, it seems that when distractors become part of the baseline, and are no longer novel or surprising, they become less salient. Similar effects may be at play in the speech domain.</p><p>Understanding what a stranger has said is easier in a 'cocktail party' scenario when simultaneous, competing speech is produced by a highly familiar talker (one's spouse) vs. when the competing voice is that of a novel talker <ref type="bibr">(Johnsrude et al., 2013)</ref>. Moreover, frequently occurring sounds lead to reduced perception of similar sounds, an effect that has been extensively studied in the selective adaptation literature <ref type="bibr">(Eimas &amp; Corbit, 1973;</ref><ref type="bibr">see Samuel, 1986</ref>, for a review).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The system's end state</head><p>The successful assimilation of the structural regularities present in the environment is typically viewed as the SL system's end state. While SL naturally assumes that the learned regularities can be continuously updated given gradual changes, the view is that the system has done its job when it has learned the current regularities in a given domain. This view is well reflected in the experimental tasks devised to probe SL, which by and large, do not engage in change, perception of change, adaptability to change, and continuity (see <ref type="bibr">Frost et al., 2019</ref>, for a review). Rather, participants are typically presented with an input sequence and then tested on whether they have learned the regularities that were embedded therein. Learning is inferred when there is evidence that participants have assimilated the structural regularity embedded in the input stream, whether visual or auditory (e.g., <ref type="bibr">Siegelman, Bogaerts, Christiansen, &amp; Frost, 2017)</ref>. While, admittedly, these tasks were initially designed to provide a proof of concept that the statistical regularities in the input stream can be learned, in our view, they implicitly entrenched theoretical approaches to SL to consider it as a system whose end state is to map the existing structure of a stable environment.</p><p>In contrast, IF assumes ongoing foraging for novel information, and thus that there is no end state.</p><p>In the domain of language, given the dynamic changes in the input, IF requires the perceiver to be constantly adapting to what is different. For example, phoneme perception is immediately impacted upon encountering a new voice that shifts in mean acoustic spectra <ref type="bibr">(Holt, 2005;</ref><ref type="bibr">Huang &amp; Holt, 2012)</ref>, or when hearing a foreign accent in which acoustic input dimensions differ in their correlation <ref type="bibr">(Idemaru &amp; Holt, 2011;</ref><ref type="bibr">Hodson, Shinn-Cunningham, &amp; Holt, 2023)</ref>. Indeed, as we discuss further below, given that most SL experiments involve stimuli that are often very novel, these experiments may be better construed as implicating, at least initially, sensitivity to new structure.</p><p>Of course, at the end of the day, organisms do represent knowledge about stable structural properties of the environment, language included, and this knowledge facilitates perception and action.</p><p>From the IF perspective we are proposing, SL mechanisms are continuously at work, and if baseline regularities are recurrently registered across time, without much change, this information will be represented (subserving interaction with the environment; see, e.g., <ref type="bibr">Schapiro et al., 2017)</ref> and updated to reflect gradual changes that may emerge over time. One could argue that, given this, IF subserves the SL system rather than the other way around. However, since the statistical regularities in sensory input are dynamic and continuously changing, adaptation to these novel changes is the system's primary goal. Importantly, how much knowledge of stable real-world linguistic regularities is acquired from mere exposure to input regularities through implicit and unsupervised SL mechanisms, and to what extent additional kinds of learning (e.g., supervised) might be necessary, remains to be determined (see <ref type="bibr">Br&#246;ker et al., 2024, for review)</ref>.</p><p>For example, concurring with our IF octopus metaphor, the alignment of statistical input regularities with active behavior amplifies learning above and beyond what is possible across passive exposure alone. An illustrative example comes from novel nonspeech sound clusters designed to mimic the statistical structure and complexity of English consonant categories. These categories are not acquired with passive exposure (e.g., <ref type="bibr">Wade &amp; Holt, 2005;</ref><ref type="bibr">Emberson, Liu &amp; Zevin, 2013;</ref><ref type="bibr">Roark, Lehet, Dick, &amp; Holt, 2022)</ref>. Nonetheless, they are rapidly learned across the same time course when they are embedded in an unrelated active task structured such that the sounds are not essential to the task, but learning their structure supports success. This learning robustly generalizes to novel exemplars <ref type="bibr">(Roark et al., 2022)</ref>, alters cortical representations <ref type="bibr">(Leech et al., 2019)</ref>, and persists over days <ref type="bibr">(Gabay, Karni, &amp; Holt, 2023)</ref>, even without knowledge that categories exist. The structure is discovered via its utility in supporting behavior. Active engagement in a rich, multimodal perceptual environment (typical of most natural behaviors) may encourage foraging for information that directs learners to specific statistical regularities among the essentially infinite informational contingencies that exist even in simple real-world environments <ref type="bibr">(Roark et al. 2022</ref>). Thus, regularity that is difficult to extract across passive listening is readily learned, perhaps in a form of 'self-supervised' learning <ref type="bibr">(Lim, Fiez, &amp; Holt, 2019)</ref>, by virtue of coarse alignment of statistically structured input with behaviorally relevant actions, objects, and events. In this context, both endogenous generation of feedback as in self-supervised learning and exogenous (supervised) feedback fit with the view that the learner actively engages with the environment. We assume that the division of labor between the different learning mechanisms is likely to differ across domains (e.g., reading, syntax, second-language acquisition), and potentially across individuals. For example, explicit instruction and feedback play a significant role in reading acquisition (see <ref type="bibr">Rastle, Lally, Davis, &amp; Taylor, 2021)</ref>, which is not the case for native spoken language acquisition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Selectivity</head><p>In conventional SL theory and in typical SL research, the structural regularities in the environment are defined independently of whether they are informative to the learner and of the specific context of learning (e.g., <ref type="bibr">Lelonkiewicz, Ktori, &amp; Crepaldi, 2020;</ref><ref type="bibr">Fiser &amp; Aslin, 2002)</ref>. Indeed, implicit passive learning of regularities present in sensory input (whatever computational mechanisms are assumed) is taken to be nonselective, in the sense that the system is set to assimilate and absorb the statistical cooccurrences present in the input through mere exposure, whatever the regularity is, regardless of its informational value to the learner. While having a mechanism that registers recurrent regularities nonselectively might have some advantages in terms of simplicity, we argue that passive, non-selective SL mechanisms are limited in their usefulness for driving higher-order behavior. Instead, their importance comes from their role as a baseline-providing subcomponent of a higher-order IF system, which is selective. The IF system must be selective because any given environment presents the learner with a myriad of possible regularities, and it is the current context that determines which are "informative" and which are not for a given organism.</p><p>The importance of selectivity and the role of contexts has not gone completely unnoticed within the domain of SL research. For example, a good demonstration that learners are not passive absorbers of regularities in sensory input and that the informativeness of the signal modulates learning is children's ability to assimilate regularities such as ABB (e.g., generalizing le-di-di to ko-ga-ga; see <ref type="bibr">Marcus et al., 1999)</ref>. <ref type="bibr">Marcus, Fernandes, and Johnson (2007)</ref> have shown that when children hear ABB in speech sounds, they learn the patterns, but when they hear ABB in non-speech sounds, such as sinewave tones, they do not. Importantly, in a subsequent study, <ref type="bibr">Ferguson and Lew-Williams (2016)</ref> demonstrated that if children were previously exposed to a video of two persons communicating in tones (a communicative context), learning does occur for tones, just as it does for speech sounds (see <ref type="bibr">Saffran et al., 2007</ref>, for a related finding with images children found interesting [dogs] vs. not [shapes, used by <ref type="bibr">Marcus et al., 2007]</ref>). This shows that there are preferences regarding what regularities should be attended to and what regularities can be ignored. The context determines what in the environment carries important information for a given species and what does not. In this example, regularities such as speech sounds that subserve communication within humans appear to be a primary filter for selection. This tension between informative and non-informative regularities has also been acknowledged in the domain of perception with a similar argument; it is computationally infeasible to sample all available information in the very complex real-world environment <ref type="bibr">(Braunlich &amp; Love, 2022)</ref>. Learning is selective because IF requires learners to eventually ignore regularities in the input that are uninformative (with respect to goals relevant for the organism to thrive in its niche), and focus on the informative ones. Note that making sense of which variations in the input are informative and which are not in itself requires learning. In fact, an essential part of development can be regarded as mastering this distinction.</p><p>For example, very young infants are sensitive to subcategorical (subphonemic) variations in speech (allowing them to distinguish non-native speech sound contrasts, such as the /r/-/l/ distinction for Japanese infants), but during the first year of life they learn to divide acoustic-phonetic space in ways that are optimized for the language they are immersed in, and they lose the ability to distinguish most contrasts that are not relevant for that language (e.g., <ref type="bibr">Werker, Gilbert, Humphrey, &amp; Tees, 1981)</ref>. In the domain of reading, proficiency has been shown to be related to the extent to which a reader relies on systematic (e.g., orthography to phonology) regularities versus spurious (e.g., arbitrary semantic cues such as imageability) regularities that are characteristic of the orthography <ref type="bibr">(Siegelman et al., 2020)</ref>. Successful SL will result in lasting representations aligned with language regularities. This internal knowledge provides another form of selectivity for IF. In lexically-mediated perceptual learning <ref type="bibr">(Norris et al., 2003)</ref>, informativeness is seen in at least two ways. First, shifts in phoneme boundaries seem typically to be learned in a talker-specific way (segregating novel statistics experiences in speech from a particular talker from baseline, rather than assuming the overall baseline for the language has changed; e.g., <ref type="bibr">Eisner &amp; McQueen, 2005)</ref>. Second, learning is blocked when an alternative explanation is available for phoneme boundary shifts (e.g., seeing that the speaker has a pen in her mouth; <ref type="bibr">Kraljic, Samuel, &amp; Brennan, 2008)</ref>. In dimension-based statistical learning in speech, individual differences in acoustic cue weighting (a reflection of long-term speech representations) predict how local speech input regularities shift speech categorization <ref type="bibr">(Wu &amp; Holt, 2022)</ref>. Thus, statistical learning under control of information foraging is selective, can be adaptively conservative in its scope, and can involve foraging of internal representational space as well as local input.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Neurobiological characteristics</head><p>In the conventional view, SL serves to provide the organism with an internal mirror of external regularities in the environment, so that greater external regularity should lead to faster learning. From a neurobiological perspective, the assumption of this view is that the more regularity there is in the input, the more neural activity there should be in the medial temporal lobe (MTL) as well as in early sensory cortices, leading to faster assimilation of the external statistics (e.g., <ref type="bibr">Schapiro et al., 2017)</ref>. IF systems operate differently, because both full randomness and full regularity are uninformative, as they do not represent meaningful novelty. This view of a "Goldilocks" range of information concurs with behavioral findings showing that children direct attention to events that are neither too simple nor too complex (e.g., <ref type="bibr">Kidd et al., 2012</ref>; see also <ref type="bibr">Forest, Siegelman, &amp; Finn, 2022, for adults)</ref>. Hence, the neural architecture of an IF system should not be responsive to randomness, nor to completely predictable patterns. Indeed, neuroimaging studies have identified neural populations that track uncertainty nonmonotonically following an inverted U-shaped function (with stronger response for moderately unexpected inputs, but low response for both highly expected and highly random inputs), in both the visual and auditory cortices <ref type="bibr">(Nastase, Iacovella, &amp; Hasson, 2014;</ref><ref type="bibr">Hasson, 2017)</ref>. These neural systems do not respond to full randomness or full regularity as these are alike in terms of informativeness (or lack thereof); instead, they are tuned to the moderate regularities in the sensory input.</p><p>Extensive neuroscience work has investigated the interconnection of the reward system and salience network <ref type="bibr">(Seeley et al., 2007)</ref> with the violation of expectation. For example, substantial research has tied the amygdala, insula, and dorsal anterior cingulate cortex to computations flagging novelty and surprise (e.g., <ref type="bibr">Kolling, Wittmann, Behrens, 2016;</ref><ref type="bibr">see Vassena, Holroyd, &amp; Alexander, 2017</ref>, for review), thus potentially mediating information foraging. A different body of evidence comes from work on reinforcement learning and the dopaminergic system showing that novel and/or infrequent information is encoded by dopaminergic neurons in the striatum (e.g., <ref type="bibr">Schultz, Dayan, &amp; Montague, 1997)</ref>. This research, however, has mainly focused on prediction errors regarding upcoming reward given changes in its probability (e.g., <ref type="bibr">Behrens, Woolrich, Walton, &amp; Rushworth 2007)</ref>, and not on general deviation from baseline regularity in implicit unsupervised learning. Hence, the question of whether and to what extent IF is rewarding beyond paradigms of supervised learning requires further investigation.</p><p>A hint comes from neuroimaging of the learning of nonspeech categories, described above.</p><p>Recall that these regularities are not learned with passive exposure but are robustly acquired when the regularity aligns with behavior in an active task <ref type="bibr">(Wade &amp; Holt, 2005;</ref><ref type="bibr">Roark et al., 2022;</ref><ref type="bibr">Gabay et al., 2023)</ref>. Examining such learning with fMRI reveals that that posterior striatum (especially caudate and putamen) is sensitive to statistical regularity. When actions and events in an active task incidentally align with sound categories that are defined by well-structured statistical regularities, the posterior striatum is recruited to a greater degree than it is among participants who engage in the same task with statistically less-well-structured categories. The magnitude of striatal activation is associated with better behavioral learning outcomes. Thus, when statistical regularities align with actions and events in the environment, 'self-supervised' learning signals available via the posterior striatum may boost learning beyond what is possible through passive exposure alone <ref type="bibr">(Lim, Fiez &amp; Holt, 2019)</ref>.</p><p>Along with our present IF approach, a recent framework tying learning and memory to curiosity (the Prediction, Appraisal, Curiosity, and Exploration [PACE] Framework; <ref type="bibr">Gruber &amp; Ranganath, 2019)</ref> demonstrates how, in general, prediction errors and detection of novelty increase attention and exploration through modulation of activity in dopaminergic circuits. Hence, while SL research has typically focused on the role of the MTL in registering regularities mainly through hippocampal activation (e.g., <ref type="bibr">Schapiro et al., 2017)</ref>, an IF approach would incorporate a larger set of neurobiological mechanisms that simultaneously consider a) sensitivity to deviation from regularity, b) reward systems, and c) memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Individual differences</head><p>Both SL and IF accounts assume that there are individual differences in pattern sensitivity (see <ref type="bibr">Frost et al., 2015;</ref><ref type="bibr">Siegelman et al., 2017, for discussion)</ref>. Since the environment is not only dynamic but also noisy, the challenge to any learning system, SL and IF alike, is to distinguish meaningful patterns related to regularities, from those related to noise. Separating noise from signal requires something akin to a time window across which random noise will average out based on not having any predictive value, whereas regularity of the signal will remain. If the sampling window is too short, the system will change/reorient/relearn with every bit of fluctuating noise. If the time window is too wide, shifting patterns could go unnoticed. One possibility is that such a sampling window may be implemented in a literal fashion, as in a time series analysis. However, this may not be a likely solution because learners cannot hold on to multiple input patterns before processing them, and our limited memory abilities leaves little room for backtracking (cf. the Now-or-Never bottleneck, <ref type="bibr">Christiansen &amp; Chater, 2016)</ref>. Instead, we lean toward a more metaphorical interpretation of the sampling window, such as what might be observed in recurrent networks trained on sequences. Here, there is no explicit sampling window, but coherent signals will triumph over noise because consistent patterns (relative to dynamic contexts) will be the primary driver of weight changes over time. Interestingly, work by <ref type="bibr">Karuza et al. (2016)</ref> indeed suggests that the assimilation of structure leads to decreased environment sampling, causing learners to overlook pattern shifts and to display a bias toward their initial experiences (see also <ref type="bibr">Bruner &amp; Postman, 1949)</ref>. We assume that individuals differ in this respect, that is, how they optimize their 'sampling window' in a given context.</p><p>Although research from an SL perspective targets individual sensitivity to detecting regularities (e.g., <ref type="bibr">Misyak, Christiansen, &amp; Tomblin, 2010</ref><ref type="bibr">, Siegelman &amp; Frost, 2015)</ref>, IF targets individual sensitivity to detecting changes in regularity. Given that the environment is typically dynamic rather than stable, individuals are expected to differ in their perceptual sensitivity to the ongoing dynamic changes in regularities in sensory input, and their efficiency in acting on these changes. This again would be tied to differences in optimizing the sampling window in a given context. Whereas it is possible that sensitivity to stable regularities goes hand in hand with sensitivity to change, it is also possible that these would be two dissociable dimensions of inter-individual variance. Note that IF also assumes substantial individual differences in sensitivity to informativeness. These differences are found not only across development (efficient IF requires the ongoing learning of which regularities to assimilate and which to ignore), but also differentiate individuals at a given point in development (see <ref type="bibr">Forest et al., 2023;</ref><ref type="bibr">Saffran &amp; Kirkham, 2018</ref>, for reviews of changes in regularity learning across development). Thus, we posit that substantial individual differences should be revealed in the ability to perceive and learn which continuously encountered variations in the input are relevant or informative and which are not. This perspective also offers novel avenues for understanding the wide individual differences in response to interventions in clinical populations. ADHD (attention deficit hyperactivity disorder), for example, has been tied to heightened novelty seeking which leads to the suboptimal reward-related decision-making characteristic of this population <ref type="bibr">(Lieder et al., 2019)</ref>. For autism, findings have been mixed. Comparing trial-by-trial performance in a serial detection task, neurotypical participants were found to overweight recent statistics and quickly update their internal sensory models, which is adaptive in changing environments, whereas individuals with autism were found to rely atypically heavily on long-term statistics <ref type="bibr">(Lieder et al., 2019)</ref>. Other studies, however, provided evidence for faster rather than slower updating of internal models by individuals with autism <ref type="bibr">(Goris et al., 2022;</ref><ref type="bibr">Lawson, Mathys, &amp; Rees, 2017)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Role in the cognitive system</head><p>Here we come to an important distinction regarding cognitive architecture, and an important clarification: Per our Figure <ref type="figure">1</ref>, IF does not preclude mechanisms of SL and does not aim to replace SL as a theoretical construct. Rather, while conventional SL theory regards SL as a stand-alone mechanism, from our perspective, SL computations form a subcomponent of IF, with SL playing a critical role in providing baselines against which change can be detected. This distinction is a corollary of the view of the environment as being fundamentally dynamic, with the changes in regularities as the main target of perception. To identify such changes, a baseline of current covariation is needed, from which novel information can be detected. Hence, per our view, SL is a primary mechanism that non-selectively attunes the system to regularities in sensory input, but in service of the higher purpose of novelty detection. While we have argued so far that IF is a necessary component of learning regularities, as we outline below, from a wider perspective, we consider IF to be a unifying principle that underlies more complex behavior such as curiosity, exploration, and creativity. This perspective opens a novel set of questions regarding how SL and IF interact, and what outputs they produce, to enable efficient perception and action in a dynamic and everchanging environment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>One system or two?</head><p>An important theoretical question is whether registering recurrent regularities in the input for establishing baselines (SL), and sensitivity to changes in regularities that deviate from baseline (IF), require two independent and distinct systems, or whether one system can account for both. In other words, we ask whether a mechanism that registers recurrent regularities in sensory input as established by SL research could also at the same time allow for the fast accommodation of changes therein.</p><p>One could argue, for example, that a unified Bayesian perspective could, in principle, accommodate both SL and IF. In a Bayesian framework, if the environment is inherently everchanging, then the prior distribution of the perceived regularities would become less and less informative. With a relatively less informative distribution, deviating evidence (i.e., changes in regularity) has substantial weight, thereby significantly changing the posterior distribution and leading to increased sensitivity to novelty, aligning with our IF approach. In contrast, if the environment is stable and characterized by recurrent regularities, this would result in increasingly strong priors for expected events. Strong priors would lead to facilitation in the perception and/or processing of likely events in the input (e.g., <ref type="bibr">Friston, 2005</ref><ref type="bibr">Friston, , 2009</ref><ref type="bibr">Friston, , 2010;;</ref><ref type="bibr">Kok, Jehee, &amp; de Lange, 2012)</ref>, aligning with the conventional SL approach and its empirical findings. The main appeal of such an architecture is that one unified system generates different behaviors to the extent that the environment is stable versus ever-changing in a given domain.</p><p>While this Bayesian approach has the advantage of parsimony, it predicts, for example, that the more exposure a learner has to a repeated set of regularities, the less weight a surprising deviation from this pattern will have in updating the posterior distribution. This is because repeated regularities produce increasingly strong priors, so that substantial evidence is eventually required for updating beliefs about the structural properties of the input.<ref type="foot">foot_0</ref> This goes counter to the prediction we would make based on our IF account. IF argues that effective novelty detection and behavioral adaptation on the basis thereof, take place when there is a violation of structure in a so-far stable patterned environment (which has served as a baseline). In general, humans and other organisms adapt very rapidly to changing statistics while seeming to provisionally segregate changing statistics from regularities learned over the long term (e.g., <ref type="bibr">Dell, Reed, Adams, &amp; Meyer, 2000;</ref><ref type="bibr">Kraljic &amp; Samuel, 2005;</ref><ref type="bibr">Onishi, Chambers &amp; Fisher, 2002)</ref>, and link them causally to contexts (e.g., <ref type="bibr">Kraljic &amp; Samuel, 2011)</ref>. Evidence shows that detection and adaptation to deviation from recurrent regularities is exceedingly fast regardless of lengthy past experience. For example, participants were found to adjust their preferred saccade length (which reflected their prolonged reading experience) when presented with sentences containing words with a uniform word-length, when they had only one trial to adapt <ref type="bibr">(Cutter et al., 2018)</ref>.</p><p>While we opted to exemplify this problem using Bayesian terms, we should emphasize that virtually any learning system will face a challenge when it has been immersed in very strong regularities, and those regularities begin to change: Systems will respond sluggishly to changes in a previously highly regular environment. For example, a neural network (e.g., a simple recurrent network; <ref type="bibr">Elman, 1990</ref><ref type="bibr">Elman, , 1991) )</ref> may require substantial experience to overcome previous learning, and it might well lose significant aspects of prior learning if they are no longer reinforced (so-called catastrophic interference; <ref type="bibr">McClosky &amp; Cohen, 1989;</ref><ref type="bibr">Bower, Thomson-Schill, &amp; Tulving, 1994;</ref><ref type="bibr">Merhav, Karni &amp; Gilboa, 2014</ref>) -a phenomenon that is not generally observed in biology. This suggests that a system that targets the tracking of stable regularities, and a system that prioritizes novelty detection over stable regularities, might operate with different computational mechanisms. This discussion resonates with our initial claim that a theory of learning regularities should first consider the nature of the environment that is the object of learning. We take it as evident that a computational mechanism tracking regularities in the input could naturally assimilate gradual and slow changes, as original patterns will recur but at lower and lower frequency, and novel patterns will emerge at a higher and higher frequency. However, if the input is characterized by abrupt changes in regularities which would flag informational novelty, and the target of perception is precisely these changes, then slow adaptive learning mechanisms would probably not suffice. This perspective raises important questions that should be the focus of future empirical and computational research efforts: What is the overlap in mechanisms that underlie the assimilation and updating of baseline regularities and those involved in detecting novel regularities? How do they interact with one another, and with what brain networks? Are different behavioral phenomena indicating sensitivity to novel regularities (spanning different cognitive domains and different timescales of adaptation) all tapping into one and the same IF system? Finally, are representations of baseline regularities overwritten/adjusted by IF, or do the representations of baseline regularities and those of novel regularities co-exist? As we outline in the following section, these questions regarding regularity learning are echoed across other domains in cognitive science.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Parallel debates across cognitive neuroscience</head><p>Since <ref type="bibr">Helmholtz (1863)</ref>, cognitive neuroscience has grappled with how sensory input and prior knowledge (expectations) interact to establish a percept (e.g., <ref type="bibr">Geisler &amp; Kersten, 2002;</ref><ref type="bibr">Friston, 2009;</ref><ref type="bibr">Heilbron, &amp; Chait, 2018)</ref>. Contemporary studies examine how expectations are established <ref type="bibr">(Jabar &amp; Fougnie, 2022)</ref>, how they shape behavior and neural response <ref type="bibr">(Egger, Remington, Chang, &amp; Jazayeri, 2019)</ref>, and, importantly, how they change in a world with dynamic, fluctuating regularities <ref type="bibr">(Hodson et al., 2023)</ref>. Whereas this paper centers on SL, parallel debates on the importance of novelty are seen in multiple domains of cognition.</p><p>The most relevant framework is that of predictive processing. This theoretical approach to cognition views the brain as a Bayesian inference machine, where predictions regarding sensory input are continuously made to minimize free energy, a proxy for uncertainty and surprise, thereby facilitating perception and action (e.g., <ref type="bibr">Friston, 2005</ref><ref type="bibr">Friston, , 2009</ref><ref type="bibr">Friston, , 2010))</ref>. While we have discussed the problem of exceedingly fast adaptation with reference to a Bayesian perspective regarding enhanced sensitivity to changes in regularities, parallels between current theories of predictive processing and our IF view certainly exist. Predictive processing, similar to IF, regards learners as active in the sense that they continuously make inferences, and it highlights the role of explorative behavior in learning (e.g., <ref type="bibr">Friston, 2016</ref><ref type="bibr">Friston, , 2017;;</ref><ref type="bibr">see Schwartenbeck et al., 2013, for discussion)</ref>. Like IF, predictive processing centers on adaptation, offering computational mechanisms for it through the notion of minimizing prediction errors and a continuous process of updating priors. Given this, both frameworks reject the idea of an end state when the environmental regularities have been assimilated. However, an important basic difference remains. While models of predictive coding also offer computational accounts for novelty seeking, curiosity, and creativity (e.g., <ref type="bibr">Schwartenbeck et al., 2013;</ref><ref type="bibr">see Clark, 2017, for discussion)</ref>, in essence, they center on the notion of minimizing surprisal, while IF centers on the prioritization of changes in regularity. This distinction echoes our above discussion on one versus two systems.</p><p>In the domain of visual perception, the contrast between prioritizing expected input (as in the SL approach) vs. upweighting novel and surprising input (as in the proposed IF approach), is discussed in the context of sensory perception of unitary events vs. sensory-motor predictions. This debate, dubbed the Perceptual Prediction Paradox (see <ref type="bibr">Press, Kok, &amp; Yon, 2020)</ref>, outlines the difference between, say, perceiving a cup, which requires fast recognition of a familiar object, vs. sensing the cup slipping from one's grip, which requires fast detection of deviation from what is expected regarding this object in terms of sensory information.</p><p>From a neurobiological perspective, a key signature of expectations is a weakened neural response to stimuli that are anticipated (see Heilbron &amp; Chait, 2018; de Lange, Heilbron, &amp; Kok, 2018, for reviews). There are, however, different accounts of the specific neural mechanism responsible for this suppression, which directly relate to whether perception tilts toward the input we expect, or if it instead prioritizes unexpected and novel input (see <ref type="bibr">Press et al., 2020, for discussion)</ref>. According to sharpening models, neural populations that are not tuned to the anticipated stimulus are particularly affected by expectations, resulting in a neural response that is overall diminished in magnitude but carries a more precise representation of the stimulus (e.g., <ref type="bibr">Kok et al., 2012;</ref><ref type="bibr">Bell, Summerfield, Morin, Malecek, &amp; Ungerleider, 2016)</ref>. This "sharpening" process biases perception in accordance with the perceiver's expectations, and is consistent with data demonstrating that predicted events are perceived with greater clarity and detected or processed faster, as typically assumed in SL theory. In contrast, damping models propose that neural populations that are tuned toward an expected stimulus are suppressed, leading to reduced cortical activation for expected input <ref type="bibr">(Blakemore, Wolpert, &amp; Frith, 1998;</ref><ref type="bibr">Summerfield &amp; de Lange 2014)</ref>. This leads to the prioritization of novelty, as the brain now favors the processing of surprising information, per an IF approach.</p><p>How these two seemingly incompatible explanations (and the data supporting each) can be reconciled is still a matter of debate (see, Richter, Heilbron, &amp; de Lange, 2022, for discussion). While some theoretical accounts hypothesize that sharpening and damping can occur in parallel in different neural populations <ref type="bibr">(Friston, 2005)</ref>, others propose different time courses for sharpening and damping so that initial processing favors the expected, but later processing highlights signals that depart from these expectations to allow us to accommodate change (e.g., <ref type="bibr">Press et al., 2020)</ref>. These proposals for reconciliations, however, are silent regarding how the tradeoff between prediction and novelty detection is modulated by context, per our IF approach. The alignment of input statistics with active behavior may play an important role in such modulation <ref type="bibr">(Roark, Lehet, Dick, &amp; Holt, 2022;</ref><ref type="bibr">Lim, Fiez, &amp; Holt, 2019)</ref>.</p><p>Integrating SL into an IF framework sets a novel research agenda for understanding how input patterns acquired by SL may lead to predictions about upcoming input, and how departures from these patterns are flagged and accommodated in behavior and neural response. In the IF framework, future SL research will benefit from cross-fertilization with cognitive neuroscience literatures that examine the nature of predictive processes and novelty detection. Such research should focus on the precise neural systems that are implicated in dynamically tracking ever-changing regularities in sensory input, and to what extent they are related to reward networks driving curiosity and exploration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Information foraging and higher cognitive functions</head><p>We consider the prioritization of novel information to be a domain-general feature and a unifying principle that explains a wide range of behaviors. Starting from early life, cognitive development is often cast in terms of constructing internal models that serve as a baseline for the detection of important novel information <ref type="bibr">(Atzil, Gao, Fradkin, &amp; Barrett, 2018)</ref>. As Twomey and Westermann (2018) suggest, infants drive their cognitive development by searching for structure in their environment, and maximal learning emerges when stimulus novelty is maximized in reference to their internal models. However, we propose that IF is also an explanatory principle for more complex behaviors. For example, foraging for information has been a cornerstone principle in the study of curiosity and its neurobiological underpinning (see e.g., <ref type="bibr">Loewenstein, 1994;</ref><ref type="bibr">Kidd &amp; Hayden, 2015</ref><ref type="bibr">, Gottlieb &amp; Oudeyer, 2018, for review)</ref>. Within this research area, novelty is taken to act as an intrinsic reward in exploration (e.g., <ref type="bibr">Gottlieb, Oudeyer, Lopes, &amp; Baranes, 2013)</ref>, and such curiosity-driven explorations overall lead to improvement of prediction, a reduction in uncertainty, and assimilating more complex structures <ref type="bibr">(Oudeyer &amp; Smith, 2016)</ref>. As long as three decades ago, tracking ocular movements, <ref type="bibr">Berlyne (1996)</ref> found that when presented with pairs of stimuli, participants spend less and less time inspecting recurrent patterns and more and more time looking at novel patterns. While at the time, this was labeled "perceptual curiosity", it coincides well with our IF approach. In general, current theories of curiosity converge on the assumption that the automatic bias towards novel and surprising events is rooted in the motivation to reduce uncertainty in the environment <ref type="bibr">(van Lieshout, de Lange, &amp; Cools, 2020)</ref>, so that the model of the environment is continuously updated. This has been extensively</p><p>shown in how young children forage for visual information. Infants tend to focus on familiar visual stimuli as long as they offer learning progress, but they switch to novel stimuli when learning progress drops <ref type="bibr">(Poli et al., 2020;</ref><ref type="bibr">and see Adyman &amp; Mareschal, 2013</ref>, for how redundancy governs spontaneous orientation). Indeed, if the environment is not stable but ever-changing, such updating is a primary priority. From this perspective, highly predictable events are uninformative and do not contribute to uncertainty reduction for updating our models of the world. Similar to our approach, novelty-based theories of curiosity suggest that new and highly uncertain stimuli drive curiosity, and, in general, the causal structure of the environment and its predictability will determine whether high or moderate uncertainty should drive curiosity <ref type="bibr">(Dubey &amp; Griffiths, 2020)</ref>.</p><p>Discussions in the domain of creativity parallel the proposed contrast between SL and IF. In creativity research, "creative foraging", as a theoretical construct, is taken to balance two main processes, exploitation and exploration <ref type="bibr">(Hills, Todd, Lazer, Redish, &amp; Couzin, 2015;</ref><ref type="bibr">Hart et al., 2018)</ref>.</p><p>Exploitation is operationally defined as taking advantage of the specific regularities within a search space, repeatedly applying identical or similar computations. Opposite to exploitation is exploration, defined as moving to a novel search space, applying novel computations, to increase gain. While exploitation would maximize reward if the environment is stable, a maximizing organism would not detect superior reward associated with different regions of space or different computations (indeed, exploitation may limit reward, if resources are depleted, especially if multiple organisms compete for resources; e.g., <ref type="bibr">Gallistel, 1993)</ref>. While exploitation is driven by predictable outcomes akin to SL, exploration, and in essence, curiosity behavior, is driven by the promise of information gain that would come in unpredictable novel search spaces <ref type="bibr">(Linquin &amp; Lombrozo, 2020)</ref>. In fact, exploration is defined only with reference to exploitation, just as novelty in IF is defined with reference to baseline statistical regularity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Concluding remarks and future directions</head><p>Our present theoretical perspective on IF and its role in cognitive architecture lays the groundwork and raises new questions for future research on the processing of regularities. Critically, experimental investigations of IF should focus on paradigms that involve changes in regularity, tracking participants' perception of these changes in real-time, mapping the precise parameters that determine efficient detection of deviations from baseline (e.g., <ref type="bibr">Weiss, Gerfen, &amp; Mitchel, 2009;</ref><ref type="bibr">Ryskin, Qi, Duff, &amp; Brown-Schmidt, 2017;</ref><ref type="bibr">Wang &amp; Theeuwes, 2020;</ref><ref type="bibr">Hodson et al., 2023)</ref>. When it comes to laboratory experiments with artificial stimuli, this will require tracking behavior in significantly longer experimental sessions than conventional SL research (see e.g., <ref type="bibr">Frank, Tannenbaum, &amp; Gibson, 2013)</ref>, where participants are processing input streams that vary in regularity as the session proceeds. In the domain of language, where the statistical co-occurrences of linguistic elements can be determined by considering large databases, experimental work can focus on presenting participants with input streams that conform or not with the distributional properties that characterize their linguistic environment. This approach can be used to measure the manipulation's impact on performance, for speech, print, or any linguistic input (e.g., <ref type="bibr">Idemaru &amp; Holt, 2011;</ref><ref type="bibr">Isbilen, McCauley, &amp; Christiansen, 2022;</ref><ref type="bibr">and see Elazar, Alhama, Bogaerts, &amp; Frost, 2022, for discussion)</ref>. In the same vein, from the perspective of individual Onishi, K. H., <ref type="bibr">Chambers, K. E., &amp; Fisher, C. (2002)</ref>. Learning phonotactic constraints from brief auditory experience. Cognition, 83, B13-B23.</p><p>Oudeyer, P.Y, <ref type="bibr">&amp; Smith, L.B. (2016)</ref>. How evolution may work through curiosity-driven developmental process. Topics in Cognitive <ref type="bibr">Science, 8, 492-502. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., &amp; Lowe, R. (2022)</ref>. Training language models to follow instructions with human feedback (arXiv:2203.02155). arXiv.</p></div>			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"><p>It should be noted that the magnitude of surprise can be greater when encountering a highly unexpected input in the context of a strong prior (characterized by a narrow distribution) compared to a weak prior (characterized by a broad distribution). This is due to the potentially larger discrepancy between the unexpected input and the concentrated probability mass of a strong prior.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>differences, studies should focus on individual sensitivity to a change in statistical regularities, individual plasticity in adapting to novel structural properties, and individual efficiency in determining which regularities are informative and which are not, given the particular context of learning.To map the neurobiological underpinning of IF, research could focus on the range of neural mechanisms that are tuned to track deviation from baseline, and mechanisms that flag alterations in patterns of quasi-regularity. A recent example is the role of norepinephrine in tracking unexpected uncertainty and deviation from regularity<ref type="bibr">(Zhao et al., 2019)</ref>. Importantly, such investigation should go beyond simple oddball paradigms and their concurrent mismatch negativity responses (e.g., N&#228;&#228;t&#228;nen &amp; Alho, 1995). As we point out above, and in contrast to conventional SL, mechanisms of IF most probably involve complex interactions with systems that govern exploration, attention, and reward.Establishing the neurocircuitry that is implicated in the foraging of change, and the necessary conditions for its consolidation in memory, is an important priority.</p></note>
		</body>
		</text>
</TEI>
