<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Project AIM: Autism intervention meta-analysis for studies of young children.</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>01/01/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10286318</idno>
					<idno type="doi">10.1037/bul0000215</idno>
					<title level='j'>Psychological Bulletin</title>
<idno>0033-2909</idno>
<biblScope unit="volume">146</biblScope>
<biblScope unit="issue">1</biblScope>					

					<author>Micheal Sandbank</author><author>Kristen Bottema-Beutel</author><author>Shannon Crowley</author><author>Margaret Cassidy</author><author>Kacie Dunham</author><author>Jacob I. Feldman</author><author>Jenna Crank</author><author>Susanne A. Albarran</author><author>Sweeya Raj</author><author>Prachy Mahbub</author><author>Tiffany G. Woynaroski</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[In this comprehensive systematic review and meta-analysis of group design studies of nonpharmacological early interventions designed for young children with autism spectrum disorder (ASD), we report summary effects across seven early intervention types (behavioral, developmental, naturalistic developmental behavioral intervention [NDBI], TEACCH, sensorybased, animal-assisted, and technology-based), and 15 outcome categories indexing core and related ASD symptoms. A total of 1,615 effect sizes were gathered from 130 independent participant samples. A total of 6,240 participants, who ranged in age from 0-8 years, are represented across the studies. We synthesized effects within intervention and outcome type using a robust variance estimation approach to account for the nesting of effect sizes within studies. We also tracked study quality indicators, and report an additional set of summary effect sizes that restrict included studies to those meeting pre-specified quality indicators. Finally, we conducted moderator analyses to evaluate whether summary effects across intervention types were larger for proximal as compared to distal effects, and for context-bound as compared to generalized effects. We found that when study quality indicators were not taken into account, significant positive effects were found for behavioral, developmental, and NDBI intervention types. When effect size estimation was limited to studies with randomized controlled trial (RCT) designs, evidence of positive summary effects existed only for developmental and NDBI intervention types. This was also the case when outcomes measured by parent report were excluded. Finally, when effect estimation was limited to RCT designs and to outcomes for which there was no risk of detection bias, no intervention types showed significant effects on any outcome.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Public Significance Statement</head><p>This comprehensive meta-analysis of interventions for young children with autism spectrum disorder (ASD) suggests that naturalistic developmental behavioral interventions and developmental intervention approaches have amassed enough quality evidence to be considered promising for supporting children with ASD in achieving a range of developmental outcomes.</p><p>Behavioral intervention approaches also show evidence of effectiveness, but methodological rigor remains a pressing concern in this area of research. There is little evidence to support the effectiveness of TEACCH, sensory-based interventions, animal-assisted interventions, and interventions mediated solely through technology at this time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Project AIM: Autism Intervention Meta-Analysis for Studies of Young Children</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Autism Spectrum Disorder</head><p>Autism spectrum disorder (ASD) is a relatively common neurodevelopmental disorder with a varied impact. Current prevalence estimates suggest that 1 in 59 meet the criteria for ASD, though this prevalence varies by sex, with males having a higher (approximately four times greater) likelihood of being affected <ref type="bibr">(Baio et al., 2018)</ref>. The diagnosis is primarily associated with core challenges in social communication, as well as restricted interests and repetitive behaviors and differences in sensory function (American Psychological Association</p><p>[APA], 2013). Individuals with ASD, however, may also exhibit difficulty in a number of related areas, such as language, adaptive behavior, and academic achievement.</p><p>A substantial portion of autistic 1 individuals report drawing a sense of identity and empowerment from the diagnosis, and advocate for a neurodiversity conceptualization of ASD as a natural form of human difference <ref type="bibr">(Houting, 2019)</ref>. Researchers have recently articulated a view of early intervention that is consistent with a neurodiversity framework (e.g., <ref type="bibr">Fletcher-Watson, 2018)</ref>. Specifically, early intervention services provided throughout childhood may support children with ASD in developing competencies that will allow them to navigate into adulthood in ways they see fit. At present, long-term life outcomes of autistic individuals vary widely. Though a number of individuals that receive early diagnoses go on to develop adaptive and communicative skills within the average range, most require at least some support, and many require substantial support into adulthood <ref type="bibr">(Renty &amp; Roeyers, 2006)</ref>. Importantly, quality of life 1Though researchers and clinicians often feel more comfortable with and advocate for using person-first language such as "individuals with autism," some autistic individuals and their parents have endorsed identity-first language that incorporates autism as a component of their identity over person-first language <ref type="bibr">(Gernsbacher, 2017;</ref><ref type="bibr">Kenny et al., 2016)</ref>. In this manuscript, we flexibly use identity-first and person-first language to acknowledge the diversity of opinions on this issue within the broader autism community (see <ref type="bibr">Robison, 2019)</ref>.</p><p>among autistic adults also varies between individuals <ref type="bibr">(Howlin &amp; Magiati, 2017)</ref>. Improving the quality of intervention provided in early childhood may be one way to increase the likelihood that long term life-satisfaction is attainable for all autistic people.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Research on Interventions in Early Childhood</head><p>Common intervention recommendations. Recommendations abound regarding the nature and amount of intervention that should be provided to support development in children with ASD. Scholars and professionals have routinely asserted that intervention should be provided as early as possible, beginning at or even before diagnosis in toddlerhood or infancy; that intervention should be intensive (i.e., provided for 25-40 hours per week for over a year or longer); and that it should be comprehensive (i.e., targeting broader development rather than specific skills; Boyd, <ref type="bibr">Odom, Humphreys, &amp; Sam, 2010;</ref><ref type="bibr">Lovaas, 1987;</ref><ref type="bibr">McEachin, Smith, &amp; Lovaas, 1993;</ref><ref type="bibr">Lord et al., 2001;</ref><ref type="bibr">Odom, Boyd, Hall, &amp; Hume, 2010)</ref>. These recommendations are motivated by the theory that interventions provided in early childhood are likely to yield the most optimal effects by capitalizing on the neuroplasticity of the developing brain <ref type="bibr">(Dawson &amp; Zanolli, 2003;</ref><ref type="bibr">Kolb &amp; Gibb, 2011)</ref>, and are rooted in early influential studies which suggested that intensive intervention yielded substantial cognitive gains, and that such gains varied according to age at the onset of intervention (e.g., <ref type="bibr">Lovaas, 1987;</ref><ref type="bibr">McEachin, Smith &amp; Lovaas, 1993)</ref>. However, it is notable that some subsequent studies exploring putative predictors of treatment response have reported that age at intake was not significantly associated with intervention outcomes (e.g., <ref type="bibr">Eikeseth, Smith, Jahr, &amp; Eldevik, 2007;</ref><ref type="bibr">Eikeseth, Klintwall, Jahr, and Karlsson, 2012)</ref>.</p><p>Types of intervention approaches. Several approaches to intervention aim to address the core and related challenges associated with ASD. These approaches vary in their underlying theories on the nature of ASD and development, as well as in their procedures and instructional modalities.</p><p>Behavioral approaches. Behavioral interventions were among the first developed and clinically tested approaches for improving outcomes for children with autism <ref type="bibr">(Ferster &amp; DeMeyer, 1962)</ref>. These approaches are derived from operant learning theory and are characterized by the discrete presentation of information (i.e., a stimulus), the prompted exhibition of target responses (i.e., desired academic, adaptive, and communicative behaviors), and the provision of extrinsic positive reinforcement (e.g., edible treats, toys, stickers, etc.) in the presence of those responses. Target skills are chosen based on functional areas of child need. Skills tend to be initially targeted in highly structured interactions within isolated clinical contexts (e.g., in the course of one-on-one interactions at a clinic with a therapist), but more natural settings and interaction partners (e.g., mainstream classrooms and other children) are gradually integrated as a child demonstrates progress. Initial studies suggested that Early Intensive Behavioral Intervention (EIBI) could yield marked improvements in cognitive and academic placement outcomes for children with ASD, especially when provided before school age and with sufficient intensity <ref type="bibr">(Lovaas, 1987;</ref><ref type="bibr">McEachin, Smith, &amp; Lovaas, 1993)</ref>. In the wake of such research, a number of behavioral approaches were further developed and refined, and the Behavior Analyst Certification Board (BACB) was established to oversee the clinical certification associated with this approach. Other behavioral interventions include Discrete Trial Training (DTT), Picture Exchange Communication System (PECS), and Positive Behavioral Supports (PBS). Together, these interventions are sometimes loosely described as Applied Behavior Analysis (ABA) Therapy and now constitute the primary approach used in clinical practice, according to parent and provider reports <ref type="bibr">(Green et al., 2006;</ref><ref type="bibr">Stahmer, Collings, &amp; Palinka, 2005)</ref>.</p><p>Developmental approaches. At times viewed in contrast to the aforementioned traditional behavioral interventions are those derived from developmental theories of learning (e.g., <ref type="bibr">Ospina et al., 2008;</ref><ref type="bibr">Prizant &amp; Wetherby, 1998)</ref>. Developmental interventions are rooted in constructivist theory, which posits that development is the result of children's active exploration of their physical and social surroundings. This exploration is far from being a solitary endeavor, as children are supported in social and language development by their interactions with more competent interaction partners such as caregivers <ref type="bibr">(Bruner, 1982;</ref><ref type="bibr">Vygotsky, 1978)</ref>. Foundational research on ASD within the developmental tradition has suggested that early deficits in social processes (joint attention being of particular importance) in children with ASD may in turn lead to difficulties in early caregiver-child social interactions. These early deficits are thus viewed as disrupting the primary context for subsequent language and social communication development.</p><p>As such, developmental interventions focus on improving the synchrony, reciprocity, and duration of parent-child or child-child interactions as a pathway for ameliorating deficits in social communication and generating cascading improvements in developmentally related skills.  <ref type="bibr">(Greenspan &amp; Wieder, 2007)</ref> and Hanen models <ref type="bibr">(Carter et al., 2011)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Naturalistic Developmental Behavioral Interventions (NDBIs).</head><p>In 2015, several interventions were categorized as belonging to a third type of intervention approach which has theoretical underpinnings in both behavioral and developmental theories of learning and development. Naturalistic Developmental Behavioral Interventions (NDBIs) involve the use of behavioral principles of learning to teach skills chosen from a developmental sequence in naturalistic environments and using natural rewards <ref type="bibr">(Schreibman et al., 2015)</ref>. Skills selected as relevant for intervention are those that allow the child to participate more fully within reciprocal interactions with the adult. These interventions are delivered primarily in the context of play, but control of interactions within this context is shared by both the child and the adult, through balanced turn-taking. Interventions categorized as NDBIs include the Early Start Denver Model <ref type="bibr">(Rogers &amp; Dawson, 2010);</ref><ref type="bibr">Enhanced Milieu Teaching (EMT;</ref><ref type="bibr">Kaiser, 1993)</ref>; Pivotal Response Treatment <ref type="bibr">(Koegel, Koegel, &amp; Carter, 1999)</ref>; and Joint Attention, Symbolic Play, Engagement, and Regulation (JASPER; <ref type="bibr">Kasari, Freeman, &amp; Paparella, 2006)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TEACCH. The TEACCH (Treatment and Education of Autistic and related</head><p>Communication-handicapped Children) program was developed in 1972 by Eric Schopler and is based primarily in the state of North Carolina <ref type="bibr">(Mesibov, Shea, &amp; Schopler, 2004)</ref>. We consider this specific intervention as distinct from other approaches because of the explicit focus on structured environmental design and self-monitoring, which is not the emphasis of any of the other interventions of interest to the present synthesis. The theoretical foundations of TEACCH are rooted neither in behavioral nor in developmental theories of learning. Rather, TEACCH procedures were designed according to Schopler's theorized profile of the learning strengths, preferences, and needs of individuals with ASD, which include relative visual strength and comfort with consistent routines. Thus, the TEACCH program is characterized by highly structured work routines and a heavy reliance on the visual presentation of information.</p><p>TEACCH "work systems" organize individual student tasks to visually convey four pieces of information: (1) What activity the student will complete, (2) How many items need to be completed, (3) How to identify when the work is finished, and (4) What will happen after task completion. TEACCH classrooms tend to feature carefully planned and structured environmental arrangements, work areas with minimal distractions, consistent routines, and the extensive use of visual schedules and supports.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sensory-based interventions.</head><p>Sensory-based interventions are motivated by the theory that sensory function is foundational in nature, and that sensory disruptions, particularly early in life, may produce cascading effects on development across a number of domains, ultimately yielding the constellation of core and related characteristics associated with ASD (e.g., <ref type="bibr">Bahrick &amp; Todd, 2012)</ref>. Within this framework, it is hypothesized that targeted treatments may thus have the potential not only to ameliorate reported sensory differences, but also to translate to effects on higher-order social, communication, and cognitive skills in children with ASD <ref type="bibr">(Cascio, Woynaroski, Baranek, &amp; Wallace, 2016)</ref>. The most well-known of these sensory-based approaches to treatment is Sensory Integration Therapy, in which children are presented with a series of individualized sensory-motor experiences intended to build foundational skills that will facilitate their engagement and participation in a range of activities of daily living <ref type="bibr">(Ayres, 1979;</ref><ref type="bibr">Ayres, 2005)</ref>. Other sensory-based interventions, as broadly conceptualized, may include activities such as brushing, swinging, the use of weighted vests and blankets to improve sensory processing, and music therapy and auditory integration training approaches that aim to scaffold motor, social, and emotional development (e.g., <ref type="bibr">Baranek, 2002;</ref><ref type="bibr">Case-Smith and Arbesman, 2008)</ref>. Sensory based approaches are most often provided by occupational therapists in clinical contexts but may also be delivered by caregivers, educators, and/or other service providers across a broader range of home and community settings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Animal-assisted interventions.</head><p>Animal-assisted interventions are those that rely on interactions with animals as the primary context for facilitating developmental change (e.g., <ref type="bibr">O'Haire, 2013;</ref><ref type="bibr">2017;</ref><ref type="bibr">Trzmiel, Purandare, Michalak, Zasadzka, &amp; Pawlaczyk, 2018)</ref>. In the ASD intervention literature, the intervention most prominently represented in this category is equineassisted activities and therapy (EAAT; see <ref type="bibr">Gabriels et al., 2012</ref> for a review of related terminology). Proponents of EAAT contend that the activities of horse-riding and horse care provide a multisensory experience that allows children the opportunity to practice skills across multiple domains. More broadly, animal-assisted interventions are theoretically motivated by the possibility that human-animal interactions are highly motivating and provide calming contexts which may support improved psychological wellbeing and social function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Technology-based interventions.</head><p>Technology-based interventions employ one or more of a variety of technologies (e.g., computers, videos, video games, robots) as the primary medium for delivery of instruction. These interventions attempt to capitalize on the reported special interest that many autistic individuals have in computer technology <ref type="bibr">(Grynszpan, Weiss, Perez-Diaz, &amp; Gal, 2014)</ref> and predictable formats of information delivery (Baron-Cohen, <ref type="bibr">Golan, &amp; Ashwin, 2012)</ref>, which allow users to control the pace of the interaction <ref type="bibr">(Knight, McKissick, &amp; Saunders, 2013)</ref>. Examples of technology-based interventions include computer-assisted instruction and The Transporters&#8482; DVD series (e.g., <ref type="bibr">Young &amp; Posselt, 2012)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Previous Syntheses of Intervention Literature</head><p>The National Professional Development Center (NPDC) on Autism Spectrum Disorders generated a list of 27 evidence-based practices for improving outcomes in individuals with ASD, based on prior reviews of single subject and group design research <ref type="bibr">(Wong et al., 2015)</ref>.</p><p>Similarly, the National Standards Project (NSP, 2015) described 14 intervention practices as established for children with ASD and an additional 18 as emerging, based on a review of single subject and group design literature. In 2011, Warren and colleagues systematically reviewed 34 group design studies examining interventions in children with ASD. Notably, only two of the studies included in the review by <ref type="bibr">Warren et al. (2011)</ref> were randomized controlled trials (RCTs), and only one of those was rated as high quality. Very recent systematic reviews suggest the publication of RCTs has precipitously increased in ASD since the publication of the aforementioned synthesis by Warren and collaborators. For example, <ref type="bibr">French and Kennedy (2017)</ref> systematically reviewed RCTs of interventions targeting any outcome in children with ASD below age 6, and found a total of 48 RCTs, 40 of which had been published since 2010.</p><p>Previous efforts to synthesize this literature have a number of shortcomings. First, NPDC and NSP review procedures attempted to synthesize evidence from RCTs, quasi-experimental studies, and single subject design studies (SSDs), when there is currently no agreed upon way of doing so. Though multiple methodologies can contribute to knowledge about effective practices, studies employing group designs, in particular high-quality RCTs, are the best equipped to control for alternative explanations and threats to internal validity. Syntheses that attempt to combine RCTs, quasi-experimental studies, and SSDs may overestimate the effectiveness of a given intervention approach. Inclusion of SSDs also limits the extent to which summary effects of intervention can be quantified with meta-analytic approaches. Though effect sizes that quantify change observed in SSDs have been proposed, many of these approaches fail to account for first order autocorrelation of data, ignore the logic of within study replication that is critical to interpretation of SSD data, and yield highly inflated and positively biased effect sizes which are not comparable to mean group differences that index treatment effects in group design <ref type="bibr">(Wolery, Busick, Reichow, &amp; Barton, 2010;</ref><ref type="bibr">Zimmerman et al., 2018)</ref>.</p><p>Second, in previous reviews, limited consideration was given to the nature of outcomes measured. That is, prior syntheses of intervention literature have predominantly sought to ascertain whether various approaches to interventions are "evidence-based," but they have largely failed to summarize the extent to which interventions effected meaningful change.</p><p>Interventions that were shown to effect change that was overly specific to intervention targets were generally not distinguished from those that impacted scores on broader standardized assessments of developmentally advanced skills as administered by independent assessors. A synthesis is needed which asks not only "what works and for whom," but also, "for what?" Third, none of the prior reviews seeking to synthesize effects for the broad range of interventions geared towards young children with ASD attempted to identify the summary effects of varied interventions on any outcomes using meta-analytic tools. Although a narrative synthesis approach allows for tallying the number of studies that have shown an effect for a given outcome, they do not allow for deriving an estimate of the combined magnitude of the effect, or determining whether or not the combined effect is significantly different from zero.</p><p>Additionally, narrative synthesis methods are unable to offer information about variables that may moderate effect sizes. Moderator effects offer vital information for understanding for whom interventions are effective, and for identifying study design features that result in potentially inflated effect sizes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Crucial Quality Considerations</head><p>Although systematic reviews and meta-analyses are purported to provide the most reliable summary of evidence of intervention effects, their conclusions are limited by the quality of evidence which they summarize <ref type="bibr">(Higgins et al., 2011;</ref><ref type="bibr">Murad, Asi, Alsawas, &amp; Alahdab, 2016)</ref>. Several aspects of study design pose risk of biasing outcomes. Thus, examination of any set of intervention literature must include an assessment of several study-level quality indicators.</p><p>We outline here those that are particularly important in studies of nonpharmacological interventions of children with ASD.</p><p>Random assignment. Though some have questioned the feasibility of conducting randomized controlled trials to test the effects of "real world" interventions with individuals with disabilities <ref type="bibr">(Oliver et al., 2002)</ref>, random assignment remains the most rigorous control for rival explanations of findings. Though random assignment does not ensure pretreatment statistical equivalence between groups on all variables, it is the best procedural guard against systematic differences between groups that would limit confidence in conclusions about causal associations between the intervention and dependent variables <ref type="bibr">(Kasari, 2002)</ref>. Historically, randomized tests of interventions have been exceptionally rare in ASD research <ref type="bibr">(Warren et al., 2011)</ref>. However, the recent proliferation of RCTs in this field suggests that random assignment is feasible and employed frequently enough to permit an evaluation of evidence from randomized trials versus quasi-experimental studies. Independence of assessors. Detection bias refers to the risk of bias that arises when assessors are aware of the group assignment of individual participants. This type of bias manifests in different ways in studies of autism intervention, and the degree of risk may vary depending on the extent to which non-independent assessors are involved in outcome assessment. It is likely that detection bias poses the greatest threat when caregivers participate in outcome assessment, either as reporters or interaction partners, though the threat is still substantial in situations wherein outcomes are assessed or coded by professionals that are aware of group assignment.</p><p>Caregiver/teacher report. It is common for researchers to rely on parents or teachers to assess outcomes via standardized interview and/or report forms in pediatric psychology and adjacent fields. Because caregivers observe and engage with children for extended periods of time across a variety of contexts, they can draw on their cross-context knowledge of a child's abilities when reporting on an outcome, and may therefore produce scores that are more representative of a child's generalized abilities, compared to scores derived from brief assessments administered by unfamiliar examiners. However, parents and teachers are virtually always aware of the extent and nature of a child's participation in an intervention study.</p><p>Moreover, they are likely to be personally invested in the outcome of intervention. This combination of awareness of group assignment and strong investment in positive outcomes can yield a "placebo by proxy" effect, which can positively bias results in favor of the treatment group <ref type="bibr">(Grelotti &amp; Kaptchuk, 2011)</ref>. Prior placebo-controlled studies of pharmacological interventions such as secretin have demonstrated that these effects can be rather large <ref type="bibr">(Williams, Wray, &amp; Wheeler, 2012)</ref>, and present even in simulated clinical trials where no intervention was provided <ref type="bibr">(Jones, Carberry, Hamo, &amp; Lord, 2017)</ref>. Thus, outcomes from caregiver report are highly subject to systematic measurement error and may positively bias summary estimates of intervention effects.</p><p>Outcomes assessed in interactions with caregivers. Even in situations that do not involve standardized report, caregivers can exert undue influence on outcome measurement. This occurs when caregivers participate as interaction partners in observational measures of outcomes of interest. Autism researchers frequently use observational measurement to capture social communication and related skills in the natural contexts in which they arise. For example, scores of language and communication are often derived from free play sessions with parents, or from interactions with teachers in the classroom. These scores are fundamentally dyadic; though they are often assumed to solely represent the skills or behavior of the child, they actually index the child's response to the interaction partner. When interaction partners are aware of the administration of a treatment, they may subconsciously or consciously shift their behavior to better elicit skill demonstration from the child. Though this threat arises often in studies of interventions on language and communication outcomes, it is not limited to measures of those domains. Therefore, outcomes measured in the context of natural interaction with caregivers are also subject to bias and may influence intervention effect sizes.</p><p>Outcomes assessed or coded by professionals aware of group assignment. Even unfamiliar professionals can influence outcomes when administering standardized assessments or coding observational measures of behaviors. A recent systematic review of medical literature that contained assessment of binary outcomes from both independent and non-independent assessors found that assessors that were aware of group assignment exaggerated odds ratios as much as 36% <ref type="bibr">(Hr&#243;bjartsson et al., 2012)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Influential Outcome Characteristics</head><p>The Cochrane Collaboration has delineated a set of quality indicators that are applicable to intervention literature in most fields, but additional field-specific sources of bias also exist for autism early intervention literature. Further, when it comes to studies of intervention for children with ASD, we contend that various aspects of outcome measurement can also serve as sources of bias and should therefore be considered <ref type="bibr">(Yoder, Bottema-Beutel, Woynaroski, Chandrasekhar, &amp; Sandbank, 2013)</ref>. We summarize two particularly important dimensions of outcome variables below (boundedness and proximity), and we review one additional source of bias related to study design that we hypothesize has the potential to influence effect sizes observed across studies of treatment effects on outcomes of young children with ASD (correlated measurement error</p><p>[CME] that arises when parents or teachers are trained in the intervention and then participate in the data collection).</p><p>Boundedness of outcomes to intervention context. Whether or not an intervention effects change that generalizes beyond the context of an intervention is a question of great importance. While the context of intervention is generally contrived and temporary, changes effected by intervention are often assumed to (or at least intended to) extend to natural environments and the routines of daily life. However, dependent variables vary in the extent to which they index generalized change. Those that are measured within the context of intervention, or in a context that is similar to intervention across several dimensions (i.e., materials, setting, interaction partners, interaction style), may reflect changes that are potentially bound to the intervention context. In contrast, dependent variables that are measured in a context that differs from the intervention on several dimensions should reflect highly generalized changes. For example, in the hypothetical study of an intervention that is administered during play with a therapist, outcomes measured in a play-based interaction with a familiar therapist and similar toys may index change that is bound to that context. The outcome measure does not afford any degree of confidence that the treatment has induced changes in child behavior that would generalize to other contexts. In contrast, outcomes measured using standardized assessment procedures (i.e., different interaction style and materials) administered by an unfamiliar examiner (i.e., different interaction partner) would likely reflect change that reaches across a wide range of contexts. Similarly, outcomes measured in the home environment in an interaction with a parent (i.e., different setting, interaction partner, and interaction style, assuming the parent has not been trained in the intervention), would serve as a naturalistic assessment of highly generalized change in this hypothetical study. In theory, generalized change is more difficult to effect than context-bound change, so effect sizes for generalized outcomes are likely to be smaller relative to effect sizes of outcomes that are potentially context-bound.</p><p>Proximity of outcomes to intervention targets. Outcomes may also vary by their proximity to the targets or goals of the intervention. Ideally, interventions would be able to demonstrate change not only on outcomes that are directly taught or addressed by the intervention (i.e., proximal outcomes), but also on outcomes that are developmentally downstream from what is directly taught or addressed (i.e., distal outcomes). When interventions are able to demonstrate growth on distal outcomes, they are essentially providing evidence that the intervention is influencing children's development, which may mean that the intervention will continue to have effects long after the intervention has stopped. However, prior best evidence syntheses have shown that early interventions for children with ASD show much larger effects for proximal as compared to distal outcomes <ref type="bibr">(Yoder et al., 2013)</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Correlated measurement error in parent</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Study Purpose and Research Questions</head><p>The purpose of this study is to gather and synthesize all available studies of nonpharmacological interventions targeting any outcome in children with ASD below the age of 8 years. Our specific research questions were:</p><p>1. Across all eligible quasi-experimental and experimental studies, are summary effects positive and significant for targeted outcomes for each of seven intervention types (behavioral, developmental, NDBI, TEACCH, sensory-based, animal-assisted, and technology-based)?</p><p>2. Are summary effects positive and significant for targeted outcomes for each of the aforementioned seven intervention types when only outcomes from studies with basic quality controls (i.e., random assignment, independent assessors) are included? 3. Across intervention and outcome types, are summary effects for proximal outcomes larger than summary effects for distal outcomes? 4. Across intervention and outcome types, are summary effects for outcomes that measure context-bound behaviors larger than summary effects for outcomes that measure more highly generalized characteristics?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Search</head><p>Search Terms and Databases. To gather the peer-reviewed literature included in the current meta-analysis, the following nine online databases were searched: Academic Search Complete, CINAHL Plus with Full Text, Education Source, Educational Administration Abstracts, ERIC, MEDLINE, PsycINFO, Psychology and Behavioral Sciences Collection, and SocINDEX with Full Text. Search terms were used in various combinations to capture the diagnostic criteria and intervention designs included within the search. The individual databases were searched using the following terms: autis*, ASD, PDD, Aspergers, intervention, therapy, teach*, treat*, program, package, assign*, control group, BAU, "wait list", RCT, random*, quasi, "treatment group", "intervention group", "group design", and trial. This initial search yielded 12,933 results from academic journals, dissertations, books, reports, conference materials, and reviews.</p><p>To gather grey literature, or studies not published in peer-reviewed journals, investigators who received federal grants to study autism were identified through a search of the National Database for Autism Research (NDAR), the National Institutes of Health (NIH) Matchmaker, and Institute of Education Sciences (IES) websites. A list of researchers (n = 106) was generated, and 90 of these investigators were emailed with a request for eligible data. The contact information for the remaining investigators could not be found. Screening process. A preliminary screen of abstracts was first completed using abstrackr <ref type="bibr">(Wallace, Small, Brodley, Lau, &amp; Thomas, 2012)</ref>. Studies were screened at the full-text level if they met the following inclusion criteria: (a) published in English, (b) published from 1970present, (c) group design that included both an intervention and control group, (d) a simple majority of participants were reported to have a diagnosis of ASD, and (e) the average age of included participants was between 0 and 8 years. In many instances, though studies met inclusion criteria, insufficient information was provided to enable the extraction of unadjusted effect sizes.</p><p>In these cases, authors were identified and emailed with a request to provide unadjusted postintervention means and standard deviations. The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) diagram in Figure <ref type="figure">1</ref> summarizes the search process and provides justifications for exclusion of articles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Coding Procedures</head><p>Included studies were coded for participant characteristics, intervention characteristics, study characteristics (including quality indicators), outcome characteristics, and effect size information. The coding manual is available upon request from the first author. Participant characteristics. Participant characteristics coded from studies included average age of participant samples in months, percentage of sample that was male, and average language age in months (either receptive, expressive, or total) whenever it was reported. Intervention characteristics. Intervention approaches were categorized based on the specific techniques used and the underlying philosophies that motivated the approach. A set of candidate categories (behavioral, developmental, NDBI, sensory-based, technology-based, cognitive behavior therapy, other) were drafted in the first instantiation of the coding manual for this synthesis based on authors' knowledge of intervention literature. Based on the results of our literature search and screening process, as well as the range of intervention approaches encountered across our team's initial training on coding precision and reliability, intervention categories were further refined to include 'animal-assisted therapy'. This intervention approach was found to be motivated by a distinct theoretical framework and to have amassed a sufficient number of group design studies to permit prior systematic review and meta-analysis <ref type="bibr">(O'Haire, 2013;</ref><ref type="bibr">2017;</ref><ref type="bibr">Trzmiel, Purandare, Michalak, Zasadzka, &amp; Pawlaczyk, 2018)</ref>. Thus, interventions were initially coded as animal-assisted therapy, behavioral, developmental, NDBI, cognitive behavior therapy, sensory-based, technology-based, or other. After completion of coding, the set of interventions coded as 'other' were re-examined to determine whether there existed a sufficient set of similar studies (e.g., 5 or more) that could be meaningfully combined to comprise an additional category. This was the case for studies of the TEACCH intervention.</p><p>Studies of TEACCH that were initially coded as 'other' were, therefore, re-coded as 'TEACCH'.</p><p>Animal-assisted therapy. Interventions coded as animal-assisted therapy were those mediated through the presence of an animal. Equine Assisted Therapy was an example listed in the coding manual. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sensory-based interventions.</head><p>Interventions were coded as sensory-based if they incorporated targeted exposure to sensory or multisensory (e.g., auditory, visual, tactile, olfactory) stimuli. Examples listed in the coding manual included sensory integration, music therapy, massage, acupuncture, auditory integration, and weighted blankets. This category was drafted based on precedent across prior reviews of sensory-based interventions <ref type="bibr">(Baranek, 2002;</ref><ref type="bibr">Case-Smith and Arbesman, 2008;</ref><ref type="bibr">Weitlauf et al., 2017)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Technology-based interventions.</head><p>Interventions were coded as technology-based if the intervention was primarily delivered on a computer or electronic device (i.e., iPad, DVD).</p><p>TEACCH. Interventions were re-coded as TEACCH if a study explicitly identified using this method.</p><p>Other. Interventions that did not fit into the previously defined categories were coded as other.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Study characteristics.</head><p>Study-level characteristics that were coded include design type (i.e., randomized control trial [RCT] or quasi-experimental), publication status (i.e., indexed, non-indexed, unpublished), and several features of study quality. Studies were coded as a randomized controlled trial if the text indicated that participants were randomly assigned to an intervention group and a control group or contrasting treatment, or if the authors referred to the study as "randomized." Studies were coded as quasi-experimental when authors made no indication that the process of group allocation was random. If a contrasting treatment model was used, the group receiving the treatment that was hypothesized by the authors to effect greater change was considered the treatment group. In the case of studies testing multiple active treatment groups compared to a passive control, treatment characteristics and effects were coded separately in contrast to control.</p><p>Publications were coded for whether they were published or unpublished. Published studies included indexed and non-indexed journals, and unpublished studies included dissertations and theses. Despite our extensive attempts to locate, obtain, and include unpublished data apart from dissertations and theses, no researchers provided us with data or effect sizes from unpublished reports.</p><p>Studies were coded for several indicators of study quality. These indicators included those specified by the Cochrane Collaboration's tool for assessing risk of bias (e.g., selection bias, performance bias, detection bias, reporting bias; <ref type="bibr">Higgins et al., 2011)</ref>, as well as additional indicators which we proposed in prior work (e.g., potential presence of CME related to parent/teacher training, sufficient number of participants to justify statistical analysis, reliance on parent or teacher report; <ref type="bibr">Yoder et al., 2013)</ref>. Selection bias related to insufficient randomization procedures and allocation concealment was coded as "high", "low", or "unclear" for studies coded as randomized controlled trials, and as "not applicable" for quasi-experimental studies.</p><p>For subsequent Cochrane quality indicators, risk of bias was coded as "high" or "low" if studies explicitly indicated or provided sufficient information to ascertain the presence or absence of such risk, and as "unclear" if information related to risk potential was not detailed. Risk of selective reporting bias was coded as high if outcomes were reported to have been collected at post but were not reported in results, or if an entire assessment was administered but only selective subscores were reported without sufficient justification. Performance bias risk was assessed in consideration of the participants' and families' awareness of their group assignment.</p><p>Detection bias accounted for the independence of assessors and coders. We elected to include interaction partners in naturalistic observational measures as "assessors," given that they may transiently influence child behavior during interactions. Attrition bias was coded with respect to the number of participants recruited and the number of participants included in analysis. Specifically, attrition bias was considered low if attrition was lower than 20% or if intent-to-treat analysis was utilized. Outcomes were coded as context-bound if they were measured in or very near the context of the intervention, and as generalized if they were measured in a context that differed from the context of intervention on multiple dimensions (e.g., interaction partners, materials, setting, interaction style). Outcomes taken from standardized parent/teacher reports were coded as potentially context-bound if reporters were also the primary mediators of intervention, based on the rationale that their report could be based on their observance of the outcome as it occurred within the context of the intervention they provided. Outcomes were coded as proximal if they indexed skills that were directly taught, modeled, or prompted during the intervention, and otherwise as distal. Outcomes indexed by developmentally scaled assessments were automatically coded as distal, based on the reasoning that these assessments are meant to tap generalized development rather than specific skills. We recognize that it is possible for an intervention to directly target specific items of a developmentally scaled assessment, but reasoned that in the absence of an extremely detailed description of intervention procedures, we should assume these assessments captured constructs beyond what was directly taught in intervention. Decision trees used to judge distality and boundedness are presented in Figures <ref type="figure">2</ref> and<ref type="figure">3</ref>, respectively. Correlated measurement error related to parent/teacher training was coded as potentially present when parents or teachers operated as both the mediators of intervention as well as the outcome assessors.</p><p>Outcome categorization. Each dependent variable was categorized as either a core feature of ASD (i.e., social communication; restricted/repetitive patterns of behaviors, interests, or activities; sensory) or a related outcome (i.e., language, motor, adaptive, cognitive, academic, play, sleep, brain imaging, social emotional/challenging behavior). If outcomes were reported at multiple time points, immediate and follow-up outcomes were coded separately.</p><p>Effect size information. Unadjusted means, SDs, and ns were extracted from all eligible studies that reported a group difference between participants receiving the specified intervention and those not receiving the specified intervention. Group difference effect sizes were calculated for each outcome using the standardized mean difference (d), as derived via the Campbell Collaboration Practical Meta-Analysis Effect Size Calculator <ref type="bibr">(Lipsey &amp; Wilson, 2001)</ref> and then converted to the effect size metric used for analyses, Hedge's g (g). Effect sizes were reported in such a way that higher g scores indicated superior performance in the treatment group.</p><p>We were unable to extract effect sizes from some eligible studies due to insufficient information (e.g., authors did not report means and SDs, reported only mean change scores, or reported means and SDs that were adjusted for baseline covariates and therefore could not be meta-analyzed with unadjusted means and SDs). When this occurred for articles published within the last ten years, we contacted the corresponding author(s) in an attempt to obtain either the unadjusted post means and SDs, or any other statistical information that would allow us to calculate the standardized mean difference between treatment and control/contrast groups after intervention. Fifty-five studies did not have sufficient information to allow effect size extraction for all outcomes. In the case of nine of these studies, effect size extraction was possible for some but not all outcomes, so eligible outcomes were included from those studies. Authors responded and supplied effect size information for 14 additional studies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Reliability</head><p>A primary coder (the first author) read and coded all studies. All studies were also independently coded for reliability by one coder from a team of nine. Both coding sheets were then sent to a separate coding auditor who examined codesheets for discrepancies and reported any disagreements between coders. Original primary and reliability codes were then saved for reliability analyses in a separate folder, and all disagreements were addressed in discrepancy discussions between the primary and reliability coders. Discrepancies were considered resolved once both coders agreed to a final consensus code, which was then added to the dataset used for the final analyses. Therefore, we are able to report reliability data from the original coding and also confirm that all disagreements were resolved prior to statistical analysis.</p><p>All reliability calculations were completed in R studio (R Core Team, 2017) using the irr package <ref type="bibr">(Gamer, Lemon, Fellows, &amp; Singh, 2012)</ref>. Reliability was indexed using unweighted kappa for all categorical variables (Cohen, 1960) and one-way random intraclass correlation coefficients for all continuous variables (ICC; <ref type="bibr">Shrout &amp; Fleiss, 1979)</ref>. Kappas ranged from 0.602-0.923, and average kappa across all categorical variables was 0.751. ICCs ranged from 0.676-0.999, and average ICC across all continuous variables was 0.916.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Analysis</head><p>All analyses were conducted in R (R Core Team, 2017). To account for the nesting of multiple effect sizes within overlapping participant samples, we used robust variance estimation (RVE) with small sample adjustments when synthesizing effect sizes and conducting metaregressions <ref type="bibr">(Hedges, Tipton, &amp; Johnson, 2010;</ref><ref type="bibr">Tanner-Smith, Tipton, &amp; Polanin, 2016)</ref>. These procedures account for the non-independence of effect size statistics drawn from overlapping samples, and provide test statistics and confidence intervals that are adjusted based on how the effect sizes are clustered.</p><p>Effect sizes were aggregated based on type of outcome (see Outcome characteristics) within each type of intervention (see Intervention characteristics). Aggregating the results in this manner provided a summary statistic for the effect of each intervention type on each outcome type. Meta-regression analyses were conducted on the coded variables of distality and boundedness (see Outcome characteristics) to determine whether the magnitude of the effects across intervention and outcome types were moderated by these categorical characteristics related to measurement. The threshold level of significance for these tests was set at p &lt; .10, given that we had clear directional hypotheses for each potential moderator, meriting one-tailed tests of significance. To examine the potential presence of publication bias, we examined funnel plots of effect size estimates against their standard errors, and corresponding Egger's tests of funnel plot asymmetry, for each summary effect estimate. Due to the large number of significance tests this demanded, we applied the Benjimini-Yekutieli false discovery rate correction to the significance values from the Egger's tests to correct for spurious findings using the Hmisc package in R <ref type="bibr">(Harrell, 2018)</ref>. The Robumeta package in R <ref type="bibr">(Fisher, Tipton, &amp; Zhipeng, 2017)</ref> was used to conduct these analyses while the Metafor package <ref type="bibr">(Viechtbauer, 2010)</ref> was used to graph the forest plots and funnel plots.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Descriptives of Included Study Samples and Outcomes</head><p>The search and screening process yielded 1,615 effect sizes gathered from 130 independent study samples (from a total of 149 reports) representing 6,240 participants. Across all studies, the average age of participants was 54.21 months (SD = 18.98), the average proportion of male participants per sample was 0.84 (SD = 0.07), and the average language age of participants in studies for which it was reported was 22.68 months (SD = 11.91). An average of 12.4 outcomes were reported for a single study sample (MIN = 1, MAX = 100, MDN = 8). that at least five studies contribute to the generation of effect sizes, so the studies representing animal-assisted intervention (n = 4), cognitive behavioral therapy (n = 2), and other varied approaches that could not be meaningfully combined into intervention types (n = 29) were excluded from summary effect estimation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Participant characteristics according to intervention type are reported in</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Study Quality</head><p>Figures 4 and 5 illustrate the proportion of studies or outcomes that received each quality rating (i.e., low risk of bias, high risk of bias, unable to determine) for seven key quality indicators, according to intervention type. These figures include only studies that contributed to summary effect estimation. Because it is almost always impossible for participants to be naive to intervention delivery in studies of nonpharmacological interventions for ASD, performance bias was rated as high for all but one study included in summary effect estimation and, thus, is not reported separately for each intervention type (see <ref type="bibr">Corbett, Schickman, &amp; Ferrer, 2008</ref> for the lone exception).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Behavioral intervention studies. Figure 4 reflects information regarding quality</head><p>indicator ratings for studies of behavioral interventions. Notably, only 29.63% of studies of behavioral interventions were RCTs. Detection bias was rated as high for 77.05% of outcomes in behavioral studies. High detection bias in this set of studies was largely driven by an overreliance on reports completed by individuals who were aware of intervention assignment -60.33% of outcomes were based on parent or teacher report. Correlated measurement error related to parent/ teacher training threatened 53.77% of outcomes reported in behavioral studies. Since many of the studies relied on standardized report forms, and because most of these studies only loosely described intervention targets, 86.23% of outcomes tracked in behavioral intervention studies were categorized as distal to the intervention targets. Half (50.49%) of outcomes were categorized as generalized, and 10.49% were classified as context bound. The remaining 39.02% of outcomes were categorized as potentially context bound, because they were derived from caregiver reports in studies where caregivers participated as interventionists (meaning that it is unclear if the outcome could be demonstrated in interactions with individuals who were not trained as interventionists). Bias related to substantial attrition (i.e., &gt; 20% of the study sample) was rated as high for 15.41% of all outcomes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Developmental intervention studies. Figure 4 reflects quality indicator ratings for</head><p>studies of developmental interventions. A large majority (78.57%) of included developmental studies were RCTs. Detection bias was rated as high for 53.97% of outcomes, but this was not due entirely to over-reliance on caregiver report. Nearly a third (29%) of outcomes were taken from parent/teacher report. The remainder of outcomes flagged for high detection bias (approximately half of the outcomes tracked in these studies) reflects the common practice of measuring language and communication outcomes in the context of interactions with natural communication partners (primarily parents, who were aware of group assignment). CME related to parent/teacher training threatened three quarters (75%) of all outcomes in developmental studies. Since many of the developmental interventions were explicitly described as targeting language and social communication, and many of the outcomes were observational measures of language and social communicative behaviors, approximately half (53.57%) of outcomes were categorized as proximal to intervention targets. Approximately a quarter (27.84%) of outcomes were categorized as generalized, a quarter (25%) were categorized as potentially context-bound, and approximately one half (47.16%) were categorized as context-bound. Over a third (34.66%) of all outcomes were subject to high bias from substantial attrition.</p><p>Naturalistic developmental behavioral intervention studies. Figure <ref type="figure">4</ref> illustrates quality indicator ratings for included studies of NDBIs. A large majority (76.92%) of included studies of NDBIs were RCTs. Detection bias was rated as high for 59.42% of outcomes. This was due, in part, to the common use of observational measures of skills coded from natural interactions with interaction partners who were aware of group assignment. Only 17% (the lowest of any intervention type) of outcomes were collected from parent/teacher report. However, CME related to parent/teacher training threatened 47.09% of outcomes, due to a prevalence of parent-training studies which included outcomes derived from parent-child interactions. Because many NDBIs were described as specifically targeting symbolic play, early social communication, and language, researcher-created measures of these skills were coded as proximal to intervention targets. Thus, nearly half (47.59%) of outcomes in NDBI studies were categorized as proximal. Nearly a quarter (22.22%) were categorized as generalized, 52.41% were categorized as potentially context-bound, and another quarter (26.36%) were categorized as context-bound. Only 7.25% of outcomes were subject to bias from high attrition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sensory-based intervention studies. Figure 5 reflects quality indicator ratings for</head><p>sensory-based intervention studies that were included in summary effect size estimation. All of the seven studies included in effect size estimation were RCTs. Since language was the only outcome category for which there were a sufficient number of sensory-based intervention studies to permit summary effect size estimation, the following outcome-level quality indicator ratings apply only to the language outcomes (n = 13) tracked in these studies. Detection bias was rated as high for nearly half (46.15%) of all language outcomes. Nearly a third (30.77%) of all outcomes were based on parent/teacher report, and these same outcomes were also subject to CME related to parent training. The overwhelming majority (92%) of outcomes were categorized as distal, because few sensory-based interventions were described as directly targeting language.</p><p>Nearly a third (30.77%) were categorized as generalized, 53.86% were categorized as potentially context-bound, and 15.38% of outcomes were categorized as context-bound. Attrition bias was rated as high for 15.38% of outcomes. TEACCH studies. Figure <ref type="figure">5</ref> illustrates quality indicator ratings for studies of TEACCH that were included in summary effect size estimation (n = 6). Only two (33%) of these studies were RCTs. Detection bias was rated as high for the majority (81.81%) of outcomes, and this was largely driven by an over-reliance on parent/teacher report, from which 77.27% of outcomes were derived. CME related to parent/teacher training threatened half (50%) of all outcomes.</p><p>Given that the explicit individual intervention targets of TEACCH were not thoroughly described, and that the majority of outcomes were taken from standardized parent/teacher reports, almost all (95.45%) outcomes were assumed to be distal. Nearly half (45.45%) of outcomes were categorized as generalized, half (50%) were categorized as potentially contextbound, and the remaining 4.54% were categorized as context-bound. None (0%) of the studies reported substantial attrition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Technology-based intervention studies. Figure 5 illustrates quality indicator ratings for studies of technology-based interventions. Of ten technology-based intervention studies included</head><p>in summary effect estimation, eight (80%) were RCTs. Detection bias was rated as high for 64.28% of all outcomes. Over a third (38.1%) of outcomes were taken from parent/teacher report. CME related to parent/teacher training threatened 30.95% of outcomes. Over half (53.57%) of outcomes were categorized as distal. Nearly a third (30.95%) of outcomes were categorized as generalized, nearly half (47.62%) were categorized as potentially context-bound, and 21.43% were categorized as context-bound. Bias related to substantial attrition was rated as high for 15.38% of outcomes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Summary Effects by Intervention and Outcome Type</head><p>Summary effects across all studies without consideration of quality indicators.</p><p>Figure <ref type="figure">6</ref> reflects summary effect size estimates within interventions and outcome types. These estimates were derived using all available effect sizes, both from quasi-experimental studies and RCTs. Summary effects were computed when effect sizes associated with a given outcome and intervention type were available from at least five independent participant samples. Thus, we were able to estimate the summary effects of behavioral interventions on adaptive outcomes, cognitive outcomes, language outcomes, motor outcomes, social communication outcomes, social emotional/challenging behavior outcomes, and outcomes quantifying broader autism symptomatology. Summary effects for behavioral interventions across outcome types ranged from 0.24 to 0.46 and were all statistically significant. For developmental interventions, only language and social communication outcomes were measured in a sufficient number of studies to permit the estimation of summary effects. The summary effects of developmental interventions on these outcomes were 0.06 and 0.30, respectively, and only the estimate for social communication was statistically significant. The summary effects of NDBIs were separately estimated for adaptive outcomes, cognitive outcomes, language outcomes, play outcomes, restrictive and repetitive behaviors, social communication outcomes, social emotional/challenging behavior outcomes, and outcomes that quantified broader autism symptomatology. These summary effects ranged from -0.01 to 0.35. The summary effect estimates of NDBIs on cognition, language, play, and social communication outcomes were statistically significant. For sensory-based interventions, only language outcomes were measured in a sufficient number of studies to permit the estimation of summary effects. This summary effect estimate was 0.28, and was not significant. For TEACCH, summary effects could be generated only for social communication outcomes. This summary effect estimate was -0.11 and was not significant. For technology-based interventions, the most frequently tracked outcomes were social communication and social emotional/challenging behavior. Summary effect estimates for these outcomes were 0.05 and 0.42, respectively, and neither were significant.</p><p>Summary effects from RCTs. Figure <ref type="figure">7</ref> reflects summary effect size estimates derived exclusively from outcomes extracted from RCTs, according to intervention and outcome type.</p><p>There were not enough RCTs of behavioral interventions to permit summary effect estimation for any outcome type. For developmental interventions, the summary effect across social communication outcomes from RCTs was 0.27 and significant. For NDBIs, a sufficient number of RCTs permitted the estimation of summary effects on cognition, language, play, and social communication. These estimates ranged from 0.18 to 0.42, and were significant for language, play, and social communication. All of the studies tracking the effect of sensory-based interventions on language outcomes were RCTs. Therefore, this summary effect estimate remains identical to that of the initial model. For technology-based interventions, there were only enough RCTs to permit estimation of a summary effect for social communication. This was 0.06 and was not significant. There were no RCTs examining the effects of the TEACCH intervention on any outcome.</p><p>Summary effects from RCTs excluding outcomes from caregiver reports. Figure <ref type="figure">8</ref> reflects summary effects estimated exclusively from outcomes that were extracted from RCTs and that were not based on caregiver report. For developmental interventions, a sufficient number of studies and outcomes permitted the estimation of a summary effect for social communication, which was 0.31 and statistically significant. For NDBIs, summary effect estimation was possible for cognition, language, play, and social communication. These effects ranged from 0.18 to 0.47, and were significant in the cases of play and social communication outcomes. For sensory-based interventions, summary effect estimation was possible for language only. This estimate was 0.28 and was not significant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Summary effects from</head><p>RCTs excluding all outcomes subject to a high threat of detection bias. Figure <ref type="figure">9</ref> reflects summary effects estimated exclusively from outcomes that were extracted from RCTs where assessors were unaware of group assignment. There were enough studies/effect sizes of this nature to permit estimation of the summary effects of NDBIs on language and social communication only. These estimates were 0.17 and 0.17, respectively, and were not significant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Publication Bias Analyses</head><p>Funnel plots and Egger's test results are included in the supplementary materials accompanying this report. Corrected p-values for Egger's tests for funnel plot asymmetry were significant for adaptive and social communication outcomes from studies of NDBIs, suggesting that publication bias may have threatened these summary estimates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Moderator Analyses</head><p>Meta-regression analyses across the entire dataset suggested that summary effects were significantly larger for outcomes that were proximal compared to those that were distal (&#946; = 0.171, p = 0.024). Boundedness was also a significant source of effect size variance; effect sizes coded as generalized (&#946; = -0.170, p = .076) were smaller than those coded as potentially contextbound or context bound (&#946; = -0.115, p = 0.22).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>The purpose of this study was to locate, evaluate, and synthesize all available quasiexperimental and RCT investigations of nonpharmacological interventions for children with ASD in terms of methodological quality and summary effect. Results suggest that some intervention approaches show promise for improving a range of outcomes, while others have amassed relatively limited evidence of effectiveness to date. The number of RCT investigations in this area have increased precipitously, but low methodological rigor remains a concern.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Promising Intervention Types</head><p>We consider intervention types for which significant summary effects were shown for at least one outcome, when two important quality indicators were taken into account (randomization and abstention from using caregiver reports) to be 'promising'. NDBIs and developmental interventions meet these criteria.</p><p>NDBI. This is the first paper to report summary effects of NDBIs since the 2015 consensus paper that established this new category of intervention as a blend of traditional behavioral and developmental approaches. By far, NDBIs have emerged as the intervention type most supported by evidence from RCTs. These studies suggest NDBIs may be particularly useful for supporting development of social communication, language, and play skills. Studies of NDBIs were also the least likely to rely on caregiver report as a primary index of intervention effectiveness. However, we note that when outcomes subject to all forms of detection bias were excluded from summary effect estimation, there was no category of outcomes for this intervention type that reached significance. In addition, our results suggest that publication bias may have threatened overall summary estimates for adaptive and social communication outcome types. However, asymmetry in these funnel plots may also be due to other methodological design flaws, such as the presence of detection bias.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Developmental. Evidence suggests that developmental interventions may be particularly</head><p>effective for supporting the acquisition of social communication skills, which represents a core challenge for young children with ASD. This conclusion is supported even when outcomes from quasi-experimental studies and caregiver report are excluded. However, a substantial portion of outcomes were subject to high detection bias due to interaction partners or assessors that were aware of group assignment. When these outcomes are excluded, the remaining studies are too few in number to permit summary effect estimation for any outcome type. A key assumption of developmental interventions is that targeted gains in social communication will facilitate cascading developments in the domain of language. This assumption was not supported by our meta-analysis, as the summary effect of developmental interventions on language outcomes was not significant. However, we did locate compelling evidence suggesting that early targeted improvements in the synchrony of parent-child interactions can yield longitudinal improvements in the core challenges associated with ASD, which are detectable with standardized, independently administered assessments <ref type="bibr">(Green et al., 2010;</ref><ref type="bibr">Pickles et al., 2016)</ref>. <ref type="bibr">Green and colleagues (2010)</ref> study of the Preschool Autism Communication Trial (PACT) supports the notion that proximal changes effected by intervention can facilitate long-term change in developmentally distal outcomes, even in the absence of continued intervention. It also provides an example of methodological rigor to which the field should aspire, as it employed random assignment, pre-registered analyses, independent evaluators, and clearly defined proximal and distal outcomes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Intervention Types with Some Evidence of Effectiveness</head><p>Behavioral. Behavioral intervention, specifically EIBI and related variants, is the most commonly recommended intervention approach for children with ASD, with many states specifying behavioral interventions explicitly in insurance coverage mandates ("Autism and insurance coverage," 2018). Indeed, the large number of behavioral intervention studies (n = 27) that met our search criteria also suggests this is the most studied intervention approach for this population. Considered as a whole, without regard to quality of evidence, these studies support the effectiveness of behavioral interventions for improving a wide range of outcomes for children with ASD. However, only a fraction of past studies exploring the effects of traditional behavioral interventions were RCTs, and the majority of outcomes contributing to summary effect sizes were taken from caregiver report. Thus, the relatively low quality of this set of intervention literature limits our confidence in the accuracy of the summary effect sizes estimated in the initial model. A notable exception is the sole RCT which examined the effects of EIBI on standardized measures of cognition and language administered by independent evaluators <ref type="bibr">(Smith, Groen, &amp; Wynn, 2000)</ref>. Though the positive results of this study are encouraging, they have persisted without replication for nearly 20 years. The dramatic increase in published RCTs in the intervening years since this study's publication stand as proof that high quality group experimental investigations of autism-specific interventions are both possible and necessary in order to to unquestionably establish the effectiveness of interventions that are so routinely recommended. In the meantime, clinicians are encouraged to expand their knowledge and skills to include naturalistic approaches that center the principles of early childhood development. States with insurance mandates that explicitly cover traditional behavioral interventions should furthermore revise their policies to also include NDBI and developmental approaches, given that these approaches have now accrued substantial evidence for effects in young children on the autism spectrum from recently-published RCTs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Intervention Types with Little Evidence of Effectiveness</head><p>Sensory. Several previous systematic reviews have concluded that sensory-based interventions have amassed little evidence supporting their effectiveness to date (e.g., <ref type="bibr">Barton, Reichow, Schnitz, Smith, &amp; Sherlock, 2015;</ref><ref type="bibr">Case-Smith, Weaver, &amp; Fristad, 2015)</ref>. Our results are consistent with these conclusions. Relatively few group design studies of sensory-based interventions specifically focused on young children with ASD (i.e., with a mean age &lt; 8 years) were located. Furthermore, there were not a sufficient number of studies measuring and reporting sensory outcomes in a manner that permitted extraction of effect size information and estimation of the summary effect of this intervention approach on what would presumably be the most proximal outcome (i.e., improvements in sensory function). This is particularly concerning in light of the fact that sensory differences are highly prevalent in this population (e.g., <ref type="bibr">Ausderau, Sideris, Furlong, Little, Bulluck, &amp; Baranek, 2014;</ref><ref type="bibr">Ben-Sasson, Hen, Fluss, Cermak, Engel-Yeger, &amp; Gal, 2009;</ref><ref type="bibr">Leekam, Nieto, Libby, Wing, &amp; Gould, 2007)</ref> and have been found to be associated with some aspects of child stress <ref type="bibr">(Corbett, Schupp, Levine, &amp; Mendoza, 2009)</ref>.</p><p>Unfortunately, across all included studies, we found no evidence that any intervention type had the potential to influence sensory outcomes in children with ASD. When we were able to estimate summary effects of sensory-based interventions, as was the case for language outcomes, the relative paucity of studies limited the precision of our estimates. Though the summary effect estimate for sensory-based interventions on language outcomes is similar in magnitude to those of behavioral and NDBI approaches, this estimate is surrounded by a much wider confidence band, which overlaps with zero (i.e., the effect is not significant).</p><p>It should be noted that our category of sensory-based interventions was broad and included intervention approaches as distinct as Sensory Integration Therapy, Tomatis Sound Therapy&#8482;, and music therapy. The heterogeneity of these intervention approaches may limit the conclusions that can be drawn from this summary effect size estimate, as the theoretical underpinnings and clinical procedures do vary across approaches. It may be useful to consider the evidence for each of these intervention approaches separately, though the limited number of studies for each prevented us from computing subgroup effect sizes here. However, we did not come across any noteworthy high quality studies that suggested that any of the aforementioned intervention approaches had markedly positive effects on outcomes (though see <ref type="bibr">Schaaf et al., 2014</ref> which unfortunately did not report outcome data in a manner that would permit derivation of effect size information for synthesis). We did locate two exceptionally high quality studies demonstrating null effects of two sensory-based interventions, music therapy <ref type="bibr">(Bieleninik et al., 2017)</ref> and auditory stimulation <ref type="bibr">(Corbett et al., 2008)</ref>. Therefore, our conclusion that there is limited high quality evidence to date to support sensory-based interventions for young children with ASD is based on our quantitative findings as well as our more fine-grained qualitative observations about this set of literature. Given that sensory features are now a core diagnostic criteria of ASD (APA, 2013), and given the already widespread implementation of sensory-based interventions for this population (e.g., <ref type="bibr">Goin-Kochel, Mackintosh, &amp; Myers, 2009;</ref><ref type="bibr">Schaaf &amp; Case-Smith, 2014)</ref>, we suggest that more rigorous research of these interventions be conducted to precisely determine their effects for children with ASD.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TEACCH. Though TEACCH was among the first interventions designed specifically for</head><p>individuals with ASD, it also remains relatively under-studied compared to several other intervention approaches geared towards this population. Few eligible studies of TEACCH were located, and most were quasi-experimental. This may be because TEACCH is often conceptualized as a classroom wide intervention, necessitating large, cluster-randomized trials that are substantially more expensive to implement than clinically-based RCTs. The negative summary effect estimated across these studies suggests that there is limited evidence to support the effectiveness of TEACCH for improvement of social communication skills, and almost no evidence to support the effectiveness of TEACCH for the improvement of other core and related symptoms of ASD.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Technology-based interventions.</head><p>Although assistive technology is an important support that must be accessible to autistic individuals, early interventions mediated entirely through technology have little evidence to support their effectiveness for improving social communication or social emotional outcomes in children with ASD. Both of the summary effect sizes for these outcome types had confidence intervals which included zero. The majority of technology-based interventions represented in this meta-analysis were DVDs or video games that targeted social emotional learning and social communication skills. The limited effectiveness of these interventions may be attributable to the near or total absence of a human interaction partner in these intervention contexts. Though technological supports have characteristics that might make them particularly useful to autistic people (e.g., predictable formats of information delivery, self-paced usage, highly motivating), these supports likely need to be integrated into interpersonal interactions, which could include computer-mediated interpersonal interactions, rather than replacing interaction partners entirely in learning situations. This may be particularly true when the targeted developmental achievements are social in nature. In fact, the integration of technological supports into other interaction-based interventions is an approach that is supported by high-quality studies. For example, <ref type="bibr">Kasari and colleagues (2014)</ref> integrated speech generating devices (SGD) into their JASP-EMT early intervention approach, and found gains on a variety of communication outcomes for preschoolers who were initially minimally verbal, compared to those that received the same intervention without use of the SGD. In this study, technology was integrated into an already well-developed intervention, that had amassed some degree of empirical support.</p><p>This may be a sensible path forward for conceptualizing the utility of new technologies for early intervention. That is, technology may be most useful when it is integrated into previously developed and validated approaches as a means to expand the populations of children with ASD for whom the intervention is accessible, rather than as an intervention in its own right.</p><p>In this regard, it is important to consider that the ultimate use of technology is usually separable from the means by which children are taught to use it, so even the most intuitively designed technologies will still need to be paired with a validated teaching approach to ensure that children are able to learn to use the technology in a meaningful way.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Animal-assisted interventions.</head><p>Although we did locate studies of animal-assisted interventions, there were too few to permit estimation of summary effect sizes for any outcomes.</p><p>The two interventions represented in these studies were EAAT and canine assistance. Several of these studies relied on caregiver report to index change, and two were flagged for possible unreported conflicts of interest, as the authors currently provide the interventions in question for profit <ref type="bibr">(Bass, Duchowny, &amp; Llabre, 2009;</ref><ref type="bibr">Page, 2012)</ref>. Therefore, there is little quality evidence to support the effectiveness of animal-assisted interventions for any outcomes for children with ASD at this time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Issues Related to Quality Indicators</head><p>The results of this study indicate that study quality remains an issue plaguing intervention research in young children with ASD. Three issues appear especially important to point out, including the preponderance of quasi-experimental group designs, reliance on caregiver/teacher report, and correlated measurement error due to interaction partners or assessors who participated in the intervention.</p><p>Although it is well established that randomized controlled trials offer the best protection against alternative explanations for intervention effects, quasi-experimental studies continue to be relied upon in autism intervention research. There are some circumstances wherein quasiexperimental methods may be appropriate, such as studies aiming to move established interventions into community settings where groups are already intact and randomizing participants would be prohibitively costly (e.g., <ref type="bibr">Vivanti et al., 2014)</ref>. However, our results suggest that we do not yet have intervention types that can be considered 'established' to an extent that would warrant this strategy. Since there were too few studies to permit the estimation of summary effects once study design and performance bias were taken into account, we suggest that researcher and funding resources should continue to focus on establishing study efficacy using the highest quality designs.</p><p>Another area of particular concern is continued reliance on parent/teacher report. These measures are nearly impossible to administer in such a way that the respondent is unaware of the child's participation in an intervention. Indeed, research has shown that when caregivers complete such measures, an intervention 'effect' will be demonstrated if they believe their child is receiving an intervention even when no intervention has actually occurred <ref type="bibr">(Jones, Carberry, Hamo, &amp; Lord, 2017)</ref>. We therefore suggest that early intervention researchers should not rely on such measures, and instead seek alternative measurement systems that can be administered and scored by assessors who are unaware of group assignment.</p><p>Finally, correlated measurement error that occurs when parents or teachers are trained in an intervention and also participate as assessors is a common threat to validity that has received little attention from the field. Continued use of observational measures taken from interactions with trained caregivers may be fruitful for mediation analyses, in order to verify that posttreatment group differences in developmentally distal and generalized outcomes are explained at least in part by changes in reciprocal interactions with caregivers within the context of intervention. However, researchers should recognize that these measures are biased in favor of the intervention group, and should therefore not rely on them as a primary index of intervention effects. Researchers should also employ valid, standardized, independently-administered assessments as primary outcomes whenever possible. While changes in interactions between a trained caregiver and child may be important to measure if those interactions are expected to be the 'mechanism' through which the child achieves later developmental milestones, these interactions may not themselves index improvements in the child's interactional repertoire. If researchers consider interactions with a familiar person as the most valid context for outcome assessment, they can avoid this threat to validity by relying on observational measures taken from interactions with familiar but untrained interaction partners (e.g., untrained teachers, untrained parents, untrained siblings, or untrained peers). Use of untrained interaction partners that are also naive to group assignment will further help researchers address the added threat of detection bias.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Understanding Intervention Outcomes -Boundedness and Proximity</head><p>Replicating previous research syntheses <ref type="bibr">(Yoder et al., 2013;</ref><ref type="bibr">Fuller &amp; Kaiser, 2019</ref>) and confirming our hypotheses, effect sizes were larger for indices of context-bound behaviors as compared to generalized child characteristics. This finding confirms that interventions (broadly considered) produce larger effects on behaviors that are potentially bound to the treatment context, which are likely easier to change, than on more highly generalized characteristics of young children with ASD. In certain circumstances, context-bound behavior change may be considered important. For example, if a study aims to improve children's classroom engagement, many would consider it acceptable if these effects did not generalize beyond the classroom, as the effects are likely only relevant in classroom contexts.</p><p>However, many stakeholders may expect interventions aiming to improve child characteristics associated with longer-term development (e.g., social communication) to produce gains that generalize to contexts beyond intervention settings. If developmentally important effects cannot be demonstrated outside intervention settings, it is unlikely that they will continue to be a part of the child's behavioral repertoire, in any context, once the intervention has stopped.</p><p>Unfortunately, researchers do not always indicate whether their measurement system was restricted to detecting context-bound behaviors, or if it was able to detect gains in generalized child characteristics. We encourage researchers to make this distinction clear when presenting their study design, and when describing potential limitations in the case of studies that exclusively examine context-bound behavior change.</p><p>Our hypothesis was also confirmed in regards to proximity; effect sizes for proximal outcomes were larger than effect sizes for distal outcomes. Parallel to our findings on boundedness, this indicates that interventions are more effective at achieving gains on outcomes that reflect what was directly addressed in the intervention than gains on outcomes that are broader or beyond what was directly taught. Evidence of distal effects provide some evidence that the intervention is tapping into a developmental pathway, which can give researchers confidence that the intervention will continue to influence children's development after the intervention period is over.</p><p>There are some caveats to our approach in categorizing outcome proximity. One is that this concept is likely more accurately described as continual rather than binary. There are degrees of proximity and distality that we were not able to capture by restricting our coding to only two categories. A second caveat is that we were limited to the information about the intervention provided by study authors, which was often quite sparse. When delineating the focus of the intervention, authors did not always clarify if they were describing the immediate targets of the intervention, or a developmentally downstream target. Similarly, many studies did not offer a detailed description of the intervention, which hampered our ability to determine which outcomes were directly addressed by intervention procedures. Finally, proximity and distality are conflated with type of measurement system. Norm-referenced, standardized measures generally assess broad contexts which by definition cannot be directly targeted by intervention procedures and are therefore categorized as distal. On the other hand, observational measures of particular behaviors are often designed by researchers specifically to detect the most immediate effects of intervention (e.g., observational measures of joint engagement for interventions that seek to increase the amount of time children spend jointly engaged), which would be categorized as proximal. Thus, proximal measures may be more sensitive to change than distal measures, while distal measures are likely more construct valid than researchercreated proximal measures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Interpreting Findings in Light of the Exclusion of Evidence from SSDs</head><p>It should be reiterated that we exclusively synthesized findings from randomized and nonrandomized group design studies of interventions for children with ASD. By excluding studies with single group pretest-posttest designs and SSDs, we have omitted a substantial body of research that has been used to draw conclusions about evidence-based practice, particularly in regards to the effectiveness of behavior analytic approaches. In fact, as of 2015, the majority of the available studies of intervention techniques for children with autism employed SSD <ref type="bibr">(Wong et al., 2015)</ref>, though our review and other reviews published since attest to the recent precipitous increase in group design literature published in this field <ref type="bibr">(French &amp; Kennedy, 2017)</ref>.</p><p>Our decision to exclude SSDs from this meta-analysis was rooted primarily in the lack of adequate and agreed upon effect size metrics for synthesizing effects <ref type="bibr">(Kratochwill et al., 2013)</ref>. However, we believe there are additional insights to be gained from limiting our conclusions specifically to evidence offered by group design studies. Though SSDs are well-equipped to identify effective techniques for teaching specific targeted skills, group design studies are particularly useful for determining whether interventions can facilitate gains in generalized development. The repeated measurement that is a hallmark of SSDs may allow investigators to understand variability in specific behaviors associated with careful and controlled changes in the independent variable, but it limits reliance on validated standardized assessments as outcome measures. Such assessments, though often time consuming to administer, are likely better equipped to tap improvements in generalized development than researcher-created operationalizations of specific behaviors. Thus, if we wish to evaluate whether intervention facilitates developmental progress in young children with autism on average, an evaluation of group design studies may, arguably, be more methodologically suited for this purpose. However, even though group design studies may be preferable in this regard, ours and other recent work has shown that a substantial portion of the outcome measures used in clinical trials were overly specific to the intervention context and targets <ref type="bibr">(Provenzani et al., 2019)</ref>. Thus, fragmented measurement approaches continue to limit the conclusions that can be drawn regarding the effectiveness of autism interventions, both in SSDs and group design studies. This remains a limitation, both for the body of evidence as a whole, and our conclusions here.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Recommendations for Primary Intervention Research</head><p>Given the results of this series of meta-analyses, we propose several recommendations. While our confidence in summary effect estimates for any intervention type is hindered by a lack of high quality studies, we do have single examples of studies that meet the majority of quality indicators (e.g., <ref type="bibr">Green et al., 2010)</ref>. This suggests that designing a high-quality study is not an unreachable challenge for early intervention researchers. It would perhaps incentivize future high-quality research if funding agencies held investigators to a higher standard and required basic quality features such as randomized trials and measurement systems that can be administered in such a way that assessors remain naive to treatment status. At the very least, caregiver and teacher reports should likely be discarded altogether, as it is already clear that they introduce bias and render findings largely uninterpretable <ref type="bibr">(Jones et al., 2017)</ref>. For some domains, this may mean that new measures will need to be developed and validated that are low-cost to administer and adequately sensitive to change.</p><p>A second recommendation, also related to measurement systems, is that researchers should provide detailed descriptions of each measure (especially if they are researcher-created), and the assessment process in which each measure is used. This will allow for an adequate assessment of the kinds of bias introduced or avoided by particular approaches to measurement, and will allow for a determination of whether measures are capturing context-bound behavior change or generalized characteristics. To make this latter determination, aspects such as the measurement context, who administered the measure, and the materials and activities used during measurement should be made clear.</p><p>Third, we were quite struck by how little information many studies contained in regards to the intervention that was tested. Though it is not necessary that every study on a given intervention provide minute detail of the procedures, it would be helpful if there were at least one manualized protocol available for each intervention that describes the full set of strategies and activities involved in implementing the intervention. This would encourage independent replication of intervention studies, and would allow for a determination of whether the outcomes measured were proximal or distal to the intervention procedures. To make this distinction, researchers need to go beyond describing the aims of the intervention-they need to specifically describe the protocol in such a way that the immediate outcomes of implementing the intervention are readily discernible.</p><p>Fourth, fifty studies were excluded because relevant effect size information was not published or extractable. In many cases, this was due to exclusive reporting of change scores or post-intervention means adjusted for various baseline covariates, which should not be metaanalyzed alongside standardized mean differences extracted from unadjusted means <ref type="bibr">(Deeks, Higgins, &amp; Altman, 2008)</ref>. Though we contacted authors in every case wherein studies were less than 10 years old, many failed to respond. Therefore, we recommend that authors reporting results that control for covariates include unadjusted means and SDs in supplementary materials, to facilitate future attempts at meta-analysis.</p><p>Finally, we suggest that an "optimal" intervention design would include paired proximal and distal measures (or perhaps even include a third, far distal measure) that are expected to be developmentally connected and malleable to change. The proximal measure should be selected to capture the immediate effects of the intervention, while the distal measure should be selected to measure effects hypothesized to be developmentally downstream from proximal effects.</p><p>Mediation analyses, in which the proximal measure is the mediating variable and the distal measure is the outcome variable, could then confirm whether the proposed developmental pathway between proximal and distal effects was activated by participation in the intervention. This would allow for a better understanding of the mechanisms or 'active ingredients' through which interventions achieve cascading developmental gains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Limitations and Future Meta-analytic Research</head><p>There are at least three limitations to consider when interpreting the results of this study.</p><p>First, despite our best efforts, we were unable to collect any unpublished effect sizes or datasets apart from dissertations and theses. This could mean that the effect size estimates presented here are larger than the 'true' effects (an interpretation supported by inspection of funnel plots). Our attempts to gather unpublished effect sizes included searching NIH, NDAR, and IES databases, and requesting data directly from investigators who were reported to have received funding for group design intervention research in children with ASD. However, we did not receive any unpublished data from any researchers, suggesting there may be reticence among researchers to share their unpublished data. This is unfortunate, as access to unpublished data is critical for accurately estimating effect sizes, and accurately assessing the 'state of the science'. Further, data sharing practices are critical to ensure replicability of findings <ref type="bibr">(Nuijten, 2018)</ref>.</p><p>A second limitation to consider is that there were too few studies to adequately synthesize effect sizes for all outcomes and intervention types. This was especially true when quality indicators were taken into consideration. Researchers will need to commit to conducting highquality intervention research in order for future syntheses to accurately draw conclusions about intervention effectiveness for outcomes of interest in children with autism.</p><p>Finally, the heterogeneity of variables within each "outcome type" and treatments represented within each "intervention type" may limit the interpretability of our summary effect estimates. While we note that variables and intervention approaches were similar enough to be categorized with high reliability -kappa coefficients for outcome type and intervention type coding were 0.862 and 0.907, respectively -categorization of items that differ on a continuum will always result in the loss of information, and this information may be important for understanding key components that drive intervention effects. For example, the same intervention provided with different intensities (i.e., number of hours per week) may yield different effects. Similarly, intervention effects may differ for variables that share a domain but are distinct (e.g., social communication variables such as responding to joint attention and initiating joint attention). More fine-grained analyses within each outcome type could allow us to answer questions about putative moderators, as well as to calculate subgroup effect sizes for identical outcome types across studies (e.g., Vineland scores), or identical interventions (e.g., PECS) as the literature base on treatment effects in children with ASD continues to grow.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusions</head><p>The current study differs from existing reviews on intervention in children with ASD in two important ways. First, this study is one of few attempts to consider all intervention types and intervention outcomes as broadly as possible. This allows us to report the state of the science in regards to which interventions have accrued the most convincing evidence of effectiveness for young children with ASD, and to report on the full range of outcomes that these interventions are able to influence. Second, this study accounts for rigorous quality criteria that are common considerations in other areas of psychology, but that are applied less often to evaluations of autism research (e.g., <ref type="bibr">Reichow, Volkmar, &amp; Cicchetti, 2008)</ref>. Finally, several syntheses that are similar to ours in scope consider some of the design factors of included studies in order to classify intervention types according to levels of evidence (e.g., <ref type="bibr">Wong et al., 2015)</ref>. However, these syntheses have not provided an examination of intervention effects according to characteristics of the outcome variable, which prevents researchers from drawing conclusions in          </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>*Kasari, C., Paparella, T., Freeman, S., &amp; Jahromi, L. B. (2008). Language outcome in autism:</p></note>
		</body>
		</text>
</TEI>
