<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Not adding up: free ridership and spillover calculations in energy efficiency evaluations</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>06/01/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10194578</idno>
					<idno type="doi">10.1007/s12053-020-09872-6</idno>
					<title level='j'>Energy Efficiency</title>
<idno>1570-646X</idno>
<biblScope unit="volume">13</biblScope>
<biblScope unit="issue">5</biblScope>					

					<author>Zachary Froio</author><author>Pranay Kumar</author><author>Frank A. Felder</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[A key element of evaluation, measurement, and verification (EM&V) studies for energy efficiency programs involves estimation of net energy savings that account for free ridership, spillover, and induced market effects. The existing literature recognizes these effects to be significant and provides detailed guidelines to estimate them. However, there appears to be a disconnect between these guidelines and field evaluations conducted in practice. Our meta-analysis of 120 studies from 2006 to 2018 indicates that most free ridership and spillover estimates are based on survey results and expressed in percentage terms. We note that simply adding these percentages numerically without converting them into a common unit is inaccurate and obscures a program's true impact. Additionally, there exists wide variations in nomenclature, classification, and methodologies adopted to estimate these metrics across programs and jurisdictions. Our scatterplot analysis of the reviewed EM&V reports indicates that with few exceptions, free ridership and spillover do not necessarily offset each other. We propose an alternative approach to estimate free ridership and spillover in energy units with costs in dollar terms, e.g., as the difference between a program participant's total willingness-to-pay and the total financial impact of the program's existence. We also feel that a consistent, transparent, and reliable evaluation methodology to estimate free ridership and spillover effects across programs and jurisdictions based on randomized or quasiexperimental designs will not only improve accuracy but will also have better comparability for informed policy decisions in future.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Often considered to be the lowest-cost energy resource available, many government organizations, regulators, and utilities alike support the establishment and expansion of energy efficiency (EE) programs as a "first fuel" of choice among competing alternatives to accommodate energy demand <ref type="bibr">(Friedrich et al. 2009</ref>). Energy efficiency policy options are considered win-win solutions not only in terms of energy resource planning but also as an important greenhouse gas emission reduction strategy. Not surprisingly, the worldwide investment in energy efficiency was estimated to be $231 billion in 2016 with the building sector accounting for 58% of total investment dollars (IEA 2017). In the USA, states have invested approximately $8 billion in energy efficiency and have reportedly saved 27.1 million megawatt hours in 2018 <ref type="bibr">(ACEEE Scorecard 2019)</ref>.</p><p>A key element of effective evaluation, measurement, and verification (EM&amp;V) studies for EE programs is the estimation of net energy savings, defined by the US Department of Energy (DOE) Uniform Methods Project (UMP) protocol document as "changes in energy use that are attributable to a particular EE program," which may "implicitly or explicitly include the effects of free ridership, spillover, and induced market effects" (FR, <ref type="bibr">SO, and ME, respectively)</ref>. According to the UMP, free riders are "program participants who would have implemented a program measure or practice in the absence of program," and these participants have been divided into total, partial, and/or deferred free riders. Furthermore, program spillover refers to "additional reductions in energy consumption or demand that are due to program influences beyond those directly associated with program participation" broadly categorized into two types: participant spillover (further subcategorized into inside, outside, like, and unlike spillover) and non-participant spillover. Similarly, market effects include "long-lasting, substantive changes in market structures or participant behavior that is reflective of an increased adoption of EE measures which is causally related to market interventions" <ref type="bibr">(Li et al. 2018)</ref>. Most evaluation protocols-including DOE UMP guidelines-provide for estimating net savings by deducting the sum of the estimated values of FR, SO, and ME (including negative values) from gross savings. Furthermore, the ratio of a program's net savings to gross savings is commonly referred to as the net-to-gross (NTG) ratio. This calculation is often employed in a variation of the following series of equations as demonstrated by the UMP:</p><p>(1.) Net savings = (gross savings) -(free ridership) + (spillover) + (market effects savings not already captured by SO) (2.) Net-to-gross ratio = 1 -(free ridership)/(gross savings) + (spillover)/(gross savings) + (market effects)/(gross savings)</p><p>(3.) Net savings = (net-to-gross ratio) * (gross savings) <ref type="bibr">(Li et al. 2018</ref>). However, the addition of FR, SO, and ME is problematic for two major reasons. First, this approach omits essential information regarding the effectiveness of a program's design. The varying methodologies to estimate free ridership and spillover-as well as the inconsistent usage of both spillover and market effects within net savings calculations-inhibit the meaningful comparison of net savings across different programs. Furthermore, prescriptive changes to these program designs-for example, to reduce free ridership or to increase spillover-will differ between programs with varying degrees of these effects, regardless of the reported net savings estimations for each respective program. Second, when evaluations express FR, SO, and ME in percentages (namely with respect to total program participants), the addition of these percentages often does not result in the same outcome if such calculations were performed in terms of units of energy, e.g., kilowatt-hours, given the inherent differences between consumption and saving potentials across participants. We deliberate upon these arguments that challenge the internal validity of the theoretical construct adopted for estimation of the net energy savings of EE programs. Furthermore, our review of the EM&amp;V report brings out wide variations in nomenclature, classification, and methodologies adopted in practice for evaluation of EE programs across jurisdictions. We find a clear disconnect between available guidelines and actual field evaluations conducted in practice. As such, even if we overlook the inherent limitations of the measurement guidelines for a moment, the wider discrepancies and inconsistencies in the evaluation methods adopted in practice make the net savings values less accurate and difficult to compare across programs and jurisdictions.</p><p>The remainder of this paper explores these major findings in detail. The "Overview of EM&amp;V studies that estimate FR, SO, and ME to determine net savings" section begins with an overview of EM&amp;V studies that estimate FR, SO, and ME to determine net savings; the "Adding FR and SO loses essential information about a program's design" section demonstrates how the addition of FR and SO loses important information about a program's design; the "Discussion and alternative approaches" section discusses the implications of these findings and makes the case for alternative approaches to estimate net savings; and the "Conclusions" section concludes with policy recommendations and areas for further research.</p><p>Overview of EM&amp;V studies that estimate FR, SO, and ME to determine net savings Estimation of net energy savings does not just present technical challenges associated with quality, effectiveness, and validity of EE programs but also have important long-term policy implications for different stakeholders and impact on GHG emissions at a global level <ref type="bibr">(Vine et al. 2010)</ref>. A comprehensive policy analysis would require assessing cost-benefit analysis from the perspectives of different stakeholders depending upon the policy context using standard tests <ref type="bibr">(Felder and Athawale 2017)</ref>. However, our current analysis is limited to the estimation of net energy savings for impact evaluation of EE programs from ratepayer's perspective. Accordingly, we find that free ridership has important implications not just for the assessment of the costeffectiveness of EE programs in terms of energy units saved per program dollar spent, but also from equity considerations involving potentially free-riding program participants and the non-participants who fund such programs. A fundamental and problematic issue in estimating free ridership is constructing a satisfactory baseline scenario, i.e., a counterfactual. In other words, to quantify the net cause-effect relationship between program participation and energy savings, it is important to isolate those participants who would have implemented the particular EE measure even without any intervention. Failure to account for such participants might overstate a program's effects. Similarly, failure to assess spillover effects might result in biased estimates of program impacts and inappropriate policy recommendations <ref type="bibr">(Angelucci and Maro 2015)</ref>.For estimating net savings, it is also important to consider the program influence on all market actors over successive stages of market development. Evaluation of market transformation programs for EE product and services has significant policy implications and they focus more on markets rather than on programs <ref type="bibr">(Vine et al. 2009</ref>).As such, estimating market effects requires careful considerations and more elaborate studies employing mix of qualitative and quantitative methods over a longer period of time <ref type="bibr">(Violette and Rathbun 2017)</ref>.However, current EE program evaluation practices are marked by varied terminologies and inconsistent methodologies that are compounded by a lack of publicly available information to assess program analyses. For these reasons, the program evaluation field has been subjected to a long and continued debate eluding consensus <ref type="bibr">(Jaccard 2010)</ref>.</p><p>The American Council for an Energy-Efficient Economy (ACEEE) compiles and compares the energy saving performance of all large cities, states, and territories in the USA and ranks them based on their respective performance scorecards. An important metric of the ACEEE scorecard is the net annual incremental electricity savings as a percentage of annual state utility retail sales. Saving information is obtained from public service commission databases, and sales data are sourced from the US Energy Information Administration (EIA). From this report, it is observed that there is wide diversity among states in the usage of net versus gross energy savings for program evaluation purposes, with just 12 states reporting gross savings, 21 states reporting net savings, and another 9 states reporting both values (ACEEE Scorecard 2019, p. 30). States report energy savings as either net or gross, with net savings accounting for free riders, spillovers, and market effects, and gross savings not accounting for these metrics.</p><p>Our findings from the literature review of the EE program experience in the EU countries reveal similar pattern that is not much different from the USA. We observe that EM&amp;V protocols, methodologies, and outcomes are not consistently applied across the European Union either. Recently, most EU countries have started estimating gross savings but only the United Kingdom and Denmark estimate net energy savings. Whereas the free rider effect (also referred to as deadweight) is factored into the estimation of net energy savings in a suppliers' obligation program in the UK, the same is not accounted for in a building rehabilitation program in Germany <ref type="bibr">(Rosenow and Ray 2012)</ref>. The examination of the costs and benefits of a Norwegian energy conservation program that provided financial support to participants revealed that about 70% of those participants was free riders <ref type="bibr">(Haugland 1996)</ref>. In Italy, an incentive program for energy upgrades used a survey of 3,000 households to find the free rider effect up to 70% <ref type="bibr">(Alberini and Bigano 2014)</ref>. Furthermore, comprehensive EM&amp;V protocols for EE projects are at various stages of development in China and India <ref type="bibr">(Slote et al. 2014)</ref>.</p><p>For this study, two separate datasets of comprehensive EM&amp;V studies were compiled to conduct two distinct analyses: (1) a review of the methodologies   The vast majority of these 100 evaluations report gross savings, with 91.1% of those also reporting net savings estimations. Of the evaluations that report net savings, 89.3% directly report free ridership estimations, and all of which that detail their methodologies use survey techniques to estimate FR values. The evaluations that report free ridership do so in percentage terms of participants, energy savings, or both at the measure level (12.7%), the program level (10.1%), both (64.6%), or do not explicitly specify (12.7%). By comparison, only 66.7% of net saving evaluations estimates spillover effects. These studies report spillover in percentage terms of participants and/or energy savings at the measure level (13.1%), program level (16.4%), both (52.5%), or do not specify (18%). Like free ridership, all reports that detail their methodologies estimate spillover effects using survey techniques. A notable example is the Rocky Mountain Power Wyoming Home Energy Saving evaluation, which aggregated individual measures under six broad categories to estimate FR and SO with a statistically significant sample population. Of the net saving studies, less than 5% reported market effects, of which all estimated ME separately from spillover. In all, the majority of net saving studies-64.8%-weighted net savings by the percentage of energy savings.</p><p>The lack of ME estimations is noteworthy, and many reports acknowledge the need for further research to quantify these effects. Notable examples that assess market effects include Rocky Mountain Power Idaho's Of the 82 studies that report net savings, there is wide variation in the categories and terminology used to describe and thus estimate net saving components. Studies that estimate free ridership categorize the effect in terms of full-only (4.1%), full and partial (14.9%), full, partial, and deferred (9.5%), or otherwise do not explicitly specify (71.6%). Little consistency exists across the categorization of spillover effects as well, with studies estimating SO as like (4.1%), participant (29.7%), participant and non-participant (12.2%), or otherwise do not specify (51.4%). Only two studies-2.7% of those that report spillover-categorize all types: like, unlike, participant, and non-participant spillover.</p><p>The majority of the studies that estimate both FR and SO employ some variation of the NTG ratio saving calculation illustrated above, in which percentage estimates of free ridership and spillover are deducted and added, respectively, from gross saving estimates. However, the methodologies used to estimate FR and SO values vary across these reports. Most of the studies arrive at these estimates using a combination of yes/no responses to a series of participant survey questions. Of the 100 reviewed studies, none employed randomized controlled trials or quasi-experiments to estimate FR and SO. Others, however, use novel methods such as demand response modeling, trade ally interviews, or standard deemed values for specific measures, e.g., from statewide technical reference manuals. For example, Wisconsin's Focus on Energy program evaluation estimates FR and SO values for lighting measures using a combination of four methods: demand elasticity modeling, national sales modeling, corporate retailer interviews, and manufacturer interviews. In the same program, net savings for select HVAC measures were estimated using the Standard Market Practice (SMP) approach. Under this method, the evaluation team first established the average market baseline consumption with available market data for equipment sold outside the program. Ignoring spillover, saving net of free ridership was estimated as the difference between the market baseline and the average energy consumed by the measures installed under the program. Other jurisdictions-in Indiana and Delaware, for examplealso recognize net savings estimations that employ the SMP methodology. SMP approaches typically assume that the standard market values can act as the baseline for program evaluation avoiding the need to account for free ridership separately. However, this may or may not be the case depending upon the difference between the actual energy consumption levels of program participants with respect to the standard market baseline and therefore, needs to be tested further. <ref type="bibr">(Ridge et al. 2013)</ref>. Another example of non-survey based net saving estimations is CPS Energy's Home Efficiency Program, which uses the Texas Public Utility Commission's approved deemed values and engineering calculations for individual measures.</p><p>The second analysis for this report draws from 20 comprehensive, publicly available EM&amp;V studies published on public utility commission websites or found through search engine results using key energy efficiency EM&amp;V terms (see Table <ref type="table">2</ref> following the "Conclusions" section). These studies comprise of programs offered from 2006 through 2015 and span across 11 states (California, Connecticut, Delaware, Illinois, Indiana, Maine, Maryland, Massachusetts, New Hampshire, New York, and Rhode Island). All study programs are further organized into subprograms by both industry (i.e., residential, commercial, and industrial) and the specific energy efficiency measures administered within each subprogram.</p><p>These reports consist of 83 total subprograms from which data was obtained for the following variables: reported and evaluated program gross savings, program net savings, subprogram and measure-level NTG ratios, free ridership, spillover, and market effects. Of these 83 subprograms, all reported their estimated NTG ratios and 75 estimated their reported and evaluated gross savings and the realization rate of program gross savings. A total of 46 subprograms reported values for subprogram free ridership, while 21-approximately 25% of all compiled subprograms-reported both free ridership and spillover effects to arrive at their respective net-to-gross ratios. Of the 20 total reports, only one-the National Grid USA 2009 Commercial and Industrial Programs Free-ridership and Spillover Study-attempted to estimate program market effects, under what it considers to be "non-participant spillover effects" <ref type="bibr">(Kraft et al. 2010)</ref>. The remaining evaluations either mention but do not attempt to estimate ME or simply overlook these effects. Furthermore, only nine reports-less than half of the evaluated studiesreported net saving metrics at the measure level.</p><p>Using the FR and SO percentage estimates from the 20 reviewed studies, a scatterplot was drawn as shown in Fig. <ref type="figure">1</ref> above. For ease of analysis, the plot area was further subdivided into four quadrants by intersecting the two axes at 50% FR and SO values. One of the obvious observations from the scatterplot is that most programs reflect only free ridership percentage values overlooking spillover effects. Using the same format, measure-level FR and SO values sourced from the Massachusetts Technical Resource Manual (TRM) were drawn as shown in Fig. <ref type="figure">2</ref> below. In the measure-level plot, most of the observations were found to be in the bottom left quadrant, representing low SO and FR effects.</p><p>From the figures above, it will be incorrect to draw any general conclusions regarding the program effects. However, we feel that not assessing all components of the net savings does not capture the true program effects and might lead to incorrect policy decisions. Furthermore, we also observe that the FR and SO percentages are not uniformly distributed across all four quadrants, suggesting that free ridership and spillover percentages vary significantly across the measures and might not offset each other. These findings do not support the argument that NTG ratios are likely to be estimated as 1 in the majority of existing evaluation studies <ref type="bibr">(Haeri and Khawaja 2012)</ref> except perhaps for low income, hard-to-reach small business, and new programs, in which values for FR and SO are estimated to be negligible <ref type="bibr">(PWP, Inc. 2017)</ref>. Numerical values for market effects were not separately available in the compiled studies, but NTG ratios vary widely from 0.49 to 1.98 across reviewed programs and measures in the Massachusetts TRM. The review of these evaluation reports indicates that NTG ratios vary significantly across measures and programs across states, suggesting that free ridership and spillover effects do not necessarily offset each other, which is what some of the states are implicitly assuming if they do not require the adjustment of gross savings for FR and SO. As such, assuming an NTG ratio equal to 1 may not capture the true program effects and either over-or understate the actual savings attributable to a given program.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Adding FR and SO loses essential information about a program's design</head><p>Based on our review of the free ridership and spillover values estimated in the EM&amp;V reports, we discuss the policy implications from the perspective of cost-benefit analysis of program impacts on ratepayers. Figure <ref type="figure">3</ref> illustrates four possible scenarios for a hypothetical EE program, with combinations of free ridership percentage scores of either "low" or "high" values. The scenario depicted in Quadrant III indicates that a program's free ridership percentage is low, and that spillover is high. This may be considered a relatively cost-effective scenario, as little program expenditure will incur energy savings from program participants, while non-participants will accrue higher benefits relative to programs falling within the other quadrants. Alternatively, the scenario in Quadrant I is suboptimal, as free ridership outweighs spillover, indicating a lower level of cost-effectiveness with a relatively lower level of positive spillover and market effects to provide offsetting benefits. Given this inefficient allocation of resources, the program will likely invite criticism, potentially requiring it to be redesigned, more effectively implemented, or discontinued.</p><p>From a cursory review, the scenarios in Quadrants II and IV appear to have similar outcomes, as the effects of free ridership and spillover appear to offset each other. However, in practice, the two scenarios are dissimilar, suggesting different outcomes and implications about the effectiveness of each program. Low free ridership and spillover percentages (Quadrant II) indicate a welldesigned and implemented program compared with those illustrated in the remaining three quadrants. The low percentage values of free ridership and spillover in Quadrant II are not likely to have significant financial or distributional implications, even if these effects do not offset each other completely. However, the presence of high free ridership and spillover in Quadrant IV may have significant implications for actual savings-even if the difference is small in percentage terms-if gross energy savings are considerably high. Secondly, high free ridership and spillover percentages indicate a mature market for a given technology, positioned further up along a typical s-curve illustration of market penetration. Thus, for technologies that already achieve a sufficient market saturation without additional incentives, it is most practical and cost-effective to allow these to compete naturally in the market without incurring the costs of program intervention.</p><p>Additionally, the existing literature acknowledges with broad consensus that FR and SO exist within established EE programs and are significant enough to impact program savings, few studies to date acknowledge or attempt to measure the overlap between free ridership and spillover effects. A notable exception is Mahone and Hall's acknowledgement that EE programs with aggressive efficiency goals create an almost unavoidable overlap with similar programs within a given market <ref type="bibr">(Mahone and Hall 2010)</ref>. In such a case, one program's participants contributing positively to net savings may be considered free riders detracting from another program's own savings. For example, Program A, a hypothetical residential lighting rebate program that induces its participants to enroll in hypothetical Program B, an existing smart thermostat rebate program, must recognize these savings as attributable to Program A, which are also assumed to be accounted for in Program B's tracking system. However, in this scenario, because Program A induces the additional spillover savings, Program B should not attribute these savings to its own program at the risk of double-counting net savings for each program. This scenario also suggests that spillover savings attributable to Program A may in fact cause free ridership in Program B, given that a participant would have adopted Program B's efficiency measure in the absence of Program B. Furthermore, even if properly attributed as Program B free ridership and Program A spillover savings, the realized savings will not be offset under each program's respective evaluation method, given that energy savings are not equivalent across unlike measures. As such, simply adding together estimated values may not capture the dynamic overlaps between these effects and potentially result in inaccurate estimations of actual savings.</p><p>Given these considerations, EM&amp;V practices must account for these overlaps in their respective evaluation and survey techniques, with similar logic extended to Fig. <ref type="figure">2</ref> Measure-level free ridership vs. spillover scatterplot (Massachusetts TRM) Fig. <ref type="figure">3</ref> Free ridership vs. spillover matrix emphasize the overlap between the various categories of FR, SO, and ME across the measure, program, and portfolio levels over time. For example, Itron's Em-POWER Maryland evaluation does not attribute spillover savings estimates to its residential appliance recycling subprogram to avoid potentially doublecounting these savings, given that survey respondents indicated that they had adopted measures eligible for incentives through other EE programs (Itron, Inc. 2014).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion and alternative approaches</head><p>Another major finding of our review is that most of the evaluation reports do not explicitly estimate FR and SO values in energy units. Instead, they express FR and SO in percentage terms and erroneously add within and between these categories to arrive at NTG estimates. This analysis argues against this approach, given that percentages must correspond to physical quantities in the same units in order to be added numerically. Thus, for FR and SO to be combined congruently, these values must be expressed in common energy units (i.e., megawatt-hours, therms, or megawatts for demand reductions) or in percentages of the same units. Furthermore, given their disparate functions, energy and demand savings cannot be added across unlike EE measures because they consume energy and/or reduce its usage in inherently different ways. Because unlike EE measures vary across the type of energy resource saved (e.g., natural gas, coal, etc.) and the timing of savings (e.g. hourly, seasonally), these savings are inherently unequal and cannot be appropriately added together to holistically express total net savings.</p><p>In most of the evaluated studies for this report, FR and SO are estimated based on assigned numerical scores to a set of survey questions and expressed in percentage terms. These numbers are then added together to estimate total energy savings, with a program's cost-effectiveness based upon the incremental costs incurred on portfolio of measures. Survey-based methods are among the most commonly used approaches to estimate FR, SO, and NTG ratios due to their costeffectiveness and flexibility. However, survey results are susceptible to social desirability bias and the arbitrary assignment of FR scores based on the subjectivity of the surveyor or evaluator <ref type="bibr">(Li et al. 2018</ref>). Furthermore, it is observed from those reports that detail their respective survey methodologies and publish sample questionnaires, it is not possible for a reader to answer these questions and arrive at specific free ridership or spillover percentages because the underlying assumptions and functions to calculate these values are not made publicly-available. Thus, a transparent comparison of FR and SO across programs is not possible by reviewing published EM&amp;V studies alone due to the differences between their respective methodologies, terminologies, and irreproducible survey results.</p><p>An alternate method to estimate FR could be in terms of the difference between a program participant's total willingness-to-pay for a given measure and the program's observed investment cost expressed in dollar terms <ref type="bibr">(Grosche and Vance 2009)</ref>. This methodology may also provide a more accurate illustration of a program's cost-effectiveness in terms of dollars per unit of reduced energy consumption or in terms of dollars per ton of carbon dioxide abated. Another more accurate and reliable approach to estimate net savings is to measure the difference in energy usage between program participants "treatment group" and a similar comparison group of non-participants "control group" at the same time based on randomized controlled trials or quasiexperimental methods (State and Local Energy Efficiency Action Network 2012). For example, by comparing inframarginal program participants and non-participants who did not meet program eligibility thresholds, a study of Mexican subsidies for replacement of refrigerators and air conditioners found that approximately half of all participants would have purchased the energy-efficient appliances in absence of the program <ref type="bibr">(Boomhower and Davis 2014)</ref>. In a recent working paper on cross-program spillover, Jessoe et al. employed randomized controlled trials to test the effects of social norm messaging about residential water usage on electricity consumption <ref type="bibr">(Jessoe et al. 2017)</ref>. Their results provide experimental evidence that behavioral interventions spill over to untreated sectors by altering consumer choice. Such techniquesespecially randomized controlled trials-are often considered the highest standard of practice in the social sciences to establish causality <ref type="bibr">(Li et al. 2018</ref>). These methods may have higher costs and time requirements and other limitations pertaining to ethical issues and applicability for large-scale evaluations. However, they may-with appropriate application-significantly limit bias and enhance the comparability and reliability of evaluation results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusions</head><p>Based on our review of the impact evaluation reports of EE programs for the cost-benefit analysis from ratepayers' perspective, we find there is a clear disconnect between the available guidelines and the actual practices employed in reality. Our review of the existing EM&amp;V reports indicates that the majority of FR and SO estimates are based on survey results and expressed in percentage terms. Although percentages serve as a simple and useful indicator, the analysis in the "Adding FR and SO loses essential information about a program's design" section and the "Discussion and alternative approaches" section suggests that adding these percentages without converting the individual components into a common unit is inaccurate and obscures a program's true effects. Such methodologies may also fail to capture the dynamic overlaps between these effects and potentially result in inaccurate estimations of true program savings. Furthermore, a wide variation exists regarding the methodologies and estimated values of FR and SO across different states, thus inhibiting meaningful comparisons with be made across various program designs. The scatterplot analysis of the reviewed EM&amp;V reports indicates that FR and SO do not necessarily offset each other, except perhaps for low income, hard-to-reach small business, and new programs, for which values for these effects are estimated to be close to zero. As such, simply adding the percentages without converting them into common units ignoring the overlaps between FR, SO, and ME, and assuming an NTG ratio equal to 1 may over-or understate the actual savings attributable to a given program.</p><p>An alternative approach for the analysis of FR and SO is to estimate and report these values in terms of energy units and analyze them in dollar terms to more accurately determine program effectiveness. Based on the dynamic behavior of FR as a function of a program's presence in a given market, another method would be to estimate free ridership in terms of participant's total willingness-to-pay for EE measures in the absence of program. This method may provide a more accurate illustration of a program's cost-effectiveness in terms of dollars per unit of reduced energy consumption or in terms of dollars per ton of carbon dioxide abated. As such, FR, SO, and ME should instead be reported in terms of a program's gross energy savings and associated dollar savings for a more transparent program evaluation design. Additionally, for a meaningful comparison across the measure, program, portfolio, and utility levels, it is recommended that a consistent and reliable methodology be uniformly adopted across all EM&amp;V studies. As a step forward, randomized or quasiexperimental designs can be tried for more accurate impact evaluations and for better comparability of EE programs across jurisdictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Compliance with ethical standards</head><p>Conflict of interest The authors declare that they have no conflict of interest.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Energy Efficiency (2020) 13:991-1005</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>Efficiency (2020) 13:991-1005</p></note>
		</body>
		</text>
</TEI>
