<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Multiblock Parameter Calibration in Computer Models</title></titleStmt>
			<publicationStmt>
				<publisher>INFORMS</publisher>
				<date>10/01/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10481380</idno>
					<idno type="doi">10.1287/ijds.2023.0029</idno>
					<title level='j'>INFORMS Journal on Data Science</title>
<idno>2694-4022</idno>
<biblScope unit="volume">2</biblScope>
<biblScope unit="issue">2</biblScope>					

					<author>Cheoljoon Jeong</author><author>Ziang Xu</author><author>Albert S. Berahas</author><author>Eunshin Byon</author><author>Kristen Cetin</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<p>Parameter calibration aims to estimate unobservable parameters used in a computer model by using physical process responses and computer model outputs. In the literature, existing studies calibrate all parameters simultaneously using an entire data set. However, in certain applications, some parameters are associated with only a subset of data. For example, in the building energy simulation, cooling (heating) season parameters should be calibrated using data collected during the cooling (heating) season only. This study provides a new multiblock calibration approach that considers such heterogeneity. Unlike existing studies that build emulators for the computer model response, such as the widely used Bayesian calibration approach, we consider multiple loss functions to be minimized, each for a block of parameters that use the corresponding data set, and estimate the parameters using a nonlinear optimization technique. We present the convergence properties under certain conditions and quantify the parameter estimation uncertainties. The superiority of our approach is demonstrated through numerical studies and a real-world building energy simulation case study.</p> <p>History: Bianca Maria Colosimo served as the senior editor for this article.</p> <p>Funding: This work was partially supported by the National Science Foundation [Grants CMMI-1662553, CMMI-2226348, and CBET-1804321].</p> <p>Data Ethics & Reproducibility Note: The code capsule is available on Code Ocean at https://codeocean.com/capsule/8623151/tree/v1 and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2023.0029 ).</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Recent advancements in computing have enabled the adoption of computer models as an essential tool to design and control systems. In general, a computer model consists of a set of mathematical functions that are based on complex physics-based first principals in order to closely resemble a real-world system. A computer model is usually utilized with a simulator. A simulator receives input variables that are typically observable or controllable, such as operational conditions, and it generates outputs through simulation. Besides the input variables, additional parameters need to be specified a priori to run simulation. These parameters are oftentimes not observable, thus their values should be estimated using either physical laws or data. When physical laws that identify 2 Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008 appropriate parameter values are not available, one can use a data-driven approach using data collected from the physical system. Such a data-driven procedure is called parameter calibration in the literature <ref type="bibr">(Kennedy and O'Hagan 2001)</ref>.</p><p>Over the last couple of decades, parameter calibration has been studied in many fields such as biology, chemistry, climatology, and engineering <ref type="bibr">(Santner et al. 2018)</ref>. We provide a detailed literature review in Section 3. Existing studies, by and large, utilize limited datasets when physical trials are limited and computer models are expensive to run. To accommodate data scarcity, they build surrogate models for computer models and calibration is performed with surrogate models.</p><p>Typically, they generate data from a computer model a priori through a fixed set of simulation runs.</p><p>With the resulting data, they build a response surface (or emulator) for the computer model output using Gaussian processes (GPs) or other statistical models <ref type="bibr">(Higdon et al. 2004</ref><ref type="bibr">, Tuo et al. 2021)</ref>.</p><p>Consequently, calibration accuracy highly depends on the emulator accuracy. When the computer model data are generated at less informative design points, e.g., values far from (unknown) true parameter values, the emulator may not accurately characterize the response surface near the true parameter values, leading to inaccurate calibration results.</p><p>Recently, <ref type="bibr">Liu et al. (2021)</ref> discussed a necessity to develop a new calibration procedure for big data settings. In many operational systems, thanks to the recent advances in sensing technology and data acquisition systems, massive amounts of observational data from an actual physical system have become available. Further, a sufficiently large number of simulation data can be generated through a medium-or low-fidelity simulator or using advanced computing facilities, e.g., supercomputers. When a large number of field observations and simulation data are available, <ref type="bibr">Liu et al. (2021)</ref> presented a new approach to directly utilize data without building emulators.</p><p>The aforementioned existing literature calibrates all parameters in a similar manner despite of heterogeneous data requirement. That is, they calibrate all parameters at once with the same dataset. In some applications, however, some parameters are associated with certain specific operating conditions or a subset of datasets. Consider a building energy model (BEM). Among many parameters that need to be calibrated, some parameters are season-dependent, and thus they should be employed for simulating building energy operations only in specific seasons <ref type="bibr">(Xu et al. 2021)</ref>.</p><p>For example, "cooling supply air flow rate" is a cooling-season parameter, whereas "heating supply air flow rate" is associated with a heating season. On the other hand, other parameters, e.g., "fan total efficiency", are global parameters that need to be employed throughout the year. This BEM parameter calibration, as our motivating application, will be explained in more detail in Section 2.</p><p>From a data science point of view, these season-dependent (or block-dependent) parameters should be calibrated using data (both field operational and simulation-generated data) associated with the corresponding season (or block) only. That is, one needs to use a subset of data. When 3 the required dataset for each group of parameters is exclusive from one another, one can calibrate parameters separately for each group. However, when the subsets for different groups overlap, prior methods become inappropriate.</p><p>This study develops a new calibration approach where parameters can be divided into multiple groups and each group of parameters is associated with a subset (or block) of the entire dataset. Specifically, we estimate the parameters by minimizing multiple loss functions, each of which is associated with each group of parameters. Our proposed approach is useful when large computer experiments can be conducted, e.g., from a low-to medium-fidelity computer model.</p><p>We summarize and highlight the key contribution of this paper as follows.</p><p>&#8226; Our approach estimates unobservable parameters employed in a computer model where parameters are associated with different subsets of observational data that possibly overlap with one another. To the best of our knowledge, this is the first calibration study that takes this special problem structure into consideration in the calibration procedure.</p><p>&#8226; One salient feature of the proposed approach is that it adaptively generates data on the fly from the computer model during the calibration procedure so that the most appropriate data can be produced from the computer model. This is different from the traditional frameworks that generate simulation data at pre-selected data points a priori.</p><p>&#8226; To minimize multiple loss functions, we design a nonlinear optimization algorithm and derive its associated convergence guarantees. Under certain (reasonable) conditions, our analysis provides strong guarantees for our algorithm for different classes of functions.</p><p>&#8226; We conduct numerical experiments on a wide range of settings and a case study with real data to demonstrate the performance of the proposed approach. The results indicate its superiority with respect to multiple criteria, including estimation accuracy, uncertainty quantification, computational efficiency, and scalability, over alternative approaches.</p><p>A preliminary version of this work is presented in a short conference paper <ref type="bibr">(Xu et al. 2021)</ref>.</p><p>In this paper, we substantially extend the analysis to connect the calibration procedure with the statistical parameter estimation, derive and present convergence guarantees, quantify estimation uncertainties, and provide extensive empirical results via numerical examples and a case study.</p><p>The rest of this paper is organized as follows. Section 2 discusses the BEM application as our motivating example and other examples from various applications. Section 3 provides a literature review of related methods. Section 4 gives a formulation of the problem. Section 5 develops a new algorithm to solve the problem and studies the convergence properties. We evaluate the performance of our methodology using numerical examples and building energy simulation case study in Sections 6 and 7, respectively. Finally, Section 8 provides a brief conclusion.</p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Applications</head><p>In this section, we introduce the motivating building energy application and discuss the wide applicability of our proposed method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Motivating Application: Building Energy Model</head><p>A physics-based BEM, which simulates the building energy operations, can play an important role in optimizing building design and operations. To effectively use the BEM, several parameters that specify building characteristics need to be estimated using operational data. Table <ref type="table">1</ref> lists the parameters employed in the BEM simulator, called EnergyPlus, which is developed and maintained by the U.S. Department of Energy's National Renewable Energy Laboratories (U.S. Department of Energy 2019).</p><p>A unique aspect of the BEM is the seasonal dependency of the parameters, which is the focus of our study. Among the parameters, some parameters related to lighting, domestic hot water, window material, and ventilation can be considered as global parameters that are associated with the entire year-long operational data and simulation. On the contrary, cooling-and heating-related parameters are only associated with their seasonal portions of the year-long data. Interestingly, it is possible that different seasons may overlap in some areas. For example, in Texas, buildings operate heating devices from November to April, whereas cooling systems run from March to November.</p><p>Hence, during some periods of time, they operate both heating and cooling devices. Motivated by the BEM parameter calibration, this study develops a new calibration approach when parameters need to be calibrated with different portions of data. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Other Applications</head><p>There are various applications where our proposed method can be utilized. First, consider the hydrological computer model that simulates water flow to estimate natural system behaviors affected by climate change, land use, etc. This model can be used to investigate the rainfall-runoff relationship for the windrow compositing pad to estimate the amount of runoff <ref type="bibr">(Duncan et al. 2013</ref><ref type="bibr">, Bhattacharjee et al. 2019)</ref>. To accurately simulate rainfall-runoff dynamics, several parameters, including the depth of surface, depth of subsurface, saturated hydraulic conductivity of the gravel media, and saturated hydraulic conductivity of the supporting soil below the media, need to be calibrated. Among them, the depth of surface and depth of subsurface can be regarded as global parameters that are associated with the entire year-long data, whereas others are possibly season-dependent parameters associated only with their seasonal portion of data due to the fact that they are generally affected by weather.</p><p>Another application is the wind flow model that characterizes spatially heterogeneous wind patterns within a wind farm due to the interactions among turbines <ref type="bibr">(You et al. 2017</ref><ref type="bibr">(You et al. , 2018))</ref>.</p><p>Recently, <ref type="bibr">Howland et al. (2022)</ref> presented a flow model that consists of two sub-models: the poweryaw model and wake effect model. The power-yaw model predicts the power production of upwind turbines (yawed turbines) with a parameter &#955; p that adjusts the power generation amount in a a yaw-misalignment setting. The wake effect model predicts the wind speed deficit at downstream turbines (waked turbines) due to the shading effect of upwind turbines <ref type="bibr">(Liu et al. 2021</ref>) so that it estimates the power production of the waked turbines. This wake effect model takes the wake spreading coefficient k w and the proportionality constant of the modeled Gaussian wake &#963; 0 as parameters. Among the parameters, &#955; p can be considered as a global parameter that is employed in all wind conditions. On the other hand, k w and &#963; 0 depend on atmospheric conditions <ref type="bibr">(Howland et al. 2022)</ref>, which are calibrated using a portion of data collected under specific conditions.</p><p>Lastly, consider a system with a hierarchical block structure. One example could be a pandemic simulation model for infectious diseases. When an outbreak, such as COVID-19, occurs, it is important to predict the spread of infectious diseases. In the simulation models, some parameters such as infection rate, reinfection rate, and positive test rate should be calibrated to achieve accurate predictions and they are usually associated with specific geographical areas. These areas overlap hierarchically from a country level, followed by state, county, city levels, etc. and each parameter should be calibrated using the data obtained from the associated area of interest. For instance, data to predict the infection rate at a city level overlap with that at a county level.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Literature Review</head><p>In this section, we provide a literature review of several relevant methodologies.</p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Statistical Calibration Studies</head><p>The seminal work in modern parameter calibration is Bayesian calibration <ref type="bibr">(Kennedy and</ref><ref type="bibr">O'Hagan 2001, Higdon et al. 2004</ref>). It uses a linear linkage model to connect the responses of physical processes with computer model's outputs, and the calibration parameters and hyperparameters in mean vectors and covariance matrices of the GP emulators are estimated from posterior distributions. The Bayesian calibration methodology has gained popularity due to its ability to quantify estimation uncertainties with limited field observations and/or scant simulation data <ref type="bibr">(Kennedy and O'Hagan 2001</ref><ref type="bibr">, Higdon et al. 2004</ref><ref type="bibr">, Gramacy et al. 2015)</ref>. Recently, <ref type="bibr">Tuo and Wu (2016)</ref> suggested that the Bayesian approach could result in unreasonable estimates for imperfect computer models.</p><p>Here, an imperfect computer model implies that the outputs of the model, even with the optimally calibrated parameters, are different from the expected outcomes of a physical process. To address this limitation, <ref type="bibr">Tuo and Wu (2015)</ref> presented a new calibration approach from a frequentist point of view.</p><p>Among these statistical methods, studies in the building energy literature have mainly adopted the Bayesian calibration approach <ref type="bibr">(Chong and</ref><ref type="bibr">Menberg 2018, Coakley et al. 2014)</ref>. However, the Bayesian approach has several limitations. First, it is computationally intensive, so studies mostly use aggregated data, e.g., monthly <ref type="bibr">(Heo et al. 2012</ref><ref type="bibr">, 2013</ref><ref type="bibr">, 2015</ref><ref type="bibr">, Kim and Park 2016</ref><ref type="bibr">, Kristensen et al. 2017b</ref><ref type="bibr">, Li et al. 2016</ref><ref type="bibr">, Lim and Zhai 2017</ref><ref type="bibr">, Sokol et al. 2017</ref><ref type="bibr">, Tian et al. 2016)</ref> or annual <ref type="bibr">(Booth et al. 2013)</ref> data. Some studies use hourly data, but they use a small subset of hourly data, instead of using the whole dataset collected over a sufficiently long period of time <ref type="bibr">(Chong et al. 2017</ref><ref type="bibr">, Chong and Lam 2017</ref><ref type="bibr">, Manfren et al. 2013</ref><ref type="bibr">, Menberg et al. 2017</ref><ref type="bibr">, Kristensen et al. 2017a</ref>).</p><p>In <ref type="bibr">Chong et al. (2017)</ref>, only 80 hourly data samples are selected from the dataset collected for three months. To relieve the computational burden, a lightweight Bayesian calibration that uses a linear regression emulator is presented in <ref type="bibr">Li et al. (2016)</ref>. <ref type="bibr">Menberg et al. (2017)</ref> employ the Hamiltonian Monte Carlo sampling to obtain the posterior distribution more efficiently. Despite these advances, the Bayesian calibration approach still remains computationally demanding, which limits the use of large-size datasets.</p><p>Second, the results from Bayesian calibration are sensitive to the prior specification <ref type="bibr">(Liu et al. 2021)</ref>. Our analysis, which will be discussed in more detail in Section 7, indicates that the posterior density is heavily affected by the prior specification. When a non-informative prior is employed, the resulting posterior tends to become non-informative, showing a relatively flat posterior distribution.</p><p>We believe this is because the use of limited data (either aggregated monthly or yearly data, or a small number of selected hourly data) provides insufficient information to generate a meaningful posterior. While a more informative prior leads to a sharper posterior, it requires domain knowledge for the appropriate prior setting.</p><p>Further, the Bayesian approach in the BEM calibration studies limits the number of calibration parameters to 2 to 5. This is because the hyperparameter estimation of the covariance function becomes computationally prohibitive as the problem size grows. Further, compounded with the aforementioned issues of limited data, when multiple parameters are calibrated, overparameterization may occur, resulting in a relatively flat posterior, which does not generate meaningful calibration results.</p><p>Most importantly, existing approaches cannot handle multiple blocks of overlapping data. In particular, when the block sizes are different, one might suggest that the problem is similar to the class imbalance problem where the data size of each class is substantially different <ref type="bibr">(Byon et al. 2010</ref>). In such cases, one possible solution could be to impose different weights on each data block.</p><p>However, how to differentiate weights across blocks is not straightforward. In particular, when data blocks overlap, if the algorithm calibrates the parameters with their small portion of data with a higher weight, it would also affect other blocks with larger portions of data. Thus, the weight should consider the overlapping portion as well as the data size of each block; devising a proper weighting scheme with rigorous justification is challenging.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Other Related Studies</head><p>We also review two other streams of work that are relevant to this study, namely, adaptive sampling and multi-task learning. First, adaptive sampling, also known as sequential design in the statistical literature and active learning in the machine learning literature, has been actively studied. The idea is to collect informative data points sequentially in order to construct surrogates and/or minimize a black box function by accounting for the trade-off between exploration and exploitation <ref type="bibr">(Liu et al. 2018)</ref>. First, in surrogate modeling studies that often use GPs, adaptive sampling strategies choose the next design point that maximizes predictive variance or maximizes the average reduction in variance <ref type="bibr">(Gramacy 2020</ref><ref type="bibr">, Liu et al. 2018)</ref>. These studies aim to construct accurate surrogates.</p><p>Second, Bayesian optimization (BO) has received much attention recently for minimizing expensive black box function. BO proceeds by modeling the objective function via a GP, optimizing an acquisition function to find the next design point and then updating the posterior distribution of the GP. Acquisition functions, e.g., expected improvement <ref type="bibr">(Jones et al. 1998</ref>) and lower confidence bound <ref type="bibr">(Srinivas et al. 2010)</ref>, attempt to strike a balance between the exploration of a new design point with high uncertainty and exploitation with a low objective function value. More details on BO can be found in <ref type="bibr">Shahriari et al. (2015)</ref> and <ref type="bibr">Frazier (2018)</ref>. Typically, BO aims to minimize a single objective. Although the multi-task BO <ref type="bibr">(Swersky et al. 2013</ref>) is proposed to handle multiple objective functions, it uses the same training data for all objectives to perform optimization.</p><p>Our proposed method solves a multi-objective problem with multiple blocks of data that possibly overlap with one another.</p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008</p><p>On the other hand, multi-task learning aims to learn multiple tasks simultaneously by leveraging knowledge learned from one or some tasks while exploiting commonalities and differences <ref type="bibr">(Zhang and Yang 2021)</ref>. It helps to improve prediction accuracy and computational efficiency for learning multiple tasks, compared to individual task learning. In principle, the proposed multi-block calibration methodology can be regarded as a special case of multi-task learning, if we consider calibrating each block of parameters as each task; updated parameters associated with one block affect the loss function in other blocks so that loss function values benefit from or are hurt by others. However, multi-task learning is typically performed in a supervised learning setting with a fixed dataset, whereas our proposed method exploits adaptive sampling. Moreover, in multi-task learning, each task is learned with its own dataset which is disjoint from other tasks' data, however, our approach accounts for overlapping blocks of data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Problem Formulation</head><p>We first present a calibration problem in a general single-block calibration setting in Section 4.1 and extend it to a multi-block parameter calibration problem in Section 4.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Single-block Parameter Calibration</head><p>Let x &#8712; &#8486; &#8838; R d denote the vector of physically observable input variables of dimension d &#8712; Z + in a system, where &#8486; is a convex and compact region of input variables that is a subset of R d . Let &#952; &#8712; &#920; &#8838; R p denote a set of calibration parameters of dimension p &#8712; Z + . Let y(x) &#8838; R denote the noisy field observational data of its true physical process &#950;(x) at input x. Let &#951;(x, &#952;) &#8838; R denote the response from the computer model, simulating the true process &#950;(x) at input x. For instance, in the building energy simulation, x can be hourly environmental condition such as temperature, y(x) the actual hourly electricity consumption, and &#951;(x, &#952;) the electricity consumption generated from BEM. The unknown calibration parameter vector &#952; needs to be estimated using y(x) and &#951;(x, &#952;).</p><p>Assuming the simulator represents the physical process accurately, <ref type="bibr">Higdon et al. (2004)</ref> present the regression model to connect physical process responses with computer model outputs as</p><p>where n denotes the number of observations and &#1013; j an observation error that follows an independently and identically distributed (iid) normal distribution, &#1013; j iid &#8764; N (0, &#963; 2 ).</p><p>The likelihood function of &#952; is then</p><p>or more explicitly,</p><p>(3)</p><p>To estimate &#952;, let us consider the maximum likelihood estimation (MLE) method. The loglikelihood function is</p><p>We now maximize the log-likelihood to obtain &#952;MLE given &#963; 2 as &#952;MLE = arg max</p><p>Note that maximizing the likelihood function is equivalent to minimizing the empirical loss function that quantifies the discrepancy between the computer model output &#951;(x, &#952;) and physical process data y(x) with L 2 norm. This calibration procedure that minimizes the empirical loss aligns with the idea of L 2 calibration studied in <ref type="bibr">Tuo and Wu (2015)</ref>. <ref type="bibr">Tuo and Wu (2015)</ref> aimed to find the parameters that minimize the L 2 distance between the physical process responses and computer model outputs. Under the limited data setting, they built emulators for both computer model outputs and physical process responses. However, we consider a situation where a sufficiently large number of physical observational data is available and a computer model is relatively cheap to run.</p><p>With data availability, instead of generating data from a computer model at pre-defined design points, we want to generate computer model data more effectively so more useful data can be used for the calibration purpose and directly use it to estimate the parameters without building emulators. To this end, we minimize the loss functions in (5) with respect to &#952;. As the computer model output &#951;(x, &#952;) is likely nonlinear, we consider nonlinear optimization techniques. Consider a first-order optimization method, i.e., gradient descent (GD). The GD method works by iteratively updating &#952; with &#952; &#8592; &#952; -&#945;&#8711;F (&#952;) until a suitable termination criterion is satisfied, where F (&#952;) is the empirical loss function at &#952;, &#8711;F (&#952;) represents its gradient, and &#945; is a step size. Here, because &#951;(x, &#952;), as the output from a black box computer model, does not have a closed-form expression, &#8711;F (&#952;) can be approximated using the central finite difference</p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008</p><p>where h &gt; 0 is a small disturbance to the i th component of &#952;, e i denotes a p &#215; 1 vector with i th element being one and others zero.</p><p>One key advantage of using nonlinear optimization is that it can adaptively generate computer model data &#951;(x, &#952;). In other words, as &#952; is updated through the iterative GD process, we can run simulations to get &#951;(x, &#952;) at the newly refined &#952;. It allows us to get more informative data toward minimizing the empirical loss (or maximizing the likelihood).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Multi-block Parameter Calibration</head><p>The single-block parameter calibration presented in Section 4.1 treats all parameters equally using the entire dataset. We propose a new block-wise calibration approach when some parameters are associated with a specific subset of the entire dataset.</p><p>Recall that in the BEM application, we have three distinct groups of parameters, denoted by &#952; g , &#952; c , and &#952; h in Table <ref type="table">1</ref>. The parameter set &#952; g consists of global parameters so it should be globally  With this problem structure, we calibrate &#952; b using its corresponding portion of dataset, i.e., the b th data block. By using the right data block, it is expected to obtain more accurate estimates, compared to the single-block calibration that does not differentiate the block-dependent heterogeneity.</p><p>To account for the block dependency, we consider multiple likelihood functions, each associated with the b th block of datasets, as follows: </p><p>where</p><p>To maximize the log-likelihood, we consider the corresponding empirical loss function as</p><p>and obtain the estimates for the b th block of parameters &#952; b by</p><p>We refer to the proposed block-wise calibration approach as multi-block calibration (shortly, M-BC)</p><p>in the subsequent discussion.</p><p>It should be noted that the simulation output &#951;(x j , &#952; b ; &#952; -b ) could possibly depend on &#952; -b , as well as &#952; b , when the b th block of data overlaps with other blocks associated with some parameters in &#952; -b .</p><p>For instance, let us consider the cooling-season parameter &#952; c . The simulation output during the cooling season (March-November) also depends on the setting of global parameters &#952; g . Also, during</p><p>March to April when the cooling and heating seasons overlap, heating-season parameters also affect the energy consumption output during a part of the cooling season. Therefore, in solving (10), the resulting &#952; * b may also depend on &#952; -b . Our ultimate goal is to find &#952; * b that minimizes F b , while at the same time, other block parameters minimize their own loss functions. In the next section, we provide an iterative algorithm and its convergence properties under certain conditions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Parameter Estimation and Convergence Properties in Multi-block Calibration</head><p>In this section, we propose a new algorithm to calibrate parameters in a block-wise manner. We also provide convergence guarantees for the algorithm under certain conditions. Finally, given convergence, we employ the MLE's asymptotic properties and construct confidence intervals to quantify estimation uncertainties.</p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Multi-block Calibration Algorithm</head><p>Recall that existing studies typically pre-generate data and build emulators for &#951;(&#8226;) using the generated data. This approach is not directly applicable to the multi-block parameter calibration problem. One may attempt to build B emulators, say, &#951;1 (x, &#952; 1 ; &#952; -1 ), . . . , &#951;B (x, &#952; B ; &#952; -B ), using each To this end, we design a new algorithm to solve the block-wise calibration problem. The fundamental idea is to devise a block-wise optimization algorithm so that the parameters in different blocks can be calibrated with their own objective functions over iterations. Specifically, we propose cyclically optimizing one block of parameters each time while keeping the parameters in other blocks fixed at their most up-to-date values. For instance, in the BEM calibration, we have &#952; 1 = &#952; g , &#952; 2 = &#952; c , and &#952; 3 = &#952; h with B = 3. We iteratively optimize &#952; g that minimizes F g (&#952; g ; &#952; -g ), while other parameters in &#952; c and &#952; h are fixed at their current iterates. Then we optimize &#952; c for F c (&#952; c ; &#952; -c )</p><p>with the previously optimized &#952; g and continue with a similar procedure for F h (&#952; h ; &#952; -h ) with the previously optimized &#952; g and &#952; c . This procedure is repeated until pre-specified stopping criteria are satisfied.</p><p>Let &#952; k b denote the iterate of &#952; b after finishing the k th iteration for b = 1, 2, . . . , B. At the (k + 1) th iteration, let F b (&#952; b ; &#952; k+1 -b ) in ( <ref type="formula">9</ref>) denote the loss function for the b th block of parameters &#952; b , given the remaining parameter iterates &#952; k+1 -b . Without loss of generality, we optimize each block in an ascending order, starting from the first block, i.e., b = 1. Therefore, in optimizing &#952; b that minimizes</p><p>Here, note that the parameters before the b th block are previously updated, whereas those after the b th block are not refined yet. In this scheme, the loss function for the b th block in (9) at the (k + 1) th iteration can be written as</p><p>The computer model output &#951;(&#8226;) is likely nonlinear, as most computer models consist of a set of complex mathematical functions. Hence, we consider nonlinear optimization methods to find &#952; b that minimizes F b (&#952; b ; &#952; k+1 -b ), given &#952; k+1 -b . Among various optimization methods, we use the most commonly used method, which is GD. Our approach can be easily extended to other second-order optimization methods such as Newton's method or quasi-Newton methods.</p><p>In minimizing F b (&#952; b ; &#952; k+1 -b ) over &#952; b , we further consider the inner iteration, given &#952; k+1 -b . Here, the inner iteration is for the GD-based parameter updates in each block, and the outer iteration implies the cycle for all blocks. Let &#952; k,m b denote the iterate of &#952; b at the m th inner iteration of the outer (k + 1) th iteration. We update &#952; k,m b by</p><p>where &#945; k,m b &gt; 0 is the step size and the</p><p>where &#952; b,i is the i th parameter in &#952; b for i = 1, 2, . . . , p b , with p b being the number of parameters in &#952; b for b = 1, 2, . . . , B. Since the loss functions have no mathematical closed-form expressions due to the black box nature of the simulator, similar to (6), we use a finite-differencing to approximate the gradient for each loss function as follows:</p><p>where h &gt; 0 is a small perturbation to the i th parameter of each &#952; b and e b,i is a p b &#215; 1 vector with its i th element being one and others zero.</p><p>We repeat the inner iteration for each block until some termination criteria are met. , we move to the next (b + 1) th block to minimize its loss function</p><p>After we finish the cycle with all blocks, we set k = k + 1 and repeat the procedure. We finish the iteration when either the norm of gradient of each loss function is less than a small tolerance &#964; 1 , the relative difference of loss function values is less than a small tolerance &#964; 2 , or the maximum number of iteration is larger than some value &#964; 3 . Algorithm 1 summarizes the proposed M-BC procedure.</p><p>We would like to highlight that the proposed cyclic calibration approach is different from the block coordinate descent (BCD) algorithm. The major difference is that our approach handles multiple objective functions, each of which is associated with its corresponding subset of the dataset.</p><p>On the other hand, BCD considers a single objective function.</p><p>As discussed earlier, an important feature of our approach is that it adaptively generates computer model outputs during the calibration procedure, unlike existing approaches that use precollected samples. As the optimization proceeds, newer parameters that make the computer model while convergence criterion not met do 7:</p><p>Run simulations for</p><p>) using ( <ref type="formula">13</ref>)-( <ref type="formula">14</ref>) and set a step size &#945; k,m b 10:</p><p>end while As a remark, in multi-block calibration one can randomly choose the next block, but in this section we consider the fixed ordering that proceeds in a cyclical way. In Section 6, we empirically compare the cyclic ordering with stochastic ordering. Additionally, we note that, depending on the problem structure, the order of blocks may matter. However, in our numerical examples in a wide range of settings, we obtain comparable results in different orderings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Convergence Analysis</head><p>This section presents the convergence properties when each loss function F b , b = 1, 2, . . . , B, is a general nonconvex, strongly convex, or convex function. While we utilize the standard analysis procedure of GD, the block-wise treatment with multiple loss functions poses substantial challenges in analyzing the convergence properties of the proposed M-BC approach. In each of the three settings, we impose conditions on F b and possibly the iterates generated by Algorithm 1 to achieve certain convergence properties. In the special case where the number of blocks is equal to 1, our results recover the standard results for GD. The detailed proofs of all lemma and theorems are available in the online supplementary document.</p><p>Throughout the convergence analysis, we use the notations summarized in Table <ref type="table">2</ref>. We understand that the notations are complex, mainly due to the double (inner and outer) iteration nature in M-BC. For ease of understanding, Figure <ref type="figure">2</ref>   In our analysis, we make the following assumption.  </p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008</p><p>Table <ref type="table">2</ref> Nomenclature for convergence analysis</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Category Symbol Meaning</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Superscript k</head><p>The iteration number for the outer iteration. m</p><p>The iteration number for the inner iteration, i.e.,</p><p>The number of inner iterations for the b th block at the k th outer iteration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>M b</head><p>Fixed number of M b (k) regardless of the k th outer iteration.</p><p>Iterate</p><p>The iterate after all blocks are updated at the k th outer iteration, i.e.,</p><p>The unique global minimizer of F b in the strongly convex case; a point at which the minimum of F b is attained in the convex case, i.e.,</p><p>The iterate of the b th block parameters after finishing the inner iteration of the b th block at the k th outer iteration.</p><p>The unique global minimizer of F b given &#952; * -b in the strongly convex case; a point at which the minimum of</p><p>The iterate of the b th block parameter at the m th inner iteration of the (k + 1) th outer iteration, given</p><p>The unique global minimizer of F b given &#952; k+1 -b in the strongly convex case; a point at which the minimum of </p><p>The result above is analogous to that of GD. Namely, the progress made over a sequence of iterations (for M-BC, this is over the inner iterations) is proportional to the sum of the norms of the gradient squared. Using Lemma 1, we first analyze the convergence properties of M-BC on the general nonconvex case in Theorem 1.</p><p>Theorem 1 shows that under certain conditions, the iterate &#952; k converges to a stationary point &#952; * as k &#8594; &#8734;, i.e., the norm of the gradient of all objective functions vanishes in the limit.</p><p>Theorem 1 (General nonconvex case). Suppose Assumption 1 holds. Further, suppose that </p><p>We make a few remarks about the conditions in ( <ref type="formula">16</ref>). From Lemma 1, it follows that</p><p>). Thus, the inequalities ( <ref type="formula">16</ref>) imply that F b (&#952; k ) &#8805; F b (&#952; k+1 ), i.e., the function value for the b th block decreases after all blocks have updated their associated parameters (across outer iterations). This seems to be a strong assumption in that the blocks do not compete over the course of optimization. However, this does not require that when every other block updates, the function value F b of the b th block is non-increasing. Instead, F b is non-increasing across outer iterations.</p><p>The result in Theorem 1 has an important implication. When we eventually find the stationary point &#952; * b for the parameters in the b th block, other blocks' parameters also converge to a stationary point &#952; * -b as well and</p><p>It implies that even though we calibrate parameters in a block-wise manner, our approach obtains a concurrent convergence of the whole parameter vector, that is, We make the possible extension of conditions in (16) at the end of this section. Additionally, we relax the conditions in ( <ref type="formula">16</ref>) and prove the similar result in Corollary 1 by instead assuming the weaker technical condition in (18).</p><p>Corollary 1. Suppose Assumption 1 holds. Further, suppose that the infinitely cumulative sum of the decreases of the function values during the inner iterations is finite, i.e.,</p><p>Then, for all b,</p><p>Corollary 1 states that as long as the sum of loss function decreases is finite, M-BC converges to a stationary point. It allows other block calibrations to increase the b th block loss, so the noncompeting condition in ( <ref type="formula">16</ref>) is not needed. Thus, this result is more general than that in Theorem 1.</p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008</p><p>Next, we show the convergence rates for the strongly convex (Theorem 2) and convex (Theorem 3) cases.</p><p>Theorem 2 (Strongly convex case). Suppose Assumption 1 holds. Further, assume that the</p><p>for all k &#8805; 0 and b, where</p><p>Then, for all k and b,</p><p>Both conditions in (20) bound the optimality gap between the progress made in inner iterations versus the progress made in outer iterations. Since the objective function of each block is changing across outer iterations, the absolute value safeguards against the possibility of a current iterate having lower function value than the optimal solution. We note that this is possible in our setting since the optimal solution of &#952; * = (&#952; * b ; &#952; * -b ) is defined as the point where all objectives are optimized and thus can potentially have a higher function value than the point (&#952; k, * b ; &#952; k -b ) has. Note that the conditions for Theorem 2 do not require that the function of the b th block is monotonically non-increasing across outer iterations. That being said, these assumptions are relatively weak and can be tightened so that the result still shows the linear convergence rate at possibly a slower rate. We provide the corollary about this.</p><p>Corollary 2. Suppose Assumption 1 holds. Further, assume that the function F b is &#181; b -strongly convex over &#952; b for any fixed &#952; -b for b = 1, 2, . . . , B. Moreover, suppose that</p><p>for all k &#8805; 0 and b, where</p><p>for all k and b. Let &#945; b &#8712; (0, 1/L b ] for all b. Then, for all k and b,</p><p>Theorem 2 and Corollary 2 show linear convergence of the iterates generated by the M-BC algorithm, analogous to that of GD (in the special case where b = 1, we recover the result of GD). While convergence is derived across outer iterations, the rate of convergence depends on the number of inner iterations. Note that &#961; = c 1 c 2 (1 -&#945; b &#181; b ) M b &lt; 1, thus, the larger the number of inner iterations, the faster the rate (constant), or alternatively, the larger the errors that are acceptable in ( <ref type="formula">22</ref>) (larger constants c 1 and c 2 ). Note that compared to Theorem 2, Corollary 2 has somewhat stronger result that shows outer function value F b (&#952; k ) keep non-increasing with the linear rate</p><p>Additionally, Theorem 3 shows the convergence rate for the convex case with certain assumptions.</p><p>Theorem 3 (Convex case). Suppose Assumption 1 holds. Further, assume that the function</p><p>for all k &#8805; 0 and b, where</p><p>Then, for all k &#8805; 1 and b,</p><p>The first inequality in (25) is the same as the first condition in the strongly convex setting (see ( <ref type="formula">20</ref>)). The second bounds the distance between the difference from the current iterate's function value to the optimal solution's function value across two outer iterations.</p><p>Note that the first assumption in ( <ref type="formula">25</ref>) is relatively weak and can be tightened so that the result still shows the sublinear convergence rate at possibly a slower rate. We provide the result about this in Corollary 3.</p><p>Corollary 3. Suppose Assumption 1 holds. Further, assume that the function F b is convex for b = 1, 2, . . . , B over &#952; b for any fixed &#952; -b . Moreover, assume that</p><p>for all k &#8805; 0 and b, where c 1 &gt; 0. Additionally, assume that</p><p>for all k and b. Let &#945; b &#8712; (0, 1/L b ] for all b. Then, for all k and b,</p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008</p><p>Theorem 3 and Corollary 3 show that in the convex setting, the iterates generated by the M-BC algorithm converge to a minimizer at a sublinear rate that depends on both the number of inner and outer iterations. Again, this is analogous to GD, and in the special case with b = 1, we recover the result of GD. Note that Corollary 3 has somewhat stronger result that shows outer iterates &#952; k keep non-increasing with the sublinear rate O 1/(kM b ) .</p><p>We make a comment about the number of iterations required for our algorithm to converge (i.e., number of inner iterations for a given block and number of outer iterations) and provide the computational complexity of M-BC. Of course, the number of iterations highly depends on the convergence criterion used in Algorithm 1. To give a concrete example, consider the case of strongly convex functions, where the convergence criterion in the inner loop is based on the optimality gap and suppose the inner and outer termination tolerances for relative differences are set as 0 &lt; &#1013; outer &#8804; &#1013; inner &lt; 1. In this case, one can show that the number of inner iterations required for each block is O log (1/&#1013; inner ) , and the total number of iterations is O log &#1013; inner (1/&#1013; outer ) . Thus, combining the time for data generation and for parameter estimation, the computational complexity becomes</p><p>O n log &#1013; inner (1/&#1013; outer ) , assuming that the simulation running time is proportional to the data size n. Similar results can be shown for the convex and nonconvex cases under appropriate termination conditions.</p><p>Finally, we would like to make additional remarks about the conditions in our analysis. While the conditions we impose in deriving convergence properties, for example, that the objective functions do not compete for all &#952; b , may seem to be strong, we make these assumptions for the analysis of the M-BC algorithm to be analogous to that of GD. Furthermore, the conditions in our analysis are less restrictive than that as they are over the course of the iterates generated by the algorithm, and not for the entire space. These conditions may not be verified except in relatively simple settings, as they require information about the optimal solution and/or problem specific parameters. In practice, after running the method on a specific problem, one can observe the trajectories to see if the conditions are satisfied. In our numerical examples in a wide range of settings and the BEM case study, we did not observe the situation that the non-competing conditions were severely violated.</p><p>That being said, we conclude this section by discussing possible extensions of the proposed approach and convergence results. Suppose that in the general nonconvex case, instead of assuming ( <ref type="formula">16</ref>), we can relax the conditions as</p><p>for all k &#8805; 0 and b = 1, 2, . . . , B with &#948; k b,1 &gt; 0 and &#948; k b,2 &gt; 0. That is, we allow the objective function of the block to increase by some levels (&#948; k b,1 and &#948; k b,2 ) after calibrating parameters in other blocks.</p><p>Under this assumption, if &#948; k b,1 = &#948; 1 and &#948; k b,1 = &#948; 2 for all b = 1, 2, . . . , B, then we conjecture that one can only show convergence to a neighborhood of the solution that depends on &#948; 1 and &#948; 2 . On the other hand, if &#948; k b,1 and &#948; k b,1 are summable, then a similar result to Theorem 1 can be proven. Similarly, the conditions for the results in the strongly convex and convex cases can be relaxed, and under appropriate conditions, convergence (and rates) can be proven. Our BEM case study is Section 7 demonstrates that the non-competing condition in ( <ref type="formula">16</ref>) is satisfied. However, we plan to further investigate competing cases with (30) in our future study, as it may arise in other applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Asymptotic Properties of Calibration Parameters</head><p>When &#952; b converges to a minimizer &#952; * b for each b based on the results in the previous section, we obtain the MLE for the calibration parameters. Thus, we can entertain its appealing theoretical properties such as consistency and normality. From the likelihood function in (8), the consistency property says &#952;b,MLE converges in probability to a true parameter vector, denoted by &#952; b,0 , as n b &#8594; &#8734;, i.e., &#952;b,MLE p -&#8594; &#952; b,0 for each b. Further, the asymptotic normality indicates that the estimator</p><p>), where I(&#952; b,0 ) is an expected Fisher information matrix. We use the MLE's asymptotic properties to construct the confidence intervals of calibration parameters for uncertainty quantification.</p><p>Based on (8), the expected Fisher information matrix is represented by</p><p>where lb &#952; b |y(x), &#952; -b is a log-likelihood function from a single observation for the b th block as</p><p>The expected Fisher information matrix can be approximated by its empirical counterpart,</p><p>Using the chain rule, we obtain the (i, i &#8242; )-entry of &#8706; 2 lb &#952; b |y(x j ), &#952; -b /&#8706;&#952; 2 b as follows.</p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008</p><p>Here, since the first-and second-order partial derivatives of &#951;(x j , &#952; b ; &#952; -b ) are not analytically available, we obtain them numerically using the central finite difference <ref type="bibr">(Abramowitz and Stegun 1972)</ref> as follows:</p><p>for i, i &#8242; = 1, 2, . . . , p b , where h, h &#8242; &gt; 0 are small perturbation values. Further, assuming the physical process data y(x) have the same variance &#963; 2 across multiple blocks, we can find its MLE as</p><p>Then we can obtain the asymptotic 100(1 -&#945;)% Wald confidence interval (CI) for each component of &#952;b,MLE as follows:</p><p>for b = 1, 2, . . . , B, z 1-&#945;/2 is a critical value of the normal distribution, and I -1</p><p>ii denotes the i th diagonal entry of the inverse of the Fisher information matrix I.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Implementation Details</head><p>We provide guidance on defining the algorithmic parameters in M-BC (Algorithm 1).</p><p>&#8226; &#945; k,m b &gt; 0 (step size of the b th block at the (k + 1) th outer iteration and m th inner iteration):</p><p>The step size &#945; k,m b can be obtained by a backtracking line search, as described in Algorithm 2 below. In the backtracking line search, the parameters can be set as &#8113; = 1, &#961; = 1/2, c = 10 -4</p><p>(or 1/2) <ref type="bibr">(Boyd and</ref><ref type="bibr">Vandenberghe 2004, Nocedal and</ref><ref type="bibr">Wright 2006)</ref>. Note that, in principle, the step size &#945; k,m b could also be set to a sufficiently small constant or a diminishing sequence (e.g., 1/m).</p><p>Algorithm 2 Backtracking Line Search</p><p>Update &#945; &#8592; &#961;&#945; 4: end while</p><p>&#8226; h &gt; 0 (bandwidth of finite-differencing when approximating the gradient): The bandwidth h is set to 10 -8 , which is the square root of machine precision. We note that in the setting where exact function values can be computed, the choice of bandwidth is optimal up to constants <ref type="bibr">(Mor&#233; and Wild 2012)</ref>. In practice, we note that a sufficiently small value h works well in most cases (e.g., 10 -4 ).</p><p>&#8226; &#964; 1 &gt; 0 (termination tolerance: norm of the gradient): We suggest that &#964; 1 would be set as any sufficiently small value (e.g., 10 -4 ).</p><p>&#8226; &#964; 2 &gt; 0 (termination tolerance: relative difference of loss function values): We suggest that &#964; 2 would be also set as any sufficiently small value (e.g., 10 -4 ).</p><p>&#8226; &#964; 3 &gt; 0 (termination tolerance: maximum iterations): The maximum number of iterations is used to safeguard the algorithm from running forever. If the algorithm reaches this value, it exits the loop. We suggest that &#964; 3 would be set as sufficiently large value as long as computing resources are available. In our implementation, we set 2000 for the outer loop and 1000 for the inner loops of the algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Numerical Study</head><p>In this section, we evaluate the calibration accuracy of our proposed method in comparison with other alternatives through numerical studies. We also report the half length of CI and coverage rate of each parameter to quantify estimation uncertainties. All experiments, except those for alternative methods in Section 6.4, are conducted with MATLAB 2014a. For the alternative methods in Section 6.4, R is used. All experiments use the following computing environment: 64-bit Windows OS with the Intel Xeon CPU E5-2697 @ 2.60 GHz processor and 128 GB RAM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.">Problem Settings</head><p>We consider three numerical examples with different settings of the following physical processes and computer models.</p><p>(a) Example I (perfect computer model, block-wise convex loss functions)</p><p>&#8226; Physical process </p><p>(b) Example II (perfect computer model, nonconvex loss functions)</p><p>&#8226; Physical process</p><p>(c) Example III (imperfect computer model, nonconvex loss functions)</p><p>&#8226; Physical process</p><p>In all examples, x &#8764; U (0, 2&#960;) and &#949; &#8764; N (0, 0.01). In Examples I and II, the terms can be divided into three input domains x &lt; 4&#960;/5, 2&#960;/5 &#8804; x &lt; 8&#960;/5, and x &#8805; 6&#960;/5, respectively, to represent different blocks (e.g., seasonality in BEM). Hence, &#952; 1 , &#952; 2 , and &#952; 3 are the block parameters that need to be calibrated with dataset associated with the first, second, and third blocks of data collected when</p><p>x &lt; 4&#960;/5, 2&#960;/5 &#8804; x &lt; 8&#960;/5, and x &#8805; 6&#960;/5, respectively. Note that these data blocks are not disjoint.</p><p>For example, &#952; 1 and &#952; 2 together affect the computer model response for 2&#960;/5 &#8804; x &lt; 4&#960;/5. Thus, one should not calibrate each &#952; i separately. In Example III, the first term is globally employed regardless of the input value. The second and third terms are employed when x &lt; &#960; and x &#8805; &#960;, respectively, which mimic the block dependency. Thus, &#952; 1 should be calibrated with the entire dataset, whereas the block parameters &#952; 2 and &#952; 3 need to be calibrated with the second and third blocks of data collected when x &lt; &#960; and x &#8805; &#960;, respectively. Figure <ref type="figure">3</ref>   In implementing the proposed approach, we choose the step size (&#945; k,m b in ( <ref type="formula">12</ref>)) using the backtracking line search in Algorithm 2. The termination condition is set as the norm of the gradient of the loss functions is less than 10 -4 in both inner and outer iterations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.">Implementation Results</head><p>We present the implementation results in comparison with the following two alternatives.</p><p>(a) Hybrid-block Calibration (H-BC): In this alternative, the computer model correctly simulates its response according to underlying block properties. Specifically, similar to M-BC, it considers the block dependency, using the corresponding subset of data for each block, rather than</p><p>Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008 using the entire data. However, unlike M-BC that uses multiple loss functions, H-BC uses a single loss function in (5) and optimizes &#952; = (&#952; 1 , &#952; 2 , &#952; 3 ) altogether, namely,</p><p>In solving (40), uses the correct block of data to compute the gradient of the loss function, but instead of exploiting a block-wise minimization, it updates the parameters all at once, i.e., it updates &#952; 1 , &#952; 2 , &#952; 3 altogether at each GD iteration. only, but it also employ the other term with I(x &#8805; &#960;). Then it uses the single loss function, shown in ( <ref type="formula">5</ref>), to calibrate parameters. We include S-BC because in the BEM simulation, seasonality can be ignored and one may blindly run the BEM simulator without correct scheduling.</p><p>When loss functions are convex, we can start from an arbitrarily selected starting point to attain a global minimum. In practice, however, we do not know the form of the loss functions, since the computer model is a black box. Thus, we use multiple initial points and select the best estimates that produce the smallest mean squared error (MSE) in a training set consisting of 1000 data samples. We then evaluate the performance using a test set with another 1000 data samples.</p><p>Table <ref type="table">3</ref> summarizes the average of calibrated parameters (and standard deviation) in the third to fifth columns, obtained from 200 experiments, as well as MSEs in the test set in the last four columns. Each F i represents the MSE at the i th data block, and F all in the last column is overall MSE with all test data samples. Clearly, each calibrated value for &#952; 1 , &#952; 2 and &#952; 3 using M-BC in Examples I and II is closer to the true parameter values &#952; = (&#952; 1 , &#952; 2 , &#952; 3 ) = (-1, 10, 5).</p><p>We summarize some observations from Table <ref type="table">3</ref>.</p><p>&#8226; Since M-BC optimizes each block of parameters with the correct subset of data by minimizing the corresponding loss functions, it provides the most accurate calibration results with the lowest MSEs.</p><p>&#8226; H-BC also uses the correct portion of data, so its MSEs tend to be smaller than those from S-BC. But, its calibrated values deviate from the true values in some cases, e.g., &#952; 2 and &#952; 3 in Example I. It is because its single loss function for all three groups of parameters is quite complicated and highly nonconvex with interations between parameters (e.g., (&#952; 1 +1)(&#952; 2 -10)).</p><p>On the contrary, in the proposed M-BC, each loss function, given fixed parameters in other blocks, is simpler than the H-BC's loss function. For instance, in Example I, even though &#951;(&#8226;) is nonconvex over (&#952; 1 , &#952; 2 , &#952; 3 ), the loss function for each parameter becomes block-wise convex.</p><p>This local convexification helps the M-BC procedure find the solution more effectively.</p><p>&#8226; The different performance of H-BC in Examples I and II is related to the problem structure where different weights are assigned to each term. In Example I, for the term including I(x &lt; 2&#960;/5) in the computer model, we assigned 1/2, while we assigned 1/10 and 1/20 as weights, respectively, for the terms including I(2&#960;/5 &#8804; x &lt; 8&#960;/5) and I(x &#8805; 8&#960;/5). Thus, &#952; 1 is the most influential parameter and the loss function changes rapidly as &#952; 1 varies. With this problem structure, H-BC weighs &#952; 1 more, leading to the correct calibration of &#952; 1 relative to &#952; 2 and &#952; 3 .</p><p>In Example II, the advantage of M-BC, compared to that of H-BC, in terms of calibration accuracy, is less clear. We believe it is because the weights for each term are not substantially different. These results indicate that the performance of H-BC is sensitive to the problem structure. All things considered, M-BC achieves the best accuracy in all examples.</p><p>&#8226; Obviously, some of the parameter estimates from S-BC largely deviate from their true values in many cases, leading to larger loss values. Recall that S-BC ignores the block property (by ignoring I(&#8226;) terms in the computer models), while observational data from the physical process are collected in different blocks. This mismatch between the physical process responses and simulation outputs leads to poor calibration results with large MSEs.</p><p>M-BC also has a computational advantage over the two benchmarks, H-BC and S-BC, in general. This is mainly because M-BC calibrates each block of parameters using its associated data only, unlike H-BC and S-BC that update all parameters simultaneously. computationally more efficient than H-BC and S-BC. Further, its running time is similar in all three examples, whereas the computation time of H-BC and S-BC varies significantly depending on the problem structure. We additionally note that comparing computation performance in terms of number of iterations between the three methods is far from trivial. This is because M-BC has the double-loop structure with inner and outer iterations, whereas H-BC and S-BC does not have inner iterations. Moreover, another difficulty is related to the way the data is utilized by the methods. Specifically, as discussed in Section 5.2, assuming that the simulation running time is proportional to the data size n, computational complexity for M-BC can be expressed as O n &#1013; inner (1/&#1013; outer ) in the strongly convex case. The analogue for H-BC and S-BC (assuming a tolerance of &#1013; outer ) can be expressed as O n log(1/&#1013; outer ) . While the terms in order notation look similar, hidden are condition number dependencies. For M-BC the dependence is on b &#954; b where &#954; b is the condition number for each block problem, whereas for H-BC and S-BC the dependence is on &#954;, the condition number of the full problem. Thus, their computational complexity cannot be readily and easily compared. do not exist, we do not report the coverage rates. Overall, the results suggest that M-BC provides its empirical coverage rates close to the nominal rate of 95%. On the contrary, the coverage rates of H-BC deviate from the nominal rates in some cases, e.g., those for &#952; 2 or &#952; 3 in Example I. The low coverage rates in S-BC are related to inaccurate estimations shown in Table <ref type="table">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3.">Uncertainty Quantification</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4.">Performance Comparison with Other Methods</head><p>We compare the performance of M-BC with two common calibration approaches in the literature, including L 2 calibration <ref type="bibr">(Tuo and Wu 2015)</ref> and Bayesian calibration <ref type="bibr">(Kennedy and</ref><ref type="bibr">O'Hagan 2001, Higdon et al. 2004</ref>). We briefly explain the two benchmark approaches. More details are available in <ref type="bibr">Tuo and Wu (2015)</ref>, <ref type="bibr">Kennedy and</ref><ref type="bibr">O'Hagan (2001), and</ref><ref type="bibr">Higdon et al. (2004)</ref>. Given physical observations {y(x j ), x j } n j=1 , L 2 calibration first obtains the estimated true physical process response &#950;, where y(x) = &#950;(x) + &#1013; with an observation error &#1013;, using kernel ridge regression <ref type="bibr">(Wahba 1990</ref>) in the reproducing kernel Hilbert space. Similarly, using computer model responses {y(x j &#8242; ), (x j &#8242; , &#952; j &#8242; )} n &#8242; j &#8242; =1 generated at pre-specified design points, it builds an emulator &#951;(&#8226;, &#8226;), often constructed by a GP <ref type="bibr">(Santner et al. 2018)</ref>, for the computer model. Then, &#952; is calibrated by solving the following optimization problem:</p><p>Next, the Bayesian calibration approach formulates the physical observation using the linear   calibration and Bayesian calibration is excessive and they do not produce results in 5 days for each run. In summary, M-BC elicits the superiority of calibration accuracy as well as computational efficiency and scalability, compared to the two existing methods.</p><p>We make comments about the difficulty in theoretically comparing computational complexity between the M-BC and Bayesian approach. First, their data generation mechanisms are different.</p><p>The Bayesian approach generates a fixed set of data from pre-specified design points a priori, whereas M-BC collects computer response data on the fly. Next, in terms of computation after data is generated, in Bayesian calibration, the main computational issue is the numerical integration with respect to the posterior distribution of calibration parameters &#952; and matrix inversions for each &#952; value when using GPs in this numerical integration. If the number of observational data and computer experiments are small, the matrix inversion can be done with O (n + n &#8242; ) 3 in an iteration where n denotes the number of observational data and n &#8242; the number of computer experiments <ref type="bibr">(Kennedy and O'Hagan 2001)</ref>. However, when the data size gets larger, the matrix inversion becomes challenging. Further, it uses MCMC methods to explore posterior distributions. It is known that the MCMC computation generally depends on the number of parameters &#952;, proposal distribution, and number of simulation runs (or iterations). Chains' mixing time to its stationary distribution also affect the total computational complexity of the Bayesian approach. On the contrary, the computational complexity for M-BC is O n log &#1013; inner (1/&#1013; outer ) in the strongly convex case as discussed in Section 5.2. With these reasons, the direct comparison of the computational complexity between the M-BC and Bayesian approach does not seem to be adequate. Further, the termination conditions of the two approaches are different. However, their empirical performance shows the computational efficiency and scalability of M-BC over the Bayesian approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.5.">Performance Evaluation of M-BC Variants</head><p>We consider some variants of the proposed calibration approach. The first variant removes the inner iteration in M-BC. We refer to this variant as M-BC-V. Even with no inner iterations, it still updates parameters in a manner, but unlike M-BC, it does not perform the block-wise minimization. It may seem to be similar to H-BC. However, the main difference is that H-BC updates all parameters simultaneously at once, whereas M-BC-V sequentially updates each parameter block based on the most up-to-date information of other blocks. We also consider another variant using a random block selection. This algorithm is the same as M-BC-V but the block is selected randomly, instead of sequentially (or cyclically). This algorithm is referred to as M-BC-VS with 'S' standing for 'stochastic'.</p><p>Table <ref type="table">7</ref> summarizes calibration results from these variants. In Examples I and II, M-BC-V and M-BC-VS provide results comparable to the original M-BC. However, they yield inaccurate results in terms of MSE and unstable estimates with higher standard deviations in Example III. It appears that the block-wise minimization (i.e., the inner iteration for each block) in M-BC brings appealing improvements in some cases and provides consistently better results. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ex Method Calibrated parameter values MSE</head><p>.00 (0.01) 9.99 (0.02) 5.00 (0.03) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) M-BC-V -1.00 (0.02) 9.99 (0.02) 5.00 (0.03) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) M-BC-VS -1.00 (0.02) 10.00 (0.03) 5.00 (0.03) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) II M-BC -1.00 (0.00) 10.00 (0.01) 5.00 (0.01) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) M-BC-V -1.00 (0.00) 10.00 (0.01) 5.00 (0.01) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) M-BC-VS -1.00 (0.00) 10.00 (0.02) 5.00 (0.02) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) 0.01 (0.00) III M-BC 0.38 (0.00) -5.75 (0.44) 8.18 (0.05) 6.89 (0.67) 11.35 (1.25) 2.47 (0.16) 6.89 (0.67) M-BC-V 0.36 (0.15) 0.12 (6.76) -0.99 (7.05) 15. <ref type="bibr">03 (12.43) 23.33 (16.15) 6.80 (18.80) 15.03 (12.43</ref>) M-BC-VS 0.36 (0.14) -0.92 (6.30) 1. <ref type="bibr">83 (6.89) 13.15 (10.15) 21.31 (13.98) 5.07 (11.96) 13.15 (10.15)</ref> Further, we investigate whether the block ordering affects the estimation performance in M-BC. With three blocks, there are six pathways in the cyclic block ordering. We do not notice Article submitted to INFORMS Journal on Data Science; manuscript no. IJDS-2022-0008 any significant differences in calibration results among the six pathways in all three examples. In the future, we plan to explore other variants and study their theoretical properties and empirical performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.6.">Scalability Analysis and Block Dominance Case</head><p>Additionally, we examine the scalability of the method in various settings such as the number of samples, the extent of block overlaps, and the number of blocks/parameters, using a newly designed problem structure. Due to the limited space, we omit the details of the problem structure and results but we discuss them in the online supplementary document. The results show that M-BC yields consistently good results in all settings in terms of calibration accuracy and uncertainty quantification. In particular, it demonstrates that the proposed method is suitable even for the problems with a significant data overlap. In the online supplementary document, we further examine the performance of M-BC when block domination occurs. Our approach provides superior results in terms of both calibration accuracy and uncertainty quantification across different levels of block dominance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Case Study: Building Energy Simulation</head><p>We evaluate the performance of our proposed approach in the BEM application. We use the hourly electrical energy consumption data and the simulation outputs obtained from the simulator, EnergyPlus 9.3.0 (U.S. Department of Energy 2019) on a municipal building of Mueller, Austin in Texas. EnergyPlus is a widely used BEM simulator to model both energy consumption-for heating, cooling, ventilation, lighting, plug and process loads-and water use in buildings. Each run for year-long simulation takes about 2 minutes with a standard desktop computer.</p><p>Among a large number of parameters employed in EnergyPlus, we choose important parameters in the building energy use, based on domain knowledge and previous studies <ref type="bibr">(Manfren et al. 2013</ref><ref type="bibr">, Chong et al. 2017)</ref>, including lighting, ventilation, domestic hot water, window material (optical properties), and heating and cooling systems, as summarized in Table <ref type="table">1</ref>. In this case study, we let &#952; g = (&#952; g,1 , . . . , &#952; g,6 ) T denote the vector of year-long global parameters and &#952; c = (&#952; c,1 , . . . , &#952; c,4 ) T and &#952; h = (&#952; h,1 , . . . , &#952; h,3 ) T , respectively, represent cooling-and heating-season parameter vectors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1.">Implementation Results</head><p>We use the implementation setting similar to that in numerical examples. We terminate iterations when the relative difference between consecutive loss function values becomes smaller than a termination tolerance 10 -4 . We also use a small perturbation h = 10 -4 in computing the gradient.</p><p>The step size is decided using the backtracking line search. Figure <ref type="figure">4</ref> shows the example of loss function values for each seasonal block from the proposed M-BC approach. Each dashed vertical line indicates the iteration index where the inner iteration terminates, and each solid line marks the iteration index for the outer iteration. In the inner iteration, we first minimize the year-long loss F g , followed by F c and F h sequentially. Each plot in Figure <ref type="figure">4</ref> illustrates how these loss functions change over iterations. Interestingly, Figure <ref type="figure">4(a)</ref> shows that F g decreases mostly when &#952; c is updated. This is because the cooling season during March to November (see Table <ref type="table">1</ref>) largely overlaps with the global block (January to December) in the studied Texas region. Similarly, the loss h during the heating season also decreases when &#952; c is calibrated. It is because heating and cooling seasons overlap during three months (see Table <ref type="table">1</ref>).</p><p>Overall, all of the three loss values converge within a few outer iterations, which indicates the important role of inner iterations that calibrates each block of parameters.  LHS 1 through LHS 5. In all cases, M-BC achieves the smallest global (F g ), cooling-season (F c ), and heating-season (F h ) loss values (MSEs). In some cases (e.g., with LHS 3 and 4), the performance of H-BC is comparable to M-BC, but M-BC outperforms H-BC in other initial settings. S-BC performs worse than M-BC and H-BC in most cases.   <ref type="table">3</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2.">Comparison with Bayesian Calibration Approach</head><p>We further compare our approach with Bayesian calibration. To obtain posterior distributions, we exploit the No-U-Turn Sampler (NUTS), which is known to be useful for sampling from a high-dimensional posterior distribution as suggested in <ref type="bibr">Chong and Menberg (2018)</ref>. The NUTS is developed based on Hamiltonian Monte Carlo (HMC) which allows MCMC to converge faster in that it explores the high-dimensional posterior distribution by optimizing the tuning parameters such as a scaling factor and the number of leapfrog steps. More details are available in <ref type="bibr">Chong and Menberg (2018)</ref>. Even with NUTS, Bayesian calibration is computationally intensive, as discussed in Section 2. With the computational resources available to us, it takes an unrealistically long time if we use hourly year-long data. Thus, we use weekly data in the Bayesian analysis as in <ref type="bibr">Kristensen et al. (2017b)</ref>. Even with weekly data, one experiment requires about five days to calibrate the BEM parameters.</p><p>We consider two prior specifications in Bayesian approach: non-informative priors with uniform distributions and informative priors with Gaussian distributions, as suggested in <ref type="bibr">Chong and Menberg (2018)</ref>. In both prior settings, we use the settings in LHS 1 as the prior means for the parameters. We set the prior standard deviation as 0.2 in the informative prior settings, as in <ref type="bibr">Chong and Menberg (2018)</ref>.</p><p>Table <ref type="table">10</ref> shows the MSEs in the test sets from 10 independent experiments where we use the posterior means as the calibrated estimates. Compared to M-BC, Bayesian calibration in both prior settings generates larger MSE values especially in the global and cooling-season blocks. We note that the MSE from M-BC in the heating-season block is slightly larger than that of the Bayesian approach with the Gaussian prior. One possible reason is that underlying heating-season loss is  To further investigate, Figure <ref type="figure">5</ref> shows the posterior distributions of one of the parameterslightening level-when non-informative uniform and informative Gaussian priors are used. It turns out that the resulting posterior distribution when using the non-informative prior is still close to its prior distribution, showing large uncertainties. It implies that the calibrated parameter value from the posterior density does not deliver meaningful information. Meanwhile, when the informative prior is employed, the posterior distribution changes from its prior. Still, its MSEs are much larger in the global and cooling-season blocks (see Table <ref type="table">10</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Conclusion</head><p>In this study, we present a new multi-block parameter calibration methodology when computer model parameters need to be calibrated using different, possibly non-disjoint, subsets of data. We  consider multiple loss functions, each for their associated block of parameters that use the corresponding dataset. In addition to the block property, our approach is different from the existing literature that generates the computer model responses at pre-selected parameter settings and builds emulators using the fixed dataset. Although being widely adopted, such traditional approach decouples the data generation stage and calibration stage. On the contrary, our approach adaptively runs the simulation at new parameter settings recommended by the nonlinear optimization procedure. As such, more instructive data is generated from the computer model, which makes the calibration procedure more effective.</p><p>Our implementation results in both numerical examples and BEM case study demonstrate that the discrepancy between the physical process and the computer model outputs can be significantly reduced, when the multi-block dependency of parameters is taken into consideration. Further, in our case study, the performance of the proposed M-BC algorithm is more robust, compared to the alternatives. We also construct CIs using asymptotic properties of ML estimators. Our approach yields more stable performance with narrower CIs while maintaining empirical coverage rates close to the nominal rates, compared to other methods. The comparison with the widely used Bayesian calibration elicits the advantage of our approach in terms of improved calibration accuracy.</p><p>In the future, we would like to extend the approach under more general settings, e.g., competing cases and time-variant processes <ref type="bibr">(Byon et al. 2016)</ref>. Developing multi-task learning and BO approaches that can handle overlapping data would also be a possible future research task. Additionally, we plan to use the well-calibrated BEM computer model to control building energy end use such as demand control <ref type="bibr">(Jang et al. 2020</ref><ref type="bibr">, Li et al. 2020</ref>).</p></div></body>
		</text>
</TEI>
