<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Using Integrated EOF Analysis for Evaluation of WRF Simulations in Urban Environments</title></titleStmt>
			<publicationStmt>
				<publisher>AMS</publisher>
				<date>01/08/2026</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10665305</idno>
					<idno type="doi">10.1175/JAMC-D-24-0077.1</idno>
					<title level='j'>Journal of Applied Meteorology and Climatology</title>
<idno>1558-8424</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Gonzalo Huidobro</author><author>Ashish Sharma</author><author>Alan F Hamlet</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>Evaluation methods for Regional Climate Models (RCMs) commonly rely on point comparisons with observed meteorological fields, which provide limited understanding of the spatial and temporal representation of important factors affecting urban areas in models. These factors are not only complex but also difficult to differentiate, which complicates their analysis. This study thus develops an innovative approach using Empirical Orthogonal Function (EOF) analysis to compare urban heat island and precipitation patterns in RCM simulations with those from observations, taking advantage of the capacity of the method for data disaggregation. The method was tested on summer daily maximum and minimum temperature (T<sub>max</sub>and T<sub>min</sub>) and precipitation (P) in the Chicago Metro Area (CMA). Using observed data, the EOF analysis on temperature consistently produced coherent patterns that reflect known impacts of urban environments on climate and weather. EOF evaluation of corresponding 4-km WRF simulations against observations confirmed a strong warm bias (~3°C) for simulated T<sub>min</sub>in the urban area, as observed in point comparisons against stations; further analysis, however, suggested that the shape and time behavior of the urban pattern were well represented. EOF analysis on T<sub>max</sub>, which showed no problems in the point comparison, revealed important differences in shape (urban area of influence on temperatures) and time [Principal Components (PC) correlation of −0.5] for the urban pattern between datasets, suggesting the need for model improvements. Results showed no systematic urban effects on summer P for the CMA for observations or simulations, but analysis of winter patterns suggested a possible urban enhancement on P over the city.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>File generated with AMS Word template 2.0</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>More than 80% of the US population and more than 50% of the world population currently live in urban areas. These numbers are expected to grow in the coming decades (United Nations 2016; CSS 2022).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>a. Urban Climate</head><p>Urban areas have a well-known impact on local and regional climate. Increased impervious surfaces (e.g., pavement and buildings), reduced latent heat flux, increased thermal mass, higher concentration of human heat sources, change in surface roughness due to structures, and increase in aerosols and pollutants all have potentially important effects on meteorological variables in urban environments <ref type="bibr">(Kalnay and Cai 2003;</ref><ref type="bibr">Foley et al. 2005;</ref><ref type="bibr">Hidalgo et al. 2008)</ref>. Furthermore, urbanization and its impacts on climate and extreme weather have been shown to negatively impact human health <ref type="bibr">(McMichael 2000;</ref><ref type="bibr">Tan et al. 2010;</ref><ref type="bibr">Singh et al. 2020)</ref>, energy consumption <ref type="bibr">(Hirano and Fujita 2012;</ref><ref type="bibr">Li et al. 2019;</ref><ref type="bibr">Su et al. 2021)</ref>, and local ecosystems <ref type="bibr">(McKinney 2002;</ref><ref type="bibr">Wilby and Perry 2006;</ref><ref type="bibr">Newbound et al. 2010)</ref>. Over the last several decades, research has sought to better understand the complex interactions between urban land use, climate, and extreme weather <ref type="bibr">(Lovell 2010;</ref><ref type="bibr">Georgescu et al. 2014;</ref><ref type="bibr">Sharma et al. 2016)</ref>. These studies intend to inform urban planning and identify effective climate change adaptation strategies that can reduce vulnerability and increase resilience in response to growing impacts.</p><p>The high complexity and variability of the numerous urban impacts acting on cities can produce conflicting effects on the local climate, resulting in either increases or reductions in temperature, humidity, wind speed, and precipitation at different times and spatial scales <ref type="bibr">(Moriwaki et al. 2013;</ref><ref type="bibr">Theeuwes et al. 2015;</ref><ref type="bibr">Droste et al. 2018;</ref><ref type="bibr">Karlick&#253; et al. 2020;</ref><ref type="bibr">Qian et al. 2022</ref>). The best-known impact of urbanization on climate is the increase in temperature in cities compared to their surrounding suburban or rural areas. This phenomenon, called urban heat island (UHI), is present in all urban areas to some extent <ref type="bibr">(Oke 1982)</ref>. In cities with hot climates or seasons, UHI effects on temperature can increase discomfort and pose heightened risks of heat stress and mortality in vulnerable communities <ref type="bibr">(Changnon et al. 1996;</ref><ref type="bibr">Stewart and Oke 2012)</ref>, a significant issue for steadily growing urban populations. UHI Intensity (UHII) have been studied extensively in cities worldwide <ref type="bibr">(Bornstein 1968;</ref><ref type="bibr">Yag&#252;e et al. 1991;</ref><ref type="bibr">Wong and Yu 2005;</ref><ref type="bibr">Kikon et al. 2016)</ref> Accepted for publication in Journal of Applied Meteorology and Climatology. DOI 10.1175/JAMC-D-24-0077.1. File generated with AMS Word template 2.0 city-scale meteorological forcings such as negative correlations with wind speed or cloud cover, positive correlation with population, or varying with wind direction, time of day, or season <ref type="bibr">(Oke 1973;</ref><ref type="bibr">Kim and Baik 2005;</ref><ref type="bibr">Yue et al. 2019)</ref>. Reviews like <ref type="bibr">Mirzaei and Haghighat (2010)</ref> and <ref type="bibr">Rizwan et al. (2008)</ref>, however, have highlighted that the results vary strongly between studies and are highly dependent on the data or methods chosen, for instance, resulting in different UHII calculated at the same city when using station or satellite data, or air or surface temperature. This sometimes leads to conflicting results, such as UHI being stronger at day or night, or the presence of correlation between UHI and wind speed, cloud cover or population. Some of these differences are explained by how UHII is calculated, with conventional methods often misrepresenting the footprint of the UHI (i.e., the specific area within which urban land use affects the temperature field), taking it to be equal to the geographical boundary of the city when studies have shown that it often extends beyond the official city boundaries <ref type="bibr">(Liu et al. 2021;</ref><ref type="bibr">Sharma et al. 2021</ref>). These inconsistencies indicate that further work is still needed to better understand UHI effects on temperature under different circumstances and successfully model them.</p><p>The modification of precipitation induced by urban areas has also been well documented <ref type="bibr">(Huff and Vogel 1978;</ref><ref type="bibr">Shepherd 2005;</ref><ref type="bibr">Niyogi et al. 2011)</ref>. Competing mechanisms such as UHI-induced convergence, storm bifurcation due to building-barrier effect, or aerosol impacts in droplet formation can produce enhanced or reduced precipitation under different climatological conditions, over or around urban areas <ref type="bibr">(Shepherd 2002;</ref><ref type="bibr">Dou et al. 2015;</ref><ref type="bibr">Zhong et al. 2015)</ref>. The meta-analysis by <ref type="bibr">Liu and Niyogi (2019)</ref> concluded that the most common impact of urban areas is enhanced precipitation downwind of a city and, in some cases, enhanced precipitation over the city itself. They also note the importance of studying multiple precipitation events (i.e., a climatology rather than individual events) for the overall effect of urban areas to be representative, and many studies have recommended segmentation by wind direction to discern upwind vs. downwind effects on an event basis <ref type="bibr">(Liu and Niyogi 2019;</ref><ref type="bibr">Moraglia et al. 2024)</ref>. Understanding the complex mechanisms and parameters characterizing the urban area with the most significant effect on precipitation modifications within different climatologies and case studies is an ongoing area of research, with further work needed.</p><p>To understand the complex underlying physics involving the interactions between urban areas and climate, the use of Regional Climate Models (RCMs), with their capacity to extend Accepted for publication in Journal of Applied Meteorology and Climatology. DOI 10.1175/JAMC-D-24-0077.1.</p><p>Brought to you by University of Illinois Urbana-Champaign Library | Unauthenticated | Downloaded 02/12/26 04:43 PM UTC File generated with AMS Word template 2.0 quantitative analysis beyond the scope of observations (such as assessing the impacts of future urbanization on meteorological fields), has proven to be an effective tool for simulating and predicting regional climate patterns <ref type="bibr">(Wang et al. 2004;</ref><ref type="bibr">Giorgi 2019)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>b. Regional Climate Models (RCMs)</head><p>Constant increases in computational resources have made it possible to use RCMs to perform ever-more complex simulations, enabling a better understanding of the fundamental physics of the urban environment and the interaction of different contributing factors <ref type="bibr">(Hidalgo et al. 2008</ref>). In the context of climate change vulnerability or adaptation studies, the high-resolution prediction capability of an RCM (~1 km grid spacing) and its capacity to resolve urban areas through canopy parametrizations and provide insights about the specific urban processes impacting its weather and climate, often satisfies a crucial need to provide information at appropriate scales for urban planning and policy development (potentially resolving individual components of the urban area; <ref type="bibr">Chen et al. 2011;</ref><ref type="bibr">Sharma et al. 2021)</ref>.</p><p>Given the essential nature of RCMs as tools for understanding how future climate, land use and urbanization settings are likely to affect climate and weather in cities (through their capability to test scenarios different from current and past conditions), it is of critical importance to confirm that the models that provide us with these insights are capable of accurately representing not only the overall climate of a region, but also the basic mechanisms through which the urban areas modify local climate. These issues highlight the need to devote more effort to improving the evaluation of models as we attempt to improve them through ongoing development <ref type="bibr">(Rizwan et al. 2008;</ref><ref type="bibr">Tapiador et al. 2019)</ref>.</p><p>RCM evaluation (sometimes called validation) is usually done by comparing model simulations and observations taken at different temporal and spatial scales <ref type="bibr">(Gleckler et al. 2008</ref>). In the simplest form, the simulated meteorological fields are compared in a point-topoint scheme against observed data (e.g., satellite grids or gridded observations based on station data; AghaKouchak and Mehran 2013). In the case of urban performance evaluation, having such complex systems with many different elements interconnecting at varying scales, makes it difficult to evaluate the correct representation of separate urban mechanisms, even more so when most models do not incorporate them all (e.g., atmospheric aerosol chemistry is often omitted from urban modeling studies). Traditional approaches provide useful information on model performance at specific locations, but a more comprehensive</p><p>Accepted for publication in Journal of Applied Meteorology and Climatology. DOI 10.1175/JAMC-D-24-0077.1. Brought to you by University of Illinois Urbana-Champaign Library | Unauthenticated | Downloaded 02/12/26 04:43 PM UTC File generated with AMS Word template 2.0 comparison of spatial patterns (i.e., spatial covariance) in observations and model simulations is often missing <ref type="bibr">(Gilleland 2013;</ref><ref type="bibr">Li et al. 2019</ref>).</p><p>At larger scales, studies of meteorological fields around urban areas have often found that it is difficult to distinguish between the influence of cities and other factors such as complex terrain, seasonal atmospheric circulation patterns, and the presence of large water bodies, among others. To minimize such effects, researchers often select urban locations in relatively flat terrain away from major water bodies <ref type="bibr">(Oke 1973;</ref><ref type="bibr">Shepherd et al. 2002;</ref><ref type="bibr">Tan and Li 2015)</ref>. This highlights the need for more generalized tools for model evaluation in urban areas, with the capability to discern between urban and other external forcings. One statistical tool that can help us in this undelaying pattern evaluation is Empirical Orthogonal Function (EOF) analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>c. Empirical Orthogonal Functions (EOFs)</head><p>EOF is a powerful statistical technique that extracts an optimum set of dominant patterns in space-time datasets <ref type="bibr">(North et al. 1982;</ref><ref type="bibr">Neha and Pasari 2022)</ref>. It was first introduced as a tool for studying climate patterns decades ago <ref type="bibr">(Lorenz 1956;</ref><ref type="bibr">Kutzbach 1967</ref>) and has been widely used since then, proving effective in understanding large-scale climate phenomena.</p><p>EOF analysis optimizes the resulting patterns based on the covariance in the data itself and does not include any a priori assumptions about what the spatial patterns should look like (e.g., assumptions about the extent of the urban area over which temperature or precipitation effects will express themselves; Neha and Pasari 2022), while most traditional tools use upfront assumptions about urban characteristics (e.g., urban boundaries) that can restrict the encountered results. The technique is helpful for analyzing and comparing independent patterns instead of only using absolute values of a field, allowing us to focus on underlying phenomena more relevant to our mechanisms of interest (e.g., effects of urban areas).</p><p>Many studies have successfully used EOF analysis to study independent urban-related patterns in meteorological data. EOF analysis has been performed to study the effect of urban areas on temperature <ref type="bibr">(Kim and</ref><ref type="bibr">Baik 2005, Silva et al. 2017, using observed data;</ref><ref type="bibr">Pigeon et al. 2006</ref>, also using model simulations) with EOF patterns showing the area of extent of the influence of the city on temperatures, its relative magnitude, and its variation over time.</p><p>Although less often than with temperature, these studies have also been performed to study urban effects on precipitation <ref type="bibr">(Han et al. 2014, and</ref><ref type="bibr">Liang and</ref><ref type="bibr">Ding 2017, both</ref>  Accepted for publication in Journal of Applied Meteorology and Climatology. DOI 10.1175/JAMC-D-24-0077.1. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>d. Objectives</head><p>Since datasets have constraint domains and limited resolution in space or time, the climate patterns found through EOF analyses are only estimates of the climate variability being analyzed <ref type="bibr">(North et al. 1982)</ref>, and as such, they do not always have a clear physical meaning <ref type="bibr">(Monahan et al. 2009)</ref>. This can happen especially when the data is not affected by physical mechanisms in a consistent manner such as consistent warming or increased precipitation over a city, or consistent cooling by a mass of water in summer. It is fair to then ask whether EOF analysis can be reliably used as an evaluation tool for urban climate. While many previous studies using EOF to analyze urban climate have gained understanding through the behavior of the found patterns, the current study aims to perform more extensive analyses and tests to verify that the patterns we find in observations have consistent and physically based interpretations.</p><p>Most previous work done using EOF analysis on urban areas to study the effect of the city on climate used observations or model simulations, but these are not usually done together (as did <ref type="bibr">Pigeon et al. 2006, in Marseille, France)</ref> or compared to each other with the intention of evaluating models (as <ref type="bibr">Koch et al. 2015</ref>, did for hydrology models). Our primary objective is to evaluate the capacity of an RCM to reproduce urban climate patterns (in space and time) and to gain confidence on its correct representation of urban mechanisms affecting climate.</p><p>Our objective is not to gain an understanding of the effect of urban areas on local climate for our case study, although we present these results as needed to help evaluate our methodology.</p><p>These new approaches for evaluating models will give us essential insights in future work while testing developing models, comparing models with different configurations, or when later using these models to study alternative urban scenarios.</p><p>In this paper, we design a methodology based on EOF analysis to evaluate the ability of an RCM (the Weather Research and Forecast Model: WRF) to reproduce the underlying spatial and temporal patterns that characterize the urban influence on meteorological fields when comparing against gridded meteorological observations. Our approach incorporates an unbiased calculation of UHII (no specification of urban boundary is needed), uses climatology data to return robust statistics (not only single events), and, potentially, has the</p><p>Accepted for publication in Journal of Applied Meteorology and Climatology. DOI 10.1175/JAMC-D-24-0077.1. Brought to you by University of Illinois Urbana-Champaign Library | Unauthenticated | Downloaded 02/12/26 04:43 PM UTC File generated with AMS Word template 2.0 natural capability of separating between urban and other forcings when analyzing meteorological fields.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>a. EOF Method</head><p>In this application, the EOF analysis disaggregates a 2-D meteorological field, with every single latitude and longitude space position as rows, and time as columns (e.g., gridded temperature or precipitation datasets), into a set of linearly independent spatial patterns (the EOFs) and also a set of time-series multipliers associated with each of them [the Principal <ref type="bibr">Components (PCs)</ref>]. The analysis also returns the amount of covariance in the original dataset explained by each pattern, which is used to sort them in order of importance <ref type="bibr">(Hannachi et al. 2007</ref>). The steps of the procedure are detailed below and summarized in Fig. <ref type="figure">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1-Select a dataset to analyze:</head><p>Input: Raw gridded meteorological dataset.</p><p>Output: Raw dataset as 2-D matrix.</p><p>The process is done one variable at a time (e.g., total daily precipitation P), forming a matrix with "m" rows (dimension of space) and "n" columns (dimension of time).</p><p>2-Perform data transformation (spatial normalization) to the matrix: Input: Raw dataset as 2-D matrix.</p><p>Output: Space anomaly matrix "A".</p><p>EOF analysis usually requires removing the temporal mean of each spatial location to create an anomaly matrix in time, but here we follow a common practice when performing EOF analysis to study UHI effects <ref type="bibr">(Kim and Baik 2005;</ref><ref type="bibr">Pigeon et al. 2006;</ref><ref type="bibr">Silva et al. 2017)</ref> and calculate an anomaly matrix in space instead <ref type="bibr">(Harman 1976)</ref>. That is, at each time step, the mean value of the domain (spatial mean) is subtracted from each cell (i.e., subtract the mean of each column in the raw matrix), resulting in the spatial anomaly of the step rather than the actual value of the variable in that location. This maintains spatial information that would be lost otherwise but is useful to evaluate UHI effects. It also removes the</p><p>Accepted for publication in Journal of Applied Meteorology and Climatology. DOI 10.1175/JAMC-D-24-0077.1. Brought to you by University of Illinois Urbana-Champaign Library | Unauthenticated | Downloaded 02/12/26 04:43 PM UTC</p><p>File generated with AMS Word template 2.0 seasonality and temporal trend of the spatial mean of the domain, which otherwise dominates the EOF analysis. We call the resulting anomaly matrix "A" (with dimension m x n).</p><p>3-Perform data filtering for P: Input: Space anomaly matrix "A".</p><p>Output: Filtered space anomaly matrix "A" (only for precipitation data).</p><p>We considered only days with average precipitation over the domain greater than a certain threshold (1 mm in our study, reducing the number of days by approximately 50%) so that the EOF analysis is not affected by days with trace or zero values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4-Add weights to matrix A:</head><p>Inputs: Space anomaly matrix "A" (filtered for precipitation).</p><p>Outputs: Weighted space anomaly matrix "AW". This is done to account for different cell area sizes if necessary (it can impact the results if the area differences within the domain are big). Matrix A can be multiplied entry-wise (also known as Hadamard product, denoted by the symbol &#8857;) by a matrix "W" (with dimensions m x n) containing the square root value of the area of each cell <ref type="bibr">(North et al. 1982;</ref><ref type="bibr">Baldwin et al. 2009)</ref>, creating a weighted matrix AW:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5-Carry out the EOF analysis:</head><p>Inputs: Weighted space anomaly matrix "AW".</p><p>Outputs: Preliminary spatial patterns "E", associated timeseries "Z", and covariance explained "&#923;". We used the Python package "eofs" <ref type="bibr">(Dawson 2016)</ref>, which uses singular value decomposition (SVD) to decompose the "AW" matrix into the following form:</p><p>where &#119880; is an m x m matrix, &#120564; an m x n matrix, &#119881; and n x n matrix, and the superscript ( ) T indicates a transposed matrix. It is straightforward to show that U from the SVD is the m x m matrix of Eigenvectors (the EOFs), which we will call "E". The diagonal matrix &#120556; of Eigenvalues and the matrix of time series multipliers ZW of dimension m x n (the PCs) can be calculated as follows:</p><p>ZW has values altered by the weighting applied to A, which needs to be removed to return to the unweighted scales for analysis. We divide ZW by mean value of W (i.e., the mean of the root square value of the area of each cell, denoted &#119882;):</p><p>Each column vector of the matrix E (Ek of dimension m x 1) corresponds to a spatial pattern at position "k" (also called mode k). The Eigenvalues are scalars associated with each mode k, and indicate how much of the total covariance each pattern Ek explains (Sorted from high to low, e.g., first vector E1 explains the most covariance). Each row vector of the matrix Z (Zk of dimension 1 x n) corresponds to time series multipliers associated with each mode k.</p><p>If we multiply a pattern corresponding to any mode "k" (Ek) with its associated time series multiplier (Zk), they construct a matrix Xk, dimension m x n, which can be thought of as the contribution of the k th spatial pattern (Ek) to the original data through time:</p><p>The &#119864; &#119896; multiplied by their respective time series, Zk, added over all modes, exactly reproduces the original anomaly matrix A. That is:</p><p>or in matrix form:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>6-Applying scaling factors:</head><p>Inputs: Preliminary spatial patterns "E" and associated timeseries "Z". In the case of a mode whose pattern is associated with the influence of the urban area on temperature, PCk is an estimate of the contribution of the city to the UHII intensity over time.</p><p>After the transformation, the original EOF equation of Step 5 is maintained: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>b. Data</head><p>For this project we used two datasets, one based on observed data and the other on model simulations. The datasets were selected because they complied with the requirements for our work. One is based on observations and has long and complete records, making the EOF analysis more reliable. The other is a long-term simulation from a model application still under development, allowing for our evaluation of its performance to find possible weaknesses to address. It must be noted that the final purpose of this project is to evaluate a new methodology and not to draw conclusions for the datasets themselves. It is, in fact, convenient for the RCM to be a "work in progress" so that our tools can help identify needed areas of improvement (particularly with respect to urban characterization).  <ref type="bibr">et al. (2023)</ref>. The gridded meteorological dataset (OBS) contains daily minimum and maximum temperature (Tmin and Tmax, respectively), and precipitation (P) values over the US Midwest and Great Lakes Region. The dataset was developed following the procedures and corrections of <ref type="bibr">Maurer et al. (2002)</ref> and <ref type="bibr">Hamlet and Lettenmaier (2005)</ref> to interpolate Global Historical Climatology Network daily (GHCN-daily) stations. It also includes a novel approach to correct P undercatch issues, increasing P by 10 to 15% in summer months (higher in winter). Measurement information and methods for the GHCN-daily stations can vary, and while they aim to represent a 24-hour period ending at local midnight, many are taken at local morning or evenings, which might cause some issues when comparing against simulations (especially for storms daily timing). The spatial resolution of the interpolated OBS dataset is 1/16 th degree latitude/longitude (about 5 x 7 km at 45 N latitude), and the temporal extent is 1915 -2021. For the comparison against model simulations performed in this paper, only summer months of 1994 -2000 (May-August) were used, resulting in 840 days of data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) OBSERVATIONS (OBS):</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) WRF DATA:</head><p>To test the diagnostic capabilities of our proposed methodology and compare it against the results from the OBS dataset, we used a 4 km WRF simulation over the North American region (Fig. <ref type="figure">S1</ref>) being performed at the Argonne National Laboratory The domain selected for our case study is the Chicago Metro Area (CMA; Fig. <ref type="figure">2</ref>). The CMA has a large spatial extent and is the second-largest City in the U.S. by population. The city has a complex interaction with the adjacent Lake Michigan, making analyses of urban climate a good challenge to test our methodology.</p><p>The domain selection (Section 2 in the Supplemental Material) describes the process used to select the horizontal domain size. Changes in domain size can result in phenomena of different scales dominating the patterns explaining the data, with smaller effects (such as the city influences) being masked if domains are too large. Figure <ref type="figure">S2</ref> shows an example of the domain size selection process for a single variable and dataset. The only output for this process is the chosen domain for the CMA, here selected as a horizontal box of 2.4 degrees per side, centered at 41.88N latitude and -87.63W longitude.</p><p>Figure <ref type="figure">2</ref> shows general information about our domain. Figure <ref type="figure">2a</ref> shows the location of the meteorological stations we used as point comparison for our datasets (only stations with data for our time period are shown). Figure <ref type="figure">2b</ref> shows the topography of the area, mostly flat, with higher elevations (around 100 m difference) in the northwest area of the domain.  The first analysis we do with our datasets is to evaluate them against meteorological station data in a point-to-point fashion, using Root Mean Square Error, &#119877;&#119872;&#119878;&#119864; = &#8730;&#120564;[(&#119900;&#119887;&#119904; &#119894; -&#119904;&#119894;&#119898; &#119894; ) 2 ]/&#119899; (where n is the common length of the datasets), Peason's correlation, &#119903; = &#120564;[(&#119900;&#119887;&#119904; &#119894; -&#119900;&#119887;&#119904; &#773;&#773;&#773;&#773;&#773; )(&#119904;&#119894;&#119898; &#119894; -&#119904;&#119894;&#119898; &#773;&#773;&#773;&#773;&#773;)]/&#8730;&#120564;[(&#119900;&#119887;&#119904; &#119894; -&#119900;&#119887;&#119904; &#773;&#773;&#773;&#773;&#773; ) 2 (&#119904;&#119894;&#119898; &#119894; -&#119904;&#119894;&#119898; &#773;&#773;&#773;&#773;&#773;) 2 ] (considered good here when above 0.7), and percentage bias in the mean, %&#119887;&#119894;&#119886;&#119904; = 100(&#119904;&#119894;&#119898; &#773;&#773;&#773;&#773;&#773; -&#119900;&#119887;&#119904; &#773;&#773;&#773;&#773;&#773; )/&#119900;&#119887;&#119904; &#773;&#773;&#773;&#773;&#773; (considered good here when lower than 10%). These metrics are calculated on each cell and station pair. When calculating a single value of %bias for the domain, the absolute value of the %bias of each cell and station pair is used, to avoid positive and negative biases cancelling out. Methods and more detailed results are also discussed in the Supplemental Material, Section 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>e. EOF Evaluation 1) URBAN EFFECTS</head><p>To evaluate our methodology, we first tested its application using the observation-based (OBS) dataset. We want to evaluate whether we can identify EOF patterns related to physical mechanisms (specifically urban effects) and see if these are clear enough to be used as a representation of real underlying patterns. The physical meaning attributed to each EOF is established by examining the spatial patterns (e.g., noting the presence or absence of the urban areas), and by considering the values of the PCs, which determines the sign and magnitude of the patterns through time. This is done first only on the observation-based dataset to gain confidence in the capacity of our tool to differentiate meaningful results from statistical noise before moving on to evaluate the capacity of RCMs to reproduce these patterns. These results are also needed for the methodology sensitivity tests.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) METHODOLOGY SENSITIVITY TESTS</head><p>Since the EOF analysis is not based on physical principles but is a purely statistical procedure, some or all the resulting patterns could not be related to real phenomena and may be instead statistical noise for the particular set of data used as input to the analysis. It is then essential to verify if the results remain consistent when considering different input conditions that the user decides, such as the period of time selected or domain characteristics. The first sensitivity analysis evaluates sets of data from different time periods with no overlapping days. For this, the EOF methodology is applied to the OBS dataset using groups of seven years, starting in <ref type="bibr">1980-1986 and ending in 2008-2014</ref>. The resulting EOF patterns and PCs are compared to evaluate differences and similarities, especially in the effect of the urban area. This test is critical because, if nonoverlapping data returns similar patterns, it indicates that such patterns are based on physical mechanisms.</p><p>The remaining sensitivity tests performed on the OBS dataset are described and presented in the Supplemental Material (Section 5.2). They consist of an analysis of data length using as input periods of time of different sizes (7 to 35 years), which inform us about the accuracy of the results when using a reduced number of years (often necessary for WRF simulations); an analysis of domain location where the center location is moved by 0.3 degrees in each cardinal direction; and an analysis of domain size using distances from the center from 0.8 to 1.6 degrees in all cardinal directions. The EOF patterns and PCs are evaluated for all these tests, focusing on urban-related patterns. The goal is to determine if the parameters chosen for the remaining work are a good representation of the general characteristics of the area.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>f. EOF Application</head><p>The method for model evaluation consists of using EOF analysis in both the observationbased dataset and the RCM simulations and comparing the resulting patterns. We evaluate how closely the model reproduces the effect that different physical mechanisms have on the meteorological fields by pairing EOF patterns with similar characteristics and doing the following: 1) visual comparison of pattern shapes (e.g., noting the presence or absence of the urban areas), 2) comparison of pattern importance (i.e., the fraction of covariance explained in the original data), and 3) comparison of pattern behavior over time by looking at their PC's consistencies, signs, magnitudes, and correlation in time.</p><p>To complement the comparison of pattern behavior we also evaluate whether the correlation between the PCs (r) is statistically significant using t-statistics. Our Null hypothesis is that r = 0, t value is calculated with &#119905; = &#119903;&#8730;&#119899; -2/&#8730;1 -&#119903; 2 where n is the sample size, and the critical value t* is calculated with a significance level of 0.025 (twotailed test with 95% confidence). If t &gt; t*, we conclude that the correlation is significantly different from zero. Note that, given the little change in t* for different data lengths (e.g. With this, we can evaluate the performance of the model not only in reproducing the overall meteorological fields, but also in reproducing the patterns generated by specific physical mechanisms, focusing on the urban area.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>a. Temperature and Precipitation Patterns</head><p>Figures <ref type="figure">3a</ref>, <ref type="figure">b</ref>, and <ref type="figure">c</ref> show the mean summer values for the OBS dataset for Tmax, Tmin, and P, respectively, calculated as the temporal mean of the annual values (total annual P or mean annual T) from 1994 to 2000. Tmax shows higher temperatures to the south and lower temperatures to the north, with a high temperature spot in the left portion of the city. The highest temperature of the domain is not located in the city center, likely due to the cooling effect of the lake during the summer months. Tmin also shows higher temperatures to the south and a high temperature spot around the city center. In neither temperature case do we see a UHI effect elevating temperatures across the entire area of the city, with effects of latitude, elevation, or the lake (Fig. <ref type="figure">2</ref>) likely contributing to the final result. P shows decreased precipitation inside and north of the city, as well as in the south-west area of the domain, with a high P spot close to the lower right corner. A more detailed climatology analysis, looking at a larger domain and temporal trends, can be found in the Supplemental Material, Section 4. File generated with AMS Word template 2.0</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>b. Point Evaluation Against Stations</head><p>Following section 2.d, Table <ref type="table">1</ref> shows, for the OBS and WRF datasets, mean domain values as well as mean RMSE, correlation and abs %bias against GHCNd stations. We found the OBS dataset to be a good representation of the observations, with good matching against the station data, as expected. Results showed a mean r around 0.99 and mean abs %bias around 0.5% for Tmin and Tmax, and 9% for P, likely due to the undercatch correction applied on OBS. The WRF simulation showed bad results for the simulated P, with a correlation close to zero and a high RMSE of 13.5 mm. The mean abs %bias was close to 10%, similar to the OBS dataset, likely again due to the simulations not having precipitation undercatch issues present in station measurements. Tmax and Tmin presented moderate agreements with the stations, with RMSE close to 6 &#176;C and mean r around 0.4. Tmax showed a low mean abs %bias of 2.7%, while Tmin presented a high bias inside the urban area and around the lake (%bias &gt; 30%, Fig. <ref type="figure">S3</ref>), indicating a general overestimation of the UHI effect. Our goal now is to see whether the EOF analysis confirms some of these issues related to urban behavior in the model (e.g., showing a high bias in urban heating), or highlights new or different ones. ) Tmin (&#176;C) Prcp (mm) Tmax (&#176;C) Tmin (&#176;C) Prcp (mm) Mean Values 25.66 14.35 3.8 25.49 15.85 3.68 Mean RMSE 0.47 0.27 1.87 5.9 5.83 13.53 Mean Correlation 0.99 1 0.96 0.4 0.43 0.01 Mean Abs %Bias 0.6 0.51 8.91 2.74 11.82 10.89</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>c. EOF Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) URBAN EFFECTS</head><p>The EOF analysis returns a series of EOF-PC pattern pairs, and we evaluate their likelihood of being related to some physical mechanism. For the case of Tmax and Tmin (Fig. <ref type="figure">4</ref>), EOF-1 shows a north-south dipole, indicating mostly warmer temperatures to the south, therefore likely related to the effect of latitude or elevation differences (Fig. <ref type="figure">2b</ref>), with some possible mixing of lake or urban effect for Tmax. In File generated with AMS Word template 2.0 the city could be explained by issues with the station data, mixing of the EOF with effect of the lake, temperature advection consistently shifting the location of the urban effect on T, or other source of data noise.</p><p>For the case of P, no physical mechanism could be assigned to any case and, more importantly, no clear effect of the urban area could be detected. The sparse nature of summer thunderstorms and the inconsistent nature of urban modification of precipitation due to conflicting impacts (e.g., precipitation inhibition due to building-barrier or precipitation enhancement when high UHII is present), together with some of these effects being dependent on wind direction (which we do not take here into account), make finding consistent patterns challenging. We note, however, that other seasons explored in the Supplemental Material Section 6 show a more evident pattern of urban influence, especially in fall and winter (Fig. <ref type="figure">S14</ref>, EOF-3), suggesting substantial differences in the urban influence on large-scale cyclonic storms in winter and relatively small-scale convective storms in summer. A more detailed description of the results can be found in the Supplemental Material, Section 5.1.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Accepted for publication in</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>For each variable, columns show different EOF modes for: (top) EOF patterns, each including the percentage of total covariance explained for each variable, and (bottom) distribution of daily PCs associated with each pattern, shown as boxplots for the daily values (&#176;C) in each year. Boxplot boxes indicate 1st to 3rd quantile, whiskers indicate the farthest data point lying within 1.5 times the interquartile range from the box, orange lines indicate median, and the red line follows the mean values.</head><p>File generated with AMS Word template 2.0 2) SENSITIVITY TESTS Following the methods described in 2.d.2, sensitivity tests were performed on all variables for the first three EOFs for the OBS dataset. The results presented here summarize the most relevant points.</p><p>Figure <ref type="figure">5</ref> shows the resulting EOF-2 for Tmax and Tmin when using data from independent periods of seven years. EOF-2 is shown here because it is the most clearly related to urban effects, but the other modes give similar results. The patterns for different years are similar for both variables, which provides confidence that the shapes are based on robust physical mechanisms present in urban areas, such as heating from anthropogenic sources, building trapping of radiation, lack of latent heat, or higher thermal mass, and are not just random statistical constructions. The patterns are expected not to be identical since the data fed for each one is different, and while some changes could be associated with changes in land use or climate through time (e.g., the growing area of influence of the city for Tmin being related to increased urbanization), other shapes for both variables seem to be affected by unclear behavior of particular stations during those periods (e.g., the spots forming south of the lake in the years <ref type="bibr">1987</ref><ref type="bibr">-1993</ref><ref type="bibr">and 1994</ref><ref type="bibr">-2000 for Tmax) for Tmax)</ref>. The PCs show similar positive tendencies for all periods, with some fluctuations in their magnitudes due to climate variability.</p><p>The Supplemental Material Section 5.2 includes the result of the sensitivity analysis on the time length of the input data (Fig. <ref type="figure">S9</ref>), the location of the center of the domain (Fig. <ref type="figure">S10</ref>), and the size of the domain (Fig. <ref type="figure">S11</ref>). The resulting patterns present similar overall results to those found in Fig. <ref type="figure">5</ref>, with minor disagreement between the different time periods. For time length, using larger periods of time results in more precise patterns, with more distinct areas and fewer spots outside the urban region, but the seven years of summer used are good enough to reproduce the correct shapes and magnitudes. Domain characteristics result in minor changes in the resulting EOF shapes, especially for changes in center location, so care must be taken when selecting an appropriate one.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Accepted for publication in</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>d. WRF Evaluation</head><p>Following the methods described in Section 2.e, we compared the EOF patterns between the OBS and WRF datasets. Since our goal is to evaluate the representation of physical mechanisms by the model and compare them to the observations, the figures in this section show a side-by-side comparison of the EOF patterns associated with similar underlying phenomena rather than being strictly ordered by fraction of covariance explained. For example, for Tmin, EOF-2 from the observations is compared to EOF-1 from the model simulations because both exhibit urban characteristics, even if they don't show up in the same rank position. File generated with AMS Word template 2.0 34%, but a similar spatial pattern appears as EOF-2 for the WRF dataset with only 17% of the total variance explained. The daily correlations are low, around 0.09, but the monthly correlations have a higher value of 0.49, both values statistically significant with 95% confidence. It is curious that while the PCs for the OBS dataset tend to be positive, the PCs for the WRF case show no overall tendency, indicating a similar chance of having higher temperatures in the south or north of the domain due to this pattern. This suggests that the dominant spatial signature of the observations and simulations is different. The observations strongly suggest a latitudinal gradient, but the WRF simulations are less consistent with this interpretation. Somewhat higher elevation in the northwest part of the domain (Fig. <ref type="figure">2b</ref>) could also contribute to this pattern.</p><p>We identified the strongest urban-related patterns in EOF-2 for the observations and EOF-1 for the WRF simulations (Fig. <ref type="figure">6d-f</ref>). The spatial signature for the urban area is similar for both datasets, with mostly positive PCs indicating higher temperatures inside or close to the city area and lower temperatures in the surroundings (note that negative PC values indicate the possible presence of Urban Cool Island developing in a few instances). In the case of the OBS dataset, the EOF pattern is close to the geographic boundary of the city (land cover in Fig. <ref type="figure">2c</ref>) but with a larger extension and a clear difference between the center and the perimeter of the urban area. The urban area from the WRF dataset, however, follows the exact geographic boundary of the city, which is expected since the model simulation includes the identification of those cells as urban, and the land surface properties are modified accordingly, but also highlights a possible lack of T diffusion and advection from the urban area to the surroundings from the model.</p><p>In Section 3.a, we noted that the WRF model had a strong positive bias over the city, resulting here in the city pattern dominating the EOF analysis for WRF, explaining 65% of the total variance and showing the strongest urban pattern as EOF-1. For the OBS dataset, only 17% of the covariance is explained by the strongest urban pattern, which is ranked as EOF-2. But even though there is an important bias issue and a resulting overemphasis on the urban influence on Tmin in the WRF simulations, the comparison of the shape and behavior of the patterns over time shows good results, with a correlation of 0.1 and 0.57 for daily and monthly data, respectively, higher than any other pattern in our study and both statistically significant.  EOF-3 for both datasets shows a northeast-southwest pattern, but we do not see a clearly defined tendency in the values of the PCs to help us define a significant physical meaning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Accepted for publication in</head><p>We can see that the model is doing a good job of representing this underlying climate pattern from the observations, even without a straightforward interpretation. The patterns show a similar shape, a similar fraction of the variance explained with close to 7% in both cases, not much difference on the PCs magnitudes as shown in the boxplots, but with low correlations We see good overall agreement in the resulting EOF patterns for Tmin, with some differences in the exact city boundary in the urban-related EOFs. There is considerable disagreement on the importance of each pattern, with the urban-EOF (EOF-2) for WRF explaining most of the covariance in the data (65%), whereas in the observations the urban influence explains a much lower fraction of the covariance (17%). The correlation of the PCs is the strongest of the three variables we assessed, and the only one that produced statistically significant correlations at daily scale. The spread of the values for the PCs, however, shows important differences between the two datasets, likely due to the high bias of WRF inside the urban area.</p><p>Figure <ref type="figure">7</ref> shows the same analysis performed on the daily Tmax variable. Both datasets display a north-south dipole as EOF-1 with mostly positive PCs for the main pattern, indicating warmer temperatures in the southern part of the domain. For the WRF simulations, this pattern has a bigger significance, with more than 57% of the total variance explained vs. only 38% for the OBS dataset. The correlation between the corresponding PCs gives similar values to Tmin, with 0.02 and 0.55 for daily and monthly data, respectively, but here only the monthly value is statistically significant. EOF-1 for the WRF analysis shows a substantial urban effect on Tmax, whereas this pattern is notably absent in EOF-1 for the observations. In the same was as seen on Tmin, this pattern is most likely associated with the effect of latitude on temperature, with possibly some influence of elevation differences (Fig. <ref type="figure">2b</ref>). EOF-2 for both datasets is an urban-shaped pattern, with mostly positive PCs (as with Tmin, negative PC values indicate the possible presence of Urban Cool Island developing in some cases). Unlike the case of Tmin, here the spatial distribution of the patterns has some substantial differences. The WRF pattern follows the geographic boundary of the city (as expected, given the land use pattern specified in the model implementation) and shows areas with warmer temperatures to the north and southwest of the city. EOF-2 for the OBS dataset, however, while exhibiting higher temperatures inside the city, doesn't follow its boundary and shows systematic warming in the southeast areas of the domain. Higher density of measurements close to the city boundary, remote sensing observations (e.g., satellites), or more complex representations of the urban area in model implementations might resolve a smoother transition between urban and rural areas. The magnitudes of the PCs also show File generated with AMS Word template 2.0 significant differences, with a much smaller spread for the WRF results. The correlation between the PCs in this case is negative for daily and monthly cases, statistically significant with a value of -0.54 for the latter, indicating that the pattern related to the urban effect on Tmax is not well represented by the model, and/or it is not well captured by the EOF analysis on the OBS dataset. EOF-3 for both datasets shows an east-west pattern, which may be related to the effect of the lake. The spatial patterns and fraction of covariance explained are quite similar. For the OBS dataset, lake effects are a feasible interpretation since the values of the PCs are mostly positive, indicating lower temperatures closer to the lake, as expected to happen in summer. For the WRF dataset, however, the PCs are equally represented in positive and negative values, indicating little systematic effect. In this case, the correlation between the patterns is Accepted for publication in Journal of Applied Meteorology and Climatology. DOI 10.1175/JAMC-D-24-0077.1. We see moderately good overall agreement in the resulting EOF patterns for Tmax, with close agreement on the importance of each pattern, and the main differences happening in the urban-related EOF. There is relatively poor correspondence between the patterns in time, however, with low correlations for EOF-2 and EOF-3, and some important differences in the range of values of the PCs, especially for the urban-related EOFs. For the WRF analysis, significant urban effects are apparent in EOF-1 and EOF-2. For the OBS dataset, EOF-2 and EOF-3 show essentially all the UHI effects on Tmax.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) PRECIPITATION</head><p>Figure <ref type="figure">8</ref> shows the results from the EOF analysis applied to P data for the OBS dataset and the WRF simulations, during the simulated summer months. EOF-1 is a north-south dipole for both datasets. The shapes look similar, and the fractions explained by each are close: 24 and 27% for OBS and WRF, respectively. The area to the north have higher elevations (about 100 m difference, Fig. <ref type="figure">2b</ref>), but precipitation enhancement does not seem to be the effect captured by this pattern. The boxplots of the PCs look comparable in magnitude and extent but are not consistent in sign, indicating that north and south anomaly values are typically opposite, but neither has a predominant influence on P. The correlation between them is low, only 0.09 and 0.14 for daily and monthly data, respectively. EOF-2 for both cases show positive values in a horizontal band in the middle of the domain. The fraction explained is similar again, 16 and 12%. The PCs have similar average values but low correlations of -0.04 and 0.14 for daily and monthly data, respectively. EOF-3 is a northeast-southwest pattern and shows more differences in spatial shapes between both datasets. The PCs again offer no important characteristics, and the correlations are small and negative, -0.06 and -0.10 for daily and monthly values, respectively.</p><p>Overall, we see good agreement in the resulting pattern shapes and on pattern importance.</p><p>The range of values for the PCs is also similar for the two datasets. The correlations for the associated patterns, however, are the lowest of the three analyzed variables, maybe in part due to misalignment of storm location in WRF simulations. None of the comparisons returned statistically significant correlations, at daily or monthly scale, although we should note that precipitation data has a lower sample size due to the subtraction of days without Accepted for publication in Journal of Applied Meteorology and Climatology. DOI 10.1175/JAMC-D-24-0077.1.</p><p>File generated with AMS Word template 2.0 precipitation, making their t-score values lower. There were no patterns with predominant PC signs that would help identify systematic effects of the urban environment on summer storms.</p><p>As mentioned before, this does not mean that the city is not influencing precipitation, it means only that this influence is not consistent enough to show up as clear patterns in our current analysis due to precipitation direction, sparse small scale storms, or conflicting impacts of the urban area. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>We developed a tool based on EOF analysis to evaluate and compare underlying climate patterns from meteorological fields in the Chicago Metro Area (CMA), with the primary File generated with AMS Word template 2.0 objective of aiding in model evaluation efforts, particularly in the correct representation of urban areas. The methodology includes the correct selection of domain for analysis, steps to prepare the data and apply the EOF technique, post-processing of the results to get meaningful values, and a frame to make these pattern comparisons and obtain useful conclusions. We used the methodology on two datasets, one based on interpolation of observed station data (OBS) and the other from WRF simulations at 4 km resolution. We analyzed daily precipitation (P) and minimum and maximum temperatures (Tmin and Tmax, respectively) for the summer months (May-August) of 1994 to 2000.</p><p>Results showed that the OBS dataset had good agreement when evaluated against GHCNdaily station data in a point-to-point manner, as was expected. Same analysis for the WRF simulations showed poor correlation results for P (r ~ 0), even on a monthly scale, and a high positive bias in some stations over the city (%bias &gt; 20%). WRF temperatures exhibited higher daily correlation values and lower biases in most cases, except for a high positive bias inside the urban area and close to the lake for Tmin (over 30%).</p><p>A sensitivity analysis was performed with the OBS dataset to evaluate the variability of the EOF patterns under different conditions. The first analysis considered different periods of time, finding low variability in spatial patterns when using different periods of identical length, and when considering different amounts of years. Variations in domain characteristics gave similar results, with the patterns remaining mostly constant, but with sensitivity to the center location of the domain resulting in more pronounced differences for certain Tmin cases.</p><p>We conclude that the resulting patterns from the analysis are consistent through space and time and are likely related to physical structures and not statistical artifacts.</p><p>An application on RCM evaluation was carried out by performing EOF analysis on WRF simulations and the OBS dataset, and comparing the resulting patterns. None of the found EOFs reached a PC correlation of more than 0.10 for daily data, but some were close to 0.6 for monthly values. Tmin is the field that shows the best pattern results for the model, with the first three EOFs looking close in shape to the ones from the OBS dataset and showing positive correlations in all cases, with values that are significantly different from zero for the first two EOFs. The daily correlations are small (0.09 for EOF-1, 0.10 for EOF-2, and 0.06 for EOF-3), but the monthly correlations are considerably higher (0.49 for EOF-1, 0.57 for EOF-2, and 0.22 for EOF-3). The importance given to the urban-EOF is much higher in the File generated with AMS Word template 2.0 WRF simulations than in the OBS dataset, 65% against 17%. This shows that the UHI effect on Tmin is dramatically overstated in the WRF simulations.</p><p>For Tmax, EOF-1 and EOF-3 have similar patterns to the observations (north-south and east-west dipoles), and the correlation for monthly values are 0.55 and 0.09 respectively. EOF-2 for both datasets is the one identified with urban influence, although the OBS dataset shape differs noticeably from the city boundaries. The correlation for the EOF is negative and large, -0.54 for monthly values. The reason for this discrepancy between model and observations is not clear, but points to an issue that was not evident in the point-to-point evaluation and should be further investigated. For Tmax, only the monthly correlations for the first two EOFs were statistically significant. For P, the spatial patterns for the two datasets were similar, but there was almost no correlation between them at daily or monthly steps (none of them significantly different from zero), and no urban influence could be found with clarity.</p><p>The EOF analysis helped identify areas of improvement in the model that were not evident from the preliminary point-to-point analysis (against GHCND stations). We knew that the model had a high bias on Tmin inside the urban area, but the EOF analysis indicates that the issue is mostly due to magnitude, while the temporal and spatial behavior of the effect of the city on Tmin agrees with observations. The analysis also showed a lack of temperature spread from the city on the UHI signal, which should be investigated in the model. Tmax did not show mayor issues on the point-to-point comparison, and the EOF analysis did not find UHI magnitude issues either, but important differences arose on the spatial and temporal behavior of the effect of the city on UHI when compared to the observations, highlighting a need to improve the urban mechanisms represented in the model and their effect on Tmax, which otherwise might not be trustworthy. The results of EOF analysis on P were not conclusive enough to result in model improvement suggestions.</p><p>Regarding the data used in the project, the OBS and WRF datasets used are convenient because they cover our entire domain in a complete set of gridded values, with an extensive period of data. The OBS dataset, based on interpolation of stations, showed some limitations in pattern generation due to a lack of information between stations, with the resulting EOF patterns being heavily influenced by the locations of stations and their particular behavior over time. Complementing with satellite or radar data (and derived products), which typically has uncertain values but more detailed spatial information than interpolated station data, is File generated with AMS Word template 2.0 expected to give better results in pattern identification. The RCM version used was in development and was known to have some limitations (it was selected as a case study to demonstrate the diagnostic ability of the developed tool). The WRF simulations were not focused on Chicago, and the model did not include advanced urban schemes. Studies have shown that using more realistic parametrizations of the urban area could improve the transition between the city and the surrounding areas, and might also reduce the high bias for Tmin inside the city encountered in the WRF simulations <ref type="bibr">(Chakraborty et al. 2022;</ref><ref type="bibr">J. Wang et al. 2023)</ref>.</p><p>With the post-processing technique developed here, the value of the PCs is equal to the average of the positive values minus the average of the negative values of the pattern. For the EOF mode associated with the urban effect on temperature, this corresponds to the intensity of the UHI when only the effect of the city is considered. The Supplemental Material Section 7 includes a comparison between this definition of UHII and two other more traditional methods, using raw temperature data and two alternative boundaries to delimitate urban and rural areas: the city geographic boundary or the shapes resulting from the EOF analysis (intending to use a more realistic area of influence of the city on the temperature field).</p><p>Results from Fig. <ref type="figure">S15</ref> show that UHII directly from EOF is usually higher than the other methods, and UHII directly from the raw data and using the city boundary is usually the lower one. We hypothesize that this happens because both the lake and the latitude effect on temperature decrease the value of UHII in summer, resulting in a smaller value than considering only the effect of the city. Most notably, for Tmax when looking at the OBS dataset, UHII from the EOF analysis has a positive mean value of 0.7 &#176;C, and UHII from the raw data has a negative mean value of -0.3 &#176;C. All three methods have advantages and disadvantages depending on the information required and could potentially be used in a complementary way.</p><p>The direct implementation of EOF analysis on P data showed no clear evidence of urban effects on summer P. This is partly because P events in summer are mostly small-scale convective storms (thunderstorms), which are less prone to be captured as spatially consistent patterns, and because the effect on P (e.g., enhanced areas) often occurs downwind of the city, which depends on the direction of the event. In future work, it would be helpful to consider the direction of the storm as part of the analysis, although such analysis is limited by sample size of available datasets. The methodology in general is highly flexible and allows File generated with AMS Word template 2.0 for many improvements that could be included in the future, such as the incorporation of wind analysis, better pre-processing of P, and deeper analysis of the PCs to add more confidence to the physical meaning of the patterns (e.g., correlation between the PCs and known UHI variables such as wind speed). This work is intended to contribute to the toolset of evaluation approaches for climate models, but is not intended to replace any traditional or alternative method. It should ideally be used to provide additional and complementary diagnostic information to help evaluate the performance of models in representing climate patterns. Specifically, the approach provides more detail on important spatial and temporal patterns in the observations and simulations compared to point evaluation using station data or other gridded sources. It is important to note that EOF analyses are interpretations based on statistical decompositions that maximize covariance explained and should not be taken as exact representations of the physics underlying the processes, being also influenced by the choice of temporal and spatial resolution of data, domain size, and other factors. Care must be taken when using these methods to corroborate key findings with a physically based understanding of various phenomena and, where possible, include other sources of data in the analysis.</p><p>Further analysis of model configurations and analysis in the city of Chicago would allow for a comparison against our results (using them as benchmarks), where specific applications aiming at improved urban schemes (e.g., using different urban canopy models) are expected to produce a closer resemblance to observation based urban EOF patterns. It would be interesting as well to apply this methodology to studies in other climates, topographic settings, or seasons. Chicago has a big influence from the lake over temperatures and wind patterns, and our tests suggest that our method could also be meaningful to other cities near water bodies, or close to mountainous areas. Our preliminary winter analysis showed urban signals for precipitation, which we did not see for summer, and is worth exploring in more detail. Another interesting research direction would be to extend our EOF analysis to evaluate extreme events such as heatwaves and heavy precipitation, which are of great interest to future climate scenarios.</p><p>File generated with AMS Word template 2.0</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Post-Scaling Using Average Of Positive And Negative Areas</head><p>To help with interpretation, we introduce a post-processed scaling factor. We present the labels "pos" and "neg" to indicate the subset of cells where the eigenvector's pattern is positive or negative respectively, and an overbar indicates the spatial mean taken over all s: &#119901;&#119900;&#119904;(&#119864; &#119896; ) = &#119898;&#119890;&#119886;&#119899;{&#119864; &#119896; &#119908;&#8462;&#119890;&#119903;&#119890; &#119864; &#119896; &gt; 0}, &#119899;&#119890;&#119892;(&#119864; &#119896; ) = &#119898;&#119890;&#119886;&#119899;{&#119864; &#119896; &#119908;&#8462;&#119890;&#119903;&#119890; &#119864; &#119896; &lt; 0}.</p><p>We can similarly define the following expressions for the averages of the &#119883; &#119896; values. </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Brought to you by University of Illinois Urbana-Champaign Library | Unauthenticated | Downloaded 02/12/26 04:43 PM UTC</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>Accepted for publication in Journal of Applied Meteorology and Climatology. DOI 10.1175/JAMC-D-24-0077.1. Brought to you by University of Illinois Urbana-Champaign Library | Unauthenticated | Downloaded 02/12/26 04:43 PM UTC</p></note>
		</body>
		</text>
</TEI>
