<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>An Evaluation of Surface Wind and Gust Forecasts from the High-Resolution Rapid Refresh Model</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>06/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10389149</idno>
					<idno type="doi">10.1175/WAF-D-21-0176.1</idno>
					<title level='j'>Weather and Forecasting</title>
<idno>0882-8156</idno>
<biblScope unit="volume">37</biblScope>
<biblScope unit="issue">6</biblScope>					

					<author>Robert G. Fovell</author><author>Alex Gallagher</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Abstract            We utilized high temporal resolution, near-surface observations of sustained winds and gusts from two networks, the primarily airport-based Automated Surface Observing System (ASOS) and the New York State Mesonet (NYSM), to evaluate forecasts from the operational High-Resolution Rapid Refresh (HRRR) model, versions 3 and 4. Consistent with past studies, we showed the model has a high degree of skill in reproducing the diurnal variation of network-averaged wind speed of ASOS stations, but also revealed several areas where improvements could be made. Forecasts were found to be underdispersive, deficient in both temporal and spatial variability, with significant errors occurring during local nighttime hours in all regions and in forested environments for all hours of the day. This explained why the model overpredicted the network-averaged wind in the NYSM because much of that network’s stations are in forested areas. A simple gust parameterization was shown not only to have skill in predicting gusts in both networks but also to mitigate systemic biases found in the sustained wind forecasts.                          Significance Statement              Many users depend on forecasts from operational models and need to know their strengths, weaknesses, and limitations. We examined generally high-quality near-surface observations of sustained winds and gusts from the nationwide Automated Surface Observing System (ASOS) and the New York State Mesonet (NYSM) and used them to evaluate forecasts from the previous (version 3) and current (version 4) operational High-Resolution Rapid Refresh (HRRR) model for a selected month. Evidence indicated that the wind forecasts are excellent yet imperfect and areas for further improvement remain. In particular, we showed there is a high degree of skill in representing the diurnal variation of sustained wind at ASOS stations but insufficient spatial and temporal forecast variability and overprediction at night everywhere, in forested areas at all times of day, and at NYSM sites in particular, which are more likely to be sited in the forest. Gusts are subgrid even at the fine grid spacing of the HRRR (3 km) and thus must be parameterized. Our simple gust algorithm corrected for some of these systemic biases, resulting in very good predictions of the maximum hourly gust.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Accurate wind forecasts are important in a number of areas, including and not limited to wind energy <ref type="bibr">(Piccardo and Solari 1998;</ref><ref type="bibr">Petersen et al. 1998)</ref>, pollution transport <ref type="bibr">(Arya 1999)</ref>, and anticipation and mitigation of damage resulting from strong winds <ref type="bibr">(Holmes et al. 2014</ref>). An example of the latter is the "Santa Ana" weather event (cf. <ref type="bibr">Rolinski et al. 2019)</ref>, a cool-season pattern of offshore flow in Southern California that is known to dramatically increase the risk of large wildfires <ref type="bibr">(Westerling et al. 2004;</ref><ref type="bibr">Rolinski et al. 2016)</ref>. Numerical modeling of Santa Ana events using the Weather Research and Forecasting (WRF) model's Advanced Research WRF (ARW) core <ref type="bibr">(Skamarock et al. 2019)</ref> for the purposes of model verification and wind reconstruction (e.g., <ref type="bibr">Cao and Fovell 2016;</ref><ref type="bibr">Fovell and Cao 2017;</ref><ref type="bibr">Cao and Fovell 2018;</ref><ref type="bibr">Fovell and Gallagher 2018)</ref> has revealed strengths and weaknesses of both the forecasts and the observations of the sustained wind, which in practice implies averaging over periods of time such as 2 or 10 min. At mesoscale grid spacings, short-period (e.g., 3-s) gusts are a subgrid-scale phenomenon, necessitating parameterization in all operational numerical weather prediction models at this writing. There have been many such parameterizations proposed (cf. Sheridan 2011), some being rather complex <ref type="bibr">(Panofsky et al. 1977;</ref><ref type="bibr">Nakamura et al. 1996;</ref><ref type="bibr">Brasseur 2001;</ref><ref type="bibr">Gray 2003;</ref><ref type="bibr">Stucki et al. 2016;</ref><ref type="bibr">Guti&#233;rrez and Fovell 2018;</ref><ref type="bibr">Benjamin et al. 2021</ref>, to name a few).</p><p>Many users rely on wind predictions from operational models such as NOAA's operational High-Resolution Rapid Refresh (HRRR) (cf. <ref type="bibr">Benjamin et al. 2016;</ref><ref type="bibr">Dowell and co authors 2022)</ref>.</p><p>HRRR is based on WRF-ARW and has 3 km horizontal grid spacing covering the conterminous United States (CONUS). A number of studies have focused on verification of HRRR forecast fields, including wind speed (cf. <ref type="bibr">Olson et al. 2019b;</ref><ref type="bibr">Pichugina et al. 2019;</ref><ref type="bibr">Wilczak et al. 2019)</ref>. In particular, <ref type="bibr">Fovell and Gallagher (2020)</ref>, hereafter FG20, presented a forecast verification of HRRR version 3's (HRRRV3 or V3) 00 and 12 UTC cycles, which were selected for their relatively long (36-h) forecast periods. (Although new HRRR cycles were launched hourly, only the 00 and 12 UTC model runs ran longer than 18 h in V3.) Also, while other select months were also examined, the primary focus was on April 2019 as a representative time period.</p><p>In addition to the boundary layer analysis that employed high-resolution radiosonde data, an evaluation of 2-m temperature and 10-m wind speed forecasts for &#8776; 800 Automated Surface Observing System (ASOS) sites was conducted. These installations are typically, but not always, found at airports. FG20 demonstrated that the HRRRV3 produced skillful forecasts when averaged over the ASOS network although temperature biases were robustly related to station elevation and wind biases were negatively correlated with observed speed. The latter means that "sites characterized by slower observed winds were systematically more likely to be overpredicted while windier sites were underestimated" (FG20), consistent with the results of prior studies focusing specifically on Santa Ana events (cf. <ref type="bibr">Cao and Fovell 2016;</ref><ref type="bibr">Fovell and Cao 2017;</ref><ref type="bibr">Cao and Fovell 2018;</ref><ref type="bibr">Fovell and Gallagher 2018)</ref>.</p><p>In this work, FG20's evaluation of forecasts for ASOS stations was reconsidered from scratch and considerably extended and improved. As in FG20, we started with April 2019, but the specific emphasis is on hourly mean winds and maximum gusts with the discussion confined to the 00 UTC cycle in order to streamline the presentation. In this effort, data from the New York State Mesonet (NYSM; <ref type="bibr">Brotzge et al. 2020)</ref> were also analyzed and gust forecasts made using a simple parameterization suggested by <ref type="bibr">Cao and Fovell (2018, hereafter CF18)</ref> were considered. As version 4 of the HRRR (HRRRV4 or V4) became operational in December 2020, an analysis of April 2021 is also provided to highlight improvements and identify remaining challenges. This work diagnoses systemic errors and weaknesses of a very skillful operational model for the purposes of highlighting areas for potential future improvements. Another goal was to identify and understand issues with available observational data. This paper is organized as follows. Section 2 describes the data and methods used in this study and Sections 3 and 4 present our analyses of April 2019 (HRRRV3) and April 2021 (HRRRV4), respectively, the latter emphasizing comparisons with the Section 3 findings. Finally, Section 5 presents some conclusions and recommendations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Data and methods</head><p>Anemometers of different types, including the sonic, cup and vane, and propeller varieties, are used to sample the wind at some period we will term the sampling interval. These samples are then averaged over a certain period, the averaging interval. The World Meteorological Organization (WMO) standard (WMO 2018) specifies averaging intervals of 3-s and 10-min for the gust and sustained (mean) wind, respectively. In a given report consisting of sustained wind (hereafter usually termed simply as "wind") and gust readings, the gust is conventionally the highest 3-s value within the averaging interval used for the wind . The standard also specifies an anemometer mounting height at 10 m above ground level (AGL) with adequate clearance around the instrument.</p><p>Ideally, the surrounding environment would consist of open flat terrain with obstacles no taller than 4 m and more than thirty times their height (2 &#8226; above the horizon) away from the anemometer (WMO wind class 1). Adherence to these guidelines, however, is not all that common in practice.</p><p>NOAA makes HRRR model outputs available hourly and on the hour, providing forecasts of 10-m AGL wind speed representing an instant of time . However, because the winds at any grid point only vary over time periods that are much longer than the model time step (20 s), these are interpreted as sustained winds. As in FG20, 1-min ASOS observations were obtained from the National Centers for Environmental Information (NCEI) archive, which are available for more than 850 sites in the CONUS. The 1-min observations provide measurements of sustained winds and gusts made from sonic anemometers nominally at 10 m AGL. Although the internal processing is complicated , the sustained wind readings we used effectively represent an average of samples</p><p>In the United States, a significant exception to this is the Remote Automated Weather Station (RAWS) network, for which hourly reports consist of the past hour's highest speed sample (peak wind) along with the mean wind of the last 10 min prior to the report (National Wildfire Coordinating Group 2019). Thus, there is no guarantee the peak came from the samples used to compute the sustained wind.</p><p>The lowest horizontal wind model level is close to 10 m AGL and the 10-m wind speed value is obtained via vertical interpolation. See <ref type="bibr">Benjamin et al. (2021)</ref>.</p><p>See documentation at <ref type="url">https://www.weather.gov/asos/</ref> taken over the 2-min period prior to the report, with the highest 3-s average during the 1-min interval provided as the gust. The consequences of the relatively coarse (1 kt or 0.5144 m s -1 ) precision of ASOS wind and gust reports will be noted in the analyses to come.</p><p>The FG20 analysis used top-of-the-hour ASOS reports and model fields were interpolated to station locations in the usual fashion. However, owing to the model's horizontal resolution, which does not resolve small turbulent eddies, there is very likely less temporal and spatial variability in the forecasts than in the observations. To assess whether this unduly influenced the results, we elected to pursue an alternative strategy in this new effort, using the observed hourly mean wind speed and hourly maximum gust. Sustained wind observations from each site were averaged through a 60-min window centered at the top of each hour and the largest gust report within that window was identified. For each station, only hours without missing or invalid data in a given hour were retained. Thus, we used hourly-averaged winds instead of 2-min averages in the sustained wind verifications. Owing to <ref type="bibr">Harper et al. (2010)</ref>, who argued that different averaging intervals represent "equivalent measures of the true mean wind but with differing variance", we expected that the results for the sustained wind would be nearly unchanged, and this proved to be true.</p><p>In contrast, the altered handling of the gusts did make a difference. In prior work using 1-min ASOS observations (including <ref type="bibr">Cao and</ref><ref type="bibr">Fovell 2016, 2018;</ref><ref type="bibr">Fovell and Gallagher 2018)</ref>, the gust in each station record represented the largest speed sample during the 1-min interval at the top of each hour. Because this covers only 1.7% of the hour, we believe the hourly maximum gust is a better measure of the wind threat. This caused a reasonable and anticipated change in the gust factor (GF), being the gust divided by the sustained wind. Averaged over the CONUS, the 1-min ASOS GF was about 1.29 and this increased to 1.86 with the new strategy. Further discussion may be found in the Appendix.</p><p>Although most ASOS stations are at airports there are some significant exceptions, such as the consistently windiest site (KDGP -Guadalupe Pass, TX), a non-airport installation sited near a steep cliff. There are some very low wind speed stations, including non-airport sites such as KMEH (Meacham, OR), KP69 (Lowell, ID), and KMHS (Mt. Shasta, CA), and small airports possessing significant along-runway obstructions, examples being KVPC (Carterville, GA) and <ref type="bibr">K1JO (Bonifay,</ref><ref type="bibr">FL)</ref>. A fraction of installations reportedly have anemometers mounted below 10 m AGL (e.g., KMTP -Montauk, NY). None of these problem stations were excluded from our analyses because they were not found to alter our results or conclusions.</p><p>The New York State Mesonet <ref type="bibr">(Brotzge et al. 2020)</ref>   <ref type="bibr">2000)</ref>. Quality controlled, three-second observations from both sensors were obtained directly from the Mesonet. This would seem to represent an opportunity to evaluate the influence of hardware on the wind measurements but there are some unfortunate complications. The NYSM propeller instrument provided a 3-s average wind every 3 s, consistent with the WMO gust standard and being the same gust averaging interval employed by the ASOS sonic anemometers. In contrast, the NYSM's sonic instrument sampled once per second but only every third reading was recorded, meaning its gusts are actually 1-s and not 3-s averages.</p><p>As with the ASOS data, we used the NYSM readings to construct hourly average winds and hourly maximum gusts centered on the hour for both instruments, but retained only hours with valid data from both instruments. Over April 2019 and 2021, mean propeller winds were about 0.25 m s -1 (10.7%) lower than for the sonic, and gusts were 0.6 m s -1 (12%) slower, these differences being large enough to be relevant to our analyses. The propeller anemometer reported relatively more readings close to calm. The network-averaged GFs for April 2019 were 2.21 and 2.24 from the propeller and sonic instruments, respectively. The shorter interval used with the sonic gust data could be expected to increase the GF slightly (cf. <ref type="bibr">Durst 1960)</ref>.</p><p>FG20 did not consider gust forecasts. Herein we verified forecasts made using the simple CF18 parameterization for 10-m gusts, which consisted of multiplying the (sustained) wind forecast by the network-averaged GF after correcting for the mean network-averaged bias. We note the HRRR model also provides "gust potential" forecasts created using boundary layer depths and winds <ref type="bibr">(Benjamin et al. 2021)</ref> The locations of 807 ASOS and 126 NYSM sites are also shown on Fig. <ref type="figure">1</ref>, with marker size reflecting mean wind speed for April 2019. Sites with fewer than 500 observations in the month were excluded from the analysis and are not shown. Owing to finite resolution, a few stations were misclassified as being over water (including having &#119911; 0 &lt; 0.01 m), and these were also removed.</p><p>WRF-ARW and the HRRR's Rapid Update Cycle (RUC) land surface model utilize fractional landuse assignments, and more than half (53%) of the ASOS stations were associated with more than one class (Fig. <ref type="figure">2b</ref>). This can and does influence surface characteristics (including roughness) used in a given grid cell. That being said, the class representing the primary assignment had an average landuse fraction of 0.84 over the 807 ASOS sites, this ranging from 0.74 among the forested lands to 0.88 for the cropland and urban classes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>a. Analysis by forecast hour and local time</head><p>As in FG20, we first considered ASOS network-averaged winds expressed in terms of forecast hour, which extended out to 36 h for the 00 UTC cycle. The present result (Fig. <ref type="figure">3a</ref>) is nearly identical to that shown in FG20 (their Fig. <ref type="figure">7a</ref>), illustrating that the adoption of hourly mean observations made essentially no difference. Again, the model started with a small negative bias (defined as forecast minus observation) of about -0.5 m s -1 that became smaller in magnitude with time over the first 24 forecast hours. This bias is small compared to the spatial variation of the</p><p>In WRF-ARW, roughness lengths reported in the 0 h model output has not yet been updated, and thus may not be correct. New to this evaluation are examinations of forecast and observation spatial and temporal variability and an analysis by local time (LT). Figure <ref type="figure">4a</ref> reveals that the spatial variation of the forecasts valid at ASOS sites (henceforth, "ASOS forecasts"), expressed as the standard deviation, was smaller than that of the observations at all forecast hours. There is a diurnal cycle in both, again smeared by averaging across time zones. This may be in part a consequence of local landscape features (valleys, hills, obstacles and/or land surface variations) that cannot be resolved in the model. Since the mean forecast and observed winds were quite similar, it can be anticipated that the model would fail to represent the frequency of both lower and higher wind speeds. This will be examined presently. Additionally, Fig. <ref type="figure">4b</ref> presents time series of the difference between forecast and observation spatial standard deviation and the forecast wind bias. They are similar in that they both were negative but became less so with time.</p><p>Expressed in terms of LT, the network-averaged forecasts retained a negative bias through the day (Fig. <ref type="figure">3b</ref>), with the model apparently ramping up the late morning winds too slowly and diminishing them too quickly into the evening . The HRRR model employs the Mellor-Yamada-Nakanishi-Niino Level 2.5 (MYNN2) planetary boundary and surface layer parameterizations (Nakanishi and Niino 2004) which have been refined in recent years (cf. <ref type="bibr">Olson et al. 2019a</ref>). This finding may hold clues for further parameterization improvements. There was a</p><p>The analysis time, forecast hour 0, was removed from this analysis owing to the shift in bias behavior seen between the analysis and forecast hour 1 in Figs. <ref type="figure">4a</ref> and<ref type="figure">b</ref>. diurnal cycle in both forecast and observation spatial variation (Fig. <ref type="figure">5a</ref>) but again the forecast variability was slightly smaller and the diurnal variation in spatial standard deviation difference and forecast bias was very small (Fig. <ref type="figure">5b</ref>). It is emphasized that this is an excellent, if not completely perfect, forecast, at least with respect to the network average.</p><p>In pointed contrast, the HRRRV3 overpredicted wind speeds averaged over the 126 NYSM sites by more than 1 m s -1 (Fig. <ref type="figure">6a</ref>). Part of this gap is due to the propeller instrument that, as noted above, reports lower sustained wind speeds than its sonic counterpart. However, the forecast bias with respect to the sonic observations was 0.77 m s -1 , which is still sizable. Another difference is that the spatial variability of the forecasts (Fig. <ref type="figure">6b</ref>) was larger than the observations at every forecast hour with the biases and spatial standard deviation differences being relatively constant with forecast hour (Fig. <ref type="figure">6c</ref>). We need to emphasize at this point that the ASOS and NYSM networks   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>b. Analysis by station</head><p>The present study also enhanced the station-based analysis of FG20 and the previously cited work on Santa Ana winds. We started by comparing forecast and observed sustained winds averaged over all available pairs for each station (Fig. <ref type="figure">7a</ref>). Each dot is an ASOS (black) or NYSM (orange) station. Regarding the ASOS sites, while there are a few, non-impactful outliers, the squared linear correlation coefficient between the series is moderately high (&#119877; 2 = 0.56) and largely arrayed along The relationship between forecast wind bias and various variables is examined in Fig. <ref type="figure">8</ref>. Similar to previous studies already cited, the forecasts were not correlated with the bias (Fig. <ref type="figure">8a</ref>), even for NYSM stations (orange circles). However, the observations were significantly and negatively correlated with bias (Fig. <ref type="figure">8b</ref>), indicating overprediction of calmer sites and underprediction at windier locations. The NYSM stations do not appear to be exceptional, apart from the fact that as a relatively low wind speed network their sites are more likely to be associated with positive biases. A comparable analysis using the NYSM's sonic observations was only subtly different (not shown).</p><p>CF18 demonstrated (their Fig. <ref type="figure">11d</ref>) that the forecast wind bias was also positively correlated with the station gust factor, which could be expected because GF incorporates the observed wind. site exposure. Locations with significant obstructions would be expected to have relatively lower wind speeds than similar although unobstructed sites, but short-period gusts might be anticipated to be less impacted, leading to higher GF values. Wind speeds at these stations would be expected to be overforecast because the model cannot "see" and account for these obstructions. In contrast, sites with lower GFs might have local features, such as hills, that might help speed up the wind relative to a more average setting. These stations would likely be underpredicted.</p><p>In Fig. <ref type="figure">8c</ref>, we see a sizable negative correlation between bias and GF, although here we have instead elected to employ its reciprocal, the inverse gust factor (iGF), because it improves the linear relationship with bias and is bounded between 0 and 1. GF and iGF are functions of the observational data only and we see the model tended to overpredict when the sustained wind speeds were particularly small relative to the gust and underpredict when they were more comparable.</p><p>CF18 also considered a simple gust parameterization that was inspired by the association between bias and GF (and thus iGF). That strategy partially compensated for the biases in the sustained wind forecasts by applying the network-average gust factor to all wind forecasts, yielding less biased gust predictions. Underpredicted stations also tended to have smaller GF (larger iGF) values than average, so multiplying the too-low speed forecasts by the network average at least partially mitigated the model's negative sustained wind bias. Similarly, overpredicted sites often had larger than average GFs (smaller iGFs) so multiplying the positively biased forecasts by the smaller network-average GF compensated for some of the overprediction.</p><p>This idea was applied to the April 2019 HRRR forecasts and is shown in Fig. <ref type="figure">7b</ref>. In this case, ASOS wind forecasts were multiplied by 1.86, being roughly the network's average GF for the hourly maximum gust. This GF was applied to forecasts made for the top of the hour because we have insufficient information to determine the hourly mean forecast wind speed. With that caveat, we note this very simple gust parameterization performed quite well, with an even higher &#119877; 2 (0.62) than the forecast/observed wind relationship. Again there is a tendency for forecast/observation pairs to spread along the 1:1 line.</p><p>The CF18 parameterization implicitly presumed the network-averaged forecast wind bias was negligible so application of a single GF value could mitigate errors relative to the average. That is not the case for the NYSM. Finally, Fig. <ref type="figure">8d</ref> demonstrates that the difference between forecast and observation temporal standard deviation was also well-correlated with forecast bias. Note now the standard deviations represent the temporal variability of the forecasts and observations at each station. Stations at which the forecasts have more variability than the observations tended to be overpredicted with respect to wind speed and underprediction often resulted at stations where the observations had more variation. However, as with GF and iGF, this variable is not independent of the observed wind. The standard deviation of a variable like wind speed, which has the hard constraint of being non-negative, can (and, although not shown, generally does) increase with the variable magnitude.</p><p>Spatial plots (Fig. <ref type="figure">9</ref>) were examined to look for patterns. While the average forecast wind bias, computed over all stations and forecast hours, was only -0.2 m s -1 (cf. Fig. <ref type="figure">3a</ref>), it remains that 507 of the 807 stations (63%) were underpredicted in the mean. Figure <ref type="figure">9a</ref> shows that the positively biased stations were concentrated in the Southeast, the Appalachians generally, and into the Northeast where forested land is more common (Fig. <ref type="figure">2a</ref>). In Fig. <ref type="figure">9b</ref>, marker size reflects the squared linear correlation between the forecast and observed winds, based on an average of 1000+ forecast/observation pairs from each site. &#119877; 2 values ranged between 0.03 (KP69) and 0.77 (KARR -Aurora, IL) with a mean of 0.57 and median of 0.59. Correlations were high throughout most of the country, even in the Southeast where mean winds were relatively light, and lowest in the mountainous West. Like the correlation coefficient, &#119877; 2 is not sensitive to means or mean differences between series and is most likely low where the predictions are somewhat out of phase with the measurements. The concentration of low correlations in the western CONUS may reflect the influence of local features on diurnal winds that the model fails to properly represent.</p><p>Figure <ref type="figure">9c</ref> reveals how the temporal standard deviation difference between the forecasts and observations varied spatially. Figure <ref type="figure">8d</ref> showed that the former tended to be the larger when observed wind speeds were low and forecasts were positively biased. The mean and median differences were -0.15 and -0.17 m s -1 , respectively, with 581 (72%) of the sites having less variability among the forecasts than the observations. Note that the large red dots (representing larger forecast than observation variability) are few in number and widely scattered. These are stations having significant local obstructions near the ASOS installations. For those sites, observation variability was likely suppressed by limited anemometer exposure. This measure could be used to identify problem sites for potential removal from analyses and data assimilations.</p><p>Taken together, this analysis suggests that the small negative forecast bias seen in the network averaged winds (Fig. <ref type="figure">3</ref>) is more significant than it might appear at first glance. The majority of locations have insufficient forecast variability that is strongly correlated with negative biases.</p><p>This suggests the model is not capturing something that is important to determining real winds measured in the field. However, this is partly compensated by the inclusion of stations that are not at airports and/or have obvious siting issues. Had those sites been removed from the analysis, the underprediction would have been more pronounced. The model is still very skillful but steps could be taken to address its tendency to understate the mean winds at better exposed locations. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>c. Analysis of forecast/observation pairs</head><p>In their analysis, FG20 examined scatterplots involving all individual ASOS forecast and observation pairs over a full month and this provided insight into the source of forecast biases. Here, we improve and extend that analysis, examining all 827,230 April 2019 pairs . This represents the concatenation of forecasts and observations from 807 ASOS stations and all forecast hours from the daily 36-h HRRRV3 00 UTC cycle forecasts. Note that many observations were paired with more than one forecast.</p><p>All ASOS forecast/observation pairs are presented as a heatmap, color coded by point density, in Fig. <ref type="figure">10a</ref>. Although there is scatter about the 1:1 correspondence line, there is a reasonably good relationship (&#119877; 2 = 0.56) between these variables, comparable to that seen in the station-averaged analysis (Fig. <ref type="figure">7a</ref>). The majority of observations and forecasts represented speeds less than 5 m s -1 , and this fact drives the relationship. For higher observed winds, however, the forecasts still largely spread along the 1:1 line, indicating some usable skill. Similarly, all forecast gusts -created via the constant GF of 1.86 -are plotted against observed gusts in Fig. <ref type="figure">10b</ref>. As was the case with the station-averaged analysis, the correlation is higher for the gust forecasts than their sustained wind counterparts.</p><p>However, these same data viewed as histograms (Fig. <ref type="figure">11</ref>) demonstrate that the forecast and observed wind and gust distributions had distinctly different shapes. The forecasts possessed a narrower peak such that the occurrence of both lower and higher observed winds was relatively more frequent. This result was suspected in the discussion of Fig. <ref type="figure">4a</ref> above. Motivated by Fig. <ref type="figure">8c</ref>, we also examined histograms of winds and gusts partitioned into lower and higher GF segments (Fig. <ref type="figure">12</ref>). Forecast and observation pairs were separated into two groups based on the GF associated with the observation relative to the median value (about 1.81). With respect to winds (panels a, b)</p><p>there is a much larger shift between the segments in the shapes of the observed wind distributions than for the forecasts. When the GF is lower, the observed distribution is shifted rightward, resulting in more observations than forecasts of values exceeding 3.5 m s -1 . In contrast, observations in the high GF half are skewed towards lower speeds, resulting in a mean positive bias.</p><p>To reiterate, the network mean bias of ASOS forecasts was nearly zero (Fig. <ref type="figure">3a,</ref><ref type="figure">b</ref>), but the bias was biased such that stations having lower average wind speeds were overpredicted while windier ones were underforecast <ref type="bibr">(Figs. 8c,</ref><ref type="bibr">11a)</ref>. The constant GF algorithm exploits this systemic tendency to underpredict at sites where GFs lower than the network average and overpredict at the others by multiplying these biased wind forecasts by a single number (the network average GF), the result being less biased gust forecasts (Fig. <ref type="figure">12c,</ref><ref type="figure">d</ref>). For locations in space and/or instances in time where the observed GF was lower than the network average, multiplying by the larger average value helped shift the forecast gusts more into alignment with the observations (Fig. <ref type="figure">12c</ref>). Similarly, multiplying forecasts of high GF instances or locations by the smaller network average helped correct for the deficiencies seen among the sustained winds. The result is not perfect and we have already seen that when the two segments are recombined (i.e., Fig. <ref type="figure">11a</ref>), the forecast range is too narrow relative to the observations. In the next section, we will discover reasons for the excessive sharpness in the forecast distributions. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>d. The roles of landuse and local time</head><p>The potential roles of landuse type and local time were investigated to understand the differences between the observations and forecasts, especially with respect to their distributional shapes as seen in Fig. <ref type="figure">11</ref>. As noted earlier, WRF-ARW uses fractional landuse allocations (cf. Fig. <ref type="figure">2b</ref>) and the focus here is on the largest, or primary, assignment. For HRRRV3 and April 2019, 41% of the ASOS stations had a primary classification of cropland, 24% were urban, 14% had grassland, and 6% were given open shrubland assignments. The various forested land classes, including deciduous, evergreen, and mixed forests, accounted for about 11% of the ASOS sites. While unsurprising, it is clear that the urban landuse type is substantially overrepresented in the ASOS network relative to the CONUS landscape (see, for example, the bright red areas in Figs. <ref type="figure">2a,</ref><ref type="figure">c</ref>). Figure <ref type="figure">13</ref> reveals the existence of a robust association between primary assignment and forecast wind bias. Each class possesses two horizontal bars, representing the average bias (blue, units m s -1 ) among stations with that classification and their weighted contribution (red, units dm s -1 for convenience) reflecting station count towards the network-average bias of -0.2 m s -1 . The most negative bias (-0.6 m s -1 ) was associated with the open shrublands stations but the urban and grassland sites had larger weighted shares owing to their larger station counts. Similarly, although cropland stations had a small class-average bias (-0.08 m s -1 ), their aggregate effect was not minor owing to their ubiquity (41% of stations). In contrast, the roughly 11% of installations residing in forested grid cells were positively biased, by as much as +0.52 m s -1 in the evergreen needleleaf cells . If these overpredictions were resolved in isolation, the network-averaged skill would actually decrease. Importantly, the model has obviously failed to properly represent the general slowness of the winds in the forested areas (Fig. <ref type="figure">14d</ref>). This elucidates why the network-averaged sustained winds from the NYSM were so overpredicted. Note that the Mesonet's sustained wind histograms (Fig. <ref type="figure">15</ref>) bear a strong resemblance to that of the ASOS forested class, independent of anenometer type. While only 11% of the ASOS sites were classified as forested in the HRRRV3, that category represented 43% of the Mesonet stations, and thus it exerts a powerful influence on this network's average. Landuse type can affect wind forecasts through the roughness length, &#119911; 0 . Although this would require testing, it is not clear that simply raising &#119911; 0 would improve these predictions because the more serious issue is site exposure.</p><p>When the day is subdivided into four 6-hour segments as in Fig. <ref type="figure">16</ref>, we clearly see the underprediction of observed ASOS winds exceeding 4 m s -1 seen in Fig. <ref type="figure">11</ref> is largely confined to the nocturnal period between 6 PM and 6 AM local time (LT), when the boundary layer is likely to be stable . This period is also largely responsible for the distributional differences between the forecasts and observations noted above. The frequency of relatively larger observed wind speeds at night was sufficient to make the mean bias of forecast/observation pairs to be negative, even though the model generated too few low speed predictions. This may represent a problem with how the model handles the stable boundary layer and its intermittent, localized turbulence (cf. <ref type="bibr">Medeiros and</ref><ref type="bibr">Fitzjarrald 2014, 2015)</ref>. In contrast, the daytime period of 6 AM to 6 PM LT (panels b and c) seems to be rather well represented in the HRRRV3 forecasts, albeit with a small underrepresentation at higher wind speeds (&#8805; 8 m s -1 ) that also led to small negative net biases.</p><p>Precise percentages vary slightly between the station and forecast/observation pair analyses owing to minor data dropouts.</p><p>The number of forecast/observation pairs varies among the segments because we are only using the 00 UTC cycle and its 36 h simulations, which means some times have more forecasts than others. Those histograms aggregated all landuse classes. Figure <ref type="figure">17</ref> focuses on the 6 PM to midnight LT period differentiated by the landuse groupings examined in Fig. <ref type="figure">14</ref>. Only the forested lands (panel d) did not have characteristic underprediction of relatively faster winds, again reflecting the less than optimal handling of those areas in the model. For the afternoon (noon to 6 PM LT) period (Fig. <ref type="figure">18</ref>), however, only the urban classification (panel a) failed to capture the frequency of stronger winds. Thus, except in the vicinity of cities, the model's inability to capture the frequency of stronger winds appears to be a nocturnal issue and one that might be addressed by reconsidering assumptions employed in the stable boundary layer regime. It is surmised that the urban issue may also stem from overly high specifications of surface roughness in those areas. While many airports are located in grids designated as urban, that does not mean that the local environment of the airport is truly city-like. Finally, we reiterate that resolving the issue with forested land or removing those stations from the analysis would tend to make the nocturnal underprediction issue appear worse. sites remained after removal of misclassified stations and those with 500 or fewer observations. In our judgment, this does not negatively affect the evaluation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">HRRRV4 wind and gust evaluation for</head><p>There are more differences between these two MODIS-derived databases than just the resolution enhancement. In HRRRV4 (Fig. <ref type="figure">2c</ref>), a large fraction of the original croplands class (#12, gold), especially in the eastern CONUS, has been transferred into the previously existing but unused "cropland/vegetation mosaic" group (#14, cyan). The croplands category presently accounts for only 18.3% of ASOS station primary assignments while the mosaic claims 14.9%. In the west, a portion of the open shrublands (#7, maroon) primary assignments have been reassigned as grasslands (#10, light green), constituting 4.3% and 21.0% of ASOS sites in the newer MODIS database, respectively. We have continued combining those landuse types owing to their similarity with respect to model performance. The HRRRV4 grassland area has also spread eastward into the former croplands, so the grassland and open shrubland combination now represented 25% of the April 2021 ASOS primary assignments, an increase of 5 percentage points. Some areas that had been assigned to one of the forest classes (categories 1-5) have been reclassified as woody savannas (#8), increasing its share of the network from 2.6% to 7.2%. Owing to their similarity,  As in Section 3, above, these are primary landuse assignments. The fractional landuse apportionments represent another difference with HRRRV3. In HRRRV4, 87% of ASOS stations reside in grid cells assigned more than one landuse class, up from 53% in V3 (compare Figs. 2 panels d and b), a consequence of V4's higher landuse resolution. The average fraction claimed by the primary class was 0.7, a decrease from 0.83 for V3. Again, this was relatively smaller for the forested group and also the new cropland/vegetation mosaic classes (both about 0.6) than for the urban and croplands (both &#8776; 0.7) and grasslands (0.8). The HRRRV4 landscape is more finely divided and this makes analyzing by primary landuse assignment less precise, but again we find some value in this effort.</p><p>Figures <ref type="figure">3c</ref> and<ref type="figure">4c</ref>,d present the April 2021 forecast hour analysis. The small negative forecast bias that was previously seen in V3 has vanished (indeed, the mean bias is now essentially zero)</p><p>although the spatial standard deviation of the forecasts was still smaller than that of the observations at all forecast hours. The local time versions of these figures also revealed some improvements <ref type="bibr">(Figs. 3d and 5c,</ref><ref type="bibr">d)</ref>. Despite involving fewer sites, the station analysis results and conclusions were little changed. &#119877; 2 values for the sustained wind and gust fits were higher for both station-average (Figs. <ref type="figure">7c,</ref><ref type="figure">d</ref>) and pairwise (Fig. <ref type="figure">10c,</ref><ref type="figure">d</ref>) comparisons and (although not shown, see <ref type="bibr">Gallagher 2021)</ref> the average forecast wind was again uncorrelated with bias but the higher wind stations were still underpredicted and lower sites overforecast in a manner that is predictable from iGF or GF . In addition, the association between bias and the difference between forecast and observed temporal standard deviation remained (also not shown, cf. Gallagher 2021). Viewed spatially (Fig. <ref type="figure">9d</ref>), forecast bias was still concentrated in the east CONUS in general and southeast in particular, although errors were somewhat smaller in magnitude.</p><p>The wind and gust histograms (Fig. <ref type="figure">11c,</ref><ref type="figure">d</ref>) also suggest improvements relative to April 2019.</p><p>However, the compensating errors between more densely treed areas (the forest and woody savannas categories) and the urban and grassland areas persisted (Fig. <ref type="figure">13b</ref>). The now more spatially confined croplands class was still the best modeled and the newly separate mosaic group had a positive bias, which is unsurprising because much of the this group's stations are in the southeast, the site of lower wind observations (not shown, but similar to Fig. <ref type="figure">1</ref>) and positive biases (Fig. <ref type="figure">9d</ref>). Still, the histograms representing the urban and combined grassland and open shrubland categories (Fig. <ref type="figure">19</ref>, top row) also reveal better model behavior at relatively higher wind speeds compared to HRRRV3 (Fig. <ref type="figure">14</ref>). For convenience, we have combined the cropland and mosaic classes in Fig. <ref type="figure">19c</ref>, despite their differences, and note that the forested and woody savanna grouping remained the most poorly handled (Fig. <ref type="figure">19d</ref>).</p><p>In the end, and despite the improvements in model performance, we see that the glaringly different distributional shapes noted previously are still present and that this is still driven by the 6 PM to 6 AM period (Fig. <ref type="figure">20</ref>). Clearly, more work on the stable boundary layer remains to be done.</p><p>Although 10-m wind speeds during this period are typically not strong, sizable wind errors may have implications for boundary layer pollution transport, wind energy, etc..</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Summary and recommendations</head><p>Our previous study, <ref type="bibr">Fovell and Gallagher (2020, FG20)</ref>, presented a detailed verification of  was motivated by prior findings of systemic biases in forecast wind speeds at individual locations even when network-average bias was insignificant <ref type="bibr">(Cao and Fovell 2016;</ref><ref type="bibr">Fovell and Cao 2017;</ref><ref type="bibr">Cao and Fovell 2018;</ref><ref type="bibr">Fovell and Gallagher 2018)</ref>. temporal and spatial variability of forecast and observed winds and biases, and the incorporation of additional surface observations from the New York State Mesonet (NYSM). Additionally, hourly maximum gusts were assessed and verified, using the network-average gust factor (GF) approach as proposed in <ref type="bibr">Cao and Fovell (2018, CF18)</ref>. Since GF was also correlated with bias, with smaller and larger factors associated with under-and overprediction, respectively, multiplying the biased wind forecasts by a fixed value (the network average) was found to reduce the bias in the gust predictions compared to those of the sustained winds.</p><p>For two spring months in 2019 and 2021, we showed the network average sustained wind forecasts for ASOS stations were excellent in Version 3 and even better in the current configuration. That said, the negative correlation between bias and mean observed wind speed persisted in Version 4, and we also demonstrated that the forecast and wind distributions were distinctly different overall, with ASOS forecasts in both versions having less spread about their modal value of about 2.5 m s -1 than in reality. Furthermore, observations associated with below-median GFs skewed towards higher speeds and those with above-median values skewed sharply leftward, characteristics not captured in the forecasts. The inclusion of stations classified as forested land in the model actually worked to obscure the model's tendency to underpredict winds across the bulk of the ASOS network. A large fraction of the NYSM sites are situated in forested areas and that explained why the wind speeds at those stations were substantially overpredicted in the model.</p><p>Regarding local time, forecast wind distributions during the daytime looked quite good but less so at night, when the boundary and surface layer are usually stable. This demonstrates that further work needs to be done in the nocturnal regime. Even that systemic bias was landscape-dependent, however. Especially in urban and grassland areas, stronger winds at night were more common in the observations than in the model forecasts.</p><p>Taken together, we see evidence of further improvement in the HRRRV4 relative to its already skillful predecessor, at least in the spring month selected for close analysis. The gust parameterization inspired by <ref type="bibr">Cao and Fovell (2018)</ref> continued to work well, despite its simplicity. Because it helped mitigate systemic biases, the CF18 gust can supply a starting point for a more sophisticated approach that might also factor in boundary layer depth, winds, and stability for even better-verifying predictions, especially in particularly challenging or dangerous situations (e.g., downslope windstorms, tropical cyclones, convective storms, etc.). Challenges with respect to the stable boundary layer and the treatment of some landuse classes (especially forested areas) remain.</p><p>Other important variables, such as temperature, moisture, and the HRRR's own gust potential,</p><p>have not yet been assessed. These should be foci of future work. valid but typically results in higher gust factors because wind and gust distributions have long tails <ref type="bibr">(cf. Fig. 11 and Gallagher 2021)</ref>.</p><p>In previous work (e.g., Fovell and Gallagher 2018), we used ASOS reports from the 1-min database, each of which consisted of a 2-min running average wind (i.e., sustained wind) and the peak 3-sec average (gust) during that one minute interval. Over the ASOS network, the gust factor for the 1-min reports averaged to about 1.29. For this study, we adopted the hourly maximum gust as a better measure of the wind threat. This GF is an hour's fastest 3-s gust report divided by that hour's mean sustained wind, so both the numerator and denominator of the gust factor have been redefined. However, consistent with <ref type="bibr">Harper et al. (2010)</ref>, the mean wind is nearly the same when averaged over 2-and 60-min periods. Yet, the largest gust discoverable within a given interval logically increases with interval length.</p><p>Figure <ref type="figure">A2</ref> presents the ratio-of-means GFs obtained from about 840 ASOS sites vs. the time interval for which the maximum 3-s gust was identified. For each station, for each of four months considered, the station's entire record length &#119879; was subdivided into nonoverlapping segments of length &#120591; in minutes, where 1 &#8804; &#120591; &#8804; 60. Then, for each segment without missing data, the maximum gust report was identified and the mean sustained wind was computed. These were first averaged over all available segments of length &#120591; and then over all stations and the four months, yielding the ratio-of-means network-averaged GF representing time interval &#120591;. Because the average sustained wind for each interval represented the same information, only the numerator of the GF varied among the time intervals. Figure <ref type="figure">A2</ref> demonstrates that the 1-min GF is about 1.29 (red star) while the 60-min value is about 1.84 (green star), about 1.4 times larger. This curve varies somewhat among seasons and more prominently among networks owing to differences in mean wind speeds, mounting heights, anemometer hardware, characteristic exposures, and possibly other factors, but the shape of the curve is typically logarithmic in time. </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>There are fewer pairs in the present analysis than in FG20 (851,550) owing to the more stringent restrictions employed in the construction of hourly-averaged observations.</p></note>
		</body>
		</text>
</TEI>
