<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Incorporating physically-based water temperature predictions into the National water model framework</title></titleStmt>
			<publicationStmt>
				<publisher>Environmental Modelling &amp; Software</publisher>
				<date>01/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10554087</idno>
					<idno type="doi">10.1016/j.envsoft.2023.105866</idno>
					<title level='j'>Environmental Modelling &amp; Software</title>
<idno>1364-8152</idno>
<biblScope unit="volume">171</biblScope>
<biblScope unit="issue">C</biblScope>					

					<author>Jeffrey Wade</author><author>Christa Kelleher</author><author>Barret L Kurylyk</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[While river water temperatures are a strong control on instream processes and aquatic ecosystems, monitoring networks for river water temperatures are often sparse. Despite recent advancements in water temperature modeling strategies, current models struggle to provide real-time and reach-specific predictions across broad spatial domains. We developed a physically-based water temperature model coupled to the National Water Model (NWM) to assess the potential for water temperature prediction to be incorporated into the NWM at the continental scale. Using model forcings and outputs from the NWM v2.1 retrospective, we evaluated the ability of four model configurations of increasing complexity to simulate hourly water temperatures in the forested headwaters of H.J. Andrews Experimental Forest, Oregon, USA during a six-week summer period. Our modeling framework, representing a first effort at pairing water temperature simulation with predictions from the NWM, suggests that the NWM can be leveraged to give insight into other water quality variables.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>River water temperature is often referred to as a 'master' water quality variable, as a wide range of chemical and biological processes are closely linked to in-stream thermal regimes <ref type="bibr">(Caissie, 2006;</ref><ref type="bibr">Hannah and Garner, 2015;</ref><ref type="bibr">Ouellet et al., 2020)</ref>. The temperature of rivers controls algal and bacterial growth rates, dissolved oxygen content, solute processing, and the integrity of ecosystems <ref type="bibr">(Havens and Paerl, 2015;</ref><ref type="bibr">Isaak et al., 2012)</ref>. Water temperatures are of particular economic interest to management agencies, as the viability of salmonid fisheries and the efficiency of river-side power plants are both threatened by warmer rivers <ref type="bibr">(Ficke et al., 2007;</ref><ref type="bibr">F&#246;rster and Lilliestam, 2010)</ref>. With future climate change expected to give rise to heightened river water temperatures <ref type="bibr">(Caldwell et al., 2015;</ref><ref type="bibr">van Vliet et al., 2013;</ref><ref type="bibr">Wanders et al., 2019)</ref>, it is critical to better understand, observe, and predict reach-scale water temperate dynamics at continental to global scales.</p><p>In comparison to records of discharge, observations of river water temperature are sparse, particularly outside of the world's major river basins <ref type="bibr">(Wanders et al., 2019)</ref>. Without knowledge of river thermal regimes in unmonitored basins, it is exceedingly challenging to manage the threats warmer rivers pose to aquatic fauna and the productivity of fisheries <ref type="bibr">(Ficke et al., 2007)</ref>. Models of water temperature offer unique insight into the spatial and temporal dynamics of river thermal regimes, both within individual basins and at a global-scale, helping to bridge gaps between gages in a sparse temperature monitoring network. A wide range of modeling strategies are commonly applied to water temperature prediction and can generally be grouped into statistical and physically-based models <ref type="bibr">(Caissie, 2006;</ref><ref type="bibr">Dugdale et al., 2017)</ref>. The applications of statistical and physically-based water temperature models are extensively reviewed in <ref type="bibr">Benyahya et al. (2007)</ref> and <ref type="bibr">Dugdale et al. (2017)</ref>, respectively.</p><p>Physically-based (or 'mechanistic') models are particularly well suited to water temperature prediction <ref type="bibr">(Dugdale et al., 2017)</ref> and function by calculating energy fluxes at the air-water and water-streambed interfaces and transporting mass and stored thermal energy downstream <ref type="bibr">(Caissie, 2006)</ref>. Despite these advantages, physically-based models require site-specific data (e.g., discharge, groundwater inflow, radiation flux) in order to resolve land surface and hydrologic processes <ref type="bibr">(Dugdale et al., 2017)</ref>. The need for high-quality input data can make physically-based models difficult to apply to unmonitored catchments, where knowledge of hydrologic behavior is often uncertain.</p><p>Coupled temperature-hydrological models <ref type="bibr">(van Beek et al., 2012;</ref><ref type="bibr">van Vliet et al., 2012;</ref><ref type="bibr">Wanders et al., 2019)</ref> (hereby referred to as 'coupled models'), which concurrently simulate both hydrological and thermal river processes, are an effective tool for overcoming limitations related to a lack of observational data <ref type="bibr">(Dugdale et al., 2017)</ref>. By leveraging calibrated hydrologic predictions, coupled models can accurately simulate water temperatures in unmonitored basins at adaptable spatial and temporal resolutions <ref type="bibr">(Sun et al., 2015)</ref>. Coupled models are well-suited to simulating thermal dynamics in settings where advective heat fluxes are influential, such as headwater reaches. Along these reaches, the ability of coupled models to divide inflows into multiple source water components (e.g., surface runoff, groundwater inflow) is crucial for properly estimating hydrologic heat fluxes <ref type="bibr">(Dugdale et al., 2017)</ref>.</p><p>Recent advances in continental-scale hydrologic modeling have introduced newfound prediction capabilities at unprecedented spatial and temporal scales <ref type="bibr">(Lin et al., 2019;</ref><ref type="bibr">Salas et al., 2018)</ref>. This progress presents new opportunities for expanding the extent and accuracy of river temperature predictions through the development of coupled hydrologic-temperature models. One such broad-scale hydrologic model with potential for coupling to a water temperature model is the National Water Model (NWM). The NWM is a hydrologic model developed by the National Oceanic and Atmospheric Administration (NOAA), the National Weather Service (NWS), and the Office of Water Prediction (OWP) that forecasts hourly streamflow at 2.7 million river reaches in the conterminous US (CONUS) <ref type="bibr">(Lahmers et al., 2021;</ref><ref type="bibr">NOAA, 2016)</ref>. The NWM simulates components of the terrestrial water cycle, including land surface water and energy fluxes, soil moisture, subsurface flow, and channel routing, using a particular configuration of the NCAR-supported Weather Research and Forecasting Model hydrological modeling system (WRF-Hydro; <ref type="bibr">Gochis et al., 2021)</ref> and the Noah Multi-Parameterization land surface model (Noah-MP; <ref type="bibr">Niu et al., 2011)</ref>. As the model provides high resolution (1 km) predictions of land surface and hydrologic states over a range of forecast lead times <ref type="bibr">(NOAA, 2023)</ref>, the framework of the NWM is well suited to be coupled to a continental-scale water temperature model.</p><p>Despite the ability of the NWM to accurately represent hydrological processes in catchments across the US, application of the modeling framework to river water temperatures remains unexplored. A coupled NWM-water temperature model could resolve thermal dynamics at reach scales relevant to watershed management along all conterminous US catchments and allow for forward-looking temperature forecasts. Using data derived from the NWM and other publicly available sources, we sought to develop a proof-of-concept water temperature model in a single test basin over several weeks of summer baseflow conditions to determine if the NWM framework is suitable for temperature prediction. Though we only considered temperature modeling in a single basin, our intention is that the strategies we have developed for the modeling framework are transferable to broader spatial scales and potentially, with modifications, to other water quality variables. In this study, we aimed to (1) assess if forcings and outputs from the National Water Model can be leveraged to accurately simulate hourly river water temperatures in a forested headwater catchment during summer, when water temperatures are of greatest concern for aquatic species, and (2) evaluate how model configurations of increasing complexity represent thermal processes influencing water temperatures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Study site: H.J. Andrews Experimental Forest</head><p>We selected the H.J. Andrews Experimental Forest (H.J. Andrews), a 64 km 2 forested headwater catchment located in the western Cascade Mountains, Oregon, USA, to serve as a test basin for this study. H.J. Andrews has been subject to continuous and extensive hydrologic monitoring since 1948 <ref type="bibr">(Johnson et al., 2021)</ref>, providing insight into hyporheic exchange processes <ref type="bibr">(Becker et al., 2023;</ref><ref type="bibr">Ward et al., 2012</ref><ref type="bibr">, Ward et al., 2019a</ref><ref type="bibr">, 2019b;</ref><ref type="bibr">Wondzell et al., 2009)</ref>, river corridor connectivity <ref type="bibr">(McGuire and McDonnell, 2010;</ref><ref type="bibr">Ward et al., 2018a)</ref>, and water temperature dynamics <ref type="bibr">(Johnson, 2004)</ref>. The breadth of past hydrologic research at the site, coupled with the availability of historical observations, make H.J. Andrews an ideal catchment to explore the performance of a water temperature model.</p><p>The H.J. Andrews watershed is drained by several streams, including McRae Creek, Mack Creek, and Lookout Creek, the latter of which drains downstream to the Blue River Reservoir (Fig. <ref type="figure">1</ref>). While considered a fifth-order catchment by most field studies <ref type="bibr">(Ward et al., 2019a,b)</ref>, H.J. Andrews is represented as a third-order basin by the NWM (and NHD), which often does not resolve small headwater reaches. H.J. Andrews is characterized by high relief topography, with elevations ranging from 410 to 1630 m above sea level, and is primarily forested by Douglas fir trees <ref type="bibr">(Ward et al., 2019a,b)</ref>. Annual precipitation at this site is strongly seasonal and varies between 1900 and 2900 mm, with most falling in winter months between November and April <ref type="bibr">(Jennings and Jones, 2015)</ref>. Flows at the basin outlet (Lookout Creek) typically reach a maximum in December or January, and a minimum in September <ref type="bibr">(Jennings and Jones, 2015)</ref>. H.J. Andrews' streams are home to a diversity of aquatic species, including cutthroat trout and coastal giant salamanders <ref type="bibr">(Kaylor et al., 2019)</ref>. The watershed is generally unimpacted by anthropogenic disturbances, with the exception of experimental logging in select catchments.</p><p>While H.J. Andrews contains a number of water temperature gaging stations, only two gages coincided with reaches represented by the National Water Model. These gages, providing records of water temperature and discharge <ref type="bibr">(Gregory and Johnson, 2019)</ref>, are located on the upper reaches of Mack Creek (GSMACK; drainage area: 580 ha) and the lower reaches of Lookout Creek (GSLOOK; USGS 14161500; drainage area: 6242) near the basin outlet (Fig. <ref type="figure">1</ref>). We took GSMACK to represent headwater behavior (hereby referred to as 'headwater') and GSLOOK to represent higher order stream behavior (hereby referred to as 'outlet') in the basin.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Model data</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1.">National water model retrospective v2.1</head><p>The NWM retrospective is a backwards-looking long-duration model run forced with observational meteorological data. While the NWM retrospective analyses are typically used to evaluate model performance <ref type="bibr">(Dyer et al., 2022;</ref><ref type="bibr">Salas et al., 2018;</ref><ref type="bibr">Wan et al., 2022)</ref>, the historical continuity of their predictions makes them useful in the testing and development of models coupled to the NWM. In this study, we used data from the 42-yr NWM retrospective version 2.1 (v2.1) (February 1979 to December 2020), a run of NWM version 2.1 forced by near-surface meteorological conditions from the Analysis of Record for Calibration (AORC) dataset (NOAA, 2021a,b). The v2.1 retrospective configuration uses version 5.2.0 of WRF-Hydro (Gochis et al., 2021) and does not assimilate observed discharge data from stream gages.</p><p>The AORC supplies gridded atmospheric forcing data to WRF-Hydro. This forcing data includes hourly records of precipitation, air temperature, specific humidity, air pressure, downward shortwave radiation flux, downward longwave radiation flux, and u-and v-components of wind speed (NOAA, 2021a). These near surface conditions are used by the Noah-MP land surface model to simulate vertical energy and water fluxes at a 1-km spatial resolution <ref type="bibr">(Gochis et al., 2021)</ref>. Vertical moisture fluxes through the land surface are then passed to subsurface routing modules, which influence the lateral flow of water across the model's surface, soil, and saturated domains. Using a 250-m grid, WRF-Hydro routes subsurface flow through a 2-m thick soil column and an unconfined groundwater aquifer, approximating hydraulic gradients using a D8 steepest descent method <ref type="bibr">(Gochis et al., 2021;</ref><ref type="bibr">Lahmers et al., 2019)</ref>. When the subsurface storage of a grid cell is exceeded, excess water is routed to channels as overland surface runoff using a diffusive wave approach <ref type="bibr">(Julien et al., 1995;</ref><ref type="bibr">Lahmers et al., 2019;</ref><ref type="bibr">Rojas et al., 1997)</ref>. The location and extent of NWM channels are derived from National Hydrography Dataset (NHD) Plus Version 2 (NHDPlusV2) river reaches <ref type="bibr">(McKay et al., 2012;</ref><ref type="bibr">Salas et al., 2018)</ref>. These channels can receive inflows either from surface runoff or from groundwater recharge, represented by empirically-tuned discharge from a conceptual exponential groundwater bucket <ref type="bibr">(Gochis et al., 2021)</ref>. Downstream flow is transported through trapezoidal NWM channels using Muskingum-Cunge routing <ref type="bibr">(Gochis et al., 2021;</ref><ref type="bibr">Lahmers et al., 2019)</ref>. Parameters related to channel geometry are empirically derived using relationships to each reach segment's drainage area <ref type="bibr">(Gochis et al., 2021)</ref>. These computations are then integrated to deliver hourly values of streamflow, stream velocity, surface water runoff, and groundwater bucket inflow at each reach segment.</p><p>We retrieved NWM retrospective v2.1 forcing and output data from a publicly available AWS repository (accessible at: <ref type="url">https://registry.  opendata.aws/nwm-archive/</ref>) and extracted hourly values of relevant variables at stream segments within the H.J. Andrews study basin. The sources and respective applications of NWM data used in this study are presented in Fig. <ref type="figure">2</ref>. These inputs can be divided into three categories: meteorological forcing data, hydrological model outputs, and channel geometry parameters (Fig. <ref type="figure">2</ref>). From the meteorological forcing data, we extracted incoming shortwave radiation, incoming longwave radiation, air temperature, specific humidity, air pressure, and wind speed. Gridded meteorological forcings were assigned to vector stream reaches based on the centroid location of each reach. From the hydrological model outputs, we retrieved discharge, stream velocity, flux from the groundwater bucket, and runoff from terrain routing (surface runoff) corresponding to each model reach segment. We also retrieved channel geometry parameters, including location, reach length, width, side slope angle, and stream order, from the NWM Routelink dataset (accessible at: <ref type="url">https://www.nohrsc.noaa.gov/pub/staff/keicher/NWM_live/NWM_par  ameters/NWM_parameter_files.tar.gz</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Modeling approach 2.3.1. Model resolution</head><p>We simulated hourly water temperatures throughout the H.J. Andrews stream network during a six-week period of low flow from July 1 to August 15, 2019. We selected this time period because it was the most recent period where observed water temperatures in the basin and predictions from the NWM Retrospective v2.1 overlapped. We subdivided channels identified by the NWM into a series of 1-km long reach segments, beginning at the channel head of each tributary. Water temperature predictions were made at 46 model nodes, located at the beginning and end of each of these segments. Additional model nodes were added at the location of the two observed water temperature gages so as to not introduce error via spatial interpolation when assessing model performance against observations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.2.">Computation of water temperatures</head><p>We adapted a semi-Lagrangian model formulation following Yearsley ( <ref type="formula">2009</ref>), also implemented in the DHSVM-RBM water temperature model, to develop a Python script that simulates water temperatures using primarily NWM forcings. Semi-Lagrangian approaches are widely used in the field of numerical weather prediction <ref type="bibr">(Husain and Girard, 2017)</ref> and have also been applied extensively to water temperature modeling <ref type="bibr">(Lee et al., 2020;</ref><ref type="bibr">Yan et al., 2021;</ref><ref type="bibr">Yearsley, 2009</ref><ref type="bibr">Yearsley, , 2012))</ref>. This frame of reference combines aspects of Eulerian and Lagrangian approaches, coupling a fixed model grid with longitudinal particle tracking to gain efficiency over a strictly Eulerian method <ref type="bibr">(Yearsley, 2009)</ref>. Semi-Lagrangian models are numerically stable across broad ranges of space and time steps, facilitating simulations at time steps considerably longer than possible under models limited by the Courant condition <ref type="bibr">(Yearsley, 2009)</ref>.</p><p>In this semi-Lagrangian approach, unknown temperatures at a future time step were determined by applying reverse particle tracking to simulate the longitudinal paths of water parcels originating from model nodes where water temperatures are simulated <ref type="bibr">(Yearsley, 2009)</ref>. From a given node at time t + &#916;t, where &#916;t is equal to the computational time step, the upstream Lagrangian coordinate (&#958;) at time t was equal to <ref type="bibr">(Yearsley, 2009)</ref>:</p><p>where. t = model time step, s &#916;t = computational time step of the model, s x 0 = starting position of the water parcel along the reach, m</p><p>As the origin location of each water parcel did not always coincide with a model node, we used second-order Lagrangian polynomials to interpolate the temperature at the origin point at time t using known water temperatures from surrounding nodes <ref type="bibr">(Yearsley, 2009)</ref>.</p><p>Once the starting water temperature at the origin point was known, the particle was tracked as it traveled back downstream to the starting model node. As the water parcel passed downstream from time t to t + &#916;t, the location of a water parcel along its trajectory (x j ) was tracked by <ref type="bibr">(Yearsley, 2009)</ref>:</p><p>where: &#958; = upstream location of the water parcel at time t, m u(j') = flow velocity in the jth model segment at time t, m s -1 &#916;(j') = time taken to traverse the jth model segment, s j = index of model segment, unitless</p><p>The length of time &#916;(j') to traverse the jth model segment was equal to <ref type="bibr">(Yearsley, 2009)</ref>:</p><p>where.</p><p>x j' = locatliion of the downstream boundary of the j'th node, m x j = location of the water parcel along its trajectory, m u(t,j') = flow velocity along the j'th model segment at time t, m s -1</p><p>As water parcels traversed model segments downstream, radiative and hydrologic heat inputs were calculated and integrated over time to update the water temperature at each model node until the water parcel reached its final location at time t + &#916;dt. The calculated water temperature was then inserted at the unknown node and the cycle repeated, either at the next time step or for the next node in the sequence. Following an approach based on <ref type="bibr">Yearsley (2009)</ref>, water temperatures were updated at the downstream end of the jth model segment at time t + &#916;(j') by integrating radiative and hydrologic energy fluxes along each model segment:</p><p>where.</p><p>T(t,x j' ) = known water temperature at the current time step, &#8226; C T (t + &#916;(j'),x j' ) = unknown water temperature at the future time step,</p><p>&#8226; C &#916;(j') = time taken for the water parcel to traverse the jth model segment, s H(t,x j' ) = thermal energy flux across the air-water interface, W m -2 &#961; = density of water, kg m -3</p><p>C p = specific heat capacity of water, J kg -1 &#8226; C -1 D(t,x j' ) = water depth, m &#934;(t,x j' ) = effective advected heat flux from hydrologic inflows, including groundwater and tributaries,</p><p>If a water parcel traversed more than one model segment during a computational time step, the above formula was computed at the end of each segment crossed.</p><p>To calculate how radiative and hydrologic forcings result in changes in water temperatures, the cross-sectional area, depth, width, and the volume of each reach must be known. For each reach, the NWM Routelink file supplied the reach length, channel side slope, bottom width, and top width to define a trapezoidal channel geometry <ref type="bibr">(Gochis et al., 2021)</ref>. As our model simulated water temperatures during summer low flow conditions, we only considered flows through the primary channel and disregarded overbank flow into the floodplain. A further description of our derivation of cross-sectional area, water depth, and reach volume is presented in Supporting Information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.3.">Heat transfer equations</head><p>The total thermal energy flux across the air-water interface (H) summarizes the radiative and atmospheric forcings to water parcels as they traverse model reaches. These energy fluxes include incoming shortwave radiation, net longwave radiation, sensible heat exchange, and latent heat exchange. The total thermal energy flux across the airwater interface was calculated by:</p><p>where.</p><p>Full equations for the calculation of each of these heat balance components, including the integration of riparian vegetative shading, is described in Supporting Information. We did not include bed conduction in the net energy balance, as streambed temperatures would be difficult to quantify when expanding the model to broader scales. The bed conduction flux is generally small compared to other heat fluxes, though it can an influential process along headwater reaches <ref type="bibr">(Benyahya et al., 2012;</ref><ref type="bibr">Caissie et al., 2014;</ref><ref type="bibr">Johnson, 2004)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.4.">Hydrologic heat fluxes</head><p>In addition to the radiative and atmospheric heat fluxes to the water column, hydrologic inflows, including groundwater inflow, surface water runoff, and tributary inflow, contribute heat to the stream based on the relative temperature difference between the stream and inflows. We aggregated the relative effects of these three inflows to generate a single advective heat flux to each model reach over time. The total hydrologic inflow rate was calculated by:</p><p>where.</p><p>Once the total inflow rate and individual inflow components were known, the effective temperature of the inflows was calculated by a flow-weighted arithmetic mean <ref type="bibr">(Glose et al., 2017)</ref>:</p><p>where.</p><p>The advective heat flux to the stream was then derived by computing the difference between the aggregate inflow temperature and current water column temperature, scaled by the proportion of lateral inflow to total channel volume. The effective advective flux was calculated by <ref type="bibr">(Glose, 2013)</ref>:</p><p>where.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.5.">Estimation of unknown inflow temperatures</head><p>In Equation ( <ref type="formula">7</ref>), the water temperature of each inflow component (groundwater inflow, surface water runoff, and tributary inflow) was unknown. Following <ref type="bibr">Wanders et al. (2019)</ref>, we set the temperature of surface water runoff as 1.5 &#8226; C less than the current air temperature (though we note that this parameter was not expected to be influential given the lack of surface runoff that occurred during our simulation period). Simulated tributary temperatures were implicitly treated as lateral inflow to the model reach where the tributary joined the mainstem river.</p><p>Groundwater temperatures are particularly influential to modeled water temperatures, but are often unknown for the purposes of water temperature modeling. Our proposed approach relied on the assumption that the net water temperature of groundwater inflow must be bounded by the temperature of deep groundwater (approximated by annual mean air temperature) and the ground surface temperature (approximated by continuous air temperature). The ground surface temperature can be coarsely estimated by a smoothed mean air temperature, reflecting correlative links between patterns in solar radiation, air temperature, and ground surface temperature. By scaling the magnitude of variability of a smoothed daily air temperature signal between the bounds of deep groundwater and the ground surface temperature, we estimated the effective inflow temperature of groundwater inflow. At each model time step and model reach segment, we calculated the groundwater inflow temperature (T GW ) by:</p><p>where.</p><p>C AT-GW = Air temperature-groundwater temperature coefficient, varying from 0 to 1, unitless AT D = Mean daily air temperature smoothed over a variable duration moving window,</p><p>The smoothed daily air temperature, AT D , for a given time t was calculated by:</p><p>AT n <ref type="bibr">(10)</ref> where: W = air temperature moving window duration, days.</p><p>AT n = daily mean air temperature on day n, &#8226; C.</p><p>By tuning C AT-GW between 0 and 1, we simulated effective sourcing of inflows from temporally-invariant deep groundwater (value closer to 0) or from more variable shallow groundwater (value closer to 1) (Fig. <ref type="figure">3</ref>). We derived mean annual air temperatures along the network using 4 km gridded PRISM means over a 4-year period from 2016 to 2019 (PRISM Climate Group, Oregon State University, 2022). Daily air temperatures were retrieved from NWM forcings at each model reach and smoothed to a mean value using a backward-looking moving window, tuned to vary between a duration of 2 and 14 days. This moving window simulated the unknown response time of shallow groundwater to radiative forcings (reflected by air temperature). We used the calculated groundwater inflow temperature time series at each model node to supply boundary conditions to the model, both to set the first time-step temperature along the entire network and the time-varying temperature of streamflow initiation at all reach heads during the study period.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.6.">Approximating the thermal effects of hyporheic exchange</head><p>To conceptually represent the thermal effect of hyporheic exchange in our model, we used a simplified approach that stores water temperatures from previous time steps and returns them at a later, lagged time at a rate proportional to a tuned fraction of discharge. The hyporheic return flow temperature (T hyp ) at time t along a given reach calculated by:</p><p>where.</p><p>T hyp,t = hyporheic return flow temperature at time t, &#8226; C H lag = hyporheic lag duration, hours T n = simulated water temperature at previous time n, &#8226; C</p><p>In a similar manner to the computation of advective heat transfer due to groundwater inflow, the effective hyporheic heat flux was calculated by:</p><p>Where.</p><p>&#934; hyp = effective advective heat flux, &#8226; C s -1 H frac = fraction of streamflow returned to channel as hyporheic flow, varying between 0 and 1, unitless</p><p>We tuned the hyporheic lag duration parameter between 2 and 24 h, simulating a range of hyporheic flow path velocities. Although hyporheic flow paths often have residence times longer than 24 h, the variability in the mean temperature of simulated streamflow (approximating hyporheic return temperature) over periods longer than 24 h is negligible. As such, we limited the lag duration to a maximum of 24 h to conserve computational runtime. The hyporheic flow fraction coefficient represented the amount of water returned to the stream at a given point in time and space as a proportion of discharge (e.g., an H frac value of 0.4 equates to 40% of discharge returned to the stream as hyporheic flow). We allowed this fraction to vary independently by stream order (first, second, and third order reaches), as we generally expect stream order and hyporheic flow to demonstrate a negative relationship. As stream order increases and stream slope decreases down-valley, the effects of hyporheic flow relative to other channel processes tends to decrease <ref type="bibr">(Boano et al., 2014;</ref><ref type="bibr">Wondzell, 2011)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.7.">Estimating riparian shading in the absence of on-site observations</head><p>Riparian shading is a crucial variable in water temperature modeling, as it controls the proportion of radiation that reaches the water's surface. However, the NWM does not constrain shading of channels by streambank vegetations. In the absence of model data and on-site observations, we derived riparian shading values along the river network using an empirical formula (Vegetation-shading index; VSI) presented by <ref type="bibr">Kalny et al. (2017)</ref> that relates vegetation height, vegetation buffer width, and vegetations density to riparian shading. VSI has been shown to accurately characterize riparian shading in the absence of on-site observations, with estimated values displaying correlations of up to 0.9 with shading values derived from hemispherical photos <ref type="bibr">(Kalny et al., 2017)</ref>. VSI is calculated by:</p><p>where.</p><p>VSI = vegetation-shading index, varying between 0 and 1, unitless h r = relative vegetation height, % h max = maximum vegetation height, equal to 100% w = vegetation buffer width, m w max = maximum vegetation buffer width affecting water temperature, m, assumed to equal 50 m d = vegetation density, % d max = maximum vegetation density, equal to 100%</p><p>Given the dense and contiguous forest cover adjacent to river reaches in the H.J. Andrew's watershed, we assumed that the vegetation buffer width was equal to the maximum 50 m width value for all reach segments. Relative vegetation height (h r ) was calculated by scaling vegetation height by river width using the equation <ref type="bibr">(Kalny et al., 2017)</ref>:</p><p>where.</p><p>We modified the original relative vegetation height formula presented by <ref type="bibr">Kalny et al. (2017)</ref> by multiplying the river width term by 1.62 to account for differences in latitude between our study site and the site where the above formula was derived. The 1.62 scalar value indicates that due to the mean solar angle between 10 a.m. and 2 p.m. (maximum solar incidence) at H.J. Andrews during our study period (July 2019), a tree would cast a shadow roughly 1.62 times its length <ref type="bibr">(Kalny et al., 2017)</ref>.</p><p>In the absence of in-situ observations of canopy cover, we retrieved values of existing vegetation height and forest canopy cover from 30-m resolution gridded US LANDFIRE datasets (LANDFIRE, 2020a,b). We then calculated mean values of canopy variables along reach segments using 50 m buffers perpendicular to the centerline of stream, excluding water pixels from calculated means. To account for differences in shading due to the geographic aspect of each reach segment, we calculated canopy values using buffers on only the right bank for eastward-flowing segments ( <ref type="formula">45</ref>  <ref type="bibr">(Kalny et al., 2017)</ref>. In tuning scenarios where riparian shading exceeded 100% along a reach, its value was set to equal 100%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.8.">Assessing model performance</head><p>Model performance was evaluated during distinct calibration and validation periods. Model calibration was performed during the first four weeks of the selected time period (July 1 to July 31). The model was then reinitialized and run during a two-week model validation period (August 1 to August 15). We assessed the error of model simulations by comparing predictions to observed temperatures at two gages ('headwater': Mack Creek, 'outlet': Lookout Creek) within the basin during the calibration and validation periods. In the calculation of error metrics, we removed the first 48 h of simulated temperatures in both the calibration and validation periods to account for model spin-up. While this spin-up time is shorter than that of other hydrologic models, we found it sufficient, as the boundary condition temperatures rapidly equilibrated with radiative forcings downstream after a single diel cycle.</p><p>At each gauge, we calculated a suite of error metrics that capture a range of modes of variability, including RMSE, daily maxima error (D Max ), daily minima error (D Min ). RMSE was calculated using the full hourly time series of prediction. D Max and D Min were calculated as the mean difference between predicted and observed daily maxima and minima during each 24-h period. By using multiple error metrics in tandem to evaluate model performance, we gained additional insight into how the model resolved radiative and hydrologic processes. Daily maxima error, which described how the model represents peak temperatures, is an indicator of the model's ability to accurately simulate radiative heat fluxes that typically dominate net heat transfer during daytime hours. Daily minima error, which quantified how the model captures nighttime and early morning temperatures, is closely linked to hydrologic heat fluxes that become more influential in the absence of solar radiation. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Sequential evaluation of model configurations</head><p>Using a flexible model development framework <ref type="bibr">(Fenicia et al., 2011;</ref><ref type="bibr">Hrachowitz et al., 2014)</ref>, we tested the ability of 4 model formulations of increasing complexity and representation of physical processes to simulate water temperatures in our test basin. In general, we sought to develop model configurations that were parsimonious, representing physical behavior using the simplest formulation (or degrees of freedom) possible to avoid overparameterization and retain computational efficiency <ref type="bibr">(Hrachowitz et al., 2014;</ref><ref type="bibr">Jakeman et al., 2006)</ref>. With this in mind, we attempted to design modeling strategies such that they had sufficient complexity to produce accurate predictions, while matching the availability (or uncertainty) of model inputs <ref type="bibr">(Wagener et al., 2001)</ref>.</p><p>Models M1, M2, M3, and M4 each progressively incorporated additional degrees of freedom, tuning a broader suite of parameters that reflect uncertainty in hydrologic and thermal processes (Table <ref type="table">1</ref>). M1, the simplest configuration, only tuned parameters related to groundwater inflow temperatures and riparian shading. This formulation excluded hyporheic flow and used NWM estimates for rates of groundwater flow. M2 built on the M1 configuration, tuning the NWM estimates for the rate of groundwater inflow along the network and again excluding hyporheic flow. M3 built on the M1 configuration, adding a conceptual representation of hyporheic exchange (see Section 2.3.6). M4 combined the complexity of M2 and M3, tuning parameters related to groundwater inflow temperatures, groundwater inflow rate, riparian shading, and hyporheic exchange. Our model configurations (M1, M2, M3, M4) were not intended to resolve every physical process controlling water temperatures and instead sought to balance gains in performance against computational cost and uncertainty.</p><p>We calibrated each model configuration using 5000 uniform Monte Carlo samples of parameters (Table <ref type="table">2</ref>), totaling 20,000 model runs across all configurations. Parameters descriptions and their sampled plausible ranges are shown in Table <ref type="table">2</ref>. We intentionally defined wide parameter ranges to more fully explore all possible model outcomes. These parameters can be grouped into two categories: those that are tuned for the full network, and those that are tuned independently by stream order. We assumed that the full network parameters (air temperature moving window duration, riparian shading coefficient, and hyporheic lag duration) represent processes or sources of model error that are likely uniform throughout the basin. Parameters tuned by stream order (C 1 AT-GW , C 2 AT-GW , C 3 AT-GW , GW 1 , GW 2 , GW 3 , H 1 frac , H 2 frac , and H 3 frac ) were assumed to represent processes that scale in relation to relative stream size. As the basin contains reaches up to third order, each of these variables was tuned independently across three degrees of freedom (first-, second-, and third-order reaches). The riparian shading coefficient (R shade ) and groundwater inflow rate coefficients (GW 1 , GW 2 , GW 3 ) were unitless coefficients used to tune existing estimates of riparian shading and groundwater inflow, reflecting our uncertainty in the characterization of these processes. The coefficients used in the tuning of groundwater inflow temperatures (C 1 AT-GW , C 2 AT-GW , C 3 AT-GW , W) and hyporheic flow (H lag , H 1 frac , H 2 frac , H 3 frac ) were used in equations described in sections 2.3.5 and 2.3.6, respectively.</p><p>From 5000 model calibration runs of each configuration (M1, M2, M3, M4), we selected the top 1% of runs (50 runs) sorted by RMSE w to represent peak potential model performance. RMSE w is a weighted error metric, calculated as the weighted average of headwater RMSE (25% weight) and outlet RMSE (75% weight). These runs are highlighted amongst all calibration model runs in Fig. <ref type="figure">4</ref>. We prioritized runs with low RMSE values at the outlet because we expect that model error will decrease down-network as radiative forcings, which tend to be bettercharacterized than hydrologic forcings in water temperature models, become more influential. Therefore, we assumed that prediction quality at the outlet is relatively more valuable than at headwater reaches.</p><p>The parameter sets of the top 1% of calibration runs ranked by RMSE w were further evaluated during a two-week validation period (Fig. <ref type="figure">4</ref>). By assessing model performance using only the top 1% of calibrated parameter combinations ranked by RMSE w (inferred to represent feasible solutions), we aimed to compare the potential performance of each model formulation when well-calibrated, discarding model runs where randomly sampled parameters did not reflect the physical reality of the basin.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Calibrated and validated models</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">M1: variable groundwater inflow temperatures</head><p>M1 was the simplest of all model configurations tested, allowing variability only in parameters related to the temperature of groundwater inflow (C 1 AT-GW , C 2 AT-GW , C 3 AT-GW , W) and riparian shading (R shade ) (Table <ref type="table">1</ref>). The model configuration struggled to reproduce the magnitude and variability of observed temperature time series at both the headwater and outlet gages (Figs. <ref type="figure">5</ref><ref type="figure">6</ref><ref type="figure">7</ref>). The 1st percentile of calibration runs of M1 ranked by RMSE w had a mean RMSE of 1.41 &#8226; C at the headwater gage and a mean RMSE of 1.20 &#8226; C at the outlet, the worst of any model configuration (Fig. <ref type="figure">5</ref>). The performance of M1 decreased for validation runs, where the top calibrated parameter sets produced a mean headwater RMSE of 2.70 &#8226; C and a mean outlet RMSE of 2.47 &#8226; C. This set of best calibrated runs overestimated peak temperatures in the headwater reach, with a daily maxima error of 3.70 &#8226; C during the validation period (Figs. <ref type="figure">5</ref> and <ref type="figure">6</ref>). Despite this strong positive bias in headwater reaches, M1 outlet predictions had a negative bias during validation, underestimating daily minima by -1.04 &#8226; C (Figs. <ref type="figure">5</ref> and <ref type="figure">7</ref>). The best M1 validation run ranked by RMSE w had a headwater RMSE of</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Water temperature model formulations, tuned parameters, and number of parameters.  and <ref type="figure">7</ref>). These temperature spikes coincided with periods when relatively humidity values approached 100%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model Tuned Parameters Number of Parameters</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">M2: variable groundwater inflow rate</head><p>In addition to the variables tuned in M1 (C AT-GW , W), M2 added three further degrees of freedom, tuning the rate of groundwater inflow in first-, second-, and third-order streams (GW 1 , GW 2 , GW 3 ) (Table <ref type="table">1</ref>). M2 showed an improvement over validation M1 runs, with mean RMSE values of 2.12 &#8226; C and 1.84 &#8226; C at the headwater and outlet gages, respectively (Fig. <ref type="figure">5</ref>). However, much like M1, M2 struggled to reproduce diurnal temperature cycles throughout the stream network (Figs. <ref type="figure">6</ref> and <ref type="figure">7</ref>). In the headwater reach, M2's lower mean error was largely driven by a narrowing of the diurnal cycle and a shift of predicted daily minima to cooler temperatures (Figs. <ref type="figure">5</ref> and <ref type="figure">6</ref>). Headwater maxima error during validation was reduced to 2.78 &#8226; C while daily minima error was reduced to 0.20 &#8226; C. Improvements in model performance at the outlet were linked to a more accurate simulation of minima temperature magnitude, though M2 showed little improvement over M1 in predicting daily maxima (Figs. <ref type="figure">5</ref> and <ref type="figure">7</ref>). The negative bias in outlet predictions observed for M1 also persisted for M2 (Fig. <ref type="figure">7</ref>). The best validation run for M2 had a headwater RMSE of 1.00 &#8226; C and an outlet RMSE of 0.85 &#8226; C. M2 also experienced similar implausible temperature spikes during validation as observed in M1, again coinciding with periods of high relative humidity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3.">M3: conceptual hyporheic zone</head><p>M3 introduced considerable complexity to the M1 configuration, adding a conceptual hyporheic zone tuned by hyporheic lag time (H lag ) and hyporheic flow fraction (H 1 frac , H 2 frac , H 3 frac ) parameters (Table <ref type="table">1</ref>). M3 resulted in a considerable improvement in performance in comparison to both M1 and M2, with the top 1% of calibrated parameter sets producing mean validation RMSE values of 1.29 &#8226; C and 0.97 &#8226; C in the headwaters and outlet, respectively (Figs. <ref type="figure">5</ref><ref type="figure">6</ref><ref type="figure">7</ref>). The M3 configuration greatly reduced the positive headwater bias observed in previous model configurations, reducing daily headwater maxima error to 1.44 &#8226; C during validation and more accurately representing the observed magnitude of diurnal variability (Figs. <ref type="figure">5</ref> and <ref type="figure">6</ref>). We also observed improvements in predicting daily minimum temperatures at the outlet, where validation minima error improved to 0.28 &#8226; C (Fig. <ref type="figure">5</ref>). Across 5000 model runs, M3's best run ranked by validation RMSE w had a headwater RMSE of 0.80 &#8226; C and an outlet RMSE of 0.76 &#8226; C.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.4.">M4: variable groundwater inflow rate and conceptual hyporheic zone</head><p>M4 was the most complex configuration tested, combining aspects from both M2 and M3 to tune hyporheic flow parameters (H lag , H 1 frac , H 2 frac , H 3 frac ) and groundwater inflow parameters (GW 1 , GW 2 , GW 3 ) (Table <ref type="table">1</ref>). Despite increased complexity and additional degrees of freedom, M4 did not show a marked improvement in performance over M3, providing only marginal decreases in RMSE (Fig. <ref type="figure">5</ref>). Predicted water temperature envelopes from M3 and M4 at the headwater and outlet were difficult to distinguish visually (Figs. <ref type="figure">6</ref> and <ref type="figure">7</ref>). M4 had the lowest mean validation RMSE value amongst all model configurations at the headwater gage (1.10 &#8226; C). The configuration's validation RMSE at the outlet gage (1.04 &#8226; C) was comparable to that of M3 (Fig. <ref type="figure">5</ref>). Headwater daily maxima and minima prediction error for M4 were low compared to other model configurations during validation, with values of 1.12 &#8226; C and -0.29 &#8226; C respectively (Fig. <ref type="figure">5</ref>). The best performing M4 run had a headwater RMSE of 0.70 &#8226; C and an outlet RMSE of 0.59 &#8226; C over the validation period.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Optimal calibrated parameters</head><p>Mean optimal parameter values across the top 1% of calibrated model runs gave additional insight into differences in performance between model configurations (Table <ref type="table">3</ref>). Coefficients controlling the temperature of groundwater inflow for first-, second-, and third-order streams (C 1 AT-GW , C 2 AT-GW , C 3 AT-GW ) were tuned for all model configurations. First-order coefficients (C 1 AT-GW ) were calibrated to lower values than second-and third-order coefficients for all configurations, reflecting cooler inflow temperatures in upland areas of the catchment. M1 and M2, models without hyporheic flow, had optimal C AT-GW values that were considerably higher than M3 and M4, models that did represent hyporheic flow. W, representing the number of days of mean air temperatures that were incorporated into estimates of groundwater temperatures, was consistently tuned to a value between 7 and 8 days for all model configurations. We also tuned R shade , a coefficient used to adjust the degree riparian shading along the network, for all model versions, reflecting uncertainty in our estimates of riparian cover derived from gridded datasets. In M1 and M2, R shade was tuned to 0.999 and 1.042 respectively, suggesting little bias in estimated riparian shading values. However, M3 and M4 had notably lower optimal R shade values of 0.690 and 0.699, respectively.</p><p>We tuned coefficients used to adjust the rate of groundwater inflow for first-, second-, and third-order streams (GW 1 , GW 2 , GW 3 ) for configurations M2 and M4. In both model configurations, optimal coefficients ranged from 1.141 to 1.433, representing increased groundwater inflow along all reaches in the stream network relative to NWM values. There was no clear relationship between optimal GW values and stream order for M2 and M4. Coefficients describing processes governing flow through a conceptual hyporheic zone, including hyporheic lag duration (H lag ) and hyporheic flow fraction (H 1 frac , H 2 frac , H 3 frac ), were calibrated for M3 and M4. Hyporheic zone parameters were tuned to relatively similar values between the two configurations. In M3 and M4, the mean optimal hyporheic lag duration, controlling the time delay before hyporheic flow is returned to the channel, was equal to 11.380 and 11.740 h, respectively. Hyporheic flow fraction, describing the proportion of streamflow that is routed into the conceptual hyporheic zone, had a strong negative relationship with stream order for both tested model configurations. For both M3 and M4, first-order reaches had the highest proportion of hyporheic flow, with coefficients of 0.666 and 0.682, respectively. Optimal hyporheic flow fractions then sequentially decreased for second-and third-order reaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Evaluating performance of water temperature model configurations</head><p>The quality of predictions made by certain configurations of our model confirm that water temperatures can be successfully simulated using inputs derived from a continental-scale hydrologic model (in this case, the NWM). Of the four configurations tested, two (M3 and M4) produced calibrated simulations with validation RMSEs near or below 1.0 &#8226; C at both the headwater and outlet reaches. These prediction errors are well below the 2.0 &#8226; C RMSE threshold estimated by <ref type="bibr">Yearsley (2012)</ref> as an acceptable measure of performance for water temperature modeling and compare well to other studies using similar modeling strategies <ref type="bibr">(Sun et al., 2015;</ref><ref type="bibr">Yan et al., 2021;</ref><ref type="bibr">Yearsley, 2012;</ref><ref type="bibr">Yearsley et al., 2019)</ref>. The ability of select configurations of our model to adequately predict water temperatures in H.J. Andrews, a complex forested headwater catchment, is promising for the incorporation of water temperature modeling into the NWM framework.</p><p>Each of the model configurations we explored in this study represents a unique hypothesis for our understanding of how radiative and hydrologic processes combine to influence river thermal regimes. As expected, the addition of degrees of freedom to configurations progressively improved model performance amongst most error metrics, though this relationship was not strictly linear (Figs. <ref type="figure">5</ref><ref type="figure">6</ref><ref type="figure">7</ref>). The strongest contrast in model performance existed between configurations that represented the thermal effects of hyporheic exchange (M3, M4) and those that only tuned parameters related to groundwater inflow rate and temperature (M1, M2). M3 and M4 demonstrated clear advantages over M1 and M2 in all error metrics excluding headwater daily maxima, suggesting that the influence of hyporheic flow on temperatures in this basin is too large to disregard. As a high-relief mountain headwater catchment, it is unsurprising that hyporheic exchange is an influential thermal process in H.J. Andrews, and its role in hydrologic function in the region has been thoroughly documented <ref type="bibr">(Becker et al., 2023;</ref><ref type="bibr">Herzog et al., 2019;</ref><ref type="bibr">Schmadel et al., 2017;</ref><ref type="bibr">Ward et al., 2017)</ref>.</p><p>Notably, the addition of parameters controlling the rate of groundwater inflow to configurations M2 and M4 resulted in improvements in model error, but to differing degrees (Table <ref type="table">1</ref>). When we added inflow tuning parameters to M2, we observed a reduction in error across most metrics over the previous model version (M1) (Fig. <ref type="figure">5</ref>). By contrast, the addition of these calibrated parameters to M4 did not result in considerable improvement over M3 (Fig. <ref type="figure">5</ref>). We hypothesize that the difference in the marginal reductions in error between M2 and M4 is likely attributable to the presence of other tuned hydrologic parameters (hyporheic exchange) in the M4 configuration. In M2, tuning to groundwater inflow represented the only pathway for the model to account for uncertain hydrologic processes, including hyporheic exchange. This flexibility gave M2 a greater advantage over M1. When the model included hyporheic exchange, as it did in M3 and M4, it appeared less crucial to tune groundwater inflow rate. Although the groundwater inflow parameters were tuned to route additional inflow into the channel (Table <ref type="table">3</ref>), the magnitude of these increases did not exceed 144% of NWM inflows. This suggests that the NWM's estimated groundwater contributions, at least in this basin, are roughly of the correct magnitude to accurately simulate thermal processes.</p><p>All configurations of the model struggled to simultaneously generate accurate predictions at both the headwater and outlet gages to varying degrees. While model predictions were often capable of providing accurate predictions at the headwater gage (Fig. <ref type="figure">4</ref>), many of these runs translated into poor outcomes at the outlet. For example, despite overestimating temperatures at the headwater, both M1 and M2 predicted outlet temperatures that were colder than observed (Figs. <ref type="figure">5</ref><ref type="figure">6</ref><ref type="figure">7</ref>). We highlight two possible explanations for the model's inability to fit both the headwater and the outlet concurrently. First, because we tuned several parameters independently by stream order (Table <ref type="table">2</ref>), random variations in parameter values for second-and third-order reaches only influenced predictions at the outlet and not at the headwater. This could be alleviated by narrowing calibrated parameter ranges or by enforcing a constrained sampling strategy informed by process-based knowledge (e.g., hyporheic flow fraction in second-and third-order reaches must be tuned to be less than that of first-order reaches), as has been implemented in hydrological modeling studies (e.g., <ref type="bibr">Hrachowitz et al., 2014)</ref>. Tradeoffs in fitting the headwater and outlet could also be caused by a mischaracterization of heat fluxes along the network. This was most evident in configurations M1 and M2, where unrealistically warm headwaters were required to achieve the reasonable predictions at the outlet (Figs. <ref type="figure">6</ref> and <ref type="figure">7</ref>). This effect was partially -though not entirelyalleviated by the inclusion of a conceptual hyporheic zone in M3 and M4 (Figs. <ref type="figure">6</ref> and <ref type="figure">7</ref>), indicating that model configurations presented here may not fully capture all relevant heat fluxes in the system.</p><p>As we note in Section 3.1.1 and 3.1.2, a large proportion of M1 and M2 runs generated temperature spikes that exceeded plausible stream temperatures ranges typically observed at this site. These temperature spikes consistently coincided with periods of high relative humidity. Through further investigation of modeled heat fluxes, we found that the spikes were linked to high rates of sensible heat transfer into the stream, likely driven by our calculation of sensible heat using the Bowen Ratio (see Supporting Information Eq. 17 and 18). As Bowen's ratio is scaled Fig. <ref type="figure">6</ref>. Observed headwater temperatures (black) and 5/95th confidence envelope of water temperature predictions at the headwater gage across model configurations M1, M2, M3, and M4 for the top 50 calibration runs (1st percentile), ranked by weighted headwater and outlet RMSE (RMSE w ). Predictions displayed during a four-week calibration period and a two-week validation period, separated by a 48-h spin-up period for validation. by a relative humidity-dependent term <ref type="bibr">(Bowen, 1926;</ref><ref type="bibr">Boyd and Kasper, 2003)</ref>, high relative humidity values approaching 100% could lead to erroneous sensible heat fluxes, which we observe here. We attempted to alleviate these sensible heat errors by capping relative humidity at 97% in our model, but this only eliminated temperature spikes in some runs of M1 and M2. Alternative approaches to estimate sensible heat exchange during periods of high relative humidity are likely needed in catchments experiencing high humidity to remedy these errors.</p><p>Of the four configurations tested, the M3 and M4 configurations best approximated water temperature behavior in the H.J. Andrews catchment during this specific time period. However, this does not necessarily give insight into the efficacy of our modeling frameworks in other locations or at broader scales. Thermal regimes and their controlling processes are remarkably diverse, both within single catchments and across the North American continent <ref type="bibr">(Fullerton et al., 2015;</ref><ref type="bibr">Maheu et al., 2016)</ref>. As such, the optimal model configuration in one basin may not translate to a neighboring catchment or to a different geographic region. This potential heterogeneity in model performance presents challenges in extending water temperature modeling from individual catchments to the continental US.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Strategies for constraining uncertain inputs</head><p>Despite the wealth of hydrologic data provided by NWM runs, several key inputs required to force our water temperature model were uncertain or altogether unknown. These included but are not limited to parameters governing the water temperature of groundwater inflows, headwater initiation water temperatures, riparian shading of channels, and hyporheic exchange. If water temperatures are to be accurately predicted, particularly in a physically-based modeling framework, Fig. <ref type="figure">7</ref>. Observed outlet temperatures (black) and 5/95th confidence envelope of water temperature predictions at the outlet gage across model configurations M1, M2, M3, and M4 for the top 50 calibration runs (1st percentile), ranked by weighted headwater and outlet RMSE (RMSE w ). Predictions displayed during a four-week calibration period and a two-week validation period, separated by a 48-h spin-up period for validation. approaches must be developed to estimate these parameters at a high spatial resolution (1 km) and at nationwide scales. Leveraging publicly available data external to the NWM and Monte Carlo calibration, we designed strategies to overcome data limitations that both enabled us to fit temperature behavior in the study catchment and that we envisioned could be easily scalable to broader modeling domains. We explore the viability of our proposed strategies to estimate groundwater temperatures and hyporheic exchange.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Estimating groundwater temperatures key to accurate water temperature predictions</head><p>The heat flux associated with groundwater inflow, although often smaller in magnitude than fluxes at the air-water interface, can be a strong control on the water temperature of streams <ref type="bibr">(Caissie, 2006;</ref><ref type="bibr">Caissie and Luce, 2017;</ref><ref type="bibr">Kurylyk et al., 2016)</ref>. Groundwater inflow is particularly influential to water temperatures in forested headwater streams, as the magnitude of other radiative and turbulent heat fluxes are diminished <ref type="bibr">(Caissie and Luce, 2017;</ref><ref type="bibr">Ouellet et al., 2020)</ref>. In reaches where flows are primarily sourced from relatively cold groundwater, water temperatures are cooler and typically have narrower diel ranges <ref type="bibr">(Hannah and Garner, 2015)</ref>. Rigorous on-site monitoring is required to determine the rate and temperature of groundwater inflow to channels <ref type="bibr">(Caissie and Luce, 2017)</ref>, making advective fluxes challenging to quantify at a broader spatial extent. As such, the temperature of groundwater fluxes to streams represents a substantial source of uncertainty in physically-based models.</p><p>In water temperature models, the temperature of groundwater inflow is generally set to the mean annual air temperature, mimicking the temperature of deep groundwater <ref type="bibr">(Kurylyk et al., 2016;</ref><ref type="bibr">MacDonald et al., 2014)</ref>. Perhaps counterintuitively, the temperatures of groundwater and inflows to streams are not always equivalent. The temperature of subsurface inflow when it enters a stream, whether sourced by shallow flow paths, warmed through bed conduction, or mixed with hyporheic waters, is often warmer than that of deep groundwater in summer and cooler in winter <ref type="bibr">(Kurylyk et al., 2016;</ref><ref type="bibr">Leach and Moore, 2014)</ref>. Advected inflow temperatures are also more temporally variable than that of deep groundwater and are loosely coupled to daily mean air temperatures <ref type="bibr">(Leach and Moore, 2014)</ref>. Past modeling studies have attempted to account for the time-varying nature of inflow temperatures using non-linear regression <ref type="bibr">(Mohseni et al., 1998)</ref> to predict inflow temperatures from smoothed air temperatures <ref type="bibr">(van Vliet et al., 2012;</ref><ref type="bibr">Yearsley, 2012)</ref>.</p><p>As groundwater temperature in our modeling approach was critical not only for forcing subsurface fluxes but also for setting the upstream boundary condition, our approach to estimating groundwater temperatures (Section 2.3.5) was intended to incorporate both variable sourcing depth and a lagged relationship between inflow temperature and air temperature (Fig. <ref type="figure">3</ref>). Under the assumption that the sourcing depth of inflow may vary down-network, we allowed the coefficient governing the effective source depth of inflows (C AT-GW ) to independently vary by stream order. The performance of our calibrated approach can be roughly assessed by evaluating the error in headwater daily minima temperatures (D Min ), as diurnal minima are closely coupled to the temperature of inflows. By this metric, our approach was successful when well-calibrated, with all configurations producing headwater D Min values under 0.75 &#8226; C during the validation period (Fig. <ref type="figure">5</ref>). However, it remains challenging to disentangle the true effectiveness of our groundwater temperature approach from other potentially mischaracterized or absent streambed processes, including bed conduction and hyporheic flow. For example, groundwater inflow temperatures were consistently tuned warmer for models without hyporheic flow processes to fit behavior at the outlet (M1/M2) (Table <ref type="table">3</ref>). This suggests that for the simplified model configurations, groundwater inflow temperatures may be tuned to compensate for missing thermal processes, resulting in poor performance in certain regions of the stream network.</p><p>Given the degree of heterogeneity across all US catchments, our calibration for groundwater inflow temperature parameters may be cumbersome to apply to a continental-scale domain. The successes we observed in reproducing water temperatures using a tuned groundwater temperature approach are specific only to the study catchment during a period of low flow, and do not necessarily indicate transferability to other basins or time periods. By randomly tuning inflow temperature parameters, we sought to demonstrate that our model was capable of simulating water temperature behavior given well-calibrated parameters. This contrasts with a typical approach to modeling physical processes across broad spatial domains, where focus is instead placed on achieving acceptable mean model performance with uncertain parameter estimates. Expanding our water temperature model beyond the test basin would require a more complex approach to accurately simulate the broad diversity of subsurface flow dynamics across catchments and climates. Simulated groundwater temperatures at the continental scale would need to be temporally and spatially variable, reflecting site, basin, and regional controls on water temperature processes <ref type="bibr">(Benz et al., 2022;</ref><ref type="bibr">Hannah and Garner, 2015)</ref>. This approach would also need to incorporate the influence of seasonal snowpack on groundwater temperatures. Spatial statistical models or machine learning techniques could be an efficient and effective tool to predict variability in groundwater temperatures across the US, generating both upstream boundary conditions and inflow temperatures to drive a physically-based model <ref type="bibr">(Dugdale et al., 2017)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Is a conceptual hyporheic zone needed?</head><p>Hyporheic flow, characterized by flow paths that originate in the stream, travel through the subsurface, and eventually return to the stream, is an important process controlling the magnitude and timing of water temperature variability <ref type="bibr">(Arrigoni et al., 2008;</ref><ref type="bibr">Boano et al., 2014;</ref><ref type="bibr">Hannah et al., 2009)</ref>. Particularly in high relief headwaters like those of H.J. Andrews, a considerable portion of streamflow can pass through the hyporheic zone, returning flows with temperatures that are lagged and buffered compared to instream waters <ref type="bibr">(Arrigoni et al., 2008;</ref><ref type="bibr">Schmadel et al., 2017;</ref><ref type="bibr">Ward et al., 2016;</ref><ref type="bibr">Wondzell, 2011)</ref>. In some cases, hyporheic advective heat fluxes may comprise 25% of total net radiation in headwater streams <ref type="bibr">(Moore et al., 2005)</ref>. Although hyporheic flow can be an influential thermal process, many broad-scale hydrologic models, including the NWM, do not include hyporheic processes in their representation of river networks. Similarly, water temperature models also often neglect heat fluxes related to hyporheic exchange <ref type="bibr">(Kurylyk et al., 2016)</ref>.</p><p>Hyporheic flow and its associated effects on water temperatures are remarkably difficult to characterize, even when employing field observations and flow tracing techniques. In the absence of field measurements, our approach (outlined in Section 2.3.6) was intended to be conceptual rather than to give insight into true hyporheic behavior at this or any other study site. Our strategy for approximating hyporheic exchange did not represent physical mass transfer within the model and treated flow paths as point features, returning flow to the stream at the same point it originated. This simplification ignores the complex, 3D nature of hyporheic flow cells that can travel a considerable distances down-valley <ref type="bibr">(Tonina and Buffington, 2009)</ref>. For this reason, the calibrated hyporheic flow fraction values and time lags used in our model (Table <ref type="table">3</ref>) should not be taken as explicit estimations of hyporheic flow processes at H.J. Andrews. We also note that while our approach did tune hyporheic lag time, we opted not to include parameters that would damp the temperatures of hyporheic return flows so as to minimize the risk of overparameterization. The exclusion of hyporheic damping, a recognized thermal process related to groundwater-surface water exchange <ref type="bibr">(Briggs et al., 2018;</ref><ref type="bibr">Caissie et al., 2014;</ref><ref type="bibr">Caissie and Luce, 2017)</ref>, could limit the peak potential performance of tested model configurations.</p><p>The contrast in model performance between configurations that included hyporheic processes (M3 and M4) and those that did not (M1 and M2) suggests that incorporating, or at least mimicking, hyporheic exchange is critical to simulating water temperatures in the study basin . This finding was expected, given the multitude of studies describing the influence of hyporheic exchange on hydrological processes in H.J. Andrews <ref type="bibr">(Becker et al., 2023;</ref><ref type="bibr">Kasahara and Wondzell, 2003;</ref><ref type="bibr">Schmadel et al., 2017;</ref><ref type="bibr">Ward et al., 2018b</ref><ref type="bibr">Ward et al., , 2019))</ref>. The gains in performance we observed when including hyporheic processes were primarily linked to an improved estimation of daily minimum water temperatures (M3 and M4; Fig. <ref type="figure">5</ref>). This likely indicates that our representation of hyporheic flow, which returned warmer daytime waters roughly 12 h later (Table <ref type="table">3</ref>), served to address missing nighttime streambed fluxes. The influence of hyporheic flow in our model was tuned to decrease with increasing stream order (Table <ref type="table">3</ref>), matching our understanding of how hyporheic exchange evolves down-network <ref type="bibr">(Ward et al., 2019a,b)</ref>. We note that because our model configurations did not resolve bed conduction heat fluxes, parameters associated with hyporheic exchange may be simultaneously accounting for the effects of both hyporheic flow and bed conduction. If bed conduction remains absent from future model configurations, it may be beneficial to instead tune a single time-varying conceptual term that integrates all lagged streambed heat fluxes.</p><p>Our results indicate that in H.J. Andrews, and likely other basins in similar settings, the inclusion of hyporheic flow processes can improve predictions of hourly water temperatures throughout the stream network. However, incorporating the thermal effects of hyporheic flow into a water temperature model at the continental scale of the NWM may pose challenges. Hyporheic flow dynamics are patchy and site-specific, varying considerably within stream networks, between physiographic regions, and across different flow conditions <ref type="bibr">(Wondzell, 2011)</ref>. Clearly, it is not feasible to simulate hyporheic flow paths for all US river reaches, particularly given the absence of field observations along many rivers. Nevertheless, flexible modeling strategies could be designed to incorporate hyporheic processes only where they are the most influential to water temperatures. In such a framework, hyporheic flow could be tuned to improve water temperature predictions only in select regions and along low-order streams where hyporheic advective fluxes have the strongest influence on hourly water temperatures. Higher-order streams could then be represented by conceptually simpler modeling frameworks, improving computational efficiency.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Challenges and opportunities in expanding from the catchment to continental scale</head><p>Our study details the development and implementation of a water temperature model for a single catchment during a summer period. Expanding this model to the continental scale will require further model calibration and validation for catchments in different climates and hydrologic settings. At present, the processes included in the model are most relevant for summer periods. It is therefore unlikely that the model will perform well in catchments at high latitudes during winter periods. Further work would be needed to incorporate additional hydrologic and heat exchange processes that will likely vary across catchments and throughout the year. At present, we also caution that the model includes processes that have not yet been tested, such as the incorporation of surface water inflow and associated water temperatures of this inflow, and reserve such testing for future work.</p><p>Ultimately, we found that our ability to test this model in different catchments was limited by the availability of water temperature observations concentrated within a single catchment. This need is what led us to select the location of the case study presented here. Watershedscale water temperature monitoring occurs infrequently, and such data is often shared in disparate repositories, if shared at all. In contrast, monitoring by the USGS aims to cover a breadth of watersheds and rivers, though expanded river temperature monitoring networks in the Delaware River Basin show promise for assessing water temperature models. Our hope in making this model publicly available is that other river temperature modelers will apply it in their watersheds, incorporating their process-based understanding into their assessment of model fidelity, leading to future improvements in the model framework. While such benchmarking and community building is common in the watershed modeling space, it is rarer in water quality modeling. Future improvements in water quality modeling would certainly benefit from a community-based approach to assessing the ability of models to simulate water temperature (as well as other constituents), to evaluate to what extent our understanding of the hydrology informs this modeling, and to shape improvements in the model framework.</p><p>Though our study focused on developing modeling capabilities in a single catchment, our primary motivation was to evaluate the capacity for water temperature prediction to be coupled to the NWM at broader scales. In many ways, the NWM offers a framework for river networks and detailed information on hydrologic fluxes and meteorological inputs that make it an ideal foundation for a temperature model. For comparison, other gridded water temperature models (e.g., DHSVM-RBM) require the modeler to gather meteorological observations at weather stations (that must be interpolated in some way across the study domain), define a river network (based on a digital elevation model), and estimate or tune key hydrologic fluxes. While there are certainly benefits to using the NWM as a framework for building a water temperature model, we see several major challenges facing the application NWM to the simulation of water temperatures. These generally stem from the NWM's simplified representation of hydrological processes and represent areas of needed future study.</p><p>Foremost, the predictions made by our NWM-based water temperature model are ultimately limited by the accuracy and uncertainty of NWM simulations. Beyond discharge, the target variable for NWM calibration <ref type="bibr">(Gochis et al., 2019)</ref>, our physically-based temperature model is also reliant on several NWM states and parameters, including channel dimensions, groundwater inflow, surface runoff, and stream velocity. Despite the NWM's demonstrated ability to produce reasonable predictions of discharge, particularly in large river basins <ref type="bibr">(Boyd and Kasper, 2003;</ref><ref type="bibr">Hansen et al., 2019;</ref><ref type="bibr">Salas et al., 2018)</ref>, it can in some cases struggle to reproduce variability in other model states (e.g., soil moisture, snowpack; Garousi-Nejad and Tarboton, 2022; <ref type="bibr">Wan et al., 2022)</ref>. As these model states are not explicit targets for calibration in the NWM, their mischaracterization could propagate error into predicted water temperatures.</p><p>The NWM's representation of river network extent, derived from NHD Plus flow paths <ref type="bibr">(McKay et al., 2012;</ref><ref type="bibr">Salas et al., 2018)</ref>, is another potential source of uncertainty to water temperature modeling. Although the NHD provides exceptional spatial coverage of river networks across the US, it has been shown to systematically underestimate the true extent of river density <ref type="bibr">(Elmore et al., 2013)</ref>. In the H.J. Andrews catchment, the NWM models streamflow along only 34.5 km of river length. By contrast, <ref type="bibr">Ward et al. (2019a,b)</ref> estimated the total river length in H.J. Andrews was 242 km using on-site lidar assessments and flow accumulation modeling. The omission of numerous headwater reaches by the NWM could have implications for the prediction of water temperatures in low stream order catchments. The amount of time water is exposed to radiation at the surface, which is influenced by the location of channel initiation, is a strong control on water temperature magnitude and variability <ref type="bibr">(Yearsley, 2012)</ref>. Due to this uncertainty, headwater temperatures may be difficult to accurately simulate in catchments where the true location of streamflow initiation is mischaracterized.</p><p>Despite these limitations, the unique framework of the NWM presents promising opportunities for the prediction of water temperatures at broad scales. While our model is coupled with NWM version 2.1, NOAA is set to introduce the Next Generation Water Resources Modeling Framework (NextGen) in the coming years, with exciting implications for water temperature prediction. Based on the understanding that certain model configurations may perform better in specific catchments, the flexible and interoperable NextGen framework will enable domains to be simulated using model conceptualizations that best match the dominant hydrologic controls in a particular region <ref type="bibr">(NOAA, 2021b)</ref>. By leveraging the NextGen framework, the same principle could be applied to tailor water temperature model configurations to specific catchments or regions. For example, in forested headwater catchments, a more parameterized model configuration could be used to resolve complex hyporheic heat fluxes. In contrast, water temperature predictions in large high order streams could be made using comparatively simpler and more efficient models. The potential flexibility offered by the NextGen framework could bring about profound advances in prediction quality, resolution, and extent across the US.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In this study, we developed and evaluated the capabilities of physically-based high resolution water temperature model driven by forcings and outputs from the National Water Model (NWM). Through the sequential calibration and validation of four model configurations of increasing complexity, we demonstrated that the inclusion of heat fluxes at the streambed interface (e.g., hyporheic flow) is critical for simulating hourly water temperatures in a forested headwater catchment. The performance of the best-fitting model configuration was comparable to or better than other physically-based water temperature models, suggesting that the NWM can be an effective foundation for water temperature prediction. It is equally worth noting that our simulations were performed for a low flow period during summer, given the relevance of such periods to ecological management. Model performance during other periods of the year should be investigated to fine-tune process representation and to further generalize the model framework.</p><p>While this work focuses on model development in a single catchment, the expansion of NWM-based water temperature modeling to broader spatial domains would improve understanding and management of the complex mosaic of US river thermal regimes. Hourly water temperature forecasts along all US river reaches could provide actionable information that would inform the management of fisheries and other sensitive aquatic ecosystems. Such a model would present a clear improvement over the patchwork of water temperature monitoring stations currently active across the continent. With the introduction of the NextGen NWM framework on the horizon, we recommend the continued development, exploration, and evaluation of NWM-coupled water temperature models to expand predictions from single catchments to all US watersheds.</p></div></body>
		</text>
</TEI>
