<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>A cluster-based temporal attention approach for predicting cyclone-induced compound flood dynamics</title></titleStmt>
			<publicationStmt>
				<publisher>Elsevier</publisher>
				<date>06/25/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10640506</idno>
					<idno type="doi"></idno>
					<title level='j'>Environmental modelling  software</title>
<idno>1873-6726</idno>
<biblScope unit="volume">191</biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Samuel Daramola</author><author>David F Muñoz</author><author>Hamed Moftakhari</author><author>Hamid Moradkhani</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Deep learning (DL) models have been used for rapid assessments of environmental phenomena like mapping compound flood hazards from cyclones. However, predicting compound flood dynamics (e.g., flood extent and inundation depth over time) is often done with physically-based models because they capture physical drivers, nonlinear interactions, and hysteresis in system behavior. Here, we show that a customized DL model can efficiently learn spatiotemporal dependencies of multiple flood events in Galveston, TX. The proposed model combines the spatial feature extraction of CNN, temporal regression of LSTM, and a novel cluster-based temporal attention approach to assimilate multimodal inputs; thus, accurately replicating compound flood dynamics of physically-based models. The DL model achieves satisfactory flood timing (±1 h), critical success index above 60 %, RMSE below 0.10 m, and nearly perfect error bias of 1. These results demonstrate the model's potential to assist in flood preparation and response efforts in vulnerable coastal regions.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Coastal areas are increasingly vulnerable to compound &#26112;&#27648;ood events, which occur when storm surges, river over&#26112;&#27648;ow, and heavy rainfall occur simultaneously or in close succession <ref type="bibr">(Eilander et al., 2023;</ref><ref type="bibr">Mu&#222; noz et al., 2021;</ref><ref type="bibr">Xu et al., 2024)</ref>. Although these events have historically been rare, their frequency and intensity have risen signi&#26112;&#26880;cantly in recent years <ref type="bibr">(Taherkhani et al., 2020;</ref><ref type="bibr">Wing et al., 2022)</ref>. This increase is primarily attributed to climate change, which drives sea level rise, marine heatwaves, torrential rainfall, intense winds from tropical cyclones, glacier melting, land subsidence, and altered ocean currents <ref type="bibr">(Bloom&#26112;&#26880;eld et al., 2023;</ref><ref type="bibr">Levy et al., 2024;</ref><ref type="bibr">Ohenhen et al., 2023;</ref><ref type="bibr">Radfar et al., 2024;</ref><ref type="bibr">Santiago-Collazo et al., 2019;</ref><ref type="bibr">Thi&#233;blemont et al., 2024;</ref><ref type="bibr">Volkov et al., 2023)</ref>. Currently, approximately 700 million people and an estimated $13 trillion in assets located in areas less than 10 m above mean sea level worldwide are at risk of &#26112;&#27648;ooding <ref type="bibr">(Kirezci et al., 2023)</ref>. In the United States alone, tropical cyclones intensify rainfall and river over&#26112;&#27648;ow, contributing to over $13.8 billion in &#26112;&#27648;ood damages and 86 &#26112;&#27648;ood fatalities out of a total $280 billion in cyclone-related damages and 683 deaths over the past &#26112;&#26880;ve years (NOAA-NCEI, 2024). Hence, the imperative to understand &#26112;&#27648;ood dynamics caused by compound events is driving the exploration of advanced modeling approaches that can enhance current prediction methods and support &#26112;&#27648;ood management practices.</p><p>Methods for predicting compound and coastal &#26112;&#27648;ood dynamics rely on both physically-based and data-driven approaches <ref type="bibr">(Chiang et al., 2024;</ref><ref type="bibr">Fraehr et al., 2022;</ref><ref type="bibr">Hu et al., 2019;</ref><ref type="bibr">L&#246;we et al., 2021;</ref><ref type="bibr">Marsooli and Wang, 2020;</ref><ref type="bibr">Sampurno et al., 2022;</ref><ref type="bibr">Shahabi and Tahvildari, 2024)</ref>. Physically-based approaches, such as hydrodynamic and hydraulic models, are particularly effective at estimating the interaction of multiple &#26112;&#27648;ood drivers such as rainfall, river discharge, and storm surges <ref type="bibr">(Bates, 2022</ref><ref type="bibr">(Bates, , 2023;;</ref><ref type="bibr">Leijnse et al., 2021;</ref><ref type="bibr">Santiago-Collazo et al., 2024)</ref>. These approaches demonstrate strong predictive skills across multiple &#26112;&#27648;ood scenarios by complying with physical constraints such as the conservation of mass and momentum over time and space <ref type="bibr">(Alipour et al., 2022;</ref><ref type="bibr">Camus et al., 2021;</ref><ref type="bibr">Jibhakate et al., 2023;</ref><ref type="bibr">Pe&#222; na et al., 2022;</ref><ref type="bibr">Zhong et al., 2024)</ref>. However, they require suf&#26112;&#26880;cient computational resources and observational (forcing) data, even for short-term (hourly to daily) predictions over large-scale domains <ref type="bibr">(Bilskie et al., 2021;</ref><ref type="bibr">Nezhad et al., 2023)</ref>. Data-driven models are generally ef&#26112;&#26880;cient in this regard since they leverage statistical and machine learning techniques to learn from nonlinear associations and infer data patterns, thus rapidly generating &#26112;&#27648;ood predictions without relying on complex physical simulations <ref type="bibr">(Gomez et al., 2024;</ref><ref type="bibr">Lewis et al., 2024;</ref><ref type="bibr">Moftakhari et al., 2017)</ref>.</p><p>Statistical approaches typically require explicit assumptions about data distribution to predict &#26112;&#27648;ood data associated with a given return period (e.g., peak storm surge, river &#26112;&#27648;ow, and rainfall) <ref type="bibr">(Boumis et al., 2023;</ref><ref type="bibr">Maduwantha et al., 2024;</ref><ref type="bibr">Moftakhari et al., 2021;</ref><ref type="bibr">Zhong et al., 2024)</ref>.</p><p>Machine learning techniques, particularly deep learning (DL) models, can capture complex nonlinear associations and hidden patterns from input data features; thus, enhancing the prediction accuracy <ref type="bibr">(Foroumandi et al., 2024;</ref><ref type="bibr">Fu et al., 2022;</ref><ref type="bibr">Sattari et al., 2025)</ref>. Recently, transfer learning techniques have been incorporated into DL model architectures to further enhance the predictive capability in areas with limited or scarce data <ref type="bibr">(Obara and Nakamura, 2022;</ref><ref type="bibr">Seleem et al., 2023)</ref>. Previous studies have noted accurate prediction of storm surges, extreme water levels, river &#26112;&#27648;ow, among other &#26112;&#27648;ood drivers <ref type="bibr">(Daramola et al., 2025;</ref><ref type="bibr">Green et al., 2025;</ref><ref type="bibr">Hussain and Khan, 2020;</ref><ref type="bibr">Mahakur et al., 2025;</ref><ref type="bibr">McKeon and Piecuch, 2025;</ref><ref type="bibr">Mu&#222; noz et al., 2021;</ref><ref type="bibr">Samantaray et al., 2025;</ref><ref type="bibr">Tang et al., 2025;</ref><ref type="bibr">Tiggeloven et al., 2021)</ref>. Nevertheless, DL models are mostly used for station-based &#26112;&#27648;ood predictions with hazard and risk map generation focused on the peak &#26112;&#27648;ood extent <ref type="bibr">(Ayyad et al., 2022;</ref><ref type="bibr">Tedesco et al., 2024)</ref>. This is because many conventional DL models excel at either spatial (e.g., Graph Convolution Networks, Convolutional Neural Networks) or temporal analysis of &#26112;&#27648;ood events (e.g., Long Short-Term Memory networks, Gated Recurrent Units). As a result, they often lack the ability to process &#26112;&#27648;ood dynamics in both space and time. To overcome this limitation, studies tend to use DL frameworks that combine spatial and temporal architectures, preferably employing static datasets for spatial analyses and dynamic features for temporal contribution <ref type="bibr">(Farahmand et al., 2023;</ref><ref type="bibr">Fathi et al., 2025)</ref>. Additionally, DL models designed for spatiotemporal analysis like Convolutional Long Short-Term Memory (ConvLSTM) may effectively capture general spatiotemporal patterns but struggle to accurately represent the varying conditions across the spatial domain. The latter results from the lack of inherent physical constraints in DL models to guide &#26112;&#27648;ood dynamics in contrast to those present in hydrodynamic models. Furthermore, regional variations in spatial &#26112;&#27648;ood dependencies are ignored when conducting current and future &#26112;&#27648;ood risk assessments <ref type="bibr">(Brunner et al., 2020)</ref>.</p><p>Given the challenges described, a key question arises: which DL techniques can effectively capture hysteresis in system behavior across spatiotemporal domains? In &#26112;&#27648;ood dynamics, hysteresis occurs when water levels depend not only on current conditions but also on prior states, producing lagged effects that vary in space and time <ref type="bibr">(Wu et al., 2023)</ref>. Spatially, water level variations at one location can induce changes in water levels of adjacent areas due to factors like bottom friction (roughness) effects on the propagation of &#26112;&#27648;ood waves as well as morphological features that either attenuate or amplify water levels in coastal and estuarine systems <ref type="bibr">(Hoitink and Jay, 2016;</ref><ref type="bibr">Prandle, 1985;</ref><ref type="bibr">Talke and Jay, 2020)</ref>. Temporally, peak water level timing might differ across inland, transition, and coastal regions, resulting in peak water levels occurring earlier or later depending on distance from the &#26112;&#27648;ood source, local topography and bathymetry, or &#26112;&#27648;ood connectivity. This complex interplay challenges traditional DL models, which often fail to capture these lagged dependencies effectively <ref type="bibr">(Brunner et al., 2020)</ref>. As a result, such models may erroneously assume a single peak water level across a large domain <ref type="bibr">(Yu et al., 2024)</ref>. This oversight can miss critical timing differences in peak water levels, such as those along tidally-in&#26112;&#27648;uenced rivers or estuaries driven by storm surges, river &#26112;&#27648;ow, and backwater effects <ref type="bibr">(Hoitink and Jay, 2016;</ref><ref type="bibr">Sandbach et al., 2018)</ref>. To model hysteresis accurately, advanced DL frameworks must integrate spatiotemporal dependencies across the model domain including the interconnected nature of &#26112;&#27648;ooding processes and different peak water level timings.</p><p>To address these limitations, we propose architectural modi&#26112;&#26880;cations that enable DL models to better capture the spatiotemporal dynamics of compound &#26112;&#27648;ood events, as simulated in hydrodynamic models. Specifically, our approach involves three key modi&#26112;&#26880;cations: (i) implement mechanisms that enable the model to selectively focus on and weigh the importance of features, enhancing their ability to understand and replicate intricate spatiotemporal dependencies; (ii) cluster the spatial domain regionally and modulate with temporally varying contributions based on water level dynamics at observational stations to re&#26112;&#27648;ect water level variability over time and space; and (iii) leverage dynamic input data from all &#26112;&#27648;ood drivers (just like in hydrodynamic models) to facilitate the learning process of nonlinear interactions and improve the model's predictive accuracy. Our proposed DL model is applied to Galveston, TX, on the Gulf Coast of the United States, a region that has been signi&#26112;&#26880;cantly affected by torrential rainfall and cyclone-induced &#26112;&#27648;ood events.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Study area</head><p>The Galveston Bay (G-Bay) is the seventh largest estuary in the US and serves as a vital hydrological link between Houston, Texas, and the Gulf of Mexico through a complex network of bayous, interior bays, and rivers <ref type="bibr">(Mu&#222; noz et al., 2024)</ref>. Extending 56 km in length and averaging 31 km in width, G-Bay, with a shallow depth of approximately 2 m, encompasses an expansive surface area of about 1600 km 2 (Fig. <ref type="figure">1a</ref>). The combination of substantial freshwater discharge and dynamic tidal forces creates a sensitive hydrodynamic environment susceptible to rapid changes in water levels. Additionally, G-Bay's complex morphology resembles a bottle-like neck that connects the Buffalo Bayou River with the Bay. Such a morphologic feature exacerbates the impacts from coastal and inland &#26112;&#27648;ood drivers, including storm surges and rainfall-runoff <ref type="bibr">(Mu&#222; noz et al., 2022;</ref><ref type="bibr">Valle-Levinson et al., 2020)</ref>, and makes it particularly susceptible to compound &#26112;&#27648;ood events affecting both Galveston and Harris Counties.</p><p>For this study, we selected six of the most recent &#26112;&#27648;ooding events that have affected G-Bay, including Hurricanes Ike (2008), Harvey (2017), Nicholas (2021), and Beryl (2024), as well as the torrential rainfall events of Memorial <ref type="bibr">Day (2015)</ref> and Tax <ref type="bibr">Day (2016)</ref>. Making landfall near Galveston on September 13, 2008, as a Category 2 hurricane, Hurricane Ike brought maximum sustained winds of approximately 175 km/h and a storm surge exceeding 4.5 m (National Weather <ref type="bibr">Service, 2008)</ref>. The hurricane caused extensive &#26112;&#27648;ooding in Galveston and Harris Counties. Torrential rainfall on May 23-24, 2015, around Memorial Day resulted in widespread &#26112;&#27648;ooding across South-Central Texas. Southern Blanco County received a record of 0.25-0.33 m leading to Blanco River at Wimberley rising from nearly 1.5 m to over 12.5 m within hours, causing severe &#26112;&#27648;ooding <ref type="bibr">(National Weather Service, 2015)</ref>. Similarly, about 0.61 m of rain fell in Houston and in Waller County from April 17-18, 2016, &#26112;&#26880;lling local reservoirs to full capacity and hitting record high water levels (ABC <ref type="bibr">News, 2019)</ref>.</p><p>Harvey made landfall near Rockport, Texas, on August 25, 2017, as a Category 4 hurricane with winds of 215 km/h. The storm stalled over Southeast Texas, dropping over 1.27 m of rain in some areas, resulting in catastrophic &#26112;&#27648;ooding, particularly in the Houston metropolitan area (National Environmental SatelliteDataand Information Service, 2024). Nicholas made landfall on September 14, 2021, near Sargent Beach, Texas, as a Category 1 hurricane with maximum sustained winds of 120 km/h, causing signi&#26112;&#26880;cant rainfall and &#26112;&#27648;ooding in Southeast Texas and impacting communities around Galveston Bay <ref type="bibr">(National Hurricane Center, 2022)</ref>. Beryl made landfall near Matagorda, Texas, on July 8, 2024, as a Category 1 hurricane with winds of 129 km/h, leading to extensive &#26112;&#27648;ooding in Galveston and Harris Counties (NASA earth observatory, 2024). The six events resulted in over 210 fatalities and approximately $200 billion in economic damages. This high susceptibility to compound &#26112;&#27648;ooding renders Galveston Bay an ideal site for applying the proposed DL model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>We developed a DL framework that incorporates information from a physically-based model, namely Delft3D-FM, to estimate compound &#26112;&#27648;ood dynamics during extreme events. Both the DL and the Delft3D-FM models utilize the same input features: observed water levels and river discharge at speci&#26112;&#26880;c stations, along with spatially distributed variables, such as precipitation, digital elevation model (DEM), atmospheric pressure, and wind speed (Fig. <ref type="figure">2</ref>). These input features capture key spatiotemporal patterns from variables driving compound &#26112;&#27648;ood dynamics, enabling both models to predict &#26112;&#27648;ood depth and inundation extent. Since the DL model aims to replicate the model predictions of Delft3D-FM, the simulated water depth data of the latter model is the target feature of the proposed framework during training. The framework employs the LSTM model to analyze water level data from multiple tide-gauge stations (Fig. <ref type="figure">2a</ref> and <ref type="figure">b</ref>). Through an attention mechanism, which is a means to solve the problem of information overload <ref type="bibr">(Niu et al., 2021)</ref>, the model identi&#26112;&#26880;es critical time steps that re&#26112;&#27648;ect complex &#26112;&#27648;ood dynamics across the area. The attention vectors generated by this step are then integrated into the ConvLSTM model, which captures spatially distributed &#26112;&#27648;ood patterns, allowing for nuanced time-dependent interactions across the domain. Enhanced by its own attention mechanism, the ConvLSTM model learns both the unique contributions of individual spatial features and their interactions over time and space (Fig. <ref type="figure">2c</ref> and <ref type="figure">d</ref>). The attention vectors from the stations are then combined with the ConvLSTM output, in an attempt for the framework to improve &#26112;&#27648;ood predictions (Fig. <ref type="figure">2b</ref> and <ref type="figure">d</ref>). The framework's &#26112;&#26880;nal output is the spatiotemporal evolution of &#26112;&#27648;ood depth and inundation extent (Fig. <ref type="figure">2e</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Physically-based model</head><p>We leverage a previously developed Delft3D-FM model of G-Bay that simulates compound &#26112;&#27648;ood dynamics associated with hurricane events <ref type="bibr">(Mu&#222; noz et al., 2024)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Model setup</head><p>The model employs an unstructured &#26112;&#26880;nite-volume mesh grid composed of triangular cells that vary in size. The grid resolution ranges from 3 km at the ocean boundary to 5 m within Harris County, enabling high-resolution simulation of the complex interactions between G-Bay's physical morphologic and hydrodynamic processes. The model is driven by multiple external forcing conditions, including tidal harmonic constituents at the open ocean boundary, derived from the TPXO 8.0 global inverse tide model; hourly riverine in&#26112;&#27648;ows from USGS river gauges; and hourly wind speed and atmospheric pressure data from the ERA5 reanalysis dataset with a spatial resolution of ~31 km. Since rainfall data is a key &#26112;&#27648;ood driver in inland areas, we interpolated these data using a dense network of rain gauges in Harris County. Following <ref type="bibr">Sebastian et al. (2021)</ref> and Mu&#222; noz et al. ( <ref type="formula">2024</ref>), we set the "inverse distance weight" as the interpolation method in ArcGIS with an output cell size of 1 km (e.g., shortest Euclidean distance between existing rain gauges), a search radius of 5 points, and a power function of 2. Moreover, we supplemented rain gauge data with "total precipitation" from ERA5 to estimate rainfall patterns in coastal areas beyond Harris County and over the Gulf of Mexico.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Model calibration and validation</head><p>The G-Bay model has been rigorously calibrated and validated using all the above-mentioned compound &#26112;&#27648;ood events as shown in <ref type="bibr">Mu&#222; noz et al. (2024)</ref> as well as Figs. S3-S6 in the supplementary material. Legacy DEMs, including the 2006 Galveston, Texas Coastal Digital Elevation Model and the Continuously Updated Digital Elevation Model (CUDEM), are used to simulate the conditions during past and recent hurricane and storm events. Additionally, the National Land Cover Database (NLCD) provides annual land cover maps at a 30-m resolution, which are crucial for inferring spatially distributed roughness values during the calibration process. Model calibration involves adjusting the Manning's roughness coef&#26112;&#26880;cient for different land cover types to ensure that simulated water levels closely align with those of observed data. To further optimize the model's performance, 50 ensemble simulations were conducted using high-performance computing resources to &#26112;&#26880;nd the optimal combination of roughness coef&#26112;&#26880;cients that minimize errors of simulated water level and inundation depth in terms of RMSE, KGE, and NSE altogether. Observed water levels and ground-truth inundation depths were obtained from NOAA's Tide &amp; Currents and high-water marks from the USGS's Flood Event Viewer. The ensemble members consisted of a unique combination of plausible roughness values according to the NLCD land cover class. Such combinations were obtained from the Latin Hypercube Sampling technique and considered for the compound &#26112;&#27648;ood events <ref type="bibr">(Helton and Davis, 2003;</ref><ref type="bibr">Mu&#222; noz et al., 2022)</ref>. This comprehensive calibration ensures that the Delft3D-FM model accurately reproduces water levels, inundation depth, and &#26112;&#27648;ood extent in G-Bay under a variety of scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Deep learning model architecture</head><p>The proposed DL framework integrates LSTM and ConvLSTM to process point-based and spatially distributed data, respectively. These different data types are initially processed separately to allow the combined model to exert varying localized in&#26112;&#27648;uences across broader spatial features. The proposed model is trained and validated on sequential and representative &#26112;&#27648;ooding events including Hurricane Ike (2008), Memorial Day (2015), and Tax Day ( <ref type="formula">2016</ref>). Both point and spatial data of these events are processed in a 6-h sequence to effectively learn patterns over meaningful and practical time intervals without adding excessive complexity in the model. In this regard, the U.S. National Hurricane Center provides forecasts every 6 h that are eventually used to run physically-based models. We use a Bayesian optimization to &#26112;&#26880;rst select the best combination of hyperparameters after 50 tuning trials, while the model training for the best model is con&#26112;&#26880;gured for up to 300 epochs. We utilize the Adam optimizer with mean squared error as the loss function. The model validation is monitored with two critical callbacks (model checkpoint and early stopping). The model checkpoint saves the best model whenever the validation loss improves, ensuring that the &#26112;&#26880;nal model retains the best weights achieved during training. Early stopping halts the training process if the validation loss does not improve after 10 consecutive epochs, preventing over&#26112;&#26880;tting and ensuring that the best-performing model is saved. L2 regularization is also incorporated to prevent over&#26112;&#26880;tting by adding a penalty to the loss function based on the magnitude of the model's weights <ref type="bibr">(Xie et al., 2022)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Point feature processing</head><p>Hourly water level data from 21 observation stations provide localized temporal dynamics of the extreme &#26112;&#27648;ood events. The time-indexed water level data are loaded into structured arrays, with each array corresponding to a different observation station. For each station i, the time series data is processed through two sequential LSTM layers, with an attention mechanism that captures unique temporal patterns and computes the relative importance of each timestep at the observation stations <ref type="bibr">(Chaudhari et al., 2021)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>i. LSTM and hidden states:</head><p>The &#26112;&#26880;rst LSTM layer processes the input sequence, generating hidden states for each timestep (Equation ( <ref type="formula">1</ref>)). These hidden states serve as input to the second LSTM layer, which produces a &#26112;&#26880;nal hidden state that encapsulates the model's comprehensive understanding of the entire sequence (Equation ( <ref type="formula">2</ref>)) <ref type="bibr">(Chaudhari et al., 2021)</ref>. The hidden states at each timestep for both layers are computed as follows: </p><p>where h t-1,i is the hidden state from the previous timestep, x t,i is the input (water level data) at timestep t, h t,i is the hidden state at the current timestep, and h</p><p>(2)</p><p>T,i is the &#26112;&#26880;nal hidden state of the second LSTM layer for station i.</p><p>ii. Temporal attention mechanism: A unique custom attention mechanism is applied to each observation station. For each station i, the attention score e t,i at timestep t is computed using Equation (3):</p><p>where W and b are learnable parameters within the attention mechanism.</p><p>Rather than performing a simple dot product between the hidden states of LSTM layers, the attention mechanism increases the attention score of the highest attention weights to enhance the model's focus on the most critical timesteps (e.g., peak water level). The attention weights aw t,i , are calculated by applying a softmax function to the attention scores. This is then followed by amplifying the in&#26112;&#27648;uence of the top 10 % of the attention scores using an emphasis factor <ref type="bibr">(Daramola et al., 2025)</ref> (Equation ( <ref type="formula">4</ref>)):</p><p>Both the percentage and emphasis factor are iteratively explored until the best model performance is achieved. The process improves the model's ability to accurately capture magnitude and timing of extreme water levels <ref type="bibr">(Daramola et al., 2025)</ref>. The attention weights aw t,i are utilized to compute the attention vector av i in Equation (5). This vector represents a weighted sum of the LSTM hidden states across all timesteps <ref type="bibr">(Chaudhari et al., 2021)</ref>:</p><p>Since each hidden state is a vector, the attention vector will have the same dimensionality as the individual hidden states at all stations. Finally, the attention vectors are extracted and used to modulate the output of the spatial features processed by the ConvLSTM. It is important to note that each attention vector is not a single, &#26112;&#26880;xed value applied across the entire modeling period. Instead, it dynamically represents the importance of each timestep within a sequence of data, capturing the evolving conditions of the event.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Spatial feature processing</head><p>Spatial features, including atmospheric pressure, wind speed, DEM, precipitation, and water depth, are processed as 50-m resolution Geo-TIFF images at hourly timesteps. The DEM, which remains static for the duration of each event (e.g., often 1-week), is replicated to match the temporal resolution of the other features. These spatial data are loaded into structured arrays while maintaining the spatial coordinate reference system (CRS) and transformation metadata, which are critical for conducting a geospatial analysis. To address the varying scale among features, minmaxscaler normalization is applied. Subsequently, the input data for the ConvLSTM is stacked along the channel dimension, where each channel corresponds to a spatial feature (i.e., atmospheric pressure, wind speed, DEM, precipitation), with water depth designated as the target or predicted variable. A mask is used to track invalid cells containing NaN values within the spatial feature map. This mask is utilized in a custom loss function and during the spatial attention mechanism to ensure that missing data do not adversely affect the model's training (Fig. <ref type="figure">S2</ref> in supplementary material). After the minmaxscaler normalization, the values of valid cells across all features are scaled between 0.1 and 1.0, with all NaN cells set to 0. In the denormalization process, the predicted output is similarly masked, restoring the NaN values to ensure consistency with the original data structure.</p><p>i. Convolutional and recurrent operation: At each timestep t, the ConvLSTM applies convolutional &#26112;&#26880;lters to the input spatial data to extract local spatial features (Equation ( <ref type="formula">6</ref>)), using a kernel size of 3 &#215; 3 pixels (or 150 &#215; 150 m). The ConvLSTM updates its hidden states over time, like those of the LSTM (Equation ( <ref type="formula">7</ref>)), but it also processes spatial information through convolutions <ref type="bibr">(Gavahi et al., 2021</ref><ref type="bibr">(Gavahi et al., , 2023))</ref>. The convolution operation involves sliding a kernel (or &#26112;&#26880;lter) over the input feature maps to produce a new output <ref type="bibr">(Mu&#222; noz et al., 2021)</ref>. Speci&#26112;&#26880;cally, we use ConvLSTM consisting of three layers for balance of complexity and ef&#26112;&#26880;ciency (Table <ref type="table">1</ref>). Note that adding too many layers can lead to over&#26112;&#26880;tting or excessive computational cost.</p><p>Layer normalization is applied after the &#26112;&#26880;rst two ConvLSTM layers to stabilize the training process (Equation ( <ref type="formula">8</ref>)). Unlike batch normalization, it operates independently of batch size by normalizing across the features of a single timestep, making it more robust where memory constraints result into a smaller batch size. By treating each timestep independently, layer normalization ensures that the model effectively learns from the unique characteristics of each timestep.</p><p>where X</p><p>(l)</p><p>t is the output feature map of the l-th layer at timestep t, h</p><p>t are the unnormalized and normalized hidden state at timestep t in layer l, W (l) and b (l) are the &#26112;&#26880;lter weights and biases of the l-th convolutional layer, &#956; and &#963; 2 are the mean and variance of the activations in layer l, &#947; and &#946; are learnable scaling and shifting parameters, and &#1013; is a small constant for numerical stability.</p><p>ii. Spatial attention mechanism: To enhance the model's focus on &#26112;&#27648;ooded regions, a Convolutional Block Attention Module (CBAM) <ref type="bibr">(Woo et al., 2018)</ref> is integrated after each of the &#26112;&#26880;rst two ConvLSTM layers. The CBAM applies both channel (feature) and spatial attention modules. First, the channel attention module enables the model to emphasize relevant features effectively (see supplementary material). Then, the spatial attention module emphasizes relevant spatial locations by computing attention weights &#945; t (x, y) for each spatial location at timestep t (Equation ( <ref type="formula">9</ref>)), which is used to modulate the output feature map (Equation ( <ref type="formula">10</ref>)). The attention-modulated feature maps from both CBAM modules are averaged to produce the &#26112;&#26880;nal modulated feature map (Equation ( <ref type="formula">11</ref>)).</p><p>where X channel att t is the channel-attended feature map at location (x,y), * denotes convolution, X</p><p>(1),att t (x, y), and X</p><p>(2),att t (x, y) are the outputs from the &#26112;&#26880;rst and second CBAM modules, respectively.</p><p>The ConvLSTM model outputs these spatiotemporal feature maps that encode both the spatial patterns via convolutions as well as temporal dependencies via the recurrent updates. The output from the &#26112;&#26880;nal ConvLSTM layer (X T ) is &#26112;&#27648;attened into a vector (s) that contains all the spatial-temporal information learned by the ConvLSTM up to the &#26112;&#26880;nal timestep T.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>iii. Cluster-based temporal attention modulation:</head><p>The study domain is partitioned into clusters using Voronoi tessellation <ref type="bibr">(Fukami et al., 2021)</ref>, where the number of clusters (N) corresponds to number of observation stations (i 1 , i 2 , &#8230;, i N ) (Table <ref type="table">S1</ref> in the supplementary material). This segmentation enables the model to effectively capture the varying &#26112;&#27648;ood dynamics associated with each station and its immediate surrounding region (Fig. <ref type="figure">3</ref>). This assumption is based on the expectation that cells surrounding the stations within each cluster are likely to experience water level peaks at nearly the same time. For each observation station i, the Voronoi cluster V i is de&#26112;&#26880;ned as:</p><p>where p is a point in the spatial domain, R 2 refers to the 2-dimensional Euclidean space, 6 &#8901;6 is the Euclidean distance, s i is the location of station i, and s j are the locations of other stations. Binary cluster masks are created to identify the cells belonging to each Voronoi cluster (Equation ( <ref type="formula">13</ref>)). Using a mutual exclusivity technique, we ensure that spatial cells are associated with at most one cluster, preventing overlapping in&#26112;&#27648;uences. These masks are crucial for applying the temporal attention vectors on the spatial feature map in a localized manner during the model training.</p><p>iv. Applying localized temporal attention through selective assignment: After extracting the temporal attention vectors for each station, they are transformed and applied to modulate the spatial feature map (Table <ref type="table">S2</ref> in the supplementary material). This approach selectively assigns the attention vectors to modulate their speci&#26112;&#26880;c corresponding clusters, which could lead to spatially varying localized in&#26112;&#27648;uences. Each station provides an attention vector, denoted as av i , which captures key temporal patterns from water level dynamics (Equation ( <ref type="formula">5</ref>)). Recall that the individual attention vector is only within a cell in spatial feature map. Hence, each attention vector is passed through a fully connected (dense) layer that projects it into a 1D tensor, flattened vector i , having a length equal to the total number of cells in the spatial feature map (H &#215;W) (Equation ( <ref type="formula">14</ref>)). The flattened vector i is then reshaped into a 2D array, reshaped vector i , which aligns attention information with the grid-like shape of the spatial map (Equation ( <ref type="formula">15</ref>)).</p><p>)</p><p>The Voronoi cluster mask, mask i (x, y), corresponding to a speci&#26112;&#26880;c spatial region of each station is now used to con&#26112;&#26880;ne the in&#26112;&#27648;uence of the attention vector on each reshaped vector i into a localized vector i (Equation ( <ref type="formula">16</ref>)). This ensures that the localized vector i only has attention vector within the cluster, effectively masking out areas outside the cluster. As a result, only the values at points (x, y) within each cluster V i is retained. Lastly, a combined vector(x, y) is initialized with zero-values to match the shape of the spatial feature map. For each station i, we assign its localized vector i (x, y) to the corresponding cluster locations in combined vector(x, y) using a function (Equation ( <ref type="formula">17</ref>)). Since the Voronoi clusters are mutually exclusive, each cell in the spatial feature map is in&#26112;&#27648;uenced by exactly one attention vector.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>combined vector(x, y) =</head><p>{ localized vector i (x, y)</p><p>Modulating the ConvLSTM output: The combined vector(x, y) is used to modulate the ConvLSTM output s through element-wise multiplication (Equation ( <ref type="formula">18</ref>)). This modulation emphasizes or deemphasizes the ConvLSTM output based on high or low learned attention weights <ref type="bibr">(Chenmin et al., 2024)</ref> across the spatial regions, respectively.</p><p>modulated output(x, y) = s(x, y) &#215; combined vector(x, y) (</p><p>The modulated output is &#26112;&#27648;attened and passed through fully connected layers z to generate the &#26112;&#26880;nal output y (Equation ( <ref type="formula">19</ref>)), which is then reshaped back to the spatial feature map's dimensions (Equation ( <ref type="formula">20</ref>)):</p><p>The proposed framework offers a signi&#26112;&#26880;cant advancement in modeling the spatiotemporal complexities of cyclone-induced &#26112;&#27648;ood dynamics, surpassing the capabilities of conventional ConvLSTM models and existing hybrid approaches (Fig. <ref type="figure">4</ref>). While conventional ConvLSTM models effectively capture general spatiotemporal patterns, they are not suitable to account for the region-speci&#26112;&#26880;c temporal variability and spatial heterogeneity inherent in &#26112;&#27648;ood events. This limitation becomes particularly evident in large and/or diverse regions, such as Galveston Bay, where &#26112;&#27648;ood propagation and peak water level timing exhibit substantial variation between coastal, transition, and inland zones. To overcome these shortcomings, our approach incorporates a novel cluster-based temporal attention mechanism. This framework partitions the spatial domain into distinct clusters, assigning localized water level dynamics to each cluster. By aligning temporal observations with region-speci&#26112;&#26880;c spatial predictions, the model accurately captures the varying timing of peak water levels across extensive areas. Moreover, the proposed framework leverages dynamic inputs that evolve over time to provide comprehensive spatial and temporal information. In contrast, many hybrid models rely heavily on static features, such as elevation, &#26112;&#27648;oodplain, land use, etc., to ensure spatial consistency <ref type="bibr">(Farahmand et al., 2023;</ref><ref type="bibr">Mu&#222; noz et al., 2024;</ref><ref type="bibr">Valle-Levinson et al., 2020)</ref>, often at the expense of capturing temporally varying &#26112;&#27648;ood behaviors. This innovation enables the model to precisely replicate the varied timing of peak water levels across extensive spatial domains, potentially improving its robustness and predictive accuracy over conventional ConvLSTM and hybrid methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3.">Model prediction and analysis</head><p>We evaluate the trained model's performance on unseen data to assess its ability to generalize and make accurate predictions of water depth and extent during &#26112;&#27648;ooding events. Testing is conducted using process-based model simulations of Hurricane Harvey (2017), Hurricane Nicholas ( <ref type="formula">2021</ref>) and Hurricane Beryl (2024). The input and output feature processing follow the same steps as in the model training, ensuring consistency in data handling. The cluster-based training approach and the application of localized attention through selective assignment are already built into the model. Nevertheless, the accuracy of the prediction relies on using the attention model to extract attention vectors from the test data (Equations ( <ref type="formula">21</ref>) and ( <ref type="formula">22</ref>)). This provides insight into how the model weighs different timesteps from the test events. This cluster-based attention mechanism allows the model to focus on different spatial regions during prediction, providing more re&#26112;&#26880;ned, localized predictions of inundation depth and &#26112;&#27648;ood extent.</p><p>vectors = attention model ( water level test sequences ) (21) y pred = best model ( X test sequences , water level test sequences )</p><p>Recommended performance metrics for predicting inundation depth are computed for each &#26112;&#27648;ood event, including the Root-Mean Square Error (RMSE), coef&#26112;&#26880;cient of determination (R 2 ), Kling-Gupta Ef&#26112;&#26880;ciency (KGE) and Nash-Sutcliffe Ef&#26112;&#26880;ciency (NSE) <ref type="bibr">(Gupta et al., 2009;</ref><ref type="bibr">Mahakur et al., 2025;</ref><ref type="bibr">Nash and Sutcliffe, 1970;</ref><ref type="bibr">Samantaray et al., 2025)</ref>. In addition, hit rate (H), false alarm ratio (F), critical success index (C), and error bias (E) are used to compute the accuracy of &#26112;&#27648;ood map replication <ref type="bibr">(Wing et al., 2022)</ref>. H measures the ratio of correctly predicted &#26112;&#27648;ood instances (true positives) to the total number of actual &#26112;&#27648;ood instances, indicating the model's ability to identify &#26112;&#27648;ooding when they occur. F indicates the proportion of false positives among all instances where the model predicted a &#26112;&#27648;ood, measuring how often the model incorrectly predicts &#26112;&#27648;ooding when there is none. C assesses the proportion of correctly predicted &#26112;&#27648;ood events by considering all hits, false alarms, and misses, offering a comprehensive measure of the model's accuracy in predicting &#26112;&#27648;oods. E re&#26112;&#27648;ects the ratio of predicted &#26112;&#27648;ood instances (both true positives and false positives) to actual &#26112;&#27648;ood instances (both true positives and false negatives), revealing whether the model has a tendency to overpredict or underpredict &#26112;&#27648;ooding. Best results are closer to 1 except for F, which should be closer to 0. We calculated the residuals by taking the direct difference between the actual and predicted &#26112;&#27648;ood maps of the water depth to evaluate prediction errors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results and discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Hyperparameter tuning process for the best model</head><p>The Bayesian optimization process systematically searches the hyperparameter space across 50 trials, successfully identifying con&#26112;&#26880;gurations that minimize the validation loss. This approach iteratively tunes the hyperparameter values based on prior trial outcomes. For each trial, the optimal model is derived using the Adam optimizer with mean squared error as the loss function, achieving convergence within the 300-epoch limit. The model checkpointing callback saves the best model weights based on the lowest validation loss achieved during training, and the early stopping callback terminates training after 10 consecutive epochs without improvement in validation loss. This approach resulted in models that not only achieved the lowest validation loss per trial but also demonstrated strong generalization to unseen data, effectively capturing the underlying patterns in the training set. The consistent improvement in validation loss across trials demonstrates the effectiveness of this method in identifying a robust model con&#26112;&#26880;guration. Fig. <ref type="figure">5</ref> illustrates the convergence of the optimization process, showcasing the performance of the optimal hyperparameters (i.e., models) obtained across all 50 trials.</p><p>Most trials exhibit satisfactory training and validation loss curves, with rapid decreases during the initial epochs that stabilize near-zero around 50 epochs (Fig. <ref type="figure">4a</ref> and <ref type="figure">b</ref>). Generally, trials with steeper initial declines and lower asymptotic losses tend to converge better and may have less risk of over&#26112;&#26880;tting <ref type="bibr">(Singh et al., 2024)</ref>. Most trials also stop early, around 40 to 50 epochs (Fig. <ref type="figure">5c</ref>), suggesting they reach optimal validation loss relatively quickly, though a few continue up to 200 and 300 epochs, indicating more complex convergence patterns. The validation losses of most trials fall within a narrow range, with a signi&#26112;&#26880;cant concentration between 5E-5 and 1E-4 m 2 (Fig. <ref type="figure">5d</ref>). The trial with the lowest validation loss (Trial 11) is achieved after 50 epochs (Fig. <ref type="figure">5e</ref>). In this trial, the training-to-validation loss ratio initially oscillates above 1 in the early epochs (Fig. <ref type="figure">5f</ref>), signaling some expected early divergence between the losses. However, this ratio stabilizes around epoch 10, trending closer to 1, which suggests a balanced performance between training and validation. Therefore, all trial ratios trending towards 1 underscore the model's consistency in generalizing across datasets (Fig. <ref type="figure">5f</ref>), while ratios below 1 indicate over&#26112;&#26880;tting. Finally, we apply all models and evaluate their ability to accurately capture &#26112;&#27648;ood events.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Attention modulation</head><p>The attention vectors extracted by the LSTM inform how well the model integrates the in&#26112;&#27648;uence of water level variability in predicting &#26112;&#27648;ood dynamics across the study area. Timesteps of high-water level variability (or abrupt changes) receive higher weights, indicating potential &#26112;&#27648;ooding around the observation stations, while lower magnitudes suggest reduced &#26112;&#27648;ood risk (Fig. <ref type="figure">6</ref>). Based on the modulation approach, attention weights from individual stations are projected over the clusters containing those stations, allowing distinct and varying in-&#26112;&#27648;uences that correlate with the station's water levels. For instance, at timestep 130 h, a higher attention weight is applied to the cluster associated with station 4 (Galveston Bay Entrance, TX -NOAA 8771341) since the peak water level for Hurricane Ike occurs approximately 5 h earlier than at station 18 (Manchester, TX -NOAA 8770777) (Fig. <ref type="figure">6a</ref>). However, attention weights for both station clusters at timestep 188 h during the Memorial Day &#26112;&#27648;ooding are quite similar, re&#26112;&#27648;ecting the relatively low water levels at that time (Fig. <ref type="figure">6b</ref>). This approach effectively accounts for lag-times in &#26112;&#27648;ood dynamics across the larger domain, enhancing the model's ability to predict regional &#26112;&#27648;ood variations accurately. It is important to note that the spatiotemporal analysis is conducted at the cell-size scale using the ConvLSTM model. Furthermore, the attention vector does not imply uniform &#26112;&#27648;ooding across the entire cluster; rather, its in&#26112;&#27648;uence is applied to the spatially varied S. <ref type="bibr">Daramola et al. Environmental Modelling and Software 191 (2025)</ref> 106499 values per timestep already estimated at individual cells by the ConvLSTM model. This means that each individual cell in the spatial map has a calculated value that varies across cells and at each timestep. The attention vector, although uniform over its cluster, emphasizes these varying values per cell with information about the events' evolution.</p><p>In addition, we compare the proposed framework with a model applying an attention vector to the entire domain to highlight its ability to accurately account for the timing differences in &#26112;&#27648;ood dynamics over a large domain. The single attention vector re&#26112;&#27648;ects the combined in&#26112;&#27648;uence of all observation stations. The cell with the highest actual peak water depth across all timesteps within each cluster is selected and compared with its predicted data to observe any lag-or lead-times. Results of this analysis show a signi&#26112;&#26880;cant lag-time of about 5 h for the majority of clusters and a lead-time of about 2 h for the clusters when a single attention vector is used. However, correct peak timing is observed in two clusters without a color shade (Fig. <ref type="figure">7a</ref>), suggesting where the maximum attention weight aligns with the peak &#26112;&#27648;ood timing. This indicates that averaging attention across stations may only capture &#26112;&#27648;ood dynamics correctly in clusters where timing coincides by chance. In contrast, incorporating cluster-based application of attention vectors demonstrates a signi&#26112;&#26880;cantly improved match with the actual &#26112;&#27648;ood timing, with only 5 clusters within 1 h lead-and lag-times (Fig. <ref type="figure">7b</ref>).</p><p>In a previous study, we demonstrated that deep learning models can  be signi&#26112;&#26880;cantly enhanced by incorporating an attention mechanism <ref type="bibr">(Daramola et al., 2025)</ref>. This approach helps emphasize indicators that signal the presence and severity of &#26112;&#27648;ooding events, rather than relying solely on repetitive patterns in training datasets or overlapping characteristics among various &#26112;&#27648;ood drivers. Consequently, attention mechanisms were integrated into the architecture of the proposed model in this study to ensure its effectiveness during both the training and testing phases. Given the model's satisfactory performance, this technique is shown to bolster its ability to effectively identify &#26112;&#27648;ood patterns across space and time. This improvement enables the proposed model to generalize effectively, indicating that the framework is robust and can be applied to other regions with varying distributions of observation stations or diverse hydrodynamic conditions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Deep learning predictions and process-based model simulations</head><p>Most of the DL models indicate that the proposed framework achieves satisfactory accuracy in predicting &#26112;&#27648;ood dynamics in terms of &#26112;&#27648;ood extent and inundation depth (Fig. <ref type="figure">S4</ref> in the supplementary material). To showcase the model's ability, we present its performance for predicting the peak &#26112;&#27648;ood depth maps of the three events. For Hurricane Beryl (Fig. <ref type="figure">8</ref>), the peak &#26112;&#27648;ood depth occurred on July 8, 2024, corresponding to the timestep with the maximum RMSE value. The best model slightly overpredicts &#26112;&#27648;ood depth in coastal areas and underpredicts in inland areas, resulting in an overall slight underprediction with an error bias (E) lesser than 1. The false alarm ratio (F) identi&#26112;&#26880;es only 14 % of non-&#26112;&#27648;ooded areas are falsely classi&#26112;&#26880;ed as &#26112;&#27648;ooded, with a critical success index (C) over 70 % of &#26112;&#27648;ood maps (percentage of accurately predicted depth and extent). The model also achieved a high accuracy in identifying &#26112;&#27648;ooded areas, with a hit rate (H) of 80 %. For the evolution of &#26112;&#27648;ood depth in selected areas, the model demonstrates satisfactory performance across all timesteps with KGE, NSE, and R 2 metrics over 0.70.</p><p>At the peak &#26112;&#27648;ooding on August 29, 2017, for Hurricane Harvey, the best model tends towards slight overprediction of the &#26112;&#27648;ood map, with an error bias (E) greater than 1 (Fig. <ref type="figure">S10a</ref> and <ref type="figure">b</ref>, supplementary material), especially along the coast. However, &#26112;&#27648;ood depth is underestimated in the inland areas (Fig. <ref type="figure">S10c</ref>). Similar to Hurricane Beryl, the &#26112;&#27648;ood map has the maximum RMSE value. Nevertheless, C is over 60 %, indicating that most of the &#26112;&#27648;ood map's depth and extent are accurately predicted. Additionally, over 80 % (H) of the &#26112;&#27648;ooded areas are correctly identi&#26112;&#26880;ed. F is below 30 %, which means that the model is less likely to classify non-&#26112;&#27648;ooded areas as &#26112;&#27648;ooded. Similarly, the model demonstrates satisfactory performance across all timesteps with KGE, NSE metrics over 0.70 for the evolution of &#26112;&#27648;ood depth at selected areas. Just like the &#26112;&#26880;rst two results, the time of peak &#26112;&#27648;ooding (on September 14, 2021) corresponds to the timestep with the maximum RSME for Hurricane Nicholas. The best model tends to slightly underestimate &#26112;&#27648;ooding in the inland region while overestimating &#26112;&#27648;ood depth along the coast especially at the eastern area of the model domain (Fig. <ref type="figure">S11a</ref> and <ref type="figure">b</ref>, supplementary material). The model accurately predicts most of &#26112;&#27648;ood map's depth and extent with a C of 64 %, while correctly identifying &#26112;&#27648;ooded areas with 80 % accuracy (H). F is below 25 %, i.e., low tendency to incorrectly classify non-&#26112;&#27648;ooded areas as &#26112;&#27648;ooded.</p><p>Regarding the peak &#26112;&#27648;ood maps in Fig. <ref type="figure">7</ref>, S10 and S11, the underestimation in inland areas is due to the omission of many interconnected rivers that could contribute to &#26112;&#27648;ooding during these events. In contrast, the overestimation in coastal areas is caused by high attention vectors projected onto the clusters, as the associated stations are exposed to the ocean. Nevertheless, the model performances metrics are satisfactory for all &#26112;&#27648;ood events and across all clusters (Fig. <ref type="figure">9</ref>, S12 and S13). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Limitation and future work</head><p>The model utilizes a physically-based model simulation as the "ground truth" &#26112;&#27648;ood extent and inundation depth during the compound &#26112;&#27648;ood events. However, these model simulations slightly differ from actual observations introducing errors despite the thorough model calibration <ref type="bibr">(Mu&#222; noz et al., 2024)</ref>. The Delft3D-FM model is subject to several sources of uncertainty, including the initial condition, the forcing (or boundary) conditions, model parameters, and model structure <ref type="bibr">(Abbaszadeh et al., 2022;</ref><ref type="bibr">Mu&#222; noz et al., 2022)</ref>. Additionally, there may exist interpolation errors when resampling the spatially varying cell sizes of Delft3D-FM to a uniform 50-m DL model input's cell size. The resampling procedure was needed to enable correct model training and validation of the CNN architecture. Next, river discharges recorded at upstream river-gauge stations are applied along the river branches and used as an input feature in the training process. Ignoring the lag-time in peak &#26112;&#27648;ow that may occur along the river branches might contribute to prediction errors, especially in large river deltas, estuaries, and bays. These uncertainties in the Delft3D-FM outputs can propagate to the DL model and affect its performance evaluation due to inherited biases, evident by the slight over-and underestimation of the comparison plots (Figs. S3-S6 in the supplementary material).</p><p>While the cluster-based attention vector for water level variability can account for the lag-time, some clusters may encompass areas with slight time variations, potentially affecting the model's accuracy. Nevertheless, all cells contained in each cluster are still spatially varying in magnitude, even though they follow the same timing pattern for the attention vector modulation during an event's evolution. When the average timing of all cells within the cluster is estimated, all the clusters have lead-and lag-times between 0 to approximately 2 h with respect to the actual peak timing. In other words, the more observation points for water levels, the more spatial timing variation will be captured; leading to a substantial improvement of the model's accuracy. Although incorporating attention mechanisms emphasizes certain weights and enhances generalization, some models produce straight-line predictions across all time steps due to L2 regularization effects in the architecture that penalize large weights (Figs. <ref type="figure">S7-S9</ref>). The latter occurs because a higher regularization coef&#26112;&#26880;cient can overly smooth the attention emphasis during predictions, preventing the model from capturing rapid &#26112;&#27648;uctuations or noise in the data. Therefore, it is crucial to balance the strength of regularization technique within the architecture to enhance generalization without inducing under&#26112;&#26880;tting. To further reduce the error between model predictions and actual observations, future work will focus on introducing residual learning techniques <ref type="bibr">(Tedesco et al., 2024;</ref><ref type="bibr">Zou et al., 2023)</ref>. Residual learning enables the model to concentrate on learning the differences or residuals between its predictions and the true observations rather than attempting to model the entire mapping directly.</p><p>The performance metrics used in this study also have limitations. For example, RMSE is highly sensitive to large errors, so extreme &#26112;&#27648;ood events can disproportionately in&#26112;&#27648;ate its value, potentially skewing perceptions of the model's overall performance. R 2 indicates explained variance but does not account for prediction bias, which is crucial for accurate &#26112;&#27648;ood depth estimation. KGE and NSE, which are standard metrics in hydrology, help capture these biases during extreme events. Metrics like H, F, and CSI effectively assess &#26112;&#27648;ood extent but may overlook the magnitude of depth predictions. E, however, is particularly signi&#26112;&#26880;cant for extreme events because it captures even small systematic over-and underpredictions. Despite their limitations, these metrics are used in a complementary manner and remain the most valid metrics for &#26112;&#27648;ood analysis. Future studies could consider comparison with other state-of-the-art DL architectures and explore additional metrics or weighting schemes that prioritize performance during extreme conditions. The latter will ensure a balanced evaluation across all &#26112;&#27648;ood scenarios.  and <ref type="table">E are 0 m</ref>, <ref type="table">1</ref>, <ref type="table">1</ref>, <ref type="table">0</ref>, <ref type="table">1</ref>, and <ref type="table">1</ref>, <ref type="table">respectively.</ref> S. <ref type="bibr">Daramola et al. Environmental Modelling and Software 191 (2025)</ref> 106499</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In this study, we developed a DL framework to address the complexities of cyclone-induced compound &#26112;&#27648;ood dynamics in Galveston Bay, TX. The framework combined LSTM and ConvLSTM model architectures, integrating spatial and temporal attention mechanisms to enhance the model's ability to capture nonlinear associations among &#26112;&#27648;ood drivers such as precipitation, river discharge, and storm surge. By leveraging output data from a coastal hydrodynamic model (Delft3D-FM) and incorporating cluster-based attention vectors, the framework achieved high accuracy in replicating &#26112;&#27648;ood dynamics in terms of inundation depth and &#26112;&#27648;ood extent. Importantly, it accounts for the hysteresis in system behavior, leading accurate timing of &#26112;&#27648;ood dynamics, particularly the propagation of &#26112;&#27648;oods across a large domain, caused by extreme cyclonic events. We also show the capability of the DL framework to handle the spatial variability of dominant &#26112;&#27648;ood drivers in coastal (storm surge) and inland areas (rainfall-runoff). This in turn demonstrates the potential of coupling DL architectures to enhance the prediction accuracy of cyclone-induced compound &#26112;&#27648;ood dynamics, speci&#26112;&#26880;cally in complex coastal systems.</p><p>The model demonstrates a satisfactory performance in identifying &#26112;&#27648;ooded areas, achieving an average hit rate of over 80 % while maintaining average false alarm rate below 25 %, even with the large domain size and resolution. Average KGE and NSE for the prediction of all events evolution are above 0.7. The study demonstrates that applying clusterbased temporal attention allows the model to focus selectively on different spatial regions, becoming more effective at capturing localized timing variations in &#26112;&#27648;ood dynamics across a large domain. Despite these performances, incorporating more water level observation points and discharge observations along river channels could enhance spatial variability and improve the model's accuracy. The proposed coupled DL model holds promise for disaster preparedness and &#26112;&#27648;ood mitigation in vulnerable coastal regions, offering a scalable approach adaptable to other cyclone-prone areas.</p></div></body>
		</text>
</TEI>
