<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Reconstructing Digital Terrain Models from ArcticDEM and WorldView-2 Imagery in Livengood, Alaska</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>04/01/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10421190</idno>
					<idno type="doi">10.3390/rs15082061</idno>
					<title level='j'>Remote Sensing</title>
<idno>2072-4292</idno>
<biblScope unit="volume">15</biblScope>
<biblScope unit="issue">8</biblScope>					

					<author>Tianqi Zhang</author><author>Desheng Liu</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[ArcticDEM provides the public with an unprecedented opportunity to access very high-spatial resolution digital elevation models (DEMs) covering the pan-Arctic surfaces. As it is generated from stereo-pairs of optical satellite imagery, ArcticDEM represents a mixture of a digital surface model (DSM) over a non-ground areas and digital terrain model (DTM) at bare grounds. Reconstructing DTM from ArcticDEM is thus needed in studies requiring bare ground elevation, such as modeling hydrological processes, tracking surface change dynamics, and estimating vegetation canopy height and associated forest attributes. Here we proposed an automated approach for estimating DTM from ArcticDEM in two steps: (1) identifying ground pixels from WorldView-2 imagery using a Gaussian mixture model (GMM) with local refinement by morphological operation, and (2) generating a continuous DTM surface using ArcticDEMs at ground locations and spatial interpolation methods (ordinary kriging (OK) and natural neighbor (NN)). We evaluated our method at three forested study sites characterized by different canopy cover and topographic conditions in Livengood, Alaska, where airborne lidar data is available for validation. Our results demonstrate that (1) the proposed ground identification method can effectively identify ground pixels with much lower root mean square errors (RMSEs) (<0.35 m) to the reference data than the comparative state-of-the-art approaches; (2) NN performs more robustly in DTM interpolation than OK; (3) the DTMs generated from NN interpolation with GMM-based ground masks decrease the RMSEs of ArcticDEM to 0.648 m, 1.677 m, and 0.521 m for Site-1, Site-2, and Site-3, respectively. This study provides a viable means of deriving high-resolution DTM from ArcticDEM that will be of great value to studies focusing on the Arctic ecosystems, forest change dynamics, and earth surface processes.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Digital elevation models (DEMs) are 3D representations of the Earth's surface that are fundamental to many scientific studies, including topographic analysis <ref type="bibr">[1,</ref><ref type="bibr">2]</ref>, tracking surface deformation <ref type="bibr">[3,</ref><ref type="bibr">4]</ref>, and detecting soil volume changes triggered by geohazards <ref type="bibr">[5,</ref><ref type="bibr">6]</ref>. DEM can be further divided into digital terrain model (DTM) and digital surface model (DSM), with the former excluding any above-ground objects and reflecting only bare ground elevation information <ref type="bibr">[7]</ref>. The differences between DSMs and DTMs deliver critical information for monitoring changes in canopy height (canopy height model, CHM) <ref type="bibr">[8,</ref><ref type="bibr">9]</ref>, glaciers/snow cover depth <ref type="bibr">[10,</ref><ref type="bibr">11]</ref> or urban building damages caused by earthquakes or floods <ref type="bibr">[12,</ref><ref type="bibr">13]</ref>.</p><p>ArcticDEM is a time-dependent collection of high-resolution DEMs generated from very-high-resolution (meter-to-submeter) optical stereoscopy acquired from WorldView (WV)-1/2/3 and GeoEye-1 satellites over all land areas north of 60 &#8226; N, i.e., the pan-Arctic region <ref type="bibr">[14]</ref>. ArcticDEM has been used in scientific investigations about surface change dynamics of glaciers <ref type="bibr">[15]</ref>, deposit thickness of volcanic eruptions <ref type="bibr">[16,</ref><ref type="bibr">17]</ref>, vegetation biomass <ref type="bibr">[18]</ref>, and river surface heights <ref type="bibr">[19]</ref>. However, as a DEM product generated from optical stereoscopy, ArcticDEM mainly reflects elevation information at the surface top of non-ground objects (i.e., DSMs) rather than bare ground elevation (i.e., DTMs) in vegetated areas. For this reason, its current applications in DTM-or CHM-based studies, such as ecological analysis and hydrological modeling, remain under-explored. Recently, <ref type="bibr">Meddens et al. (2018)</ref> developed a 5 m resolution CHM called local ArcticCHM from ArcticDEM using a random forest (RF) regression model calibrated by airborne lidar-derived canopy height for three study sites in Alaska <ref type="bibr">[8]</ref>. The estimated canopy height was then subtracted from the Arctic-DEM to generate a 5 m resolution DTM called local ArcticDTM. While the local ArcticDTM achieves significant improvement over the ArcticDEM compared to lidar-derived DTM, their supervised modeling scheme is dependent on the availability of airborne lidar-derived canopy height. However, airborne lidar collections are not available in most Arctic regions. Therefore, it is necessary to develop a new approach to estimate DTM from ArcticDEM with no reliance on lidar data.</p><p>This study primarily focuses on optical stereoscopy based DTM estimation. In this context, numerous attempts have been made to estimate DTMs or normalized DSMs (i.e., DSM-DTM) directly from DSMs generated from optical stereoscopy. Techniques in this regard generally follow two steps: (1) ground/non-ground separation, and (2) DTM interpolation of non-ground areas based on identified ground observations. Previous studies have investigated several approaches for identifying ground/non-ground points. For instance, Xiao et al. (2019) located terrain points from WV-2/3 stereo images by adopting a cloth simulation filtering (CSF) algorithm <ref type="bibr">[20]</ref>, which was initially proposed by <ref type="bibr">Zhang et al. (2016)</ref> in reconstructing DTMs from lidar point clouds <ref type="bibr">[21]</ref>. A recent study <ref type="bibr">[22]</ref> shows that CSF achieves higher accuracies in identifying ground areas from unmanned aerial vehicle-based point clouds than several other lidar-based point cloud filtering approaches, e.g., curvaturebased multiscale curvature classification (MCC) <ref type="bibr">[23]</ref>, surface-based filtering (FUSION software, Version 4.40), and progressive triangulated irregular network (TIN)-based (Las-Tool software, Version 1.4) <ref type="bibr">[24]</ref> methods. <ref type="bibr">Perko et al. (2019)</ref> extended the multi-directional ground filtering method by <ref type="bibr">Meng et al. (2009)</ref>  <ref type="bibr">[25]</ref> to a slope-dependent version (hereinafter referred to as MSD) for searching ground candidates from tri-stereo Pl&#233;iades images <ref type="bibr">[26]</ref>. To outline the non-ground objects from WV-2 stereo images, <ref type="bibr">&#214;zcan et al. (2018)</ref> employed a morphologically based ground filtering method (hereinafter referred to as MBG) by segmenting small non-ground patches based on 'seeds' identified by a Canny edge detector, then progressively expanding these small segments until reaching clear boundaries <ref type="bibr">[27]</ref>. MBG is found to outperform several commonly applied morphological filters, including simple morphological filters (SMRF) <ref type="bibr">[28]</ref> and the filtering method embedded in gLiDAR software (<ref type="url">https://github.com/translunar/glidar</ref>) <ref type="bibr">[29]</ref>, in separating non-ground objects from lidar-based DSMs <ref type="bibr">[27]</ref>. <ref type="bibr">Tian et al. (2014)</ref> used RF to classify the study region into different ground and non-ground classes based on multi-level texture features extracted from Cartosat-1 stereo images <ref type="bibr">[30]</ref>. Noticeably, though indicating good capabilities in separating ground and non-ground observations, these ground filtering methods were mainly applied in urban scenarios. In addition, CSF <ref type="bibr">[21]</ref>, MSD <ref type="bibr">[26]</ref>, and MBG <ref type="bibr">[27]</ref> were initially designed for lidar-based DTM generation. However, it is worth mentioning that in optical imagery, geometric occlusion <ref type="bibr">[31]</ref> or low-contrast surfaces <ref type="bibr">[14]</ref> largely prevent effective key-point matching and ground detection, especially under dense canopies. As a result, optical stereoscopy-derived DSMs only reflect elevation information about the top of canopies and fail to capture fine-scale details, in contrast to the lidar-derived counterparts <ref type="bibr">[31]</ref>. Whether the above-mentioned lidar-based ground filtering techniques could achieve similar performance in extracting reliable ground points from optical imagery in forested regions remains to be examined. After ground identification, previous studies often adopted one or several spatial interpolation techniques on regional DTM generation based on the identified ground points, such as inverse distance weighting <ref type="bibr">[32]</ref><ref type="bibr">[33]</ref><ref type="bibr">[34]</ref><ref type="bibr">[35]</ref>, kriging <ref type="bibr">[32,</ref><ref type="bibr">36]</ref>, spline <ref type="bibr">[36]</ref><ref type="bibr">[37]</ref><ref type="bibr">[38]</ref>, TIN-based methods <ref type="bibr">[32,</ref><ref type="bibr">36]</ref>, and image inpainting <ref type="bibr">[27]</ref>. Although the performances of spatial interpolation methods on DTM interpolation have been examined in many studies <ref type="bibr">[34,</ref><ref type="bibr">36,</ref><ref type="bibr">37,</ref><ref type="bibr">39,</ref><ref type="bibr">40]</ref>, most of the investigations were primarily based on lidar data. How they perform in optical stereoscopy based DTM interpolation remains unclear.</p><p>The objectives of this study are twofold. Firstly, we develop an automated ground identification approach by integrating an unsupervised and probabilistic clustering method, namely a Gaussian mixture model, with a locally adjusted morphological operation to pinpoint high-confidence ground pixels from ArcticDEMs at three vegetated study sites characterized by different canopy cover and topographic conditions in Livengood, Alaska. By doing so, we aim to evaluate and compare the capability of the proposed algorithm in correctly identifying ground observations under different scenarios with the three abovementioned filtering-based methods, i.e., CSF, MSD, and MBG. Secondly, we also fully assess the robustness and consistency of two commonly adopted spatial interpolation techniques on optical stereoscopy based DTM interpolation. The novelty of this study lies in that it is the first time to (1) estimate DTMs from ArcticDEM in forested regions with no reliance on lidar observations, and (2) introduce a ground identification model framework that combines a clustering technique (GMM) with uncertainty information encoded with a locally adjusted morphological operation for improving the quality of the extracted ground masks. Though designed for ArcticDEM, the proposed model framework relies on optical stereoscopy and its generated DSMs and hence could be transferred to any other forested regions besides the Arctic region.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology and Materials</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Study Sites and Data</head><p>Our study sites (Site-1, Site-2, Site-3) are located in the forested landscapes of Livengood, Alaska (Figure <ref type="figure">1</ref>), covering an area of 360,000 m 2 individually. Vegetation compositions of these sites are characterized by a similar mixture of white (Picea glauca), black spruce (Betula neoalaskana), and deciduous forests such as birch (Betula neoalaskana) and aspen (Populus tremuloides) <ref type="bibr">[41]</ref> (Alaska Vegetation and Wetland Composite map, accessed on 1 February 2023). The three study sites have varying levels of canopy cover densities: low at Site-3, medium at Site-1, and high at Site-2 with a large forest patch, with elevations ranging from 296-364 m (Site-1) and 245-327 m (Site-2) to 260-317 m (Site-3) above sea level, respectively. Noticeably, there exists a drastic elevation change in the midst of Site-2 caused by river incision (Figures <ref type="figure">1</ref> and<ref type="figure">2</ref>). These site distinctions provide a good testbed for understanding the performance of DTM prediction algorithms under different topographic and canopy cover conditions.</p><p>Remote Sens. 2023, 15, x FOR PEER REVIEW 3 of 2 <ref type="bibr">[32,</ref><ref type="bibr">36]</ref>, and image inpainting <ref type="bibr">[27]</ref>. Although the performances of spatial interpolatio methods on DTM interpolation have been examined in many studies <ref type="bibr">[34,</ref><ref type="bibr">36,</ref><ref type="bibr">37,</ref><ref type="bibr">39,</ref><ref type="bibr">40</ref> most of the investigations were primarily based on lidar data. How they perform in optica stereoscopy based DTM interpolation remains unclear.</p><p>The objectives of this study are twofold. Firstly, we develop an automated ground identification approach by integrating an unsupervised and probabilistic clusterin method, namely a Gaussian mixture model, with a locally adjusted morphological opera tion to pinpoint high-confidence ground pixels from ArcticDEMs at three vegetated stud sites characterized by different canopy cover and topographic conditions in Livengood Alaska. By doing so, we aim to evaluate and compare the capability of the proposed algo rithm in correctly identifying ground observations under different scenarios with th three above-mentioned filtering-based methods, i.e., CSF, MSD, and MBG. Secondly, w also fully assess the robustness and consistency of two commonly adopted spatial inter polation techniques on optical stereoscopy based DTM interpolation. The novelty of thi study lies in that it is the first time to (1) estimate DTMs from ArcticDEM in forested re gions with no reliance on lidar observations, and (2) introduce a ground identificatio model framework that combines a clustering technique (GMM) with uncertainty infor mation encoded with a locally adjusted morphological operation for improving the qual ity of the extracted ground masks. Though designed for ArcticDEM, the proposed mode framework relies on optical stereoscopy and its generated DSMs and hence could be trans ferred to any other forested regions besides the Arctic region.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology and Materials</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Study Sites and Data</head><p>Our study sites (Site-1, Site-2, Site-3) are located in the forested landscapes of Liven good, Alaska (Figure <ref type="figure">1</ref>), covering an area of 360,000 m 2 individually. Vegetation composi tions of these sites are characterized by a similar mixture of white (Picea glauca), blac spruce (Betula neoalaskana), and deciduous forests such as birch (Betula neoalaskana) and aspen (Populus tremuloides) <ref type="bibr">[41]</ref> (Alaska Vegetation and Wetland Composite map, accessed on 1 February 2023). The three study sites have varying levels of canopy cover densities low at Site-3, medium at Site-1, and high at Site-2 with a large forest patch, with elevation ranging from 296-364 m (Site-1) and 245-327 m (Site-2) to 260-317 m (Site-3) above se level, respectively. Noticeably, there exists a drastic elevation change in the midst of Site 2 caused by river incision (Figures <ref type="figure">1</ref> and<ref type="figure">2</ref>). These site distinctions provide a good testbed for understanding the performance of DTM prediction algorithms under different topo graphic and canopy cover conditions.  ArcticDEM products include 2 m resolution ArcticDEM strips and 5 m resolution ArcticDEM tiles mosaicked from the best-quality pixels in ArcticDEM strip files <ref type="bibr">[14,</ref><ref type="bibr">42]</ref>. This study used a 2 m ArcticDEM strip generated from a stereo-pair of cloud-free WV-2 satellite imagery acquired in June 2016. In addition to ArcticDEM, we also acquired one WV-2 image from the original stereo-pair to utilize the multispectral information in the following ground identification step. Moreover, given that the comparative method CSF is a point cloud-based filter <ref type="bibr">[21]</ref>, we requested the mask file encoding the information of points matched between the source optical stereo-images <ref type="bibr">[14]</ref> for generating the point clouds for CSF filtering. The reference DTM (1 m) used for model evaluation was obtained from airborne laser scanning (ALS) point clouds collected in 2011 by a commercial lidar data vendor commissioned by the Alaska Division of Geological and Geophysical Surveys (Delivery 6 <ref type="bibr">[43]</ref>). The lidar data have an average point density &gt;8 points per m 2 with a vertical accuracy of 0.05 m root mean squared error (RMSE).</p><p>To ensure spatial consistency across different data sources, we applied a bilinear resampling approach to upscale lidar-derived DTM (1 m) and WV-2 multispectral bands (~1.92 m) to 2 m matching the ArcticDEM's resolution and reprojected all data sources (via ArcticDEM products include 2 m resolution ArcticDEM strips and 5 m resolution ArcticDEM tiles mosaicked from the best-quality pixels in ArcticDEM strip files <ref type="bibr">[14,</ref><ref type="bibr">42]</ref>. This study used a 2 m ArcticDEM strip generated from a stereo-pair of cloud-free WV-2 satellite imagery acquired in June 2016. In addition to ArcticDEM, we also acquired one WV-2 image from the original stereo-pair to utilize the multispectral information in the following ground identification step. Moreover, given that the comparative method CSF is a point cloud-based filter <ref type="bibr">[21]</ref>, we requested the mask file encoding the information of points matched between the source optical stereo-images <ref type="bibr">[14]</ref> for generating the point clouds for CSF filtering. The reference DTM (1 m) used for model evaluation was obtained from airborne laser scanning (ALS) point clouds collected in 2011 by a commercial lidar data vendor commissioned by the Alaska Division of Geological and Geophysical Surveys (Delivery 6 <ref type="bibr">[43]</ref>). The lidar data have an average point density &gt; 8 points per m 2 with a vertical accuracy of 0.05 m root mean squared error (RMSE).</p><p>To ensure spatial consistency across different data sources, we applied a bilinear resampling approach to upscale lidar-derived DTM (1 m) and WV-2 multispectral bands (~1.92 m) to 2 m matching the ArcticDEM's resolution and reprojected all data sources (via QGIS) to EPSG: 32606 (WGS84/UTM zone 6N). It could be observed that the topography of the ArcticDEM has been greatly affected by vegetation canopies at all study sites (Figure <ref type="figure">2</ref>). We also noticed a clearly systematic mismatch (&gt;8 m vertical difference) between ArcticDEMs and the lidar-derived reference DTMs. To offset this mismatch, we used the extracted high-confidence pseudo-ground pixels (pseudo-ground cluster refined by both cluster membership probability &gt; 0.9 and morphological erosion, Section 2.2.1) as a base to co-register ArcticDEMs and lidar-derived DTMs via a point cloud alignment algorithm named iterative closest point <ref type="bibr">[44]</ref>. This operation reduces the original RMSEs <ref type="bibr">(8.73, 8.93, 9</ref>.04 m) of the pseudo-ground pixels to 0.13, 0.18, and 0.44 m at Site-1, -2, -3, respectively.</p><p>To evaluate the ground identification results, we determined the reference ground masks to be lidar-derived CHM &#8804; 0.2 m for all study sites, where lidar-derived CHM was calculated as the difference between lidar-derived DSM and lidar-derived DTM. This 0.2 m threshold is very close to our DEM co-registration errors, which would make a negligible difference to our results, particularly given our focus on removing tall vegetation. It should be noted that we used the lidar-derived CHM &#8804; 0.2 m to derive ground masks instead of referring to all lidar returns &#8804; 0.2 m of the ground surface (i.e., z &#8804; 0.2 m) for three reasons. First, CHM &#8804; 0.2 m represents ground and low vegetation in the source lidar classification (Delivery 6 <ref type="bibr">[43]</ref>) while z &#8804; 0.2 m could include ground returns under tall vegetation canopy (as lidar points can penetrate canopies). Second, the reference ground masks were used to validate ground identification algorithms in locating ground points from ArcticDEM that are derived from optical (stereo) imagery but not lidar data. Unlike lidar, points matched between optical stereo images are usually limited to upper canopy surfaces in vegetated areas with no bare ground information revealed at these locations. This means that if there is tall vegetation at the same location, the lower points with z &#8804; 0.2 m would not be captured by optical imagery. Third, the official guide of the lidar data (Delivery 6 <ref type="bibr">[43]</ref>) states that lidar returns with z &#8804; 0.2 m have a higher potential to be artifacts and could be misclassified as non-ground by the lidar filtering method, making them hardly distinguishable from the low vegetation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Proposed Method</head><p>The proposed method consists of two main steps: (1) ground identification and (2) DTM interpolation. A flowchart of the proposed method is illustrated in Figure <ref type="figure">3</ref>. First, we employed an unsupervised clustering model, i.e., Gaussian mixture model (GMM) <ref type="bibr">[45]</ref>, to detect the pseudo-ground cluster (ground candidates initially separated by GMM with no refinement) based on WV-2 multispectral imagery. The GMM-derived pseudo-ground pixels were then refined by clustering uncertainties and a locally adjusted morphological erosion operation. Second, we applied two spatial interpolation techniques to generate the regional DTM (i.e., ArcticDTM) based on the ground masks extracted from the ground identification step and the ArcticDEM data. The details of the methodology are elaborated on in the following subsections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1.">Ground Identification</head><p>The entire ground identification process is comprised of two steps: (1) pseudo-ground cluster identification and (2) ground mask refinement.</p><p>Step 1: pseudo-ground cluster identification: The pseudo-ground cluster is mainly identified from the clustering result by an unsupervised and probabilistic clustering algorithm, GMM <ref type="bibr">[45]</ref>, which groups pixels into different clusters characterized by similar within-cluster and distinctive between-cluster statistical patterns (i.e., mean and covariance matrix) in the feature space. GMM clustering can be considered a soft version of K-means with probabilistic meaning encoded <ref type="bibr">[46]</ref>, thereby enabling uncertainty quantification of the clustering results <ref type="bibr">[47]</ref>. Compared to K-means, GMM is more flexible in modeling a full covariance matrix that determines the shape of the feature distribution associated with a specific cluster. Specifically, GMM models the likelihood of the observations as a mixture of Gaussian distributions <ref type="bibr">[45,</ref><ref type="bibr">48]</ref>, as shown in Equation (1):</p><p>where p(x|&#952;) denotes the joint conditional probability (likelihood) of all data observations x given parameter &#952;. Here, n corresponds to the number of pixels in the imagery, m is the number of mixture components, &#960; k represents the proportion of the k-th Gaussian component, x i is a feature vector of the i-th pixel with dimension 1 &#215; f where f stands for the number of input features, and The entire ground identification process is comprised of two steps: ( ground cluster identification and ( <ref type="formula">2</ref>) ground mask refinement.</p><p>Step 1: pseudo-ground cluster identification: The pseudo-ground cluster identified from the clustering result by an unsupervised and probabilistic clust rithm, GMM <ref type="bibr">[45]</ref>, which groups pixels into different clusters characterized within-cluster and distinctive between-cluster statistical patterns (i.e., mean a ance matrix) in the feature space. GMM clustering can be considered a soft ve means with probabilistic meaning encoded <ref type="bibr">[46]</ref>, thereby enabling uncertainty GMM clustering (using the GaussianMixture function in Python sklearn.mixture package <ref type="bibr">[49]</ref>) was initialized by the K-means clustering, and the fitting of the Gaussian mixture or the parameter optimization step was then realized by the expectation-maximization algorithm <ref type="bibr">[50]</ref>. Once the parameters (&#960; k , &#181; k , &#931; k , Equation ( <ref type="formula">1</ref>)) are determined, the cluster membership probability of the i-th observation (x i ) corresponding to the k-th Gaussian component can be obtained by</p><p>, which quantifies the clustering uncertainty. The clustering uncertainty encodes critical information for evaluating the confidence of each pixel's clustering result. This information was then used for refining the initially identified pseudo-ground cluster in Step 2: ground mask refinement.</p><p>The entire process of GMM clustering is fully automatic for a given number of mixture components (i.e., # of clusters). The optimal number of clusters generally can be determined empirically by a visual inspection or automatically by employing a Bayesian information criterion that seeks a trade-off between maximizing the log-likelihood and reducing the model complexity <ref type="bibr">[51]</ref>.</p><p>To extract reliable ground points from the ArcticDEM, we needed features for GMM clustering that can differentiate ground from non-ground areas (e.g., vegetation). Given this, we then input the eight original multispectral bands (coastal, blue, green, yellow, red, red edge, and two near-infrared bands) of the WV-2 imagery and additionally calculated three vegetation-related indices based on the spectral bands, including two vegetation indices and one water index. Vegetation indices include the normalized difference vegetation index (NDVI) and modified soil-adjusted vegetation index (MSAVI) <ref type="bibr">[52]</ref>. NDVI reflects healthy vegetation conditions and positively correlates with leaf density and plant biomass. MSAVI corrects the soil brightness impacts on the vegetation response. The normalized difference water index (NDWI) <ref type="bibr">[53]</ref> captures the water components of the vegetation canopy. Before conducting GMM clustering, we standardized all input features to remove scale differences and applied principal component analysis to reduce feature dimensions to several dominant directions that explain over 95% of the total variation.</p><p>Based on the GMM clustering results, we then identified the pseudo-ground cluster with the lowest median NDVI and negative NDWI (i.e., NDWI &lt; -0.1), given that bare soils reflect more strongly in the near-infrared band than visible bands. The adopted NDVI and NDWI criteria were intended to provide an automatic pseudo-ground cluster identification and were evaluated only at vegetated study sites similar to ours. For regions with different characteristics, the pseudo-ground cluster identification step may need some minor adjustments or is simply conducted by visual inspection.</p><p>Step 2: ground mask refinement: Due to the inherent spectral confusion between the ground and non-ground pixels (mainly vegetation), commission errors resulting from vegetation pixels can be found in the GMM-based pseudo-ground cluster. To reduce the number of remaining vegetation pixels in the ground masks, we refined the pseudoground cluster by removing pixels with low confidence or that were sparsely distributed. Low confidence pixels were discarded if their cluster membership probability was &lt;&#955; * , where higher &#955; * removes more uncertain pseudo-ground pixels. Here we set &#955; * to be 0.8. Sparsely distributed pixels were removed through a morphological erosion operation with a 3 &#215; 3 square-shaped structure element. On the other hand, considering that eliminating sparse pseudo-ground points may cause large errors in DTM estimation under high topographic variations, we skipped the morphological refinement at locations meeting two criteria: (1) low sampling density: the percentage of the remaining pseudo-ground pixels (after uncertainty refinement) within the local window is &lt;&#947; * , and (2) high topographic variation: the local standard deviation of ArcticDEM is &gt;&#963; * . Generally, higher &#947; * or lower &#963; * preserves more sparsely distributed pseudo-ground pixels. In this study, we used a 5 &#215; 5 local window (i.e., 100 m 2 ) and set &#947; * , &#963; * to be 10%, 4 m, respectively, for all study sites.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2.">DTM Interpolation</head><p>The ground masks derived from the ground identification methods provide pixel locations where ArcticDEM values can be regarded as DTM. Based on these pseudoground points, spatial interpolation techniques were employed to estimate the terrain information for the remaining non-ground areas and reconstruct regional DTMs. In this paper, we implemented and compared two spatial interpolation techniques for DTM generation, including ordinary kriging (OK) <ref type="bibr">[54,</ref><ref type="bibr">55]</ref> and triangulation irregular network (TIN)-based natural neighbor (NN) <ref type="bibr">[56]</ref>. Specifically, the two spatial interpolation methods are summarized as follows:</p><p>OK interpolates the unknowns based on the weighted average of known points. Specifically, the weight determination of OK considers both spatial dependence <ref type="bibr">(variogram)</ref> and spatial closeness (location). Under the stationarity assumption of OK, the DEM residuals at ground pixels were generated first by subtracting the best-fit second-order trend surfaces (quadratic surfaces fitted on ArcticDEMs with the least residuals, using the lm function in the R stats package <ref type="bibr">[57]</ref>) from the original ArcticDEMs. We then predicted the DEM residuals at non-ground locations based on 16 nearest pseudo-ground neighbors and the estimated semi-variogram (fitted by a spherical model, using the variofit function in the R geoR package <ref type="bibr">[58]</ref>). Here we used 16 nearest neighbors to achieve a balance between interpolation accuracy and computational efficiency. The predicted DEM residual maps were then added back to the trend surfaces to produce the final DTMs (using the krige function in the R gstat package <ref type="bibr">[57]</ref>).</p><p>TIN-based NN, known as "area-stealing", also starts with building a Delaunay triangulation upon the known points, then separately constructing the first-and second-order Voronoi cells for each unknown location based on the triangle it locates and nearby triangles whose circumcircles enclose it. Here, the first-order and second-order Voronoi structures are formed globally by known points (using their middle lines) and locally by any unknown and its known neighbors, respectively. With the two layers of Voronoi networks, NN then interpolates the unknowns by a weighted average of these nearby triangle vertices whose weights are determined according to the area ratio of the second-order to the first-order Voronoi cells they are located at. A more detailed discussion on NN could be found in <ref type="bibr">Tily and Brace (2006)</ref>  <ref type="bibr">[56]</ref>. NN ensures a first order of continuity (c 1 , the first derivatives of the interpolated surfaces are also continuous) everywhere except at the known locations.</p><p>Benefiting from this higher level of continuity, NN usually provides more favorable and smoother predictions than linear (c 0 ) and nearest-neighbor (discontinuous) interpolation. NN was implemented via the griddata function (MATLAB R2020b). Note that NN cannot estimate unknown points outside the TIN network. The predictions of these unknowns were then made by a linear extrapolation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Evaluation and Comparison</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.1.">Ground Identification</head><p>To evaluate the performance of the proposed method in extracting reliable ground pixels, we calculated several quantitative metrics between the predicted and reference ground masks (i.e., lidar-derived CHM &#8804; 0.2 m) for the entire study region, including overall accuracy (OA), type I error (TI, or commission error), type II error (TII, or omission error), and F1 score <ref type="bibr">[59]</ref>. True positive rate (TP) was additionally provided to supplement the explanation of the F1 score. Among all reported metrics, OA calculates the relative percentage of correctly identified ground (TP) and non-ground pixels (true negative rate, TN). TI (or commission errors) and TII (or omission errors) account for non-ground pixels misclassified as ground (false positive, FP) and ground pixels misclassified as non-ground (false negative, FN), respectively. F1 score denotes the harmonic mean of the user's accuracy (i.e., 1-TI) and producer's accuracy (i.e., 1-TII), which conveys a balance between TI and TII and weighs more on FN and FP in contrast to OA. Higher F1 scores are usually associated with lower classification errors. Considering that the elevation values of ground pixels are crucial for DTM estimation, we additionally assessed the vertical accuracy of the identified ground pixels by comparing their ArcticDEMs with the reference lidar-derived DTMs using error metrics listed in Section 2.3.2.</p><p>As a comparison to our GMM-based method, we also derived ground masks from K-means and three filtering-based ground identification algorithms, including CSF <ref type="bibr">[21]</ref>, MBG <ref type="bibr">[27]</ref>, and MSD <ref type="bibr">[26]</ref>. It is worth noting that all these filtering-based methods work directly on optical-stereo-imagery-generated DSMs or point clouds. The three filtering-based ground identification algorithms were implemented using the source code available on the GitHub repositories (CSF: <ref type="url">https://github.com/jianboqi/CSF</ref> (accessed on 1 October 2021); MSD: <ref type="url">https://github.com/rolandperko/dsm2dtm</ref> (accessed on 1 October 2021); MBG: <ref type="url">https://github.com/himmetozcan/Lidar_DTM_Segmentation</ref> (accessed on 1 October 2021)) with the parameter settings optimized by using the lidar-derived reference ground masks and visual inspection. More detailed descriptions of these methods and their parameter settings are provided in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.2.">DTM Interpolation</head><p>The evaluation of DTM interpolation was conducted by calculating accuracy metrics between the predicted ( &#375;) and reference lidar-derived DTMs (y) in the entire study regions, including RMSE (Equation ( <ref type="formula">2</ref>)), relative RMSE (rRMSE), mean absolute error (MAE, Equation ( <ref type="formula">3</ref>)), and mean error (ME, Equation ( <ref type="formula">4</ref>)). RMSE has been frequently adopted in evaluating the average model prediction errors. rRMSE is calculated as the ratio of RMSE to the range (difference between maximum and minimum DTMs) of the reference DTMs, i.e., 53.8, 56, 55.4 m for Site-1, -2, -3, respectively, which provides normalized RMSE to account for the scale difference of data. MAE and ME convey additional information regarding the average absolute errors and bias in the prediction. It is worth mentioning that since ME only sums the prediction errors regardless of its sign (positive or negative), it is generally smaller than MAE, and lower ME may not indicate better prediction. Nonetheless, large positive MEs could imply a highly positive deviation (overestimation) on average.</p><p>where n is the number of observations, &#375;i , y i denote the estimated and observed values of the i-th pixel, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Ground Mask Extraction</head><p>The GMM-based clusters along with the clustering uncertainties (i.e., cluster membership probability) are shown in Figure <ref type="figure">4</ref> Though having no significant difference in spatial patterns, the generated clusters appear slightly more separable with additional spectra-derived features (NDVI, MSAVI, NDWI) than using multispectral band only (blackoutlined areas in Figure <ref type="figure">A1</ref>). Overall, the cluster maps exhibit spatial patterns consistent with the satellite images at all study sites, indicating good performances of the GMM-based clustering in grouping similar pixels (Figure <ref type="figure">4</ref>). Based on the NDVI and NDWI rules, cluster 1 was determined as the pseudo-ground class for all study sites. The rest three clusters (2-4) are found to be darker regions (e.g., shadow, dark green vegetation, cluster 2), green vegetation (cluster 3), and an additional concrete road class (cluster 4) after visually inspecting the satellite imagery (Figures <ref type="figure">2</ref> and<ref type="figure">4</ref>). Considering that concrete roads are commonly processed as the bare ground in lidar-derived DTMs, we therefore appended cluster 4 to the pseudo-ground class.</p><p>It appears that pixels with lower cluster membership probability, meaning more uncertainty about the dominant cluster, are generally located at areas with higher levels of spatial variability, e.g., canopy gaps, vegetation-ground boundaries (Figures <ref type="figure">2</ref> and<ref type="figure">4</ref>). In contrast, more spatially homogenous regions such as the top of dense canopy or ground surface are characterized by much purer clustering results (i.e., higher cluster membership probability, Figure <ref type="figure">4</ref>). To demonstrate the value of uncertainty measures, we show ground masks generated from GMM clustering only and refined by cluster membership probability of &gt;0.8 in Figure <ref type="figure">5</ref>. Instead of displaying a binary map of each ground mask, DEM differences between ArcticDEM and lidar-derived DTM overlaid with different ground masks are presented for better assessment and comparison. Clearly, positive DEM differences indicate vegetated areas. Overall, it could be observed that vegetated pixels &gt; 3 m are mostly filtered out in GMM-derived ground masks. Refinement based on clustering uncertainty further removes some tall vegetation (with DEM differences &gt; 3 m) located at the vegetated areas or vegetation-to-ground boundaries (Figure <ref type="figure">5</ref>).  It appears that pixels with lower cluster membership probability, meaning more uncertainty about the dominant cluster, are generally located at areas with higher levels of spatial variability, e.g., canopy gaps, vegetation-ground boundaries (Figures <ref type="figure">2</ref> and<ref type="figure">4</ref>). In contrast, more spatially homogenous regions such as the top of dense canopy or ground surface are characterized by much purer clustering results (i.e., higher cluster membership probability, Figure <ref type="figure">4</ref>). To demonstrate the value of uncertainty measures, we show ground masks generated from GMM clustering only and refined by cluster membership probability of &gt; 0.8 in Figure <ref type="figure">5</ref>. Instead of displaying a binary map of each ground mask, DEM differences between ArcticDEM and lidar-derived DTM overlaid with different ground masks are presented for better assessment and comparison. Clearly, positive DEM  <ref type="figure">(a-c</ref>) correspond to Site-1, Site-2, and Site-3, respectively. The four clusters 1, 2, 3, 4 were identified as pseudo-ground, darker regions (e.g., shadow, dark green vegetation), green vegetation, and concrete road, respectively. Clustering results at locations outlined by black boxes were magnified for better comparison with satellite RGB images. Uncertainty maps display the cluster membership probabilities. Higher cluster membership probabilities imply purer clustering results with lower confidences being other clusters, whereas lower probabilities suggest more ambiguities in discriminating one cluster from others.</p><p>differences indicate vegetated areas. Overall, it could be observed that vegetated pixels &gt; 3 m are mostly filtered out in GMM-derived ground masks. Refinement based on clustering uncertainty further removes some tall vegetation (with DEM differences &gt; 3 m) located at the vegetated areas or vegetation-to-ground boundaries (Figure <ref type="figure">5</ref>). The final GMM-based ground masks derived from refining pseudo-ground clusters with clustering uncertainty and morphological erosion are shown in Figure <ref type="figure">6</ref>. Compared to the ground masks in Figure <ref type="figure">5</ref>, sparsely distributed vegetation pixels are effectively removed by morphological operation. Among the three study sites, Site-2 has larger void areas with no ground observations due to denser canopy cover (Figure <ref type="figure">2</ref>), creating more complicated scenarios for DTM interpolation (see results in Section 3.2). Figure <ref type="figure">6</ref> also shows ground masks corresponding to K-means and three comparative ground filteringbased methods. Compared to K-means and the ground filtering-based methods, the GMM-based method clears most of the vegetated pixels (particularly tall vegetation with DEM differences &gt; 3 m). In contrast, K-means-derived ground masks exhibit very similar spatial patterns to those generated by GMM with no uncertainty refinement (Figure <ref type="figure">5</ref>). MSD tends to misidentify ground pixels as non-ground at positions where the ground-tovegetation transitions are less drastic than topographic variations (outlined by red ellipses). MBG generally does a better job than CSF and MSD in discriminating ground from non-ground pixels. However, it has a similar issue to MSD, especially at the boundaries of the study sites (Site-2, Site-3). Moreover, all filtering-based methods confuse some low areas at large forest patches with ground pixels and fail to detect some tall vegetation at the vegetation-to-ground boundaries. Overall, the spatial patterns of ground pixels are better preserved in GMM-based ground masks with much less tall vegetation present (  The final GMM-based ground masks derived from refining pseudo-ground clusters with clustering uncertainty and morphological erosion are shown in Figure <ref type="figure">6</ref>. Compared to the ground masks in Figure <ref type="figure">5</ref>, sparsely distributed vegetation pixels are effectively removed by morphological operation. Among the three study sites, Site-2 has larger void areas with no ground observations due to denser canopy cover (Figure <ref type="figure">2</ref>), creating more complicated scenarios for DTM interpolation (see results in Section 3.2). Figure <ref type="figure">6</ref> also shows ground masks corresponding to K-means and three comparative ground filtering-based methods. Compared to K-means and the ground filtering-based methods, the GMM-based method clears most of the vegetated pixels (particularly tall vegetation with DEM differences &gt; 3 m). In contrast, K-means-derived ground masks exhibit very similar spatial patterns to those generated by GMM with no uncertainty refinement (Figure <ref type="figure">5</ref>). MSD tends to misidentify ground pixels as non-ground at positions where the ground-to-vegetation transitions are less drastic than topographic variations (outlined by red ellipses). MBG generally does a better job than CSF and MSD in discriminating ground from non-ground pixels. However, it has a similar issue to MSD, especially at the boundaries of the study sites (Site-2, Site-3). Moreover, all filtering-based methods confuse some low areas at large forest patches with ground pixels and fail to detect some tall vegetation at the vegetation-to-ground boundaries. Overall, the spatial patterns of ground pixels are better preserved in GMM-based ground masks with much less tall vegetation present (Figures <ref type="figure">5</ref> and<ref type="figure">6</ref>). Tables <ref type="table">1</ref> and<ref type="table">2</ref> list detailed accuracy metrics of ground masks derived from GMM, Kmeans, and three ground filtering methods. Table <ref type="table">1</ref> provides categorical accuracy assessment on ground identification by evaluating the classification accuracy of discriminating ground pixels from non-ground pixels. GMM achieves significantly lower TIs (i.e., Tables <ref type="table">1</ref> and<ref type="table">2</ref> list detailed accuracy metrics of ground masks derived from GMM, K-means, and three ground filtering methods. Table <ref type="table">1</ref> provides categorical accuracy assessment on ground identification by evaluating the classification accuracy of discriminating ground pixels from non-ground pixels. GMM achieves significantly lower TIs (i.e., commission errors, 0.288-0.413) and higher OAs (0.634-0.722) than the filtering-based methods at all study sites (Table <ref type="table">1</ref>). We also notice that GMM-based ground masks have larger TII errors (i.e., omission errors) than the others, especially at Site-1 (0.682) and Site-3 (0.306). This can be attributed to the uncertainty refinement and morphological erosion used for removing low-confidence and sparsely distributed pseudo-ground pixels (Figures <ref type="figure">5</ref> and<ref type="figure">6</ref>). Despite relatively low TP rates (0.129-0.389), GMM-derived ground masks achieve comparable F1 scores to other methods at Site-2 (0.654) at Site-3 (0.690). CSF-and MBG-derived ground masks score higher F1s at Site-1 (0.629, 0.628). K-means receives the highest F1s at both Site-2 (0.683) and Site-3 (0.742). However, their large TI errors may cause severe overpredictions in spatial interpolation. Likewise, despite high TPs (0.332-0.538), MSD may also suffer from overpredictions given its rather high TI errors (0.434-0.555). Table <ref type="table">2</ref> provides a quantitative accuracy assessment of ground identification by calculating vertical accuracy metrics of the ArcticDEM values at identified ground pixels compared to the lidar-derived reference DTM. Because ArcticDEM values at identified ground pixels will be used as DTM samples for subsequent DTM interpolation, the vertical accuracy of identified ground pixels is an important quality metric that matters more to spatial interpolation results. The results show GMM's superior capability in identifying high-quality ground pixels with much lower errors (0.328-0.348 m RMSEs, 0.005-0.006 rRMSE, 0.233-0.256 m MAEs, and 0.039-0.058 m MEs) than the other methods. Comparatively, CSF and MSD produce less desirable ground pixels that have much larger elevation errors (&gt;2.5 m RMSEs at Site-1). MBG performs satisfactorily in identifying reliable ground pixels at Site-1 and Site-3 (&lt;1 m RMSEs), yet the extracted ground masks fail to remove tall vegetation, resulting in 2.411 m RMSE and 0.812 m MAE at Site-2. The remaining vegetation also brings larger errors (0.522-0.635 m RMSEs) in K-means derived ground masks than GMM-derived. Given the results in Tables <ref type="table">1</ref> and<ref type="table">2</ref>, GMM appears to be the best ground identification method considering its lowest TI errors, highest overall accuracies, and best quality (i.e., smallest elevation differences to the reference DTMs) of ground pixels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">DTM Interpolation</head><p>DTM interpolation results based on GMM-based ground masks were evaluated quantitatively by calculating error metrics in comparison to the lidar-derived reference DTMs (Tables <ref type="table">3</ref><ref type="table">4</ref><ref type="table">5</ref>for Site-1, -2, -3, respectively). In addition to error metrics, extreme values at non-ground areas were also reported in Tables <ref type="table">3</ref><ref type="table">4</ref><ref type="table">5</ref>where upper (UE) and lower (LE) extreme values were calculated as the percentage of pixels with &lt;-3 m and &gt;3 m vertical errors, respectively. Qualitatively, we mapped the interpolated DTMs and their differences to the reference DTMs to visualize the spatial patterns of the predicted DTMs (Figures <ref type="figure">7</ref><ref type="figure">8</ref><ref type="figure">9</ref>for Site-1, -2, -3 respectively). Boxplots were additionally plotted to illustrate the distribution of DEM differences for both ArcticDEM and the predicted DTMs (Figure <ref type="figure">10</ref>). In general, positive deviations from the reference DTMs are found in ArcticDEMs at all study sites (positive MEs in Tables <ref type="table">3</ref><ref type="table">4</ref><ref type="table">5</ref>, right-skewed boxplots, Figure <ref type="figure">10</ref>), suggesting positive biases in ArcticDEMs compared to the lidar-derived DTMs due to the vegetation effects. Comparatively, Site-1 and Site-2 have taller vegetation (&gt;10 m, Figure <ref type="figure">10a,</ref><ref type="figure">b</ref>) and larger portions of UE (~20%, Tables <ref type="table">3</ref> and<ref type="table">4</ref>) than Site-3 where vegetation heights are mostly &lt;5 m (Figure <ref type="figure">10c</ref>). Overall, spatial interpolation techniques, together with GMM-based ground masks, effectively shorten the differences between ArcticDEMs and the lidar-derived reference DTMs at all study sites. ues at non-ground areas were also reported in Tables <ref type="table">3</ref><ref type="table">4</ref><ref type="table">5</ref>where upper (UE) and lower (LE) extreme values were calculated as the percentage of pixels with &lt;-3 m and &gt;3 m vertical errors, respectively. Qualitatively, we mapped the interpolated DTMs and their differences to the reference DTMs to visualize the spatial patterns of the predicted DTMs (Figures <ref type="figure">7</ref><ref type="figure">8</ref><ref type="figure">9</ref>for Site-1, -2, -3 respectively). Boxplots were additionally plotted to illustrate the distribution of DEM differences for both ArcticDEM and the predicted DTMs (Figure <ref type="figure">10</ref>). In general, positive deviations from the reference DTMs are found in ArcticDEMs at all study sites (positive MEs in Tables <ref type="table">3</ref><ref type="table">4</ref><ref type="table">5</ref>, right-skewed boxplots, Figure <ref type="figure">10</ref>), suggesting positive biases in ArcticDEMs compared to the lidar-derived DTMs due to the vegetation effects. Comparatively, Site-1 and Site-2 have taller vegetation (&gt;10 m, Figure <ref type="figure">10a,</ref><ref type="figure">b</ref>) and larger portions of UE (~20%, Tables <ref type="table">3</ref> and<ref type="table">4</ref>) than Site-3 where vegetation heights are mostly &lt;5 m (Figure <ref type="figure">10c</ref>). Overall, spatial interpolation techniques, together with GMMbased ground masks, effectively shorten the differences between ArcticDEMs and the lidar-derived reference DTMs at all study sites.         Specifically, at Site-1, natural neighbor (NN) predicts better DTMs than ordinary kriging (OK), which reduces elevation differences of ArcticDEM at entire regions from 4.722, 2.136, 2.015 to 0.648, 0.449, and 0.113 m for RMSE, MAE, ME, respectively, with rRMSE being reduced from 0.088 to 0.012 (Table <ref type="table">3</ref>). Furthermore, the original UE in ArcticDEM (19.282%) has been substantially lowered to &lt;0.3% by all interpolations. Qualitatively, both OK and NN effectively remove vegetation taller than 5 m from ArcticDEM (Figure <ref type="figure">7b</ref>), shifting the original positively skewed distribution of elevation difference towards a normal one (Figure <ref type="figure">10a</ref>). However, it should be noted that all interpolation methods tend to flatten the valleys previously covered by dense canopies (Figure <ref type="figure">7a,</ref><ref type="figure">b</ref>). Furthermore, a general underestimation of DTM can be found at the northern-elevated areas in all methods (Figure <ref type="figure">7b</ref>), resulting in 0.101-0.154% of LE (Table <ref type="table">3</ref>).</p><p>The DTM interpolation results at Site-2 show larger errors (0.026-0.047 rRMSEs in Table <ref type="table">4</ref>) than those at Site-1 (Table <ref type="table">3</ref>), particularly on the west side of the region where the land surface is elevated and covered by dense canopies (Figure <ref type="figure">8</ref>). Like Site-1, NN also predicts more accurate DTM than OK interpolation, which effectively decreases the RMSE of ArcticDEM by &gt;65% (from 4.841 to 1.677 m, Table <ref type="table">4</ref>). Compared with NN (Table <ref type="table">4</ref> and Figure <ref type="figure">10b</ref>), OK-interpolated DTMs suffer more from both overestimation (&gt;15% UE) and underestimation (8.277% LE), resulting in 0.354 m ME. Like Site-1, none of the presented methods could well recover the terrain information under dense canopies with large interpolation uncertainties present (Figure <ref type="figure">8</ref>) due to the lack of exposed ground areas on WV-2 imagery that enable sufficient ground identification. Consequently, all interpolated DTMs demonstrate significant artifacts on the west side of the region (Figure <ref type="figure">8a</ref>) in comparison with the reference DTM (Figure <ref type="figure">2</ref>). This indicates a major limitation of using optical stereo imagery in reconstructing bare ground topography in densely forested areas (Figure <ref type="figure">2</ref>).</p><p>Site-3 has much lower vertical errors in ArcticDEM than the other two sites due to its sparser canopy cover and lower topographic variation (Figure <ref type="figure">9</ref>, Table <ref type="table">5</ref>). All spatial interpolation techniques are effective at further reducing the elevation errors in ArcticDEM (Table <ref type="table">5</ref>). DEM error maps (Figure <ref type="figure">9b</ref>) and boxplots (Figure <ref type="figure">10c</ref>) also suggest the good performances of all interpolation techniques in removing tall vegetation from ArcticDEM, shifting the mean and median of DEM differences to ~0 m, and recovering the underlying terrain surface (Figure <ref type="figure">9a</ref>). Consistent with the results at Site-1 and Site-2, NN produces more accurate DTM than OK, which reduces ArcticDEM error metrics by more than half (e.g., 0.521 m RMSE, 0.009 rRMSE, Table <ref type="table">5</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Ground Identification</head><p>Extracting DTMs from optical-stereo-imagery generated DSMs requires reliable ground observations as input to spatial interpolation methods. Errors in the extracted ground pixels can affect the accuracy of DTM interpolation. Existing filtering-based ground identification methods primarily employ the point cloud/DSM product in deriving DTMs, such as CSF <ref type="bibr">[21]</ref>, MSD <ref type="bibr">[26]</ref>, and MBG <ref type="bibr">[27]</ref>. In essence, all these techniques identify ground points based on DEM distinctions among neighboring pixels. They may perform well in lidarbased point cloud/DSM as lidar receives ground returns even under vegetation canopies. Comparatively, point clouds generated from optical stereoscopy are much sparser and spatially non-overlapped <ref type="bibr">[60]</ref> due to geometric occlusion <ref type="bibr">[31]</ref> or low-contrast surfaces <ref type="bibr">[14]</ref>. Without sufficient ground exposure, filtering-based ground identification methods may misclassify low areas between large forest patches as ground or misidentify steep topography as non-ground objects when topographic changes are more substantial than the vegetation-to-ground transitions in optical stereoscopy-derived DSMs.</p><p>To address the above-mentioned misidentification issues in filtering-based techniques, we developed a GMM-based ground identification method by taking advantage of the spectral information of optical imagery, which also enables the quantification of the cluster membership probability to identify high-confidence ground pixels. Our results demonstrate the superior performance of the GMM-based method over K-means and filtering-based ground identification methods by locating the highest-quality ground samples. While more advanced supervised classification algorithms (e.g., support vector machine <ref type="bibr">[61]</ref>, random forest <ref type="bibr">[30]</ref>, and deep convolutional neural networks <ref type="bibr">[62]</ref>) can be used to classify ground and non-ground from optical imagery, they are not suitable for forested sites in high-latitude regions where training samples are rarely available. Our GMM-based method is unsupervised and has great potential for high latitude regions covered by ArcticDEM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Spatial Interpolation</head><p>Though ground identification is often considered the most critical step in estimating DTM from DEM products, our results suggest varying performance and inconsistencies of spatial interpolation techniques in DTM prediction across three study sites due to uncertainties in the ground masks and topographic variations. Specifically, for scenarios with mild topography (e.g., Site-3), both OK and NN work well even with high omission errors in the ground masks. However, for scenarios with high topographic variations (e.g., Site-2), lacking ground samples at elevated locations could lead to significant underestimation in both NN and OK. Overall, TIN-based NN appears more robust to uncertainties in the ground masks than OK, resulting in lower errors in DTM interpolation. Our finding of NN's better performance has concurred with Bater and Coops (2009) <ref type="bibr">[33]</ref> and <ref type="bibr">Bandara et al. (2011)</ref>  <ref type="bibr">[63]</ref>, who also found NN more robust than other interpolation techniques on DTM interpolation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Future Improvement</head><p>This study only explored the use of optical imagery for ground identification. As optical data cannot see through vegetation canopy, the effectiveness of GMM and filteringbased methods depends on how much ground exposure can be observed from optical imagery. Our results show that forest patches lead to large void areas in the derived ground masks where none of the spatial interpolation techniques can fully recover the terrain information under dense canopies. To reveal more topographic details and improve overall terrain prediction, future studies can explore additional data for better ground identification and DTM generation. First, optical stereoscopy acquired with different solar geometry conditions may capture surfaces related to different canopy layers <ref type="bibr">[64]</ref>. Moreover, multi-temporal optical stereoscopy-generated DEMs tied to different seasons can be used to uncover more ground pixels, especially for deciduous species, e.g., leaf-off stages <ref type="bibr">[65]</ref>. In these regards, combining optical stereoscopy under different acquisition conditions (e.g., leaf-on and leaf-off, high and low sun elevation angles) would undoubtedly benefit the ground identification or canopy height estimation from one stereo-pair solely. Second, active sensors such as lidar and synthetic aperture radar (SAR) have better penetrating capability than optical sensors, thus providing supplementary vertical information for estimating bare ground or canopy height. With increasingly free access to spaceborne lidar, e.g., ICESat-2 <ref type="bibr">[66]</ref>, and SAR, e.g., ALOS-PALSAR <ref type="bibr">[67,</ref><ref type="bibr">68]</ref>, Sentinel-1 <ref type="bibr">[69]</ref>, and the upcoming mission ESA-BIOMASS <ref type="bibr">[70]</ref>, it is anticipated that our DTM estimation approach could be further improved by integrating optical data with active sensor data.</p><p>On the other hand, though our study demonstrates the robustness of NN in comparison with OK in DTM interpolation regarding the uncertainties of ground samples, Guo et al. (2010) reported more accurate results from kriging-based methods than NNinterpolated <ref type="bibr">[36]</ref>. Moreover, inverse distance weighting (IDW) interpolation was considered preferable to kriging-and TIN-based methods at a micro-scale <ref type="bibr">[32]</ref>. To comprehend the performance of spatial interpolation, previous studies assessed the influences of topographic variations or sampling densities on DEM interpolation by simulating ground samples with different densities <ref type="bibr">[32,</ref><ref type="bibr">36,</ref><ref type="bibr">71,</ref><ref type="bibr">72]</ref>. However, there are key distinctions between their experiments and ours, making their findings unsuitable for our problem. First, these studies were not designed for DTM derived from optical stereoscopy (e.g., ArcticDEM). The findings of <ref type="bibr">Guo et al. (2010)</ref>  <ref type="bibr">[36]</ref> and Ag&#252;era-Vega et al. (2020) <ref type="bibr">[32]</ref> were built on lidar and unmanned aerial vehicle-derived point clouds, different from those derived from satellite-based stereoscopy (way much sparser). In <ref type="bibr">Aguilar et al. (2005)</ref>  <ref type="bibr">[71]</ref> and <ref type="bibr">&#352;iljeg et al. (2019)</ref>  <ref type="bibr">[72]</ref>, the spatial interpolation techniques were directly performed on photogrammetry-derived point clouds/DEMs without discriminating ground from nonground areas (e.g., vegetation). Second, the random ground sampling strategies adopted in studies working on photogrammetry-based datasets <ref type="bibr">[32,</ref><ref type="bibr">71]</ref> did not consider the presence of vegetation, which cannot reflect the real vegetation distribution in the Arctic and forested regions (e.g., clustered). Lastly, there generally lacks a discussion on DTM inter-polation regarding the influences of uncertainties (omission or commission errors) in the ground masks. Given these, we will conduct a comprehensive analysis to evaluate the robustness and consistency of spatial interpolation techniques on optical stereoscopy based terrain reconstruction under different scenarios that involve both real data analysis and simulation study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>ArcticDEM provides fundamental elevation data to scientific studies in the Arctic region, yet its ecological/hydrological applications are limited by the fact that the height information of surface objects is coupled with the underlying topography in the DEM products. To address this, we proposed a GMM-based ground identification algorithm and compared it with three state-of-the-art filtering-based methods for ArcticDTM prediction. Our results demonstrate that the proposed DTM generation method greatly reduces the differences between ArcticDEMs and lidar-derived DTMs. Compared with filtering-based ground identification, our GMM-based method identifies more reliable ground observations with smaller vertical errors to lidar data. Moreover, natural neighbor performs more robustly in DTM interpolation than ordinary kriging at all study sites. Though specifically designed for DTM extraction from ArcticDEM, our proposed method could be transferred to any other forested regions besides the Arctic region. This study also highlighted the limitation of optical stereoscopy in ground identification over densely forested areas. In future work, optical stereoscopy with different acquisition conditions and other active remote sensing data from spaceborne lidar and SAR will be explored to supplement the ground observations to improve the overall DTM prediction. On the other hand, a simulation study will be designed to fully comprehend the performance of spatial interpolation techniques on optical stereoscopy based DTM estimation under various scenarios.  The extracted non-ground objects from the original DEM in MBG include small objects detected by morphological opening and non-ground segments outlined by segmentation and morphological region growing. The entire process of non-ground segmentation is comprised of four steps: (1) non-ground "seed points" identification by Canny edge detectors and Gaussian probabilistic voting; (2) initial segmentation by morphologically dilating the "seed points"; (3) segment expansion by appending the outer boundary pixels if their relative heights to the mean elevation of these segments are within the given tolerance values; (4) labeling the segment as a non-ground object if its mean elevation is greater than that of its outer perimeter. The user-defined parameters on MBG implementation include:</p><p>(1) size of structure element (ws, morphological opening) and elevation difference (t h ) for small object extraction, (2) two parameters (cannysigma, removeedgesvariance) to obtain clean edges from the Canny edge detector, (3) size of local filter (votemaxwinsize), height difference (voteheightdifference) and sigma of Gaussian kernel for "seed points" voting, (4) elevation difference (t h ) for initial segmentation, and (5) size of structure element (thickstep, morphological dilation), maximum iteration number (segiteration), upper (t u ), lower (t l ) and final threshold (t s ) of elevation difference for regional growing. For our study sites, we changed ws to 21, all other parameters remained the default.</p><p>The algorithm of MSD can be outlined in three steps: (1) correcting local topography by removing slope-induced topographic changes (local slope is estimated from the difference between the original and Gaussian blurred DEMs); (2) scanning all pixels from 8 directions using a local filter and identifying the lowest points as ground pixels; (3) classifying the remaining pixels into ground/non-ground based upon their height differences to the lowest points or slope changes to previous pixels. The user-defined parameters in MSD include the size of the local filter (iDistance, step 2), the threshold of height difference (dThrDeltaMin, step 3) and threshold of slope changes (dThrDeltaSlope, step 3). For parameter setting, here we changed dThrDeltaMin to 3.5 m and dThrDeltaSlope to 35 degrees. We additionally noticed that the ground sample distance (Spacing) in the source code was fixed to 1 m in computing SlopeLocal (Algorithm 1) <ref type="bibr">[26]</ref>. Given the spatial resolution of our data, we changed Spacing to 2 m.</p></div></body>
		</text>
</TEI>
