skip to main content

Title: Computing water flow through complex landscapes – Part 3: Fill–Spill–Merge: flow routing in depression hierarchies
Abstract. Depressions – inwardly draining regions – are common to many landscapes. When there is sufficient moisture, depressions take the form of lakes and wetlands; otherwise, they may be dry. Hydrological flow models used in geomorphology, hydrology, planetary science, soil and water conservation, and other fields often eliminate depressions through filling or breaching; however, this can produce unrealistic results. Models that retain depressions, on the other hand, are often undesirably expensive to run. In previous work we began to address this by developing a depression hierarchy data structure to capture the full topographic complexity of depressions in a region. Here, we extend this work by presenting the Fill–Spill–Merge algorithm that utilizes our depression hierarchy data structure to rapidly process and distribute runoff. Runoff fills depressions, which then overflow and spill into their neighbors. If both a depression and its neighbor fill, they merge. We provide a detailed explanation of the algorithm and results from two sample study areas. In these case studies, the algorithm runs 90–2600 times faster (with a reduction in compute time of 2000–63 000 times) than the commonly used Jacobi iteration and produces a more accurate output. Complete, well-commented, open-source code with 97 % test coverage is available on GitHub and more » Zenodo. « less
Authors:
; ;
Award ID(s):
1903606
Publication Date:
NSF-PAR ID:
10263903
Journal Name:
Earth Surface Dynamics
Volume:
9
Issue:
1
Page Range or eLocation-ID:
105 to 121
ISSN:
2196-632X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Depressions – inwardly draining regions of digital elevation models – present difficulties for terrain analysis and hydrological modeling. Analogous “depressions” also arise in image processing and morphological segmentation, where they may represent noise, features of interest, or both. Here we provide a new data structure – the depression hierarchy – that captures the full topologic and topographic complexity of depressions in a region. We treat depressions as networks in a way that is analogous to surface-water flow paths, in which individual sub-depressions merge together to form meta-depressions in a process that continues until they begin to drain externally. Thismore »hierarchy can be used to selectively fill or breach depressions or to accelerate dynamic models of hydrological flow. Complete, well-commented, open-source code and correctness tests are available on GitHub and Zenodo.« less
  2. Abstract. Calculating flow routing across a landscape is a routine process in geomorphology, hydrology, planetary science, and soil and water conservation. Flow-routing calculations often require a preprocessing step to remove depressions from a DEM to create a “flow-routing surface” that can host a continuous, integrated drainage network. However, real landscapes contain natural depressions that trap water. These are an important part of the hydrologic system and should be represented in flow-routing surfaces. Historically, depressions (or “pits”) in DEMs have been viewed as data errors, but the rapid expansion of high-resolution, high-precision DEM coverage increases the likelihood that depressions are real-worldmore »features. To address this long-standing problem of emerging significance, we developed FlowFill, an algorithm that routes a prescribed amount of runoff across the surface in order to flood depressions if enough water is available. This mass-conserving approach typically floods smaller depressions and those in wet areas, integrating drainage across them, while permitting internal drainage and disruptions to hydrologic connectivity. We present results from two sample study areas to which we apply a range of uniform initial runoff depths and report the resulting filled and unfilled depressions, the drainage network structure, and the required compute time. For the reach- to watershed-scale examples that we ran, FlowFill compute times ranged from approximately 1 to 30 min, with compute times per cell of 0.0001 to 0.006 s.« less
  3. The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behaviormore »understanding. We present a new method to address the problems of personalized behavior classification and interpretability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of-the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future« less
  4. Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard 15 drives), parallelism and bank conflicts (in SSDs), costs to transfer data, and firmware-internal operations. The Disk-access Machine (DAM) model simplifies reality by assuming that storage devices transfer data in blocks of size B and that all transfers have unit cost. Despite its simplifications, the DAM model is reasonably accurate. In fact, if B is set to the half-bandwidth point, where the latency and bandwidth of the hardware are equal, then the DAM approximates the IO cost on any hardware to within a factormore »of 2. Furthermore, the DAM model explains the popularity of B-trees in the 1970s and the current popularity of Bε -trees and log-structured merge trees. But it fails to explain why some B-trees use small nodes, whereas all Bε -trees use large nodes. In a DAM, all IOs, and hence all nodes, are the same size. In this article, we show that the affine and PDAM models, which are small refinements of the DAM model, yield a surprisingly large improvement in predictability without sacrificing ease of use. We present benchmarks on a large collection of storage devices showing that the affine and PDAM models give good approximations of the performance characteristics of hard drives and SSDs, respectively. We show that the affine model explains node-size choices in B-trees and Bε -trees. Furthermore, the models predict that B-trees are highly sensitive to variations in the node size, whereas Bε -trees are much less sensitive. These predictions are born out empirically. Finally, we show that in both the affine and PDAM models, it pays to organize data structures to exploit varying IO size. In the affine model, Bε -trees can be optimized so that all operations are simultaneously optimal, even up to lower-order terms. In the PDAM model, Bε -trees (or B-trees) can be organized so that both sequential and concurrent workloads are handled efficiently. We conclude that the DAM model is useful as a first cut when designing or analyzing an algorithm or data structure but the affine and PDAM models enable the algorithm designer to optimize parameter choices and fill in design details.« less
  5. null (Ed.)
    Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard drives), parallelism and bank conflicts (in SSDs), costs to transfer data, and firmware-internal operations. The Disk-access Machine (DAM) model simplifies reality by assuming that storage devices transfer data in blocks of size B and that all transfers have unit cost. Despite its simplifications, the DAM model is reasonably accurate. In fact, if B is set to the half-bandwidth point, where the latency and bandwidth of the hardware are equal, then the DAM approximates the IO cost on any hardware to within a factor of 2.more »Furthermore, the DAM model explains the popularity of B-trees in the 1970s and the current popularity of B ɛ -trees and log-structured merge trees. But it fails to explain why some B-trees use small nodes, whereas all B ɛ -trees use large nodes. In a DAM, all IOs, and hence all nodes, are the same size. In this article, we show that the affine and PDAM models, which are small refinements of the DAM model, yield a surprisingly large improvement in predictability without sacrificing ease of use. We present benchmarks on a large collection of storage devices showing that the affine and PDAM models give good approximations of the performance characteristics of hard drives and SSDs, respectively. We show that the affine model explains node-size choices in B-trees and B ɛ -trees. Furthermore, the models predict that B-trees are highly sensitive to variations in the node size, whereas B ɛ -trees are much less sensitive. These predictions are born out empirically. Finally, we show that in both the affine and PDAM models, it pays to organize data structures to exploit varying IO size. In the affine model, B ɛ -trees can be optimized so that all operations are simultaneously optimal, even up to lower-order terms. In the PDAM model, B ɛ -trees (or B-trees) can be organized so that both sequential and concurrent workloads are handled efficiently. We conclude that the DAM model is useful as a first cut when designing or analyzing an algorithm or data structure but the affine and PDAM models enable the algorithm designer to optimize parameter choices and fill in design details.« less