skip to main content

Title: Short communication: Landlab v2.0: A software package for Earth surface dynamics
Numerical simulation of the form and characteristics of Earth’s surface provides insight into its evolution. Landlab is an Open Source Python package that contains modularized elements of numerical models for Earth’s surface, thus reducing time required for researchers to create new or reimplement existing models. Landlab contains a gridding engine which represents the model domain as a dual graph of structured quadrilaterals (e.g., raster) or irregular Voronoi polygon-Delaunay triangle mesh (e.g., regular hexagons, radially symmetric meshes, fully irregular meshes). Landlab also contains components— modular implementations of single physical processes—and a suite of utilities which support numerical methods, input/output, and visualization. This contribution describes package development since version 1.0 and backward-compatibility breaking changes which necessitates the new major release, version 2.0. Substantial changes include refactoring the grid, improving the component standard interface, dropping Python 2 support, and creating 30 new components—for a total of 57 components in the Landlab package. We describe reasons why many changes were made in order to provide insight to designers of future packages. We conclude by discussing lessons about the dynamics of scientific software development gained from the experience of using, developing, maintaining, and teaching with Landlab.
Authors:
; ; ; ; ; ; ; ; ; ;
Award ID(s):
1831623
Publication Date:
NSF-PAR ID:
10171085
Journal Name:
Earth surface dynamics discussions
ISSN:
2196-6338
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Numerical simulation of the form and characteristics of Earth's surface provides insight into its evolution. Landlab is an open-source Python package that contains modularized elements of numerical models for Earth's surface, thus reducing time required for researchers to create new or reimplement existing models. Landlab contains a gridding engine which represents the model domain as a dual graph of structured quadrilaterals (e.g., raster) or irregular Voronoi polygon–Delaunay triangle mesh (e.g., regular hexagons, radially symmetric meshes, and fully irregular meshes). Landlab also contains components – modular implementations of single physical processes – and a suite of utilities that support numerical methods, input/output, and visualization. This contribution describes package development since version 1.0 and backward-compatibility-breaking changes that necessitate the new major release, version 2.0. Substantial changes include refactoring the grid, improving the component standard interface, dropping Python 2 support, and creating 31 new components – for a total of 58 components in the Landlab package. We describe reasons why many changes were made in order to provide insight for designers of future packages. We conclude by discussing lessons about the dynamics of scientific software development gained from the experience of using, developing, maintaining, and teaching with Landlab.
  2. Abstract. Models of landscape evolution provide insight into the geomorphic history of specific field areas, create testable predictions of landform development, demonstrate the consequences of current geomorphic process theory, and spark imagination through hypothetical scenarios. While the last 4 decades have brought the proliferation of many alternative formulations for the redistribution of mass by Earth surface processes, relatively few studies have systematically compared and tested these alternative equations. We present a new Python package, terrainbento 1.0, that enables multi-model comparison, sensitivity analysis, and calibration of Earth surface process models. Terrainbento provides a set of 28 model programs that implement alternative transport laws related to four process elements: hillslope processes, surface-water hydrology, erosion by flowing water, and material properties. The 28 model programs are a systematic subset of the 2048 possible numerical models associated with 11 binary choices. Each binary choice is related to one of these four elements – for example, the use of linear or nonlinear hillslope diffusion. Terrainbento is an extensible framework: base classes that treat the elements common to all numerical models (such as input/output and boundary conditions) make it possible to create a new numerical model without reinventing these common methods. Terrainbento is built on top ofmore »the Landlab framework such that new Landlab components directly support the creation of new terrainbento model programs. Terrainbento is fully documented, has 100 % unit test coverage including numerical comparison with analytical solutions for process models, and continuous integration testing. We support future users and developers with introductory Jupyter notebooks and a template for creating new terrainbento model programs. In this paper, we describe the package structure, process theory, and software implementation of terrainbento. Finally, we illustrate the utility of terrainbento with a benchmark example highlighting the differences in steady-state topography between five different numerical models.

    « less
  3. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do notmore »have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA.« less
  4. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should producemore »identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues.« less
  5. The marine-based West Antarctic Ice Sheet (WAIS) is currently retreating due to shifting wind-driven oceanic currents that transport warm waters toward the ice margin, resulting in ice shelf thinning and accelerated mass loss of the WAIS. Previous results from geologic drilling on Antarctica’s continental margins show significant variability in marine-based ice sheet extent during the late Neogene and Quaternary. Numerical models indicate a fundamental role for oceanic heat in controlling this variability over at least the past 20 My. Although evidence for past ice sheet variability has been collected in marginal settings, sedimentologic sequences from the outer continental shelf are required to evaluate the extent of past ice sheet variability and the associated oceanic forcings and feedbacks. International Ocean Discovery Program Expedition 374 drilled a latitudinal and depth transect of five drill sites from the outer continental shelf to rise in the eastern Ross Sea to resolve the relationship between climatic and oceanic change and WAIS evolution through the Neogene and Quaternary. This location was selected because numerical ice sheet models indicate that this sector of Antarctica is highly sensitive to changes in ocean heat flux. The expedition was designed for optimal data-model integration and will enable an improved understandingmore »of the sensitivity of Antarctic Ice Sheet (AIS) mass balance during warmer-than-present climates (e.g., the Pleistocene “super interglacials,” the mid-Pliocene, and the late early to middle Miocene). The principal goals of Expedition 374 were to • Evaluate the contribution of West Antarctica to far-field ice volume and sea level estimates; • Reconstruct ice-proximal atmospheric and oceanic temperatures to identify past polar amplification and assess its forcings and feedbacks; • Assess the role of oceanic forcing (e.g., sea level and temperature) on AIS stability/instability; • Identify the sensitivity of the AIS to Earth’s orbital configuration under a variety of climate boundary conditions; and • Reconstruct eastern Ross Sea paleobathymetry to examine relationships between seafloor geometry, ice sheet stability/instability, and global climate. To achieve these objectives, we will • Use data and models to reconcile intervals of maximum Neogene and Quaternary Antarctic ice advance with far-field records of eustatic sea level change; • Reconstruct past changes in oceanic and atmospheric temperatures using a multiproxy approach; • Reconstruct Neogene and Quaternary sea ice margin fluctuations in datable marine continental slope and rise records and correlate these records to existing inner continental shelf records; • Examine relationships among WAIS stability/instability, Earth’s orbital configuration, oceanic temperature and circulation, and atmospheric pCO2; and • Constrain the timing of Ross Sea continental shelf overdeepening and assess its impact on Neogene and Quaternary ice dynamics. Expedition 374 was carried out from January to March 2018, departing from Lyttelton, New Zealand. We recovered 1292.70 m of high-quality cores from five sites spanning the early Miocene to late Quaternary. Three sites were cored on the continental shelf (Sites U1521, U1522, and U1523). At Site U1521, we cored a 650 m thick sequence of interbedded diamictite, mudstone, and diatomite, penetrating the Ross Sea seismic Unconformity RSU4. The depositional reconstructions of past glacial and open-marine conditions at this site will provide unprecedented insight into environmental change on the Antarctic continental shelf during the early and middle Miocene. At Site U1522, we cored a discontinuous upper Miocene to Pleistocene sequence of glacial and glaciomarine strata from the outer shelf, with the primary objective to penetrate and date seismic Unconformity RSU3, which is interpreted to represent the first major continental shelf–wide expansion and coalescing of marine-based ice streams from both East and West Antarctica. At Site U1523, we cored a sediment drift located beneath the westerly flowing Antarctic Slope Current (ASC). Cores from this site will provide a record of the changing vigor of the ASC through time. Such a reconstruction will enable testing of the hypothesis that changes in the vigor of the ASC represent a key control on regulating heat flux onto the continental shelf, resulting in the ASC playing a fundamental role in ice sheet mass balance. We also cored two sites on the continental slope and rise. At Site U1524, we cored a Plio–Pleistocene sedimentary sequence on the continental rise on the levee of the Hillary Canyon, which is one of the largest conduits of Antarctic Bottom Water delivery from the Antarctic continental shelf into the abyssal ocean. Drilling at Site U1524 was intended to penetrate into middle Miocene and older strata but was initially interrupted by drifting sea ice that forced us to abandon coring in Hole U1524A at 399.5 m drilling depth below seafloor (DSF). We moved to a nearby alternate site on the continental slope (U1525) to core a single hole with a record complementary to the upper part of the section recovered at Site U1524. We returned to Site U1524 3 days later, after the sea ice cleared. We then cored Hole U1524C with the rotary core barrel with the intention of reaching the target depth of 1000 m DSF. However, we were forced to terminate Hole U1524C at 441.9 m DSF due to a mechanical failure with the vessel that resulted in termination of all drilling operations and a return to Lyttelton 16 days earlier than scheduled. The loss of 39% of our operational days significantly impacted our ability to achieve all Expedition 374 objectives as originally planned. In particular, we were not able to obtain the deeper time record of the middle Miocene on the continental rise or abyssal sequences that would have provided a continuous and contemporaneous archive to the high-quality (but discontinuous) record from Site U1521 on the continental shelf. The mechanical failure also meant we could not recover sediment cores from proposed Site RSCR-19A, which was targeted to obtain a high-fidelity, continuous record of upper Neogene and Quaternary pelagic/hemipelagic sedimentation. Despite our failure to recover a shelf-to-rise transect for the Miocene, a continental shelf-to-rise transect for the Pliocene to Pleistocene interval is possible through comparison of the high-quality records from Site U1522 with those from Site U1525 and legacy cores from the Antarctic Geological Drilling Project (ANDRILL).« less