Abstract. Processing Earth observation data modelled in a time-series of raster format is critical to solving some of the most complex problems in geospatial science ranging from climate change to public health. Researchers are increasingly working with these large raster datasets that are often terabytes in size. At this scale, traditional GIS methods may fail to handle the processing, and new approaches are needed to analyse these datasets. The objective of this work is to develop methods to interactively analyse big raster datasets with the goal of most efficiently extracting vector data over specific time periods from any set of raster data. In this paper, we describe RINX (Raster INformation eXtraction) which is an end-to-end solution for automatic extraction of information from large raster datasets. RINX heavily utilises open source geospatial techniques for information extraction. It also complements traditional approaches with state-of-the- art high-performance computing techniques. This paper discusses details of achieving big temporal data extraction with RINX, implemented on the use case of air quality and climate data extraction for long term health studies, which includes methods used, code developed, processing time statistics, project conclusions, and next steps.
more »
« less
Updated TreeMap2016 raster with SDI, SDImax and RD estimates
This is an updated version of the original TREEMAP 2016 raster and the associated files for CONUS. Additions to the TREEMAP 2016 raster attribute table are the SDI, SDImax and RD estimates.
more »
« less
- Award ID(s):
- 1915078
- PAR ID:
- 10651075
- Publisher / Repository:
- figshare
- Date Published:
- Subject(s) / Keyword(s):
- Forestry fire management Forest ecosystems Forestry management and environment Forestry biomass and bioproducts
- Format(s):
- Medium: X Size: 5215437762 Bytes
- Size(s):
- 5215437762 Bytes
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This dataset consists of raster files predicting spatial patterns in soils for the entire Hubbard Brook Experimental Forest. Eight soil units are used, following a hydropedologic approach, based on relationships between soil genetic horizon presence and thickness, and the frequency and depth of groundwater fluctuations. Nine raster files on a five-meter grid are presented, including one raster each showing the probability of presence of each of the eight soil units; the ninth raster represents the soil unit most likely to be present at each grid cell. The methods section of the metadata includes descriptions of the eight soil units and guidance for users of the model outputs. These data were gathered as part of the Hubbard Brook Ecosystem Study (HBES). The HBES is a collaborative effort at the Hubbard Brook Experimental Forest, which is operated and maintained by the USDA Forest Service, Northern Research Station.more » « less
-
Advancements in remote sensing technology allowed for collecting vast amounts of satellite and aerial imagery with up to 1 cm pixel resolutions, stored in raster format crucial for various research fields. However, processing this data poses challenges, including resolving data dependencies when location, resolution, and coordinate systems do not align and managing large datasets within memory constraints. This paper introduces RDPro, a novel Spark-based system that efficiently processes and analyzes large raster datasets. RDPro features a new data model tailored for data dependencies in a distributed, shared-nothing environment, complete with tools for loading and writing raster data. It also optimizes core raster operations within Spark, allowing users to integrate complex data science workflows. Comparative analysis shows RDPro outperforms existing systems by up to two orders of magnitude.more » « less
-
Most molecular diagram parsers recover chemical structure from raster images (e.g., PNGs). However, many PDFs include commands giving explicit locations and shapes for characters, lines, and polygons. We present a new parser that uses these born-digital PDF primitives as input. The parsing model is fast and accurate, and does not require GPUs, Optical Character Recognition (OCR), or vectorization. We use the parser to annotate raster images and then train a new multi-task neural network for recognizing molecules in raster images.We evaluate our parsers using SMILES and standard benchmarks, along with a novel evaluation protocol comparing molecular graphs directly that supports automatic error compilation and reveals errors missed by SMILES-based evaluation. On the synthetic USPTObenchmark, our born-digital parser obtains a recognition rate of 98.4% (1% higher than previous models) and our relatively simple neural parser for raster images obtains a rate of 85% using less training data than existing neural approaches (thousands vs. millions of molecules).more » « less
-
The National Agricultural Statistics Service, the statistical arm of the US Department of Agriculture, and the Multi-Resolution Land Characteristics Consortium, a group of the US federal agencies, collect and publish several land-use and land-cover data sets. The aim of this study is to analyze the consistency of forestland estimates based on two widely used, publicly available products: the National Land-Cover Database (NLCD) and Cropland Data Layer (CDL). Both remote-sensing-based products provide raster-formatted land-cover categorization at a spatial resolution of 30 m. Although the processing of the yearly published CDL non-agricultural land-cover data is based on less frequently updated NLCD, the consistency of large-area forestland mapping between these two datasets has not been assessed. To assess the similarities and the differences between CDL- and NLCD-based forestland mappings for the state of North Carolina, we overlay the two data products for the years 2011 and 2016 in ArcMap 10.5.1 and analyze the location and attributes of the matched and mismatched forestland. We find that the mismatch is relatively smaller for the areas of the state where forests occupy larger shares of the total land, and that the relative mismatch is smaller in 2011 when compared to 2016. We also find that a large portion of the forestland mismatch is attributable to the dynamics of re-growth of periodically harvested and otherwise disturbed forests. Our results underscore the need for a holistic approach to data preparation, data attribution, and data accuracy when performing high-scale map-based analyses using each of these products.more » « less
An official website of the United States government
