skip to main content

Title: Scaling of classification systems—effects of class precision on detection accuracy from medium resolution multispectral data
Abstract Context

Land-cover class definitions are scale-dependent. Up-scaling categorical data must account for that dependence, but most decision rules aggregating categorical data do not produce scale-specific class definitions. However, non-hierarchical, empirically derived classification systems common in phytosociology define scale-specific classes using species co-occurrence patterns.


Evaluate tradeoffs in class precision and representativeness when up-scaling categorical data across natural landscapes using the multi-dimensional grid-point (MDGP)-scaling algorithm, which generates scale-specific class definitions; and compare spectral detection accuracy of MDGP-scaled classes to ‘majority-rule’ aggregated classes.


Vegetation maps created from 2-m resolution WorldView-2 data for two Everglades wetland areas were scaled to the 30-m Landsat grid with the MDGP-scaling algorithm. A full-factorial analysis evaluated the effects of scaled class-label precision and class representativeness on compositional information loss and detection accuracy of scaled classes from multispectral Landsat data.


MDGP‐scaling retained between 3.8 and 27.9% more compositional information than the majority rule as class-label precision increased. Increasing class-label precision and information retention also increased spectral class detection accuracy from Landsat data between 1 and 8.6%. Rare class removal and increase in class-label similarity were controlled by the class representativeness threshold, leading to higher detection accuracy than the majority rule as class representativeness increased.


When up-scaling categorical data across natural landscapes, negotiating trade-offs in thematic precision, landscape-scale class representativeness and increased information retention in the scaled map results in greater class-detection accuracy from lower-resolution, multispectral, remotely sensed data. MDGP-scaling provides a framework to weigh tradeoffs and to make informed decisions on parameter selection.

more » « less
Award ID(s):
Author(s) / Creator(s):
Publisher / Repository:
Date Published:
Journal Name:
Landscape Ecology
Page Range / eLocation ID:
Subject(s) / Keyword(s):
["Categorical data","Classification systems","MDGP","Multi-dimensional grid-point scaling","Remote sensing","Phytosociology","Relative class abundance","Scale dependence"]
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Machine learning allows “the machine” to deduce the complex and sometimes unrecognized rules governing spatial systems, particularly topographic mapping, by exposing it to the end product. Often, the obstacle to this approach is the acquisition of many good and labeled training examples of the desired result. Such is the case with most types of natural features. To address such limitations, this research introduces GeoNat v1.0, a natural feature dataset, used to support artificial intelligence‐based mapping and automated detection of natural features under a supervised learning paradigm. The dataset was created by randomly selecting points from the U.S. Geological Survey’s Geographic Names Information System and includes approximately 200 examples each of 10 classes of natural features. Resulting data were tested in an object‐detection problem using a region‐based convolutional neural network. The object‐detection tests resulted in a 62% mean average precision as baseline results. Major challenges in developing training data in the geospatial domain, such as scale and geographical representativeness, are addressed in this article. We hope that the resulting dataset will be useful for a variety of applications and shed light on training data collection and labeling in the geospatial artificial intelligence domain.

    more » « less
  2. Matthews, MB (Ed.)
    The generalization power of deep-learning models is dependent on rich-labelled data. This supervision using large-scaled annotated information is restrictive in most realworld scenarios where data collection and their annotation involve huge cost. Various domain adaptation techniques exist in literature that bridge this distribution discrepancy. However, a majority of these models require the label sets of both the domains to be identical. To tackle a more practical and challenging scenario, we formulate the problem statement from a partial domain adaptation perspective, where the source label set is a super set of the target label set. Driven by the motivation that image styles are private to each domain, in this work, we develop a method that identifies outlier classes exclusively from image content information and train a label classifier exclusively on class-content from source images. Additionally, elimination of negative transfer of samples from classes private to the source domain is achieved by transforming the soft class-level weights into two clusters, 0 (outlier source classes) and 1 (shared classes) by maximizing the between-cluster variance between them. 
    more » « less
  3. Abstract Aim

    We examined body size scaling relationships for two developmental life stages of parasitic helminths (egg and adult) separately in relationship to latitude (i.e. Bergmann's rule), temperature and temperature seasonality. Given that helminth eggs experience environmental conditions more directly, whereas adults live inside infected host individuals, we predict stronger environmentally driven gradients of body size for eggs than for adults.



    Time period

    Present day.

    Major taxa studied

    Parasitic helminths.


    We compiled egg size and adult body size data (both minimum and maximum) for 265 parasitic helminth species from the literature, along with species latitudinal distribution information using an extensive global helminth occurrence database. We then examined how the average helminth egg and adult body size of all helminth species present (minimum and maximum separately) scaled with latitude, temperature and temperature variability, using generalized linear models.


    Both the egg size and the adult body size of helminths tended to decrease towards higher latitudes, although we found the opposite body size scaling pattern for their host species. Helminth sizes were also positively correlated with temperature and negatively, but more weakly, with temperature seasonality.

    Main conclusions

    Instead of following the body size patterns of their hosts, helminth parasites are more similar to other ectotherms in that they follow the converse Bergmann's rule. This pattern did not differ between helminth developmental stages, suggesting that mean annual temperature and seasonality are unlikely to be related mechanistically to body size variation in this case.

    more » « less
  4. The Dynamic World product is a discrete land cover classification of Sentinel 2 reflectance imagery that is global in extent, retrospective to 2015, and updated continuously in near real time. The classifier is trained on a stratified random sample of 20,000 hand-labeled 5 × 5 km Sentinel 2 tiles spanning 14 biomes globally. Since the training data are based on visual interpretation of image composites by both expert and non-expert annotators, without explicit spectral properties specified in the class definitions, the spectral characteristics of the classes are not obvious. The objective of this study is to quantify the physical distinctions among the land cover classes by characterizing the spectral properties of the range of reflectance present within each of the Dynamic World classes over a variety of landscapes. This is achieved by comparing both the eight-class probability feature space (excluding snow) and the maximum probability class assignment (label) distributions to continuous land cover fraction estimates derived from a globally standardized spectral mixture model. Standardized substrate, vegetation, and dark (SVD) endmembers are used to unmix nine Sentinel 2 reflectance tiles from nine spectral diversity hotspots for comparison between the SVD land cover fraction continua and the Dynamic World class probability continua and class assignments. The variance partition for the class probability feature spaces indicates that eight of these nine hotspots are effectively five-dimensional to 95% of variance. Class probability feature spaces of the hotspots all show a tetrahedral form with probability continua spanning multiple classes. Comparison of SVD land cover fraction distributions with maximum probability class assignments (labels) and probability feature space distributions reveal a clear distinction between (1) physically and spectrally heterogeneous biomes characterized by continuous gradations in vegetation density, substrate albedo, and structural shadow fractions, and (2) more homogeneous biomes characterized by closed canopy vegetation (forest) or negligible vegetation cover (e.g., desert, water). Due to the ubiquity of spectrally heterogeneous biomes worldwide, the class probability feature space adds considerable value to the Dynamic World maximum probability class labels by offering users the opportunity to depict inherently gradational heterogeneous landscapes otherwise not generally offered with other discrete thematic classifications.

    more » « less
  5. Abstract Background

    Stable isotope probing (SIP) approaches are a critical tool in microbiome research to determine associations between species and substrates, as well as the activity of species. The application of these approaches ranges from studying microbial communities important for global biogeochemical cycling to host-microbiota interactions in the intestinal tract. Current SIP approaches, such as DNA-SIP or nanoSIMS allow to analyze incorporation of stable isotopes with high coverage of taxa in a community and at the single cell level, respectively, however they are limited in terms of sensitivity, resolution or throughput.


    Here, we present an ultra-sensitive, high-throughput protein-based stable isotope probing approach (Protein-SIP), which cuts cost for labeled substrates by 50–99% as compared to other SIP and Protein-SIP approaches and thus enables isotope labeling experiments on much larger scales and with higher replication. The approach allows for the determination of isotope incorporation into microbiome members with species level resolution using standard metaproteomics liquid chromatography-tandem mass spectrometry (LC–MS/MS) measurements. At the core of the approach are new algorithms to analyze the data, which have been implemented in an open-source software ( We demonstrate sensitivity, precision and accuracy using bacterial cultures and mock communities with different labeling schemes. Furthermore, we benchmark our approach against two existing Protein-SIP approaches and show that in the low labeling range used our approach is the most sensitive and accurate. Finally, we measure translational activity using18O heavy water labeling in a 63-species community derived from human fecal samples grown on media simulating two different diets. Activity could be quantified on average for 27 species per sample, with 9 species showing significantly higher activity on a high protein diet, as compared to a high fiber diet. Surprisingly, among the species with increased activity on high protein were severalBacteroidesspecies known as fiber consumers. Apparently, protein supply is a critical consideration when assessing growth of intestinal microbes on fiber, including fiber-based prebiotics.


    We demonstrate that our Protein-SIP approach allows for the ultra-sensitive (0.01 to 10% label) detection of stable isotopes of elements found in proteins, using standard metaproteomics data.

    more » « less