skip to main content


Title: Machine learning techniques to characterize functional traits of plankton from image data
Abstract

Plankton imaging systems supported by automated classification and analysis have improved ecologists' ability to observe aquatic ecosystems. Today, we are on the cusp of reliably tracking plankton populations with a suite of lab‐based and in situ tools, collecting imaging data at unprecedentedly fine spatial and temporal scales. But these data have potential well beyond examining the abundances of different taxa; the individual images themselves contain a wealth of information on functional traits. Here, we outline traits that could be measured from image data, suggest machine learning and computer vision approaches to extract functional trait information from the images, and discuss promising avenues for novel studies. The approaches we discuss are data agnostic and are broadly applicable to imagery of other aquatic or terrestrial organisms.

 
more » « less
Award ID(s):
1655686 1637632
NSF-PAR ID:
10369897
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  more » ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;   « less
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Limnology and Oceanography
Volume:
67
Issue:
8
ISSN:
0024-3590
Page Range / eLocation ID:
p. 1647-1669
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Recent advances in high‐frequency environmental sensing and statistical approaches have greatly expanded the breadth of knowledge regarding aquatic ecosystem metabolism—the measurement and interpretation of gross primary productivity (GPP) and ecosystem respiration (ER). Aquatic scientists are poised to take advantage of widely available datasets and freely‐available modeling tools to apply functional information gained through ecosystem metabolism to help inform environmental management. Historically, several logistical and conceptual factors have limited the widespread application of metabolism in management settings. Benefitting from new instrumental and modeling tools, it is now relatively straightforward to extend routine monitoring of dissolved oxygen (DO) to dynamic measures of aquatic ecosystem function (GPP and ER) and key physical processes such as gas exchange with the atmosphere (G). We review the current approaches for using DO data in environmental management with a focus on the United States, but briefly describe management frameworks in Europe and Canada. We highlight new applications of diel DO data and metabolism in regulatory settings and explore how they can be applied to managing and monitoring ecosystems. We then review existing data types and provide a short guide for implementing field measurements and modeling of ecosystem metabolic processes using currently available tools. Finally, we discuss research needed to overcome current conceptual limitations of applying metabolism in management settings. Despite challenges associated with modeling metabolism in rivers and lakes, rapid developments in this field have moved us closer to utilizing real‐time estimates of GPP, ER, and G to improve the assessment and management of environmental change.

    This article is categorized under:

    Water and Life > Nature of Freshwater Ecosystems

    Water and Life > Conservation, Management, and Awareness

     
    more » « less
  2. As the basis of oceanic food webs and a key component of the biological carbon pump, planktonic organisms play major roles in the oceans. Their study benefited from the development of in situ imaging instruments, which provide higher spatio-temporal resolution than previous tools. But these instruments collect huge quantities of images, the vast majority of which are of marine snow particles or imaging artifacts. Among them, the In Situ Ichthyoplankton Imaging System (ISIIS) samples the largest water volumes (> 100 L s -1 ) and thus produces particularly large datasets. To extract manageable amounts of ecological information from in situ images, we propose to focus on planktonic organisms early in the data processing pipeline: at the segmentation stage. We compared three segmentation methods, particularly for smaller targets, in which plankton represents less than 1% of the objects: (i) a traditional thresholding over the background, (ii) an object detector based on maximally stable extremal regions (MSER), and (iii) a content-aware object detector, based on a Convolutional Neural Network (CNN). These methods were assessed on a subset of ISIIS data collected in the Mediterranean Sea, from which a ground truth dataset of > 3,000 manually delineated organisms is extracted. The naive thresholding method captured 97.3% of those but produced ~340,000 segments, 99.1% of which were therefore not plankton (i.e. recall = 97.3%, precision = 0.9%). Combining thresholding with a CNN missed a few more planktonic organisms (recall = 91.8%) but the number of segments decreased 18-fold (precision increased to 16.3%). The MSER detector produced four times fewer segments than thresholding (precision = 3.5%), missed more organisms (recall = 85.4%), but was considerably faster. Because naive thresholding produces ~525,000 objects from 1 minute of ISIIS deployment, the more advanced segmentation methods significantly improve ISIIS data handling and ease the subsequent taxonomic classification of segmented objects. The cost in terms of recall is limited, particularly for the CNN object detector. These approaches are now standard in computer vision and could be applicable to other plankton imaging devices, the majority of which pose a data management problem. 
    more » « less
  3. Abstract

    Microbial communities are essential components of aquatic ecosystems through their contribution to food web dynamics and biogeochemical processes. Aquatic microbial diversity is immense and a general challenge is to understand how metabolism and interactions of single organisms shape microbial community dynamics and ecosystem‐scale biogeochemical transformations. Metagenomic approaches have developed rapidly, and proven to be powerful in linking microbial community dynamics to biogeochemical processes. In this review, we provide an overview of metagenomic approaches, followed by a discussion on some recent insights they have provided, including those in this special issue. These include the discovery of new taxa and metabolisms in aquatic microbiomes, insights into community assembly and functional ecology as well as evolutionary processes shaping microbial genomes and microbiomes, and the influence of human activities on aquatic microbiomes. Given that metagenomics can now be considered a mature technology where data generation and descriptive analyses are relatively routine and informative, we then discuss metagenomic‐enabled research avenues to further link microbial dynamics to biogeochemical processes. These include the integration of metagenomics into well‐designed ecological experiments, the use of metagenomics to inform and validate metabolic and biogeochemical models, and the pressing need for ecologically relevant model organisms and simple microbial systems to better interpret the taxonomic and functional information integrated in metagenomes. These research avenues will contribute to a more mechanistic and predictive understanding of links between microbial dynamics and biogeochemical cycles. Owing to rapid climate change and human impacts on aquatic ecosystems, the urgency of such an understanding has never been greater.

     
    more » « less
  4. Abstract

    Mercury (Hg) methylation genes (hgcAB) mediate the formation of the toxic methylmercury and have been identified from diverse environments, including freshwater and marine ecosystems, Arctic permafrost, forest and paddy soils, coal‐ash amended sediments, chlor‐alkali plants discharges and geothermal springs. Here we present the first attempt at a standardized protocol for the detection, identification and quantification ofhgcgenes from metagenomes. Our Hg‐cycling microorganisms in aquatic and terrestrial ecosystems (Hg‐MATE) database, a catalogue ofhgcgenes, provides the most accurate information to date on the taxonomic identity and functional/metabolic attributes of microorganisms responsible for Hg methylation in the environment. Furthermore, we introduce “marky‐coco”, a ready‐to‐use bioinformatic pipeline based on de novo single‐metagenome assembly, for easy and accurate characterization ofhgcgenes from environmental samples. We compared the recovery ofhgcgenes from environmental metagenomes using the marky‐coco pipeline with an approach based on coassembly of multiple metagenomes. Our data show similar efficiency in both approaches for most environments except those with high diversity (i.e., paddy soils) for which a coassembly approach was preferred. Finally, we discuss the definition of truehgcgenes and methods to normalizehgcgene counts from metagenomes.

     
    more » « less
  5. The research data repository of the Environmental Data Initiative (EDI) is building on over 30 years of data curation research and experience in the National Science Foundation-funded US Long-Term Ecological Research (LTER) Network. It provides mature functionalities, well established workflows, and now publishes all ‘long-tail’ environmental data. High quality scientific metadata are enforced through automatic checks against community developed rules and the Ecological Metadata Language (EML) standard. Although the EDI repository is far along in making its data findable, accessible, interoperable, and reusable (FAIR), representatives from EDI and the LTER are developing best practices for the edge cases in environmental data publishing. One of these is the vast amount of imagery taken in the context of ecological research, ranging from wildlife camera traps to plankton imaging systems to aerial photography. Many images are used in biodiversity research for community analyses (e.g., individual counts, species cover, biovolume, productivity), while others are taken to study animal behavior and landscape-level change. Some examples from the LTER Network include: using photos of a heron colony to measure provisioning rates for chicks (Clarkson and Erwin 2018) or identifying changes in plant cover and functional type through time (Peters et al. 2020). Multi-spectral images are employed to identify prairie species. Underwater photo quads are used to monitor changes in benthic biodiversity (Edmunds 2015). Sosik et al. (2020) used a continuous Imaging FlowCytobot to identify and measure phyto- and microzooplankton. Cameras at McMurdo Dry Valleys assess snow and ice cover on Antarctic lakes allowing estimation of primary production (Myers 2019). It has been standard practice to publish numerical data extracted from images in EDI; however, the supporting imagery generally has not been made publicly available. Our goal in developing best practices for documenting and archiving these images is for them to be discovered and re-used. Our examples demonstrate several issues. The research questions, and hence, the image subjects are variable. Images frequently come in logical sets of time series. The size of such sets can be large and only some images may be contributed to a dedicated specialized repository. Finally, these images are taken in a larger monitoring context where many other environmental data are collected at the same time and location. Currently, a typical approach to publishing image data in EDI are packages containing compressed (ZIP or tar) files with the images, a directory manifest with additional image-specific metadata, and a package-level EML metadata file. Images in the compressed archive may be organized within directories with filenames corresponding to treatments, locations, time periods, individuals, or other grouping attributes. Additionally, the directory manifest table has columns for each attribute. Package-level metadata include standard coverage elements (e.g., date, time, location) and sampling methods. This approach of archiving logical ‘sets’ of images reduces the effort of providing metadata for each image when most information would be repeated, but at the expense of not making every image individually searchable. The latter may be overcome if the provided manifest contains standard metadata that would allow searching and automatic integration with other images. 
    more » « less