skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Automatic plankton quantification using deep features
Abstract The study of marine plankton data is vital to monitor the health of the world’s oceans. In recent decades, automatic plankton recognition systems have proved useful to address the vast amount of data collected by specially engineered in situ digital imaging systems. At the beginning, these systems were developed and put into operation using traditional automatic classification techniques, which were fed with hand-designed local image descriptors (such as Fourier features), obtaining quite successful results. In the past few years, there have been many advances in the computer vision community with the rebirth of neural networks. In this paper, we leverage how descriptors computed using convolutional neural networks trained with out-of-domain data are useful to replace hand-designed descriptors in the task of estimating the prevalence of each plankton class in a water sample. To achieve this goal, we have designed a broad set of experiments that show how effective these deep features are when working in combination with state-of-the-art quantification algorithms.  more » « less
Award ID(s):
1655686
PAR ID:
10172325
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Journal of Plankton Research
Volume:
41
Issue:
4
ISSN:
0142-7873
Page Range / eLocation ID:
449 to 463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. An efficient feature selection method can significantly boost results in classification problems. Despite ongoing improvement, hand-designed methods often fail to extract features capturing high- and mid-level representations at effective levels. In machine learning (Deep Learning), recent developments have improved upon these hand-designed methods by utilizing automatic extraction of features. Specifically, Convolutional Neural Networks (CNNs) are a highly successful technique for image classification which can automatically extract features, with ongoing learning and classification of these features. The purpose of this study is to detect hydraulic structures (i.e., bridges and culverts) that are important to overland flow modeling and environmental applications. The dataset used in this work is a relatively small dataset derived from 1-m LiDAR-derived Digital Elevation Models (DEMs) and National Agriculture Imagery Program (NAIP) aerial imagery. The classes for our experiment consist of two groups: the ones with a bridge/culvert being present are considered "True", and those without a bridge/culvert are considered "False". In this paper, we use advanced CNN techniques, including Siamese Neural Networks (SNNs), Capsule Networks (CapsNets), and Graph Convolutional Networks (GCNs), to classify samples with similar topographic and spectral characteristics, an objective which is challenging utilizing traditional machine learning techniques, such as Support Vector Machine (SVM), Gaussian Classifier (GC), and Gaussian Mixture Model (GMM). The advanced CNN-based approaches combined with data pre-processing techniques (e.g., data augmenting) produced superior results. These approaches provide efficient, cost-effective, and innovative solutions to the identification of hydraulic structures. 
    more » « less
  2. Abstract In the last several years, there has been a surge in the development of machine learning potential (MLP) models for describing molecular systems. We are interested in a particular area of this field — the training of system‐specific MLPs for reactive systems — with the goal of using these MLPs to accelerate free energy simulations of chemical and enzyme reactions. To help new members in our labs become familiar with the basic techniques, we have put together a self‐guided Colab tutorial (https://cc-ats.github.io/mlp_tutorial/), which we expect to be also useful to other young researchers in the community. Our tutorial begins with the introduction of simple feedforward neural network (FNN) and kernel‐based (using Gaussian process regression, GPR) models by fitting the two‐dimensional Müller‐Brown potential. Subsequently, two simple descriptors are presented for extracting features of molecular systems: symmetry functions (including the ANI variant) and embedding neural networks (such as DeepPot‐SE). Lastly, these features will be fed into FNN and GPR models to reproduce the energies and forces for the molecular configurations in a Claisen rearrangement reaction. 
    more » « less
  3. Abstract. This paper presents the quantitative imaging datasets collected during the Tara Pacific expedition (2016–2018) carried out on the schooner Tara. The datasets cover a wide range of plankton sizes, from microphytoplankton (> 20 µm in size) to mesozooplankton (a few centimetres in size), and non-living particles such as plastic and detrital particles. It consists of surface samples collected across the North Atlantic and the North and South Pacific Ocean from open-ocean stations (a total of 357 samples) and from stations located in coastal waters, lagoons or reefs of 32 Pacific islands (a total of 228 samples). As this expedition involved long distances and long sailing times, we designed two sampling systems to collect plankton while sailing at speeds of up to 9 knots. To sample microplankton, surface water was pumped aboard using a customised pumping system and filtered through a 20 µm mesh size plankton net (hereafter referred to as the deck net – DN). A high-speed net (HSN; 330 µm mesh size) was developed to sample the mesoplankton. In addition, a manta net (330 µm) was also used, when possible, to collect mesoplankton and plastics simultaneously. We could not deploy these nets at the reef and lagoon stations of islands. Instead, two bongo nets (20 µm) attached to an underwater scooter were used to sample microplankton. In addition to describing and presenting the datasets, the complementary aim of this paper is to investigate and quantify the potential sampling biases associated with these two high-speed sampling systems and the different net types, in order to improve further ecological interpretations. Regarding the imaging techniques, microplankton (20–200 µm) from the DN and bongo net were imaged directly aboard Tara using a FlowCam instrument (Fluid Imaging Technologies), whereas mesoplankton (>200 µm) from the HSN and manta net were analysed in the laboratory with a ZooScan system (back on land). Organisms and other particles were taxonomically and morphologically classified using the automatic sorting tools of the EcoTaxa web application; following this, validation or correction was carried out by taxonomic experts. For microplankton smaller than 45 µm, a subsample of 30 % of the annotations was 100 % visually validated by experts. More than 300 different taxonomic and morphological groups were identified. The datasets include the metadata and the raw data from which morphological traits such as size (equivalent spherical diameter) and biovolume were calculated for each particle as well as a number of quantitative descriptors of the surface plankton communities. These descriptors include abundance, biovolumes, the Shannon diversity index and normalised biovolume size spectrum, allowing the study of their structures (e.g. taxonomic, functional, size and trophic structures) according to a wide range of environmental parameters at the basin scale (https://doi.org/10.5281/zenodo.6445609, Lombard et al., 2023). 
    more » « less
  4. Steric molecular descriptors designed for machine learning (ML) applications are critical for connecting structure-function relationships to mechanistic insight. However, many of these descriptors are not suitable for application to com-plex systems, such as catalyst reactive site pockets. In this context, we recently disclosed a new set of 3D steric molecular descriptors that were originally designed for dirhodium(II) tetra-carboxylate catalysts. Herein, we expand the Spatial Molding for Rigid Targets (SMART) descriptor toolkit by releasing SMARTpy; an automated, open-source Python API package for computational workflow integration of SMART descriptors. The impact of the structure of the molecular probe for generation of SMART descriptors was analyzed. Resultant SMART descriptors and pocket features were found to be highly dependent upon probe selection, and do not scale linearly. Flexible probes with smaller substituents can explore narrow pocket regions resulting in a higher resolution pocket imprint. Macrocyclic probes with larger substituents are more applicable to larger cavities with smooth boundaries, such as dirhodium paddlewheel complexes. In these cases, SMARTpy provides comparable descriptors to the original calculation method using UCSF Chimera. Finally, we analyzed a series of case studies demonstrating how SMART descriptors can impact other areas of catalysis, such as organocatalysis, biocatalysis, and protein pocket analysis. 
    more » « less
  5. Teachable object recognizers provide a solution for a very practical need for blind people – instance level object recognition. They assume one can visually inspect the photos they provide for training, a critical and inaccessible step for those who are blind. In this work, we engineer data descriptors that address this challenge. They indicate in real time whether the object in the photo is cropped or too small, a hand is included, the photos is blurred, and how much photos vary from each other. Our descriptors are built into open source testbed iOS app, called MYCam. In a remote user study in (N = 12) blind participants’ homes, we show how descriptors, even when error-prone, support experimentation and have a positive impact in the quality of training set that can translate to model performance though this gain is not uniform. Participants found the app simple to use indicating that they could effectively train it and that the descriptors were useful. However, many found the training being tedious, opening discussions around the need for balance between information, time, and cognitive load. 
    more » « less