skip to main content

Title: Semi‐ and fully supervised quantification techniques to improve population estimates from machine classifiers

Modern in situ digital imaging systems collect vast numbers of images of marine organisms and suspended particles. Automated methods to classify objects in these images – largely supervised machine learning techniques – are now used to deal with this onslaught of biological data. Though such techniques can minimize the human cost of analyzing the data, they also have important limitations. In training automated classifiers, we implicitly program them with an inflexible understanding of the environment they are observing. When the relationship between the classifier and the population changes, the computer's performance degrades, potentially decreasing the accuracy of the estimate of community composition. This limitation of automated classifiers is known as “dataset shift.” Here, we describe techniques for addressing dataset shift. We then apply them to the output of a binary deep neural network searching for diatom chains in data generated by the Scripps Plankton Camera System (SPCS) on the Scripps Pier. In particular, we describe a supervised quantification approach to adjust a classifier's output using a small number of human corrected images to estimate the system error in a time frame of interest. This method yielded an 80% improvement in mean absolute error over the raw classifier output on a set of 41 independent samples from the SPCS. The technique can be extended to adjust the output of multi‐category classifiers and other in situ observing systems.

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Limnology and Oceanography: Methods
Page Range / eLocation ID:
p. 739-753
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Advances in both hardware and software are enabling rapid proliferation of in situ plankton imaging methods, requiring more effective machine learning approaches to image classification. Deep Learning methods, such as convolutional neural networks (CNNs), show marked improvement over traditional feature‐based supervised machine learning algorithms, but require careful optimization of hyperparameters and adequate training sets. Here, we document some best practices in applying CNNs to zooplankton and marine snow images and note where our results differ from contemporary Deep Learning findings in other domains. We boost the performance of CNN classifiers by incorporating metadata of different types and illustrate how to assimilate metadata beyond simple concatenation. We utilize both geotemporal (e.g., sample depth, location, time of day) and hydrographic (e.g., temperature, salinity, chlorophylla) metadata and show that either type by itself, or both combined, can substantially reduce error rates. Incorporation of context metadata also boosts performance of the feature‐based classifiers we evaluated: Random Forest, Extremely Randomized Trees, Gradient Boosted Classifier, Support Vector Machines, and Multilayer Perceptron. For our assessments, we use an original data set of 350,000 in situ images (roughly 50% marine snow and 50% non‐snow sorted into 26 categories) from a novel in situZooglider. We document asymptotically increasing performance with more computationally intensive techniques, such as substantially deeper networks and artificially augmented data sets. Our best model achieves 92.3% accuracy with our 27‐class data set. We provide guidance for further refinements that are likely to provide additional gains in classifier accuracy.

    more » « less
  2. ABSTRACT Introduction

    Between 5% and 20% of all combat-related casualties are attributed to burn wounds. A decrease in the mortality rate of burns by about 36% can be achieved with early treatment, but this is contingent upon accurate characterization of the burn. Precise burn injury classification is recognized as a crucial aspect of the medical artificial intelligence (AI) field. An autonomous AI system designed to analyze multiple characteristics of burns using modalities including ultrasound and RGB images is described.

    Materials and Methods

    A two-part dataset is created for the training and validation of the AI: in vivo B-mode ultrasound scans collected from porcine subjects (10,085 frames), and RGB images manually collected from web sources (338 images). The framework in use leverages an explanation system to corroborate and integrate burn expert’s knowledge, suggesting new features and ensuring the validity of the model. Through the utilization of this framework, it is discovered that B-mode ultrasound classifiers can be enhanced by supplying textural features. More specifically, it is confirmed that statistical texture features extracted from ultrasound frames can increase the accuracy of the burn depth classifier.


    The system, with all included features selected using explainable AI, is capable of classifying burn depth with accuracy and F1 average above 80%. Additionally, the segmentation module has been found capable of segmenting with a mean global accuracy greater than 84%, and a mean intersection-over-union score over 0.74.


    This work demonstrates the feasibility of accurate and automated burn characterization for AI and indicates that these systems can be improved with additional features when a human expert is combined with explainable AI. This is demonstrated on real data (human for segmentation and porcine for depth classification) and establishes the groundwork for further deep-learning thrusts in the area of burn analysis.

    more » « less
  3. Barany, A. ; Damsa, C. (Ed.)
    In quantitative ethnography (QE) studies which often involve large da-tasets that cannot be entirely hand-coded by human raters, researchers have used supervised machine learning approaches to develop automated classi-fiers. However, QE researchers are rightly concerned with the amount of human coding that may be required to develop classifiers that achieve the high levels of accuracy that QE studies typically require. In this study, we compare a neural network, a powerful traditional supervised learning ap-proach, with nCoder, an active learning technique commonly used in QE studies, to determine which technique requires the least human coding to produce a sufficiently accurate classifier. To do this, we constructed multi-ple training sets from a large dataset used in prior QE studies and designed a Monte Carlo simulation to test the performance of the two techniques sys-tematically. Our results show that nCoder can achieve high predictive accu-racy with significantly less human-coded data than a neural network. 
    more » « less
  4. null (Ed.)
    In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distribution or another user-installed binary, therefore this type of attack is called “Living-Off-The-Land”. Detecting these attacks is challenging, as adversaries may not create malicious files on the victim computers and anti-virus scans fail to detect them. We propose the design of an Active Learning framework called LOLAL for detecting Living-Off-the-Land attacks that iteratively selects a set of uncertain and anomalous samples for labeling by a human analyst. LOLAL is specifically designed to work well when a limited number of labeled samples are available for training machine learning models to detect attacks. We investigate methods to represent command-line text using word-embedding techniques, and design ensemble boosting classifiers to distinguish malicious and benign samples based on the embedding representation. We leverage a large, anonymized dataset collected by an endpoint security product and demonstrate that our ensemble classifiers achieve an average F1 score of 96% at classifying different attack classes. We show that our active learning method consistently improves the classifier performance, as more training data is labeled, and converges in less than 30 iterations when starting with a small number of labeled instances. 
    more » « less
  5. Parallel-laser photogrammetry is growing in popularity as a way to collect non-invasive body size data from wild mammals. Despite its many appeals, this method requires researchers to hand-measure (i) the pixel distance between the parallel laser spots (inter-laser distance) to produce a scale within the image, and (ii) the pixel distance between the study subject’s body landmarks (inter-landmark distance). This manual effort is time-consuming and introduces human error: a researcher measuring the same image twice will rarely return the same values both times (resulting in within-observer error), as is also the case when two researchers measure the same image (resulting in between-observer error). Here, we present two independent methods that automate the inter-laser distance measurement of parallel-laser photogrammetry images. One method uses machine learning and image processing techniques in Python, and the other uses image processing techniques in ImageJ. Both of these methods reduce labor and increase precision without sacrificing accuracy. We first introduce the workflow of the two methods. Then, using two parallel-laser datasets of wild mountain gorilla and wild savannah baboon images, we validate the precision of these two automated methods relative to manual measurements and to each other. We also estimate the reduction of variation in final body size estimates in centimeters when adopting these automated methods, as these methods have no human error. Finally, we highlight the strengths of each method, suggest best practices for adopting either of them, and propose future directions for the automation of parallel-laser photogrammetry data. 
    more » « less