skip to main content


Title: Benchmark Tests of Atom Segmentation Deep Learning Models with a Consistent Dataset
Abstract The information content of atomic-resolution scanning transmission electron microscopy (STEM) images can often be reduced to a handful of parameters describing each atomic column, chief among which is the column position. Neural networks (NNs) are high performance, computationally efficient methods to automatically locate atomic columns in images, which has led to a profusion of NN models and associated training datasets. We have developed a benchmark dataset of simulated and experimental STEM images and used it to evaluate the performance of two sets of recent NN models for atom location in STEM images. Both models exhibit high performance for images of varying quality from several different crystal lattices. However, there are important differences in performance as a function of image quality, and both models perform poorly for images outside the training data, such as interfaces with large difference in background intensity. Both the benchmark dataset and the models are available using the Foundry service for dissemination, discovery, and reuse of machine learning models.  more » « less
Award ID(s):
1931298
NSF-PAR ID:
10465237
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Microscopy and Microanalysis
Volume:
29
Issue:
2
ISSN:
1431-9276
Page Range / eLocation ID:
552 to 562
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Advances in visual perceptual tasks have been mainly driven by the amount, and types, of annotations of large-scale datasets. Researchers have focused on fully-supervised settings to train models using offline epoch-based schemes. Despite the evident advancements, limitations and cost of manually annotated datasets have hindered further development for event perceptual tasks, such as detection and localization of objects and events in videos. The problem is more apparent in zoological applications due to the scarcity of annotations and length of videos-most videos are at most ten minutes long. Inspired by cognitive theories, we present a self-supervised perceptual prediction framework to tackle the problem of temporal event segmentation by building a stable representation of event-related objects. The approach is simple but effective. We rely on LSTM predictions of high-level features computed by a standard deep learning backbone. For spatial segmentation, the stable representation of the object is used by an attention mechanism to filter the input features before the prediction step. The self-learned attention maps effectively localize the object as a side effect of perceptual prediction. We demonstrate our approach on long videos from continuous wildlife video monitoring, spanning multiple days at 25 FPS. We aim to facilitate automated ethogramming by detecting and localizing events without the need for labels. Our approach is trained in an online manner on streaming input and requires only a single pass through the video, with no separate training set. Given the lack of long and realistic (includes real-world challenges) datasets, we introduce a new wildlife video dataset–nest monitoring of the Kagu (a flightless bird from New Caledonia)–to benchmark our approach. Our dataset features a video from 10 days (over 23 million frames) of continuous monitoring of the Kagu in its natural habitat. We annotate every frame with bounding boxes and event labels. Additionally, each frame is annotated with time-of-day and illumination conditions. We will make the dataset, which is the first of its kind, and the code available to the research community. We find that the approach significantly outperforms other self-supervised, traditional (e.g., Optical Flow, Background Subtraction) and NN-based (e.g., PA-DPC, DINO, iBOT), baselines and performs on par with supervised boundary detection approaches (i.e., PC). At a recall rate of 80%, our best performing model detects one false positive activity every 50 min of training. On average, we at least double the performance of self-supervised approaches for spatial segmentation. Additionally, we show that our approach is robust to various environmental conditions (e.g., moving shadows). We also benchmark the framework on other datasets (i.e., Kinetics-GEBD, TAPOS) from different domains to demonstrate its generalizability. The data and code are available on our project page:https://aix.eng.usf.edu/research_automated_ethogramming.html

     
    more » « less
  2. This work presents a novel deep learning architecture called BNU-Net for the purpose of cardiac segmentation based on short-axis MRI images. Its name is derived from the Batch Normalized (BN) U-Net architecture for medical image segmentation. New generations of deep neural networks (NN) are called convolutional NN (CNN). CNNs like U-Net have been widely used for image classification tasks. CNNs are supervised training models which are trained to learn hierarchies of features automatically and robustly perform classification. Our architecture consists of an encoding path for feature extraction and a decoding path that enables precise localization. We compare this approach with a parallel approach named U-Net. Both BNU-Net and U-Net are cardiac segmentation approaches: while BNU-Net employs batch normalization to the results of each convolutional layer and applies an exponential linear unit (ELU) approach that operates as activation function, U-Net does not apply batch normalization and is based on Rectified Linear Units (ReLU). The presented work (i) facilitates various image preprocessing techniques, which includes affine transformations and elastic deformations, and (ii) segments the preprocessed images using the new deep learning architecture. We evaluate our approach on a dataset containing 805 MRI images from 45 patients. The experimental results reveal that our approach accomplishes comparable or better performance than other state-of-the-art approaches in terms of the Dice coefficient and the average perpendicular distance. 
    more » « less
  3. This work presents a novel deep learning architecture called BNU-Net for the purpose of cardiac segmentation based on short-axis MRI images. Its name is derived from the Batch Normalized (BN) U-Net architecture for medical image segmentation. New generations of deep neural networks (NN) are called convolutional NN (CNN). CNNs like U-Net have been widely used for image classification tasks. CNNs are supervised training models which are trained to learn hierarchies of features automatically and robustly perform classification. Our architecture consists of an encoding path for feature extraction and a decoding path that enables precise localization. We compare this approach with a parallel approach named U-Net. Both BNU-Net and U-Net are cardiac segmentation approaches: while BNU-Net employs batch normalization to the results of each convolutional layer and applies an exponential linear unit (ELU) approach that operates as activation function, U-Net does not apply batch normalization and is based on Rectified Linear Units (ReLU). The presented work (i) facilitates various image preprocessing techniques, which includes affine transformations and elastic deformations, and (ii) segments the preprocessed images using the new deep learning architecture. We evaluate our approach on a dataset containing 805 MRI images from 45 patients. The experimental results reveal that our approach accomplishes comparable or better performance than other state-of-the-art approaches in terms of the Dice coefficient and the average perpendicular distance. Index Terms—Magnetic Resonance Imaging; Batch Normalization; Exponential Linear Units 
    more » « less
  4. null (Ed.)
    Artificial neural networks (NNs) in deep learning systems are critical drivers of emerging technologies such as computer vision, text classification, and natural language processing. Fundamental to their success is the development of accurate and efficient NN models. In this article, we report our work on Deep-n-Cheap—an open-source automated machine learning (AutoML) search framework for deep learning models. The search includes both architecture and training hyperparameters and supports convolutional neural networks and multi-layer perceptrons, applicable to multiple domains. Our framework is targeted for deployment on both benchmark and custom datasets, and as a result, offers a greater degree of search space customizability as compared to a more limited search over only pre-existing models from literature. We also introduce the technique of ‘search transfer’, which demonstrates the generalization capabilities of the models found by our framework to multiple datasets. Deep-n-Cheap includes a user-customizable complexity penalty which trades off performance with training time or number of parameters. Specifically, our framework can find models with performance comparable to state-of-the- art while taking 1–2 orders of magnitude less time to train than models from other AutoML and model search frameworks. Additionally, we investigate and develop insight into the search process that should aid future development of deep learning models. 
    more » « less
  5. For decades, atomistic modeling has played a crucial role in predicting the behavior of materials in numerous fields ranging from nanotechnology to drug discovery. The most accurate methods in this domain are rooted in first-principles quantum mechanical calculations such as density functional theory (DFT). Because these methods have remained computationally prohibitive, practitioners have traditionally focused on defining physically motivated closed-form expressions known as empirical interatomic potentials (EIPs) that approximately model the interactions between atoms in materials. In recent years, neural network (NN)-based potentials trained on quantum mechanical (DFT-labeled) data have emerged as a more accurate alternative to conventional EIPs. However, the generalizability of these models relies heavily on the amount of labeled training data, which is often still insufficient to generate models suitable for general-purpose applications. In this paper, we propose two generic strategies that take advantage of unlabeled training instances to inject domain knowledge from conventional EIPs to NNs in order to increase their generalizability. The first strategy, based on weakly supervised learning, trains an auxiliary classifier on EIPs and selects the best-performing EIP to generate energies to supplement the ground-truth DFT energies in training the NN. The second strategy, based on transfer learning, first pretrains the NN on a large set of easily obtainable EIP energies, and then fine-tunes it on ground-truth DFT energies. Experimental results on three benchmark datasets demonstrate that the first strategy improves baseline NN performance by 5% to 51% while the second improves baseline performance by up to 55%. Combining them further boosts performance. 
    more » « less