In this work, we address the problem of detecting anomalies in a certain laboratory automation setting. At first, we collect video images of liquid transfer in automated laboratory experiments. We mimic the real-world challenges of developing an anomaly detection model by considering two points. First, the size of the collected dataset is set to be relatively small compared to large-scale video datasets. Second, the dataset has a class imbalance problem where the majority of the collected videos are from abnormal events. Consequently, the existing learning-based video anomaly detection methods do not perform well. To this end, we develop a practical human-engineered feature extraction method to detect anomalies from the liquid transfer video images. Our simple yet effective method outperforms state-of-the-art anomaly detection methods with a notable margin. In particular, the proposed method provides 19% and 76% average improvement in AUC and Equal Error Rate, respectively. Our method also quantifies the anomalies and provides significant benefits for deployment in the real-world experimental setting.
more »
« less
Talking Detection in Collaborative Learning Environments.
We study the problem of detecting talking activities in collaborative learning videos. Our approach uses head detection and projections of the log-magnitude of optical flow vectors to reduce the problem to a simple classification of small projection images without the need for training complex, 3-D activity classification systems. The small projection images are then easily classified using a simple majority vote of standard classifiers. For talking detection, our proposed approach is shown to significantly outperform single activity systems. We have an overall accuracy of 59% compared to 42% for Temporal Segment Network (TSN) and 45% for Convolutional 3D (C3D). In addition, our method is able to detect multiple talking instances from multiple speakers, while also detecting the speakers themselves.
more »
« less
- PAR ID:
- 10310070
- Date Published:
- Journal Name:
- Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science.
- Volume:
- 13053
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep learning (DL) models have demonstrated state-of-the-art performance in the classification of diagnostic imaging in oncology. However, DL models for medical images can be compromised by adversarial images, where pixel values of input images are manipulated to deceive the DL model. To address this limitation, our study investigates the detectability of adversarial images in oncology using multiple detection schemes. Experiments were conducted on thoracic computed tomography (CT) scans, mammography, and brain magnetic resonance imaging (MRI). For each dataset we trained a convolutional neural network to classify the presence or absence of malignancy. We trained five DL and machine learning (ML)-based detection models and tested their performance in detecting adversarial images. Adversarial images generated using projected gradient descent (PGD) with a perturbation size of 0.004 were detected by the ResNet detection model with an accuracy of 100% for CT, 100% for mammogram, and 90.0% for MRI. Overall, adversarial images were detected with high accuracy in settings where adversarial perturbation was above set thresholds. Adversarial detection should be considered alongside adversarial training as a defense technique to protect DL models for cancer imaging classification from the threat of adversarial images.more » « less
-
Abstract Disassembly is an essential process for the recovery of end-of-life (EOL) electronics in remanufacturing sites. Nevertheless, the process remains labor-intensive due to EOL electronics’ high degree of uncertainty and complexity. The robotic technology can assist in improving disassembly efficiency; however, the characteristics of EOL electronics pose difficulties for robot operation, such as removing small components. For such tasks, detecting small objects is critical for robotic disassembly systems. Screws are widely used as fasteners in ordinary electronic products while having small sizes and varying shapes in a scene. To enable robotic systems to disassemble screws, the location information and the required tools need to be predicted. This paper proposes a computer vision framework for detecting screws and recommending related tools for disassembly. First, a YOLOv4 algorithm is used to detect screw targets in EOL electronic devices and a screw image extraction mechanism is executed based on the position coordinates predicted by YOLOv4. Second, after obtaining the screw images, the EfficientNetv2 algorithm is applied for screw shape classification. In addition to proposing a framework for automatic small-object detection, we explore how to modify the object detection algorithm to improve its performance and discuss the sensitivity of tool recommendations to the detection predictions. A case study of three different types of screws in EOL electronics is used to evaluate the performance of the proposed framework.more » « less
-
We introduce the problem of detecting a group of students from classroom videos. The problem requires the detection of students from different angles and the separation of the group from other groups in long videos (one to one and a half hours). We use multiple image representations to solve the problem. We use FM components to separate each group from background groups, AM-FM components for detecting the back-of-the-head, and YOLO for face detection. We use classroom videos from four different groups to validate our approach. Our use of multiple representations is shown to be significantly more accurate than the use of YOLO alone.more » « less
-
Aliannejadi, M; Faggioli, G; Ferro, N; Vlachos, M. (Ed.)This work discusses the participation of CS_Morgan in the Concept Detection and Caption Prediction tasks of the ImageCLEFmedical 2023 Caption benchmark evaluation campaign. The goal of this task is to automatically identify relevant concepts and their locations in images, as well as generate coherent captions for the images. The dataset used for this task is a subset of the extended Radiology Objects in Context (ROCO) dataset. The implementation approach employed by us involved the use of pre-trained Convolutional Neural Networks (CNNs), Vision Transformer (ViT), and Text-to-Text Transfer Transformer (T5) architectures. These models were leveraged to handle the different aspects of the tasks, such as concept detection and caption generation. In the Concept Detection task, the objective was to classify multiple concepts associated with each image. We utilized several deep learning architectures with ‘sigmoid’ activation to enable multilabel classification using the Keras framework. We submitted a total of five (5) runs for this task, and the best run achieved an F1 score of 0.4834, indicating its effectiveness in detecting relevant concepts in the images. For the Caption Prediction task, we successfully submitted eight (8) runs. Our approach involved combining the ViT and T5 models to generate captions for the images. For the caption prediction task, the ranking is based on the BERTScore, and our best run achieved a score of 0.5819 based on generating captions using the fine-tuned T5 model from keywords generated using the pretrained ViT as the encoder.more » « less
An official website of the United States government

