skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 1, 2026

Title: Point-cloud based machine learning for classifying rare events in the Active-Target Time Projection Chamber
In this work, we assess the use of machine learning to classify fission events in the Active Target Time Projection Chamber (AT-TPC) using data from an experiment performed at the National Superconducting Cyclotron Laboratory at Michigan State University. The experiment produces an extremely large quantity of data, less than 3% of which are fission events. Therefore, separating fission events from the background beam events is critical to an efficient analysis. A heuristic method was developed to classify events as Fission or Non-Fission based on hand-tuned parameters. However, this heuristic method places 5% of all events into an Unlabeled category, including 15% of all fission events. We present a PointNet model trained on the data labeled by the heuristic method. This model is then used to generate labels for the events in the Unlabeled category. Using the heuristic and machine learning methods together, we can successfully identify 99% of fission events.  more » « less
Award ID(s):
2012865 2209145
PAR ID:
10584914
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment
Volume:
1072
Issue:
C
ISSN:
0168-9002
Page Range / eLocation ID:
170002
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Hu-mans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on un-labeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90% ac-curacy on four benchmark datasets including topic and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name1. 
    more » « less
  2. Atmospheric gravity waves are produced when gravity attempts to restore disturbances through stable layers in the atmosphere. They have a visible effect on many atmospheric phenomena such as global circulation and air turbulence. Despite their importance, however, little research has been conducted on how to detect gravity waves using machine learning algorithms. We faced two major challenges in our research: our raw data had a lot of noise and the labeled dataset was extremely small. In this study, we explored various methods of preprocessing and transfer learning in order to address those challenges. We pre-trained an autoencoder on unlabeled data before training it to classify labeled data. We also created a custom CNN by combining certain pre-trained layers from the InceptionV3 Model trained on ImageNet with custom layers and a custom learning rate scheduler. Experiments show that our best model outperformed the best performing baseline model by 6.36% in terms of test accuracy. 
    more » « less
  3. Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribution shift benchmarks with unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). The update maintains consistency with the original WILDS benchmark by using identical labeled training, validation, and test sets, as well as the evaluation metrics. On these datasets, we systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at this https URL. 
    more » « less
  4. null (Ed.)
    Data falsification attack in Vehicular Ad hoc Networks (VANET) for the Internet of Vehicles (IoV) is achieved by corrupting the data exchanged between nodes with false information. Data is the most valuable asset these days from which many analyses and results can be drawn out. But the privacy concern raised by users has become the greatest hindrance in performing data analysis. In IoV, misbehavior detection can be performed by creating a machine learning model from basic safety message (BSM) dataset of vehicles. We propose a privacy-preserving misbehavior detecting system for IoV using Federated Machine Learning. Vehicles in VANET for IoV are given the initial dull model to locally train using their own local data. On doing this we get a collective smart model that can classify Position Falsification attack in VANET using the data generated by each vehicle. All this is done without actually sharing the data with any third party to perform analysis. In this paper, we compare the performance of the attack detection model trained by using a federated and central approach. This training method trains the model on a different kind of position falsification attack by using local BSM data generated on each vehicle. 
    more » « less
  5. Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribution shift benchmarks with unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). The update maintains consistency with the original WILDS benchmark by using identical labeled training, validation, and test sets, as well as identical evaluation metrics. We systematically benchmark state-of-the-art methods that use unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS is limited. To facilitate method development, we provide an open-source package that automates data loading and contains the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu. 
    more » « less