skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 1, 2026

Title: Early-Stage Sensor Data Fusion Pipeline Exploration Framework for Agriculture and Animal Welfare
Internet-of-Things (IoT) approaches are continually introducing new sensors into the fields of agriculture and animal welfare. The application of multi-sensor data fusion to these domains remains a complex and open-ended challenge that defies straightforward optimization, often requiring iterative testing and refinement. To respond to this need, we have created a new open-source framework as well as a corresponding Python tool which we call the “Data Fusion Explorer (DFE)”. We demonstrated and evaluated the effectiveness of our proposed framework using four early-stage datasets from diverse disciplines, including animal/environmental tracking, agrarian monitoring, and food quality assessment. This included data across multiple common formats including single, array, and image data, as well as classification or regression and temporal or spatial distributions. We compared various pipeline schemes, such as low-level against mid-level fusion, or the placement of dimensional reduction. Based on their space and time complexities, we then highlighted how these pipelines may be used for different purposes depending on the given problem. As an example, we observed that early feature extraction reduced time and space complexity in agrarian data. Additionally, independent component analysis outperformed principal component analysis slightly in a sweet potato imaging dataset. Lastly, we benchmarked the DFE tool with respect to the Vanilla Python3 packages using our four datasets’ pipelines and observed a significant reduction, usually more than 50%, in coding requirements for users in almost every dataset, suggesting the usefulness of this package for interdisciplinary researchers in the field.  more » « less
Award ID(s):
1915599 2037328 2319389 2344423 1160483
PAR ID:
10618060
Author(s) / Creator(s):
; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
AgriEngineering
Volume:
7
Issue:
7
ISSN:
2624-7402
Page Range / eLocation ID:
215
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract High-throughput cell proliferation assays to quantify drug-response are becoming increasingly common and powerful with the emergence of improved automation and multi-time point analysis methods. However, pipelines for analysis of these datasets that provide reproducible, efficient, and interactive visualization and interpretation are sorely lacking. To address this need, we introduce Thunor, an open-source software platform to manage, analyze, and visualize large, dose-dependent cell proliferation datasets. Thunor supports both end-point and time-based proliferation assays as input. It provides a simple, user-friendly interface with interactive plots and publication-quality images of cell proliferation time courses, dose–response curves, and derived dose–response metrics, e.g. IC50, including across datasets or grouped by tags. Tags are categorical labels for cell lines and drugs, used for aggregation, visualization and statistical analysis, e.g. cell line mutation or drug class/target pathway. A graphical plate map tool is included to facilitate plate annotation with cell lines, drugs and concentrations upon data upload. Datasets can be shared with other users via point-and-click access control. We demonstrate the utility of Thunor to examine and gain insight from two large drug response datasets: a large, publicly available cell viability database and an in-house, high-throughput proliferation rate dataset. Thunor is available from www.thunor.net. 
    more » « less
  2. Modality fusion is a cornerstone of multimodal learning, enabling information integration from diverse data sources. However, vanilla fusion methods are limited by (1) inability to account for heterogeneous interactions between modalities and (2) lack of interpretability in uncovering the multimodal interactions inherent in the data. To this end, we propose I2MoE (Interpretable Multimodal Interaction-aware Mixture of Experts), an end-to-end MoE framework designed to enhance modality fusion by explicitly modeling diverse multimodal interactions, as well as providing interpretation on a local and global level. First, I2MoE utilizes different interaction experts with weakly supervised interaction losses to learn multimodal interactions in a data-driven way. Second, I2MoE deploys a reweighting model that assigns importance scores for the output of each interaction expert, which offers sample-level and dataset-level interpretation. Extensive evaluation of medical and general multimodal datasets shows that I2MoE is flexible enough to be combined with different fusion techniques, consistently improves task performance, and provides interpretation across various real-world scenarios. 
    more » « less
  3. In this paper we present Sniffer Faster R-CNN++, an efficient Camera-LiDAR late fusion network for low complexity and accurate object detection in autonomous driving scenarios. The proposed detection network architecture operates on output candidates of any 3D detector and proposals from regional proposal network of any 2D detector to generate final prediction results. In comparison to the single modality object detection approaches, fusion based methods in many instances suffer from dissimilar data integration difficulties. On one hand, fusion based network models are complicated in nature and on the other hand they require large computational overhead and resources, processing pipelines for training and inference specially, the early fusion and deep fusion approaches. As such, we devise a late fusion network that in-cooperates pre-trained, single-modality detectors without change, performing association only at the detection level. In addition to this, lidar based method fail to detect distant object due to its sparse nature so we devise proposal refinement algorithm to jointly optimize detection candidates and assist detection for distant objects. Extensive experiments on both the 3D and 2D detection benchmark of challenging KITTI dataset illustrate that our proposed network architecture significantly improves the detection accuracy, accelerating the detection speed. 
    more » « less
  4. As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor modalities and phenotyping platforms. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines for processing large volumes of field phenomics RGB, thermal, PSII chlorophyll fluorescence 2D images, and 3D point clouds. PhytoOracle aims to ( i ) improve data processing efficiency; ( ii ) provide an extensible, reproducible computing framework; and ( iii ) enable data fusion of multi-modal phenomics data. PhytoOracle integrates open-source distributed computing frameworks for parallel processing on high-performance computing, cloud, and local computing environments. Each pipeline component is available as a standalone container, providing transferability, extensibility, and reproducibility. The PO pipeline extracts and associates individual plant traits across sensor modalities and collection time points, representing a unique multi-system approach to addressing the genotype-phenotype gap. To date, PO supports lettuce and sorghum phenotypic trait extraction, with a goal of widening the range of supported species in the future. At the maximum number of cores tested in this study (1,024 cores), PO processing times were: 235 minutes for 9,270 RGB images (140.7 GB), 235 minutes for 9,270 thermal images (5.4 GB), and 13 minutes for 39,678 PSII images (86.2 GB). These processing times represent end-to-end processing, from raw data to fully processed numerical phenotypic trait data. Repeatability values of 0.39-0.95 (bounding area), 0.81-0.95 (axis-aligned bounding volume), 0.79-0.94 (oriented bounding volume), 0.83-0.95 (plant height), and 0.81-0.95 (number of points) were observed in Field Scanalyzer data. We also show the ability of PO to process drone data with a repeatability of 0.55-0.95 (bounding area). 
    more » « less
  5. Abstract The world’s coastlines are spatially highly variable, coupled-human-natural systems that comprise a nested hierarchy of component landforms, ecosystems, and human interventions, each interacting over a range of space and time scales. Understanding and predicting coastline dynamics necessitates frequent observation from imaging sensors on remote sensing platforms. Machine Learning models that carry out supervised (i.e., human-guided) pixel-based classification, or image segmentation, have transformative applications in spatio-temporal mapping of dynamic environments, including transient coastal landforms, sediments, habitats, waterbodies, and water flows. However, these models require large and well-documented training and testing datasets consisting of labeled imagery. We describe “Coast Train,” a multi-labeler dataset of orthomosaic and satellite images of coastal environments and corresponding labels. These data include imagery that are diverse in space and time, and contain 1.2 billion labeled pixels, representing over 3.6 million hectares. We use a human-in-the-loop tool especially designed for rapid and reproducible Earth surface image segmentation. Our approach permits image labeling by multiple labelers, in turn enabling quantification of pixel-level agreement over individual and collections of images. 
    more » « less