skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Novel Spatial Data Pipeline for Orchestrating Apache NiFi/MiNiFi
In many smart city projects, a common choice to capture spatial information is the inclusion of lidar data, but this decision will often invoke severe growing pains within the existing infrastructure. In this article, the authors introduce a data pipeline that orchestrates Apache NiFi (NiFi), Apache MiNiFi (MiNiFi), and several other tools as an automated solution to relay and archive lidar data captured by deployed edge devices. The lidar sensors utilized within this workflow are Velodyne Ultra Puck sensors that produce 6-7 GB packet capture (PCAP) files per hour. By both compressing the file after capturing it and compressing the file in real-time; it was discovered that GZIP and XZ both saved considerable file size being from 2-5 GB, 5 minutes in transmission time, and considerable CPU time. To evaluate the capabilities of the system design, the features of this data pipeline were compared against existing third-party services, Globus and RSync.  more » « less
Award ID(s):
2209806
PAR ID:
10525284
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
IGI Global Open Access Collection
Date Published:
Journal Name:
International Journal of Software Innovation
Volume:
12
Issue:
1
ISSN:
2166-7160
Page Range / eLocation ID:
1 to 14
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper introduces a novel LiDAR point cloud data encoding solution that is compact, flexible, and fully supports distributed data storage within the Hadoop distributed computing environment. The proposed data encoding solution is developed based on Sequence File and Google Protocol Buffers. Sequence File is a generic splittable binary file format built in the Hadoop framework for storage of arbitrary binary data. The key challenge in adopting the Sequence File format for LiDAR data is in the strategy for effectively encoding the LiDAR data as binary sequences in a way that the data can be represented compactly, while allowing necessary mutation. For that purpose, a data encoding solution, based on Google Protocol Buffers (a language-neutral, cross-platform, extensible data serialisation framework) was developed and evaluated. Since neither of the underlying technologies is sufficient to completely and efficiently represent all necessary point formats for distributed computing, an innovative fusion of them was required to provide a viable data storage solution. This paper presents the details of such a data encoding implementation and rigorously evaluates the efficiency of the proposed data encoding solution. Benchmarking was done against a straightforward, naive text encoding implementation using a high-density aerial LiDAR scan of a portion of Dublin, Ireland. The results demonstrated a 6-times reduction in data volume, a 4-times reduction in database ingestion time, and up to a 5 times reduction in querying time. 
    more » « less
  2. Abstract Image sensors capable of capturing individual photons have made tremendous progress in recent years. However, this technology faces a major limitation. Because they capture scene information at the individual photon level, the raw data is sparse and noisy. Here we propose CASPI: Collaborative Photon Processing for Active Single-Photon Imaging, a technology-agnostic, application-agnostic, and training-free photon processing pipeline for emerging high-resolution single-photon cameras. By collaboratively exploiting both local and non-local correlations in the spatio-temporal photon data cubes, CASPI estimates scene properties reliably even under very challenging lighting conditions. We demonstrate the versatility of CASPI with two applications: LiDAR imaging over a wide range of photon flux levels, from a sub-photon to high ambient regimes, and live-cell autofluorescence FLIM in low photon count regimes. We envision CASPI as a basic building block of general-purpose photon processing units that will be implemented on-chip in future single-photon cameras. 
    more » « less
  3. IoT devices influence many different spheres of society and are predicted to have a huge impact on our future. Extracting real-time insights from diverse sensor data and dealing with the underlying uncertainty of sensor data are two main challenges of the IoT ecosystem In this paper, we propose a data processing architecture, M-DB, to effectively integrate and continuously monitor uncertain and diverse IoT data. M-DB constitutes of three components:(1) model-based operators (MBO) as data management abstractions for IoT application developers to integrate data from diverse sensors. Model-based operators can support event-detection and statistical aggregation operators,(2) M-Stream, a dataflow pipeline that combines model-based operators to perform computations reflecting the uncertainty of underlying data, and (3) M-Store, a storage layer separating the computation of application logic from physical sensor data management, to effectively deal with missing or delayed sensor data. M-DB is designed and implemented over Apache Storm and Apache Kafka, two open-source distributed event processing systems. Our illustrated application examples throughout the paper and evaluation results illustrate that M-DB provides a realtime data-processing architecture that can cater to the diverse needs of IoT applications. 
    more » « less
  4. Abstract The spatio-temporal variability of temperatures in cities impacts human well-being, particularly in a large metropolis. Low-cost sensors now allow the observation of urban temperatures at a much finer resolution, and, in recent years, there has been a proliferation of fixed and mobile monitoring networks. However, how to design such networks to maximize the information content of collected data remains an open challenge. In this study, we investigate the performance of different measurement networks and strategies by deploying virtual sensors to sample the temperature data set in high-resolution weather simulations in four American cities. Results show that, with proper designs and a sufficient number of sensors, fixed networks can capture the spatio-temporal variations of temperatures within the cities reasonably well. Based on the simulation study, the key to optimizing fixed sensor location is to capture the whole range of impervious fractions. Randomly moving mobile systems consistently outperform optimized fixed systems in measuring the trend of monthly mean temperatures, but they underperform in detecting mean daily maximum temperatures with errors up to 5 °C. For both networks, the grand challenge is to capture anomalous temperatures under extreme events of short duration, such as heat waves. Here, we show that hybrid networks are more robust systems under extreme events, reducing errors by more than 50%, because the time span of extreme events detected by fixed sensors and the spatial information measured by mobile sensors can complement each other. The main conclusion of this study concerns the importance of optimizing network design for enhancing the effectiveness of urban measurements. 
    more » « less
  5. Wireless Body Area Networks (WBANs) are pivotal in health care and wearable technologies, enabling seamless communication between miniature sensors and devices on or within the human body. These biosensors capture critical physiological parameters, ranging from body temperature and blood oxygen levels to real-time electrocardiogram readings. However, WBANs face significant challenges during and after deployment, including energy conservation, security, reliability, and failure vulnerability. Sensor nodes, which are often battery-operated, expend considerable energy during sensing and transmission due to inherent spatiotemporal patterns in biomedical data streams. This paper provides a comprehensive survey of data-driven approaches that address these challenges, focusing on device placement and routing, sampling rate calibration, and the application of machine learning (ML) and statistical learning techniques to enhance network performance. Additionally, we validate three existing models (statistical, ML, and coding-based models) using two real datasets, namely the MIMIC clinical database and biomarkers collected from six subjects with a prototype biosensing device developed by our team. Our findings offer insights into strategies for optimizing energy efficiency while ensuring security and reliability in WBANs. We conclude by outlining future directions to leverage approaches to meet the evolving demands of healthcare applications. 
    more » « less