skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Local Temporal Compression for (Globally) Evolving Spatial Surfaces
The advances in the Internet of Things (IoT) paradigm have enabled generation of large volumes of data from multiple domains, capturing the evolution of various physical and social phenomena of interest. One of the consequences of such enormous data generation is that it needs to be stored, processed and queried – along with having the answers presented in an intuitive manner. A number of techniques have been proposed to alleviate the impact of the sheer volume of the data on the storage and processing overheads, along with bandwidth consumption – and, among them, the most dominant is compression. In this paper, we consider a setting in which multiple geographically dispersed data sources are generating data streams – however, the values from the discrete locations are used to construct a representation of continuous (time-evolving) surface. We have used different compression techniques to reduce the size of the raw measurements in each location, and we analyzed the impact of the compression on the quality of approximating the evolution of the shapes corresponding to a particular phenomenon. Specifically, we use the data from discrete locations to construct a TIN (triangulated irregular networks), which evolves over time as the measurements in each locations change. To analyze the global impact of the different compression techniques that are applied locally, we used different surface distance functions between raw-data TINs and compressed data TINs. We provide detailed discussions based on our experimental observations regarding the corresponding (compression method, distance function) pairs.  more » « less
Award ID(s):
1823279
PAR ID:
10211112
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Big Data Analytics - 7th International Conference, {BDA} 2019, Ahmedabad, India, December 17-20, 2019, Proceedings
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In this paper, we present the CET-LATS (Compressing Evolution of TINs from Location Aware Time Series) system, which enables testing the impacts of various compression approaches on evolving Triangulated Irregular Networks (TINs). Specifically, we consider the settings in which values measured in distinct locations and at different time instants, are represented as time series of the corresponding measurements, generating a sequence of TINs. Different compression techniques applied to location-specific time series may have different impacts on the representation of the global evolution of TINs - depending on the distance functions used to evaluate the distortion. CET-LATS users can view and analyze compression vs. (im)precision trade-offs over multiple compression methods and distance functions, and decide which method works best for their application. We also provide an option to investigate the impact of the choice of a compression method on the quality of prediction. Our prototype is a web-based system using Flask, a lightweight Python framework, relying on Apache Spark for data management and JSON files to communicate with the front-end, enabling extensibility in terms of adding new data sources as well as compression techniques, distance functions and prediction methods. 
    more » « less
  2. Active-source data acquisition included 66 vibroseis and 209 instrumented sledge hammer source locations. Multiple source impacts were recorded at each source location to enable stacking of the recorded signal. The source impacts at each source location have been aligned using cross-correlation, but to provide the most flexibility are provided unstacked (i.e., the signals from each source impact are provided separately). The active-source recordings are provided in terms of both raw, uncorrected units of counts and corrected, engineering units of meters per second. For each source impact, the force output from the vibroseis or instrumented sledge hammer was recorded and is provided in both raw counts and engineering units of kilonewtons. The passive-wavefield data includes 28 hours of ambient noise recorded over two night-time deployments. The passive-wavefield data is provided in raw counts, however, the instrument response files are provided should instrument correction be required in the future. The dataset can be used for active-source and passive-wavefield three-dimensional imaging, as well as other subsurface characterization techniques which include: horizontal-to-vertical spectral ratios, multichannel analysis of surface waves, and microtremor array measurements. 
    more » « less
  3. Valencia, Alfonso (Ed.)
    Abstract Motivation Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely impacting performance of downstream applications. Results We explore the use of lossy compression for nanopore raw data using two state-of-the-art lossy time-series compressors, and evaluate the tradeoff between compressed size and basecalling/consensus accuracy. We test several basecallers and consensus tools on a variety of datasets at varying depths of coverage, and conclude that lossy compression can provide 35–50% further reduction in compressed size of raw data over the state-of-the-art lossless compressor with negligible impact on basecalling accuracy (≲0.2% reduction) and consensus accuracy (≲0.002% reduction). In addition, we evaluate the impact of lossy compression on methylation calling accuracy and observe that this impact is minimal for similar reductions in compressed size, although further evaluation with improved benchmark datasets is required for reaching a definite conclusion. The results suggest the possibility of using lossy compression, potentially on the nanopore sequencing device itself, to achieve significant reductions in storage and transmission costs while preserving the accuracy of downstream applications. Availabilityand implementation The code is available at https://github.com/shubhamchandak94/lossy_compression_evaluation. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. There is a growing need to characterize the engineering material properties of the shallow subsurface in three dimensions for advanced engineering analyses. However, imaging the near-surface in three dimensions at spatial resolutions required for such purposes remains in its infancy and requires further study before it can be adopted into practice. To enable and accelerate research in this area, we present a large subsurface imaging data set acquired using a dense network of three-component (3C) nodal stations acquired in 2019 at the Garner Valley Downhole Array (GVDA) site. Acquisition of this data set involved the deployment of 196 stations positioned on a 14 × 14 grid with a 5 m spacing. The array was used to acquire active-source data generated by a vibroseis truck and an instrumented sledgehammer, and passive-wavefield data containing ambient noise. The active-source acquisition included 66 vibroseis and 209 instrumented sledgehammer source locations. Multiple source impacts were recorded at each source location to enable stacking of the recorded signals. The active-source recordings are provided in terms of both raw, uncorrected units of counts and corrected engineering units of meters per second. For each source impact, the force output from the vibroseis or instrumented sledgehammer was recorded and is provided in both raw counts and engineering units of kilonewtons. The passive-wavefield data include 28 h of ambient noise recorded over two nighttime deployments. The data set is shown to be useful for active-source and passive-wavefield three-dimensional imaging and other subsurface characterization techniques, which include horizontal-to-vertical spectral ratios (HVSRs), multichannel analysis of surface waves (MASW), and microtremor array measurements (MAM). 
    more » « less
  5. Abstract New technologies such as low-cost nodes and distributed acoustic sensing (DAS) are making it easier to continuously collect broadband, high-density seismic monitoring data. To reduce the time to move data from the field to computing centers, reduce archival requirements, and speed up interactive data analysis and visualization, we are motivated to investigate the use of lossy compression on passive seismic array data. In particular, there is a need to not only just quantify the errors in the raw data but also the characteristics of the spectra of these errors and the extent to which these errors propagate into results such as detectability and arrival-time picks of microseismic events. We compare three types of lossy compression: sparse thresholded wavelet compression, zfp compression, and low-rank singular value decomposition compression. We apply these techniques to compare compression schemes on two publicly available datasets: an urban dark fiber DAS experiment and a surface DAS array above a geothermal field. We find that depending on the level of compression needed and the importance of preserving large versus small seismic events, different compression schemes are preferable. 
    more » « less