skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SyncSignature: a simple, efficient, parallelizable framework for tree similarity joins
This paper introduces SyncSignature, the first fully parallelizable algorithmic framework for tree similarity joins under edit distance. SyncSignature makes use of implicit-synchronized signature generation schemes, which allow for an efficient and parallelizable candidate-generation procedure via hash join. Our experiments on large real-world datasets show that the proposed algorithms under the SyncSignature framework significantly outperform the state-of-the-art algorithm in the parallel computation environment. For datasets with big trees, they also exceed the state-of-the-art algorithms by a notable margin in the centralized/single-thread computation environment. To complement and guide the experimental study, we also provide a thorough theoretical analysis for all proposed signature generation schemes.  more » « less
Award ID(s):
1844234
PAR ID:
10400142
Author(s) / Creator(s):
;
Publisher / Repository:
VLDB Endowment
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
16
Issue:
2
ISSN:
2150-8097
Page Range / eLocation ID:
330 to 342
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Clustering algorithms are often evaluated using metrics which compare with ground-truth cluster assignments, such as Rand index and NMI. Algorithm performance may vary widely for different hyperparameters, however, and thus model selection based on optimal performance for these metrics is discordant with how these algorithms are applied in practice, where labels are unavailable and tuning is often more art than science. It is therefore desirable to compare clustering algorithms not only on their optimally tuned performance, but also some notion of how realistic it would be to obtain this performance in practice. We propose an evaluation of clustering methods capturing this ease-of-tuning by modeling the expected best clustering score under a given computation budget. To encourage the adoption of the proposed metric alongside classic clustering evaluations, we provide an extensible benchmarking framework. We perform an extensive empirical evaluation of our proposed metric on popular clustering algorithms over a large collection of datasets from different domains, and observe that our new metric leads to several noteworthy observations. 
    more » « less
  2. Identifying cause-effect relations among variables is a key step in the decision-making process. Whereas causal inference requires randomized experiments, researchers and policy makers are increasingly using observational studies to test causal hypotheses due to the wide availability of data and the infeasibility of experiments. The matching method is the most used technique to make causal inference from observational data. However, the pair assignment process in one-to-one matching creates uncertainty in the inference because of different choices made by the experimenter. Recently, discrete optimization models have been proposed to tackle such uncertainty; however, they produce 0-1 nonlinear problems and lack scalability. In this work, we investigate this emerging data science problem and develop a unique computational framework to solve the robust causal inference test instances from observational data with continuous outcomes. In the proposed framework, we first reformulate the nonlinear binary optimization problems as feasibility problems. By leveraging the structure of the feasibility formulation, we develop greedy schemes that are efficient in solving robust test problems. In many cases, the proposed algorithms achieve a globally optimal solution. We perform experiments on real-world data sets to demonstrate the effectiveness of the proposed algorithms and compare our results with the state-of-the-art solver. Our experiments show that the proposed algorithms significantly outperform the exact method in terms of computation time while achieving the same conclusion for causal tests. Both numerical experiments and complexity analysis demonstrate that the proposed algorithms ensure the scalability required for harnessing the power of big data in the decision-making process. Finally, the proposed framework not only facilitates robust decision making through big-data causal inference, but it can also be utilized in developing efficient algorithms for other nonlinear optimization problems such as quadratic assignment problems. History: Accepted by Ram Ramesh, Area Editor for Data Science and Machine Learning. Funding: This work was supported by the Division of Civil, Mechanical and Manufacturing Innovation of the National Science Foundation [Grant 2047094]. Supplemental Material: The online supplements are available at https://doi.org/10.1287/ijoc.2022.1226 . 
    more » « less
  3. El Mrabet, N.; De Feo, L.; Duquesne, S. (Ed.)
    We present our speed records for Falcon signature generation and verification on ARMv8-A architecture. Our implementations are benchmarked on Apple M1 ‘Firestorm’, Raspberry Pi 4 Cortex-A72, and Jetson AGX Xavier. Our optimized signature generation is 2x slower, but signature verification is 3–3.9x faster than the state-of-the-art CRYSTALS-Dilithium implementation on the same platforms. Faster signature verification may be particularly useful for the client side on con-strained devices. Our Falcon implementation outperforms the previous work targeting Jetson AGX Xavier by the factors 1.48x for signing in falcon512 and falcon1024, 1.52x for verifying in falcon512, and 1.70x for verifying in falcon1024. We achieve improvement in Falcon signature generation by supporting a larger subset of possible parameter values for FFT-related functions and applying our compressed twiddle-factor table to reduce memory usage. We also demonstrate that the recently proposed signature scheme Hawk, sharing optimized functionality with Falcon, has 3.3x faster signature generation and 1.6–1.9x slower signature verification when implemented on the same ARMv8 processors as Falcon. 
    more » « less
  4. The Internet of Things (IoT) harbors a large number of resource-limited devices (e.g., sensors) that continuously generate and offload sensitive information (e.g., financial, health, personal). It is imperative the ensure the trustworthiness of this data with efficient cryptographic mechanisms. Digital signatures can offer scalable authentication with public verifiability and nonrepudiation. However, the state-of-the-art digital signatures do not offer the desired efficiency and are not scalable for the connected resource-limited IoT devices. This is without considering long term security features such as post-quantum security and forward security. In this paper, we summarize the main challenges to an energy-aware and efficient signature scheme. Then, we propose new scheme design improvements that uniquely embed different emerging technologies such as Mutli-Party Computation (MPC) and secure enclaves (e.g., Intel SGX) in order to secret-share confidential keys of low-end IoT devices across multiple cloud servers. We also envision building signature schemes with Fully Homomorphic Encryption (FHE) to enable verifiers to compute expensive commitments under encryption. We provide evaluation metrics that showcase the feasibility and efficiency of our designs for potential deployment on embedded devices in IoT. 
    more » « less
  5. This paper presents a new paradigm for abnormality detection using a novel power signature that characterizes the rising and descending patterns of energy consumption. The proposed methodology includes a low-overhead power signature generation circuit, computation-light analysis methods, and optimal generation of the golden signature used in the analysis. The proposed power signature generation circuit is designed using 90 nm CMOS technology, and its operation is validated via circuit simulations. The effectiveness of the proposed method in detecting the insertion of potentially malicious code is demonstrated with data obtained from hardware experiments and circuit simulations. 
    more » « less