Sharing genomic databases is critical to the collaborative research in computational biology. A shared database is more informative than specific genome-wide association studies (GWAS) statistics as it enables do-it-yourself calculations. Genomic databases involve intellectual efforts from the curator and sensitive information of participants, thus in the course of data sharing, the curator (database owner) should be able to prevent unauthorized redistributions and protect genomic data privacy. As it becomes increasingly common for a single database be shared with multiple recipients, the shared genomic database should also be robust against collusion attack, where multiple malicious recipients combine their individual copies to forge a pirated one with the hope that none of them can be traced back. The strong correlation among genomic entries also make the shared database vulnerable to attacks that leverage the public correlation models. In this paper, we assess the robustness of shared genomic database under both collusion and correlation threats. To this end, we first develop a novel genomic database fingerprinting scheme, called Gen-Scope. It achieves both copyright protection (by enabling traceability) and privacy preservation (via local differential privacy) for the shared genomic databases. To defend against collusion attacks, we augment Gen-Scope with a powerful traitor tracing technique, i.e., the Tardos codes. Via experiments using a real-world genomic database, we show that Gen-Scope achieves strong fingerprint robustness, e.g., the fingerprint cannot be compromised even if the attacker changes 45% of the entries in its received fingerprinted copy and colluders will be detected with high probability. Additionally, Gen-Scope outperforms the considered baseline methods. Under the same privacy and copyright guarantees, the accuracy of the fingerprinted genomic database obtained by Gen-Scope is around 10% higher than that achieved by the baseline, and in terms of preservations of GWAS statistics, the consistency of variant-phenotype associations can be about 20% higher. Notably, we also empirically show that Gen-Scope can identify at least one of the colluders even if malicious receipts collude after independent correlation attacks.
more »
« less
Robust fingerprinting of genomic databases
Abstract MotivationDatabase fingerprinting has been widely used to discourage unauthorized redistribution of data by providing means to identify the source of data leakages. However, there is no fingerprinting scheme aiming at achieving liability guarantees when sharing genomic databases. Thus, we are motivated to fill in this gap by devising a vanilla fingerprinting scheme specifically for genomic databases. Moreover, since malicious genomic database recipients may compromise the embedded fingerprint (distort the steganographic marks, i.e. the embedded fingerprint bit-string) by launching effective correlation attacks, which leverage the intrinsic correlations among genomic data (e.g. Mendel’s law and linkage disequilibrium), we also augment the vanilla scheme by developing mitigation techniques to achieve robust fingerprinting of genomic databases against correlation attacks. ResultsVia experiments using a real-world genomic database, we first show that correlation attacks against fingerprinting schemes for genomic databases are very powerful. In particular, the correlation attacks can distort more than half of the fingerprint bits by causing a small utility loss (e.g. database accuracy and consistency of SNP–phenotype associations measured via P-values). Next, we experimentally show that the correlation attacks can be effectively mitigated by our proposed mitigation techniques. We validate that the attacker can hardly compromise a large portion of the fingerprint bits even if it pays a higher cost in terms of degradation of the database utility. For example, with around 24% loss in accuracy and 20% loss in the consistency of SNP–phenotype associations, the attacker can only distort about 30% fingerprint bits, which is insufficient for it to avoid being accused. We also show that the proposed mitigation techniques also preserve the utility of the shared genomic databases, e.g. the mitigation techniques only lead to around 3% loss in accuracy. Availability and implementationhttps://github.com/xiutianxi/robust-genomic-fp-github.
more »
« less
- Award ID(s):
- 2141622
- PAR ID:
- 10368286
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 38
- Issue:
- Supplement_1
- ISSN:
- 1367-4803
- Format(s):
- Medium: X Size: p. i143-i152
- Size(s):
- p. i143-i152
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Polymorphic gates are reconfigurable devices that deliver multiple functionalities at different temperature, supply voltage or external inputs. Capable of working in different modes, polymorphic gate is a promising candidate for embedding secret information such as fingerprints. In this paper, we report five polymorphic gates whose functionality varies in response to specific control input and propose a circuit fingerprinting scheme based on these gates. The scheme selectively replaces standard logic cells by polymorphic gates whose functionality differs with the standard cells only on Satisfiability Don’t Care conditions. Additional dummy fingerprint bits are also introduced to enhance the fingerprint’s robustness against attacks such as fingerprint removal and modification. Experimental results on ISCAS and MCNC benchmark circuits demonstrate that our scheme introduces low overhead. More specifically, the average overhead in area, speed and power are 4.04%, 6.97% and 4.15% respectively when we embed 64-bit fingerprint that consists of 32 real fingerprint bits and 32 dummy bits. This is only half of the overhead of the other known approach when they create 32-bit fingerprints.more » « less
-
The website fingerprinting attack allows a low-resource attacker to compromise the privacy guarantees provided by privacy enhancing tools such as Tor. In response, researchers have proposed defenses aimed at confusing the classification tools used by attackers. As new, more powerful attacks are frequently developed, raw attack accuracy has proven inadequate as the sole metric used to evaluate these defenses. In response, two security metrics have been proposed that allow for evaluating defenses based on hand-crafted features often used in attacks. Recent state-of-the-art attacks, however, use deep learning models capable of automatically learning abstract feature representations, and thus the proposed metrics fall short once again. In this study we examine two security metrics and (1) show how these methods can be extended to evaluate deep learning-based website fingerprinting attacks, and (2) compare the security metrics and identify their shortcomings.more » « less
-
Nikolski, Macha (Ed.)Abstract MotivationGenome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques provide formal secure guarantees, the substantial communication and computational overheads hinder the practical application of large-scale collaborative GWAS. ResultsThis work introduces an efficient framework for conducting collaborative GWAS on distributed datasets, maintaining data privacy without compromising the accuracy of the results. We propose a novel two-step strategy aimed at reducing communication and computational overheads, and we employ iterative and sampling techniques to ensure accurate results. We instantiate our approach using logistic regression, a commonly used statistical method for identifying associations between genetic markers and the phenotype of interest. We evaluate our proposed methods using two real genomic datasets and demonstrate their robustness in the presence of between-study heterogeneity and skewed phenotype distributions using a variety of experimental settings. The empirical results show the efficiency and applicability of the proposed method and the promise for its application for large-scale collaborative GWAS. Availability and implementationThe source code and data are available at https://github.com/amioamo/TDS.more » « less
-
The Schnorr signature scheme is an efficient digital signature scheme with short signature lengths, i.e., $4k$-bit signatures for $$k$$ bits of security. A Schnorr signature $$\sigma$$ over a group of size $$p\approx 2^{2k}$$ consists of a tuple $(s,e)$, where $$e \in \{0,1\}^{2k}$$ is a hash output and $$s\in \mathbb{Z}_p$$ must be computed using the secret key. While the hash output $$e$$ requires $2k$$ bits to encode, Schnorr proposed that it might be possible to truncate the hash value without adversely impacting security. In this paper, we prove that \emph{short} Schnorr signatures of length $$3k$ bits provide $$k$$ bits of multi-user security in the (Shoup's) generic group model and the programmable random oracle model. We further analyze the multi-user security of key-prefixed short Schnorr signatures against preprocessing attacks, showing that it is possible to obtain secure signatures of length $$3k + \log S + \log N$$ bits. Here, $$N$$ denotes the number of users and $$S$$ denotes the size of the hint generated by our preprocessing attacker, e.g., if $$S=2^{k/2}$$, then we would obtain secure $3.75k$-bit signatures for groups of up to $$N \leq 2^{k/4}$$ users. Our techniques easily generalize to several other Fiat-Shamir-based signature schemes, allowing us to establish analogous results for Chaum-Pedersen signatures and Katz-Wang signatures. As a building block, we also analyze the $$1$$-out-of-$$N$$ discrete-log problem in the generic group model, with and without preprocessing.more » « less