skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Revisiting the Effects of Leakage on Dependency Parsing
Recent work by Søgaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations. In this work we revisit this claim, testing it on more models and languages. We find that it only holds for zero-shot cross-lingual settings. We then propose a more fine-grained measure of such leakage which, unlike the original measure, not only explains but also correlates with observed performance variation.  more » « less
Award ID(s):
1757064
PAR ID:
10327898
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Transactions of the Association for Computational Linguistics
Volume:
ACL 2022
ISSN:
2307-387X
Page Range / eLocation ID:
2925-2934
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent work by Sogaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations. In this work we revisit this claim, testing it on more models and languages. We find that it only holds for zero-shot cross-lingual settings. We then propose a more fine-grained measure of such leakage which, unlike the original measure, not only explains but also correlates with observed performance variation. Code and data are available online, 
    more » « less
  2. Abstract Life history theory suggests that maximum size and growth evolve to maximize fitness. In contrast, the Gill Oxygen Limitation Theory (GOLT) suggests that growth and maximum size in fishes and other aquatic, water‐breathing organisms is constrained by the body mass‐scaling of gill surface area. Here, we use new data and a novel phylogenetic Bayesian multilevel modelling framework to test this idea by asking the three questions posed by the GOLT regarding maximum size, growth and gills. Across fishes, we ask whether the body mass‐scaling of gill surface area explains (1) variation in the von Bertalanffy growth coefficient (k) above and beyond that explained by asymptomatic size (W), (2) variation in growth performance (a trait that integrates the tradeoff betweenkandW) and (3) more variation in growth performance compared to activity (as approximated by caudal fin aspect ratio). Overall, we find that there is only a weak relationship among maximum size, growth and gill surface area across species. Indeed, the body mass‐scaling of gill surface area does not explain much variation ink(especially for those species that reach the sameW) or growth performance. Activity explained three to five times more variation in growth performance compared to gill surface area. Our results suggest that in fishes, gill surface area is not the only factor that explains variation in maximum size and growth, and that other covariates (e.g. activity) are likely important in understanding how growth, maximum size and other life history traits vary across species. 
    more » « less
  3. null (Ed.)
    We investigate a simple but overlooked folklore approach for searching encrypted documents held at an untrusted service: Just stash an index (with unstructured encryption) at the service and download it for updating and searching. This approach is simple to deploy, enables rich search support beyond unsorted keyword lookup, requires no persistent client state, and (intuitively at least) provides excellent security compared with approaches like dynamic searchable symmetric encryption (DSSE). This work first shows that implementing this construct securely is more subtle than it appears, and that naive implementations with commodity indexes are insecure due to the leakage of the byte-length of the encoded index. We then develop a set of techniques for encoding indexes, called size-locking, that eliminates this leakage. Our key idea is to fix the size of indexes to depend only on features that are safe to leak. We further develop techniques for securely partitioning indexes into smaller pieces that are downloaded, trading leakage for large increases in performance in a measured way. We implement our systems and evaluate that they provide search quality matching plaintext systems, support for stateless clients, and resistance to damaging injection attacks. 
    more » « less
  4. null (Ed.)
    We investigate a simple but overlooked folklore approach for searching encrypted documents held at an untrusted service: Just stash an index (with unstructured encryption) at the service and download it for updating and searching. This approach is simple to deploy, enables rich search support beyond unsorted keyword lookup, requires no persistent client state, and (intuitively at least) provides excellent security com- pared with approaches like dynamic searchable symmetric encryption (DSSE). This work first shows that implementing this construct securely is more subtle than it appears, and that naive implementations with commodity indexes are insecure due to the leakage of the byte-length of the encoded index. We then develop a set of techniques for encoding indexes, called size-locking, that eliminates this leakage. Our key idea is to fix the size of indexes to depend only on features that are safe to leak. We further develop techniques for securely partitioning indexes into smaller pieces that are downloaded, trading leakage for large increases in performance in a mea- sured way. We implement our systems and evaluate that they provide search quality matching plaintext systems, support for stateless clients, and resistance to damaging injection attacks. 
    more » « less
  5. Information leaks in software can unintentionally reveal private data, yet they are hard to detect and fix. Although several methods have been proposed to detect leakage, such as static verification-based approaches, they require specialist knowledge, and are time-consuming. Recently, we introduced HyperGI, a dynamic, hypertest-based approach that can detect and produce potential fixes for hyperproperty violations. In particular, we focused on violations of the noninterference property, as it results in information flow leakage. Our instantiation of HyperGI was able to detect and reduce leakage in three small programs. Its fitness function tried to balance information leakage and program correctness but, as we pointed out, there may be tradeoffs between keeping program semantics and reducing information leakage that require developer decisions. In this work we ask if it is possible to automatically detect and repair information leakage in more realistic programs without requiring specialist knowledge. We instantiate a multi-objective version of HyperGI in a tool, called LeakReducer, which explicitly encodes the tradeoff between program correctness and information leakage. We apply LeakReducer to six leaky programs, including the well-known Heartbleed bug. LeakReducer is able to detect leakage in all, in contrast to state-of-the-art fuzzers, detecting leakage in only two programs. Moreover, LeakReducer is able to reduce leakage in all subjects, with comparable results to previous work, while scaling to much larger software. 
    more » « less