skip to main content


Title: UDM: Private User Discovery with Minimal Information Disclosure
We present and analyze UDM, a new protocol for user discovery in anonymous communication systems that minimizes the information disclosed to the system and users. Unlike existing systems, including those based on private set intersection, UDM learns nothing about the contact lists and social graphs of the users, is not vulnerable to off-line dictionary attacks that expose contact lists, does not reveal platform identifiers to users without the owner’s explicit permission, and enjoys low computation and communication complexity. UDM solves the following user-discovery problem. User Alice wishes to communicate with Bob over an anonymous communication system, such as cMix or Tor. Initially, each party knows each other’s public contact identifier (e.g., email address or phone number), but neither knows the other’s private platform identifier in the communication system. If both parties wish to communicate with each other, UDM enables them to establish a shared key and learn each other’s private platform identifier. UDM uses an untrusted user-discovery system, which processes and stores only public information, hashed values, or values encrypted with keys it does not know. Therefore, UDM cannot learn any information about the social graphs of its users. Using the anonymous communication system, each pair of users who wish to communicate with each other uploads to the user-discovery system their private platform identifier, encrypted with their shared key. Indexing their request by a truncated cryptographic hash of their shared key, each user can then download each other’s encrypted private platform identifier.  more » « less
Award ID(s):
1753681
NSF-PAR ID:
10160119
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Cryptologia
ISSN:
1558-1586
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Development of a comprehensive legal privacy framework in the United States should be based on identification of the common deficiencies of privacy policies. We attempt to delineate deficiencies by critically analyzing the privacy policies of mobile apps, application suites, social networks, Internet Service Providers, and Internet-of-Things devices. Whereas many studies have examined readability of privacy policies, few have specifically identified the information that should be provided in privacy policies but is not. Privacy legislation invariably starts a definition of personally identifiable information. We find that privacy policies’ definitions of personally identifiable information are far too restrictive, excluding information that does not itself identify a person but which can be used to reasonably identify a person, and excluding information paired with a device identifier which can be reasonably linked to a person. Legislation should define personally identifiable information to include such information, and should differentiate between information paired with a name versus information paired with a device identifier. Privacy legislation often excludes anonymous and de-identified information from notice and choice requirements. We find that privacy policies’ descriptions of anonymous and de-identified information are far too broad, including information paired with advertising identifiers. Computer science has repeatedly demonstrated that such information is reasonably linkable. Legislation should define these categories of information to align with technological abilities. Legislation should also not exempt de-identified information from notice requirements, to increase transparency. Privacy legislation relies heavily on notice requirements. We find that, because privacy policies’ disclosures of the uses of personal information are disconnected from their disclosures about the types of personal information collected, we are often unable to determine which types of information are used for which purposes. Often, we cannot determine whether location or web browsing history is used solely for functional purposes or also for advertising. Legislation should require the disclosure of the purposes for each type of personal information collected. We also find that, because privacy policies disclosures of sharing of personal information are disconnected from their disclosures about the types of personal information collected, we are often unable to determine which types of information are shared. Legislation should require the disclosure of the types of personal information shared. Finally, privacy legislation relies heavily on user choice. We find that free services often require the collection and sharing of personal information. As a result, users often have no choices. We find that whereas some paid services afford users a wide variety of choices, paid services in less competitive sectors often afford users few choices over use and sharing of personal information for purposes unrelated to the service. As a result, users are often unable to dictate which types of information they wish to allow to be shared, and which types they wish to allow to be used for advertising. Legislation should differentiate between take-it-or-leave it, opt-out, and opt-in approaches based on the type of use and on whether the information is shared. Congress should consider whether user choices should be affected by the presence of market power. 
    more » « less
  2. Vincent Poor and Zhu Han (Ed.)
    Recently, blockchain has received much attention from the mobility-centric Internet of Things (IoT). It is deemed the key to ensuring the built-in integrity of information and security of immutability by design in the peer-to-peer network (P2P) of mobile devices. In a permissioned blockchain, the authority of the system has control over the identities of its users. Such information can allow an ill-intentioned authority to map identities with their spatiotemporal data, which undermines the location privacy of a mobile user. In this paper, we study the location privacy preservation problem in the context of permissioned blockchain-based IoT systems under three conditions. First, the authority of the blockchain holds the public and private key distribution task in the system. Second, there exists a spatiotemporal correlation between consecutive location-based transactions. Third, users communicate with each other through short-range communication technologies such that it constitutes a proof of location (PoL) on their actual locations. We show that, in a permissioned blockchain with an authority and a presence of a PoL, existing approaches cannot be applied using a plug-and-play approach to protect location privacy. In this context, we propose BlockPriv, an obfuscation technique that quantifies, both theoretically and experimentally, the relationship between privacy and utility in order to dynamically protect the privacy of sensitive locations in the permissioned blockchain. 
    more » « less
  3. It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016). This is because these approaches do not scale to all biodiversity, and they do not stop the publication of free-text phenotypes that would need post-publication curation. In addition, both manual and machine learning approaches face great challenges: the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other) in manual curation, and keywords to ontology concept translation in automated information extraction, make it difficult for either approach to produce data that are truly FAIR. Our empirical studies show that inter-curator variation in translating phenotype characters to Entity-Quality statements (Mabee et al. 2007) is as high as 40% even within a single project. With this level of variation, curated data integrated from multiple curation projects may still not be FAIR. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardized vocabularies (ontologies). We argue that the authors describing characters are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of the descriptions from the moment of publication. In this presentation, we will introduce the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists, which consists of three components: a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. Fig. 1 shows the system diagram of the platform. The presentation will consist of: a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. The software modules currently incorporated in Character Recorder and Conflict Resolver have undergone formal usability studies. We are actively recruiting Carex experts to participate in a 3-day usability study of the entire system of the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists. Participants will use the platform to record 100 characters about one Carex species. In addition to usability data, we will collect the terms that participants submit to the underlying ontology and the data related to conflict resolution. Such data allow us to examine the types and the quantities of logical conflicts that may result from the terms added by the users and to use Discrete Event Simulation models to understand if and how term additions and conflict resolutions converge. We look forward to a discussion on how the tools (Character Recorder is online at http://shark.sbs.arizona.edu/chrecorder/public) described in our presentation can contribute to producing and publishing FAIR data in taxonomic studies. 
    more » « less
  4. We solve a long-standing challenge to the integrity of votes cast without the supervision of a voting booth: ``improper influence,'' which we define as any combination of vote buying and voter coercion. In comparison with previous proposals, our system is the first in the literature to protect against a strong adversary who learns all of the voter's keys---we call this property ``extreme coercion resistance.'' Our approach allows each voter, or their trusted agents (which we call ``hedgehogs''), to ``nullify'' (effectively cancel) their vote in a way that is unstoppable and irrevocable, and such that the nullification action is forever unattributable to that voter or their hedgehog(s). We demonstrate the security of VoteXX in the {universal composability} model. Additionally we provide concrete implementations of sub-protocols---including inalienable authentication, decentralized bulletin boards, and anonymous communication channels---that are usually left as abstract assumptions in the literature. As in many other coercion-resistant systems, voters are authorized to vote with public-private keys. Each voter registers their public keys with the Election Authority (EA) in a way that convinces the EA that the voter has complete knowledge of their private keys. Voters concerned about losing their private keys can themselves, or by delegating to one or more hedgehog(s), monitor the bulletin board for malicious ballots cast with their keys, and can act to nullify these ballots in a privacy-preserving manner with zero-knowledge proofs. In comparison with previous proposals, our system makes fewer assumptions and protects against a stronger adversary. For example, votexx makes none of the following assumptions made by previous systems: the voter must complete registration before being coerced; the election will not close before the voter can cast a ballot after coercion; the voter needs to generate a fake password to evade coercion; and the voter knows an honest Election Authority official. 
    more » « less
  5. A fundamental question that has been studied in cryptography and in information theory is whether two parties can communicate confidentially using exclusively an open channel. We consider the model in which the two parties hold inputs that are correlated in a certain sense. This model has been studied extensively in information theory, and communication protocols have been designed which exploit the correlation to extract from the inputs a shared secret key. However, all the existing protocols are not universal in the sense that they require that the two parties also know some attributes of the correlation. In other words, they require that each party knows something about the other party’s input. We present a protocol that does not require any prior additional information. It uses space-bounded Kolmogorov complexity to measure correlation and it allows the two legal parties to obtain a common key that looks random to an eavesdropper that observes the communication and is restricted to use a bounded amount of space for the attack. Thus the protocol achieves complexity-theoretical security, but it does not use any unproven result from computational complexity. On the negative side, the protocol is not efficient in the sense that the computation of the two legal parties uses more space than the space allowed to the adversary. 
    more » « less