skip to main content


Title: The complexity of human computation via a concrete model with an application to passwords

What can humans compute in their heads? We are thinking of a variety of cryptographic protocols, games like sudoku, crossword puzzles, speed chess, and so on. For example, can a person compute a function in his or her head so that an eavesdropper with a powerful computer—who sees the responses to random inputs—still cannot infer responses to new inputs? To address such questions, we propose a rigorous model of human computation and associated measures of complexity. We apply the model and measures first and foremost to the problem of 1) humanly computable password generation and then, consider related problems of 2) humanly computable “one-way functions” and 3) humanly computable “pseudorandom generators.” The theory of human computability developed here plays by different rules than standard computability; the polynomial vs. exponential time divide of modern computability theory is irrelevant to human computation. In human computability, the step counts for both humans and computers must be more concrete. As an application and running example, password generation schemas are humanly computable algorithms based on private keys. Humanly computable and/or humanly usable mean, roughly speaking, that any human needing—and capable of using—passwords can if sufficiently motivated generate and memorize a secret key in less than 1 h (including all rehearsals) and can subsequently use schema plus key to transform website names (challenges) into passwords (responses) in less than 1 min. Moreover, the schemas have precisely defined measures of security against all adversaries, human and/or machine.

 
more » « less
Award ID(s):
1717349
NSF-PAR ID:
10144514
Author(s) / Creator(s):
;
Publisher / Repository:
Proceedings of the National Academy of Sciences
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
117
Issue:
17
ISSN:
0027-8424
Page Range / eLocation ID:
p. 9208-9215
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The tower number ${\mathfrak t}$ and the ultrafilter number $\mathfrak {u}$ are cardinal characteristics from set theory. They are based on combinatorial properties of classes of subsets of $\omega $ and the almost inclusion relation $\subseteq ^*$ between such subsets. We consider analogs of these cardinal characteristics in computability theory. We say that a sequence $(G_n)_{n \in {\mathbb N}}$ of computable sets is a tower if $G_0 = {\mathbb N}$ , $G_{n+1} \subseteq ^* G_n$ , and $G_n\smallsetminus G_{n+1}$ is infinite for each n . A tower is maximal if there is no infinite computable set contained in all $G_n$ . A tower ${\left \langle {G_n}\right \rangle }_{n\in \omega }$ is an ultrafilter base if for each computable R , there is n such that $G_n \subseteq ^* R$ or $G_n \subseteq ^* \overline R$ ; this property implies maximality of the tower. A sequence $(G_n)_{n \in {\mathbb N}}$ of sets can be encoded as the “columns” of a set $G\subseteq \mathbb N$ . Our analogs of ${\mathfrak t}$ and ${\mathfrak u}$ are the mass problems of sets encoding maximal towers, and of sets encoding towers that are ultrafilter bases, respectively. The relative position of a cardinal characteristic broadly corresponds to the relative computational complexity of the mass problem. We use Medvedev reducibility to formalize relative computational complexity, and thus to compare such mass problems to known ones. We show that the mass problem of ultrafilter bases is equivalent to the mass problem of computing a function that dominates all computable functions, and hence, by Martin’s characterization, it captures highness. On the other hand, the mass problem for maximal towers is below the mass problem of computing a non-low set. We also show that some, but not all, noncomputable low sets compute maximal towers: Every noncomputable (low) c.e. set computes a maximal tower but no 1-generic $\Delta ^0_2$ -set does so. We finally consider the mass problems of maximal almost disjoint, and of maximal independent families. We show that they are Medvedev equivalent to maximal towers, and to ultrafilter bases, respectively. 
    more » « less
  2. State-of-the-art password guessing tools, such as HashCat and John the Ripper, enable users to check billions of passwords per second against password hashes. In addition to performing straightforward dictionary attacks, these tools can expand password dictionaries using password generation rules, such as concatenation of words (e.g., “password123456”) and leet speak (e.g., “password” becomes “p4s5w0rd”). Although these rules work well in practice, creating and expanding them to model further passwords is a labor-intensive task that requires specialized expertise. To address this issue, in this paper we introduce PassGAN, a novel approach that replaces human-generated password rules with theory-grounded machine learning algorithms. Instead of relying on manual password analysis, PassGAN uses a Generative Adversarial Network (GAN) to autonomously learn the distribution of real passwords from actual password leaks, and to generate high-quality password guesses. Our experiments show that this approach is very promising. When we evaluated PassGAN on two large password datasets, we were able to surpass rule-based and state-of-the-art machine learning password guessing tools. However, in contrast with the other tools, PassGAN achieved this result without any a-priori knowledge on passwords or common password structures. Additionally, when we combined the output of PassGAN with the output of HashCat, we were able to match 51%–73% more passwords than with HashCat alone. This is remarkable, because it shows that PassGAN can autonomously extract a considerable number of password properties that current state-of-the art rules do not encode. 
    more » « less
  3. We develop an economic model of an offline password cracker which allows us to make quantitative predictions about the fraction of accounts that a rational password attacker would crack in the event of an authentication server breach. We apply our economic model to analyze recent massive password breaches at Yahoo!, Dropbox, LastPass and AshleyMadison. All four organizations were using key-stretching to protect user passwords. In fact, LastPass' use of PBKDF2-SHA256 with $10^5$ hash iterations exceeds 2017 NIST minimum recommendation by an order of magnitude. Nevertheless, our analysis paints a bleak picture: the adopted key-stretching levels provide insufficient protection for user passwords. In particular, we present strong evidence that most user passwords follow a Zipf's law distribution, and characterize the behavior of a rational attacker when user passwords are selected from a Zipf's law distribution. We show that there is a finite threshold which depends on the Zipf's law parameters that characterizes the behavior of a rational attacker --- if the value of a cracked password (normalized by the cost of computing the password hash function) exceeds this threshold then the adversary's optimal strategy is {\em always} to continue attacking until each user password has been cracked. In all cases (Yahoo!, Dropbox, LastPass and AshleyMadison) we find that the value of a cracked password almost certainly exceeds this threshold meaning that a rational attacker would crack all passwords that are selected from the Zipf's law distribution (i.e., most user passwords). This prediction holds even if we incorporate an aggressive model of diminishing returns for the attacker (e.g., the total value of $500$ million cracked passwords is less than $100$ times the total value of $5$ million passwords). On a positive note our analysis demonstrates that memory hard functions (MHFs) such as SCRYPT or Argon2i can significantly reduce the damage of an offline attack. In particular, we find that because MHFs substantially increase guessing costs a rational attacker will give up well before he cracks most user passwords and this prediction holds even if the attacker does not encounter diminishing returns for additional cracked passwords. Based on our analysis we advocate that password hashing standards should be updated to require the use of memory hard functions for password hashing and disallow the use of non-memory hard functions such as BCRYPT or PBKDF2. 
    more » « less
  4. We introduce password strength signaling as a potential defense against password cracking. Recent breaches have exposed billions of user passwords to the dangerous threat of offline password cracking attacks. An offline attacker can quickly check millions (or sometimes billions/trillions) of password guesses by comparing a candidate password’s hash value with a stolen hash from a breached authentication server. The attacker is limited only by the resources he is willing to invest. We explore the feasibility of applying ideas from Bayesian Persuasion to password authentication. Our key idea is to have the authentication server store a (noisy) signal about the strength of each user password for an offline attacker to find. Surprisingly, we show that the noise distribution for the signal can often be tuned so that a rational (profit-maximizing) attacker will crack fewer passwords. The signaling scheme exploits the fact that password cracking is not a zero-sum game i.e., it is possible for an attacker to increase their profit in a way that also reduces the number of cracked passwords. Thus, a well-defined signaling strategy will encourage the attacker to reduce his guessing costs by cracking fewer passwords. We use an evolutionary algorithm to compute the optimal signaling scheme for the defender. We evaluate our mechanism on several password datasets and show that it can reduce the total number of cracked passwords by up to 12% (resp. 5%) of all users in defending against offline (resp. online) attacks. While the results of our empirical analysis are positive we stress that we view the current solution as a proof-of-concept as there are important societal concerns that would need to be considered before adopting our password strength signaling solution. 
    more » « less
  5. Many problems in programming language theory and formal methods are undecidable, so they cannot be solved precisely. Practical techniques for dealing with undecidable problems are often based on decidable approximations. Undecidability implies that those approximations are always imprecise. Typically, practitioners use heuristics andad hocreasoning to identify imprecision issues and improve approximations, but there is a lack of computability-theoretic foundations about whether those efforts can succeed.

    This paper shows a surprising interplay between undecidability and decidable approximations: there exists a class of undecidable problems, such that it is computable to transform any decidable approximation to a witness input demonstrating its imprecision. We call those undecidable problemswitnessable problems. For example, if a program propertyPis witnessable, then there exists a computable functionfP, such thatfPtakes as input the code of any program analyzer targetingPand produces an input programwon which the program analyzer is imprecise. An even more surprising fact is that the class of witnessable problems includes almost all undecidable problems in programming language theory and formal methods. Specifically, we prove the diagonal halting problemKis witnessable, and the class of witnessable problems is closed under complements and many-one reductions. In particular, all “non-trivial semantic properties of programs” mentioned in Rice’s theorem are witnessable. We also explicitly construct a problem in the non-witnessable (and undecidable) class and show that both classes have cardinality 20.

    Our results offer a new perspective on the understanding of undecidability: for witnessable problems, although it is impossible to solve them precisely, it is always possible to improve any decidable approximation to make it closer to the precise solution. This fact formally demonstrates that research efforts on such approximations are promising and shows there exist universal ways to identify precision issues of program analyzers, program verifiers, SMT solvers, etc., because their essences are decidable approximations of witnessable problems.

     
    more » « less