<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>On Finding Quantum Multi-collisions</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2019 Spring</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10095285</idno>
					<idno type="doi"></idno>
					<title level='j'>EUROCRYPT 2019</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Qipeng Liu</author><author>Mark Zhandry</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[A k-collision for a compressing hash function H is a set of k distinct inputs that all map to the same output. In this work, we show that for any constant k, \Theta(N^(1/2(1-1/(2^k-1)))) quantum queries are both necessary and sufficient to achieve a k-collision with constant probability. This improves on both the best prior upper bound (Hosoyamada et al., ASIACRYPT 2017) and provides the first non-trivial lower bound, completely resolving the problem.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Collision resistance is one of the central concepts in cryptography. A collision for a hash function H : {0, 1} m &#8594; {0, 1} n is a pair of distinct inputs x 1 = x 2 that map to the same output: H(x 1 ) = H(x 2 ).</p><p>Multi-collisions. Though receiving comparatively less attention in the literature, multi-collision resistance is nonetheless an important problem. A k-collision for H is a set of k distinct inputs {x 1 , . . . , x k } such that x i = x j for i = j where H(x i ) = H(x j ) for all i, j.</p><p>Multi-collisions frequently surface in the analysis of hash functions and other primitives. Examples include MicroMint <ref type="bibr">[RS97]</ref>, RMAC <ref type="bibr">[JJV02]</ref>, chopMD <ref type="bibr">[CN08]</ref>, Leamnta-LW <ref type="bibr">[HIK+11]</ref>, PHOTON and Parazoa <ref type="bibr">[NO14]</ref>, Keyed-Sponge <ref type="bibr">[JLM14]</ref>, all of which assume the multi-collision resistance of a certain function. Multi-collisions algorithms have also been used in attacks, such as MDC-2 <ref type="bibr">[KMRT09]</ref>, HMAC <ref type="bibr">[NSWY13]</ref>, Even-Mansour <ref type="bibr">[DDKS14]</ref>, and LED <ref type="bibr">[NWW14]</ref>. Multi-collision resistance for polynomial k has also recently emerged as a theoretical way to avoid keyed hash functions <ref type="bibr">[BKP18,</ref><ref type="bibr">BDRV18]</ref>, or as a useful cryptographic primitives, for example, to build statistically hiding commitment schemes with succinct interaction <ref type="bibr">[KNY18]</ref>.</p><p>Quantum. Quantum computing stands to fundamentally change the field of cryptography. Importantly for our work, Grover's algorithm <ref type="bibr">[Gro96]</ref> can speed up brute force searching by a quadratic factor, greatly increasing the speed of preimage attacks on hash functions. In turn, Grover's algorithm can be used to find ordinary collisions (k = 2) in time O(2 n/3 ), speeding up the classical "birthday" attack which requires O(2 n/2 ) time. It is also known that, in some sense (discussed below), these speedups are optimal <ref type="bibr">[AS04,</ref><ref type="bibr">Zha15a]</ref>. These attacks require updated symmetric primitives with longer keys in order to make such attacks intractable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">This Work: Quantum Query Complexity of Multi-collision Resistance</head><p>In this work, we consider quantum multi-collision resistance. Unfortunately, little is known of the difficulty of finding multi -collisions for k &#8805; 3 in the quantum setting. The only prior work on this topic is that of Hosoyamada et al. <ref type="bibr">[HSX17]</ref>, who give a O(2 4n/9 ) algorithm for 3-collisions, as well as algorithms for general constant k. On the lower bounds side, the &#937;(2 n/3 ) from the k = 2 case applies as well for higher k, and this is all that is known. We completely resolve this question, giving tight upper and lower bounds for any constant k. In particular, we consider the quantum query complexity of multi-collisions. We will model the hash function H as a random oracle. This means, rather than getting concrete code for a hash function H, the adversary is given black box access to a function H chosen uniformly at random from the set of all functions from {0, 1} m into {0, 1} n . Since we are in the quantum setting, black box access means the adversary can make quantum queries to H. Each query will cost the adversary 1 time step. The adversary's goal is to solve some problem-in our case find a k-collision-with the minimal cost. Our results are summarized in Table <ref type="table">1</ref>. Both our upper bounds and lower bounds improve upon the prior work for k &#8805; 3; for example, for k = 3, we show that the quantum query complexity is &#920;(2 3n/7 ).</p><p>Table <ref type="table">1</ref>. Quantum query complexity results for k-collisions. k is taken to be a constant, and all Big O and &#937; notations hide constants that depend on k. In parenthesis are the main restrictions for the lower bounds provided. We note that in the case of 2-to-1 functions, m &#8804; n + 1, so implicitly these bounds only apply in this regime. In these cases, m characterizes the query complexity. On the other hand, for random or arbitrary functions, n is the more appropriate way to measure query complexity. We also note that for arbitrary functions, when m &#8804; n + log(k -1), it is possible that H contains no k-collisions, so the problem becomes impossible. Hence, m &#8805; n + log k is essentially tight. For random functions, there will be no collisions w.h.p unless m (1 -1 k )n, so algorithms on random functions must always operate in this regime.</p><p>Upper Bound (Algorithm)</p><p>Lower Bound <ref type="bibr">[BHT98]</ref> O(2 m/3 ) for k = 2 (2-to-1)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>[AS04]</head><p>&#937;(2 m/3 ) for k = 2 (2-to-1) <ref type="bibr">[Zha15a]</ref> O(2 n/3 ) for k = 2 (Random, m &#8805; n/2) &#937;(2 n/3 ) for k = 2 (Random) <ref type="bibr">[HSX17]</ref> O 2</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Motivation</head><p>Typically, the parameters of a hash function are set to make finding collisions intractable. One particularly important parameter is the output length of the hash function, since the output length in turn affects storage requirements and the efficiency of other parts of a cryptographic protocol. Certain attacks, called generic attacks, apply regardless of the implementation details of the hash function H, and simply work by evaluating H on several inputs. For example, the birthday attack shows that it is possible to find a collision in time approximately 2 n/2 by a classical computer. Generalizations show that k-collisions can be found in time &#920;(2 (<ref type="foot">foot_0</ref>-1/k)n ) 1 .</p><p>These are also known to be optimal among classical generic attacks. This is demonstrated by modeling H as an oracle, and counting the number of queries needed to find (k-)collisions in an arbitrary hash function H. In cryptographic settings, it is common to model H as a random function, giving stronger average case lower bounds.</p><p>Understanding the effect of generic attacks is critical. First, they cannot be avoided, since they apply no matter how H is designed. Second, other parameters of the function, such as the number of iterations of an internal round function, can often be tuned so that the best known attacks are in fact generic. Therefore, for many hash functions, the complexity of generic attacks accurately represents the actual cost of breaking them.</p><p>Therefore, for "good" hash functions where generic attacks are optimal, in order to achieve security against classical adversaries n must be chosen so that t = 2 n/2 time steps are intractable. This often means setting t = 2 128 , so n = 256. In contrast, generic classical attacks can find k-collisions in time &#920;(2 (1-1/k)n ). For example, this means that n must be set to 192 to avoid 3-collisions, or 171 to avoid 4-collisions.</p><p>Once quantum computers enter the picture, we need to consider quantum queries to H in order to model actual attacks that evaluate H in superposition. This changes the query complexity, and makes proving bounds much more difficult. Just as understanding query complexity in the classical setting was crucial to guide parameter choices, it will be critical in the quantum world as well.</p><p>We also believe that quantum query complexity is an important study in its own right, as it helps illuminate the effects quantum computing will have on various areas of computer science. It is especially important to cryptography, as many of the questions have direct implications to the postquantum security of cryptosystems. Even more, the techniques involved are often closely related to proof techniques in post-quantum cryptography. For example, bounds for the quantum query complexity of finding collisions in random functions <ref type="bibr">[Zha15a]</ref>, as well as more general functions [EU17,BES17], were developed from techniques for proving security in the quantum random oracle model <ref type="bibr">[BDF+11,</ref><ref type="bibr">Zha12,</ref><ref type="bibr">TU16]</ref>. Similarly, the lower bounds in this work build on techniques for proving quantum indifferentiability <ref type="bibr">[Zha18]</ref>. On the other hand, proving the security of MACs against superposition queries <ref type="bibr">[BZ13]</ref> resulted in new lower bounds for the quantum oracle interrogation problem <ref type="bibr">[van98]</ref> and generalizations <ref type="bibr">[Zha15b]</ref>.</p><p>Lastly, multi-collision finding can be seen as a variant of k-distinctness, which is essentially the problem of finding a k-collision in a function H : {0, 1} n &#8594; {0, 1} n , where the k-collision may be unique and all other points are distinct. The quantum query complexity of k-distinctness is currently one of the main open problems in quantum query complexity. An upper bound of (2 n ) 3 4 -1 4(2 k -1) was shown by Belovs <ref type="bibr">[Bel12]</ref>. The best known lower bound is &#937;((</p><p>Interestingly, the dependence of the exponent on k is exponential for the upper bound, but polynomial for the lower bound, suggesting a fundamental gap our understanding of the problem.</p><p>Note that our results do not immediately apply in this setting, as our algorithm operates only in a regime where there are many (&#8804; k-)collisions, whereas k-distinctness applies even if the k-collision is unique and all other points are distinct (in particular, no (k -1)-collisions). On the other hand, our lower bound is always lower than 2 n/2 , which is trivial for this problem. Nonetheless, both problems are searching for the same thing-namely a k-collisions-just in different settings. We hope that future work may be able to extend our techniques to solve the problem of k-distinctness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">The Reciprocal Plus 1 Rule</head><p>For many search problems over random functions, such as pre-image search, collision finding, k-sum, quantum oracle interrogation, and more, a very simple folklore rule of thumb translates the classical query complexity into quantum query complexity.</p><p>In particular, let N = 2 n , all of these problems have a classical query complexity &#920;(N 1/&#945; ) for some rational number &#945;. Curiously, the quantum query complexity of all these problems is always &#920;(N 1 &#945;+1 ). In slightly more detail, for all of these problems the best classical q-query algorithm solves the problem with probability &#920;(q c /N d ) for some constants c, d. Then the classical query complexity is &#920;(N d/c ). For this class of problems, the success probability of the best q query quantum algorithm is obtained simply by increasing the power of q by d. This results in a quantum query complexity of &#920;(N d/(c+d) ). Examples: -Grover's pre-image search <ref type="bibr">[Gro96]</ref> improves success probability from q/N to q 2 /N , which is known to be optimal <ref type="bibr">[BBBV97]</ref>. The result is a query complexity improvement from N = N 1/1 to N 1/2 . Similarly, finding, say, 2 pre-images has classical success probability q 2 /N 2 ; it is straightforward to adapt known techniques to prove that the best quantum success probability is q 4 /N 2 . Again, the query complexity goes from N to N 1/2 . Analogous statements hold for any constant number of pre-images. -The BHT collision finding algorithm <ref type="bibr">[BHT98]</ref> finds a collision with probability q 3 /N , improving on the classical birthday attack q 2 /N . Both of these are known to be optimal <ref type="bibr">[AS04,</ref><ref type="bibr">Zha15a]</ref>. Thus quantum algorithms improve the query complexity from N 1/2 to N 1/3 . Similarly, finding, say, 2 distinct collisions has classical success probability q 4 /N 2 , whereas we show that the quantum success probability is q 6 /N 2 . More generally, any constant number of distinct collisions conforms to the Reciprocal Plus 1 Rule. -k-sum asks to find a set of k inputs such that the sum of the outputs is 0. This is a different generalization of collision finding than what we study in this work. Classically, the best algorithm succeeds with probability q k /N . Quantumly, the best algorithm succeeds with probability q k+1 /N <ref type="bibr">[BS13,</ref><ref type="bibr">Zha18]</ref>.</p><p>Hence the query complexity goes from N 1/k to N 1/(k+1) . Again, solving for any constant number of distinct k-sum solutions also conforms to the Reciprocal Plus 1 Rule. -In the oracle interrogation problem, the goal is to compute q +1 input/output pairs, using only q queries. Classically, the best success probability is clearly 1/N. Meanwhile, Boneh and Zhandry <ref type="bibr">[BZ13]</ref> give a quantum algorithm with success probability roughly q/N, which is optimal.</p><p>Some readers may have noticed that Reciprocal Plus 1 (RP1) rule does not immediately appear to apply the Element Distinctness. The Element Distinctness problem asks to find a collision in H : [M ] &#8594; [N ] where the collision is unique. Classically, the best algorithm succeeds with probability &#920;(q 2 /M 2 ). On the other hand, quantum algorithms can succeed with probability &#920;(q 3 /M 2 ), which is optimal <ref type="bibr">[Amb04,</ref><ref type="bibr">Zha15a]</ref>. This does not seem to follow the prediction of the RP1 rule, which would have predicted q 4 /M 2 . However, we note that unlike the settings above which make sense when N M , and where the complexity is characterized by N , the Element Distinctness problem requires M &#8804; N and the complexity is really characterized by the domain size M . Interestingly, we note that for a random expanding function, when N &#8776; M 2 , there will with constant probability be exactly one collision in H. Thus, in this regime the collision problem matches the Element Distinctness problem, and the RP1 rule gives the right query complexity! Similarly, the quantum complexity for k-sum is usually written as M k/(k+1) , not N 1/(k+1) . But again, this is because most of the literature considers H for which there is a unique k-sum and H is non-compressing, in which case the complexity is better measured in terms of M . Notice that a random function will contain a unique k collision when N &#8776; M k , in which case the bound we state (which follows the RP1 rule) exactly matches the statement usually given.</p><p>On the other hand, the RP1 rule does not give the right answer for kdistinctness for k &#8805; 3, since the RP1 rule would predict the exponent to approach 1/2 for large k, whereas prior work shows that it approaches 3/4 for large k. That RP1 does not apply perhaps makes sense, since there is no setting of N, M where a random function will become an instance of k-distinctness: for any setting of parameters where a random function has a k-collision, it will also most likely have many (k -1)-collisions.</p><p>The takeaway is that the RP1 Rule seems to apply for natural search problems that make sense on random functions when N M . Even for problems that do not immediately fit this setting such as Element Distinctness, the rule often still gives the right query complexity by choosing M, N so that a random function is likely to give an instance of the desired problem.</p><p>Enter k-collisions. In the case of k-collisions, the classical best success probability is q k /N (k-1) , giving a query complexity of N (k-1)/k = N 1-1/k . Since the kcollision problem is a generalization of collision finding, is similar in spirit to the problems above, and applies to compressing random functions, one may expect that the Reciprocal Plus 1 Rule applies. If true, this would give a quantum success probability of q 2k-1 /N k-1 , and a query complexity of</p><p>Even more, for small enough q, it is straightforward to find a k-collision with probability O(q 2k-1 /N k-1 ) as desired. In particular, divide the q queries into k -1 blocks. Using the first q/(k -1) queries, find a 2-collision with probability (q/(k -1)) 3 /N = O(q 3 /N ). Let y be the image of the collision. Then, for each of the remaining (k -2) blocks of queries, find a pre-image of y with probability (q/(k -1))<ref type="foot">foot_1</ref> /N = O(q 2 /N ) using Grover search. The result is k colliding inputs with probability O(q 3+2(k-2) /N k-1 ) = O(q 2k-1 /N k-1 ). It is also possible to prove that this is a lower bound on the success probability (see lower bound discussion below). Now, this algorithm works as long q &#8804; N 1/3 , since beyond this range the 2-collision success probability is bounded by 1 &lt; q 3 /N . Nonetheless, it is asymptotically tight in the regime for which it applies. This seems to suggest that the limitation to small q might be an artifact of the algorithm, and that a more clever algorithm could operate beyond the N 1/3 barrier. In particular, this strongly suggests k-collisions conforms to the Reciprocal Plus 1 Rule.</p><p>Note that the RP1 prediction gives an exponent that depends polynomially on k, asymptotically approaching 1/2. In contrast, the prior work of <ref type="bibr">[HSX17]</ref> approaches 1/2 exponentially fast in k. Thus, prior to our work we see an exponential vs polynomial gap for k-collisions, similar to the case of k-distinctness.</p><p>Perhaps surprisingly given the above discussion 2 , our work demonstrates that the right answer is in fact exponential, refuting the RP1 rule for k-collisions.</p><p>As mentioned above, our results do not immediately give any indication for the query complexity of k-distinctness. However, our results may hint that kdistinctness also exhibits an exponential dependence on k. We hope that future work, perhaps building on our techniques, will be able to resolve this question.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.4">Technical Details</head><p>The Algorithm. At their heart, the algorithms for pre-image search, collision finding, k-sum, and the recent algorithm for k-collision, all rely on Grover's algorithm. Let f : {0, 1} n &#8594; {0, 1} be a function with a fraction &#948; of accepting inputs. Grover's algorithm finds the input with probability O(&#948;q 2 ) using q quantum queries to f . Grover's algorithm finds a pre-image of a point y in H by setting f (x) to be 1 if and only if H(x) = y.</p><p>The BHT algorithm [BHT98] uses Grover's to find a collision in H. First, it queries H on q/2 = O(q) random points, assembling a database D. As long as q N 1/2 , all the images in D will be distinct. Now, it lets f (x) be the function that equals 1 if and only if H(x) is found amongst the images in D, and x is not among the pre-images. By finding an accepting input to f , one immediately finds a collision. Notice that the fraction of accepting inputs is approximately q/N.</p><p>By running Grover's for q/2 = O(q) steps, one obtains a such a pre-image, and hence a collision, with probability O((q/N )q 2 ) = O(q 3 /N ).</p><p>Hosoyamada et al. show how this idea can be recursively applied to find multi-collisions. For k = 3, the first step is to find a database D 2 consisting of r distinct 2-collisions. By recursively applying the BHT algorithm, each 2-collision takes time N 1/3 . Then, to find a 3 collision, set up f as before: f (x) = 1 if and only if H(x) is amongst the images in D and x is not among the pre-images. The fraction of accepting inputs is approximately r /N, so Grover's algorithm will find a 3-collision in time (N/r) 1/2 . Setting r to be N 1/9 optimizes the total query count as N 4/9 . For k = 4, recursively build a table D 3 of 3-collisions, and set up f to find a collision with the database.</p><p>The result is an algorithm for k-collisions for any constant k, using</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Our algorithm improves on Hosoyamada et al.'s, yielding a query complexity of O(N</head><p>) ). Note that for Hosoyamada et al.'s algorithm, when constructing D k-1 , many different D k-2 databases are being constructed, one for each entry in D k-1 . Our key observation is that a single database can be re-used for the different entries of D k-1 . This allows us to save on some of the queries being made. These extra queries can then be used in other parts of the algorithm to speed up the computation. By balancing the effort correctly, we obtain our algorithm. Put another way, the cost of finding many (k-)collisions can be amortized over many instances, and then recursively used for finding collisions with higher k. Since the recursive steps involve solving many instances, this leads to an improved computational cost.</p><p>In more detail, we iteratively construct databases D 1 , D 2 , . . . , D k . Each D i will have r i i-collisions. We set r k = 1, indicating that we only need a single k-collision. To construct database D 1 , simply query on r 1 arbitrary points. To construct database D i , i &#8805; 2, define the function f i that accepts inputs that collide with D i-1 but are not contained in D i-1 . The fraction of points accepted by f i is approximately r i-1 /N . Therefore, Grover's algorithm returns an accepting input in time (N/r i-1 ) 1/2 . We simply run Grover's algorithm r i times using the same database D i-1 to construct D i in time r i (N/r i-1 ) 1/2 . Now we just optimize r 1 , . . . , r k-1 by setting the number of queries to construct each database to be identical. Notice that r 1 = O(q), so solving for r i gives us</p><p>Setting r k = 1 and solving for q gives the desired result. In particular, in the case k = 3, our algorithm finds a collision in time O(N 3/7 ).</p><p>The Lower Bound. Notice that our algorithm fails to match the result one would get by applying the "Reciprocal Plus 1 Rule". Given the discussion above, one may expect that our iterative algorithm could potentially be improved on even more. To the contrary we prove that, in fact, our algorithm is asymptotically optimal for any constant k.</p><p>Toward that end, we employ a recent technique developed by Zhandry <ref type="bibr">[Zha18]</ref> for analyzing quantum queries to random functions. We use this technique to show that our algorithm is tight for random functions, giving an average-case lower bound.</p><p>Zhandry's "Compressed Oracles." Zhandry demonstrates that the information an adversary knows about a random oracle H can be summarized by a database D * of input/output pairs, which is updated according to some special rules. In Zhandry's terminology, D * is the "compressed standard/phase oracle". This D * is not a classical database, but technically a superposition of all databases, meaning certain amplitudes are assigned to each possible database. D * can be measured, obtaining an actual classical database D with probability equal to its amplitude squared. In the following discussion, we will sometimes pretend that D * is actually a classical database. While inaccurate, this will give the intuition for the lower bound techniques we employ. In the Sect. 4 we take care to correctly analyze D * as a superposition of databases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Zhandry shows roughly the following:</head><p>-Consider any "pre-image problem", whose goal is to find a set of pre-images such that the images satisfy some property. For example, k-collision is the problem of finding k pre-images such that the corresponding images are all the same. Then after q queries, consider measuring D * . The adversary can only solve the pre-image problem after q queries if the measured D * has a solution to the pre-image problem. Thus, we can always upper bound the adversary's success probability by upper bounding the probability D * contains a solution. -D * starts off empty, and each query can only add one point to the database.</p><p>-For any image point y, consider the amplitude on databases containing y as a function of q (remember that amplitude is the square root of the probability). Zhandry shows that this amplitude can only increase by O( 1/N ) from one query to the next. More generally, for a set S of r different images, the amplitude on databases containing any point in S can only increase by O( |S|/N ).</p><p>The two results above immediately imply the optimality of Grover's search. In particular, the amplitude on databases containing y is at most O(q 1/N ) after q queries, so the probability of obtaining a solution is the square of this amplitude, or O(q 2 /N ). This also readily gives a lower bound for the collision problem. Namely, in order to introduce a collision to D * , the adversary must add a point that collides with one of the existing points in D * . Since there are at most q such points, the amplitude on such D * can only increase by O( q/N ). This means the overall amplitude after q queries is at most O(q 3/2 /N 1/2 ). Squaring to get a probability gives the correct lower bound.</p><p>A First Attempt. Our core idea is to attempt a lower bound for k-collision by applying these ideas recursively. The idea is that, in order to add, say, a 3collision to D * , there must be an existing 2-collision in the database. We can then use the 2-collision lower bound to bound the increase in amplitude that results from each query.</p><p>More precisely, for very small q, we can bound the amplitude on databases containing distinct 2-collisions as O( (q 3/2 /N 1/2 ) ). If q N 1/3 , must be a constant else this term is negligible. So we can assume for q &lt; N 1/3 that is a constant.</p><p>Then, we note that in order to introduce a 3-collision, the adversary's new point must collide with one of the existing 2-collisions. Since there are at most , we know that the amplitude increases by at most O(</p><p>) since is a constant. This shows that the amplitude on databases with 3-collisions is at most q/N 1/2 . We can bound the amplitude increase even smaller by using not only the fact that the database contains at most 2-collisions, but the fact that the amplitude on databases containing even a single 2-collision is much less than 1. In particular, it is O(q 3/2 /N 1/2 ) as demonstrated above. Intuitively, it turns out we can actually just multiply the 1/N 1/2 amplitude increase in the case where the database contains a 2-collision by the q 3/2 /N 1/2 amplitude on databases containing any 2-collision to get an overall amplitude increase of q 3/2 /N .</p><p>Overall then, we upper bound the amplitude after q &lt; N 1/3 queries by O(q 5/2 /N ), given an upper bound of O(q 5 /N 2 ) on the probability of finding a 3-collision. This lower bound can be extended recursively to any constant k-collisions, resulting in a bound that exactly matches the Reciprocal Plus 1 Rule, as well as the algorithm for small q! This again seems to suggest that our algorithm is not optimal.</p><p>Our Full Proof. There are two problems with the argument above that, when resolved, actually do show our algorithm is optimal. First, when q &#8805; N 1/3 , the O(q 3/2 /N 1/2 ) part of the amplitude bound becomes vacuous, as amplitudes can never be more than 1. Second, the argument fails to consider algorithms that find many 2-collisions, which is possible when q &gt; N 1/3 . Finding many 2-collisions of course takes more queries, but then it makes extending to 3-collisions easier, as there are more collisions in the database to match in each iteration.</p><p>In our full proof, we examine the amplitude on the databases containing a 3-collision as well as r 2-collisions, after q queries. We call this amplitude g q,r . We show a careful recursive formula for bounding g using Zhandry's techniques, which we then solve.</p><p>More generally, for any constant k, we let g</p><p>q,r,s be the amplitude on databases containing exactly r distinct (k -1)-collisions and at least s distinct k-collisions after q queries. We develop a multiply-recursive formula for the g (k) in terms of the g (k) and g (k-1) . We then recursively plug in our solution to g (k-1) so that the recursion is just in terms of g (k) , which we then solve using delicate arguments.</p><p>Interestingly, this recursive structure for our lower bound actually closely matches our algorithm. Namely, our proof lower bounds the difficulty of adding an i-collision to a database D * containing many i -1 collisions, exactly the problem our algorithm needs to solve. Our techniques essentially show that every step of our algorithm is tight, resulting in a lower bound of &#937; N</p><p>) , exactly matching our algorithm. Thus, we solve the quantum query complexity of k-collisions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.5">Other Related Work</head><p>Most of the related work has been mentioned earlier. Recently, in <ref type="bibr">[HSTX18]</ref>, Hosoyamada, Sasaki, Tani and Xagawa gave the same improvement. And they also showed that, their algorithm can also find a multi-collision for a more general setting where</p><p>) and find a multiclaw for random functions with the same query complexity. They also noted that our improved collision finding algorithm for the case |X| &#8805; l &#8226; |Y | was reported in the Rump Session of AsiaCrypt 2017. They did not give an accompanying lower bound.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminaries</head><p>Here, we recall some basic facts about quantum computation, and review the relevant literature on quantum search problems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Quantum Computation</head><p>A quantum system Q is defined over a finite set B of classical states. In this work we will consider B = {0, 1} n . A pure state over Q is a unit vector in C |B| , which assigns a complex number to each element in B. In other words, let |&#966; be a pure state in Q, we can write |&#966; as:</p><p>where x&#8712;B |&#945; x | 2 = 1 and {|x } x&#8712;B is called the "computational basis" of C |B| . The computational basis forms an orthonormal basis of C |B| .</p><p>Given two quantum systems Q 1 over B 1 and Q 2 over B 2 , we can define a</p><p>is entangled. Otherwise, we say |&#966; is unentangled.</p><p>A pure state |&#966; &#8712; Q can be manipulated by a unitary transformation U . The resulting state |&#966; = U |&#966; .</p><p>We can extract information from a state |&#966; by performing a measurement. A measurement specifies an orthonormal basis, typically the computational basis, and the probability of getting result x is | x|&#966; | 2 . After the measurement, |&#966; "collapses" to the state |x if the result is x.</p><p>For example, given the pure state |&#966; = 3 5 |0 + 4 5 |1 measured under {|0 , |1 }, with probability 9/25 the result is 0 and |&#966; collapses to |0 ; with probability 16/25 the result is 1 and |&#966; collapses to |1 .</p><p>We finally assume a quantum computer can implement any unitary transformation (by using these basic gates, Hadamard, phase, CNOT and &#960; 8 gates), especially the following two unitary transformations:</p><p>Here, &#8853; is a commutative group operation defined over Y . -Quantum Fourier Transform: Let N = 2 n . Given a quantum state |&#966; = 2 n -1 i=0 x i |i , by applying only O(n 2 ) basic gates, one can compute |&#968; = 2 n -1 i=0 y i |i where the sequence {y i } 2 n -1 i=0 is the sequence achieved by applying the classical Fourier transform QFT N to the sequence {x i } 2 n -1 i=0 :</p><p>where &#969; n = e 2&#960;i/N , i is the imaginary unit. One interesting property of QFT is that by preparing |0 n and applying QFT 2 to each qubit, (QFT</p><p>For convenience, we sometimes ignore the normalization of a pure state which can be calculated from the context.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Grover's Algorithm and BHT Algorithm</head><p>Definition 1 (Database Search Problem). Suppose there is a function/database encoded as F : X &#8594; {0, 1} and F -1 (1) is non-empty. The problem is to find x * &#8712; X such that F (x * ) = 1.</p><p>We will consider adversaries with quantum access to F , meaning they submit queries as x&#8712;X,y&#8712;{0,1} &#945; x,y |x, y and receive in return x&#8712;X,y&#8712;{0,1} &#945; x,y |x, y &#8853; F (x) . Grover's algorithm <ref type="bibr">[Gro96]</ref> finds a pre-image using an optimal number of queries:</p><p>There is a quantum algorithm that finds x * &#8712; X such that F (x * ) = 1 with an expected number of quantum</p><p>even without knowing t in advance.</p><p>We will normally think of the number of queries as being fixed, and consider the probability of success given the number of queries. The algorithm from Theorem 1, when runs for q queries, can be shown to have a success probability min(1, O(q 2 /(|X|/t))). For the rest of the paper, "Grover's algorithm" will refer to this algorithm. Now let us look at another important problem: 2-collision finding problem on 2-to-1 functions. Brassard, H&#248;yer and Tapp proposed a quantum algorithm <ref type="bibr">[BHT98]</ref> that solved the problem using only O(N 1/3 ) quantum queries. The idea is the following:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 2 (Collision Finding on 2-to-1 Functions). Assume</head><p>-Prepare a list of input and output pairs, L = {(x i , y i = F (x i )} t i=1 where x i is drawn uniformly at random and t = N 1/3 ; -If there is a 2-collision in L, output that pair. Otherwise, -Run Grover's algorithm on the following function F : F (x) = 1 if and only if there exists i &#8712; {1, 2, &#8226; &#8226; &#8226; , t}, F (x) = y i = F (x i ) and x = x i . Output the solution x, as well as whatever x i it collides with.</p><p>This algorithm takes O(t + N/t) quantum queries and when t = &#920;(N 1/3 ), the algorithm finds a 2-collision with O(N 1/3 ) quantum queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Multi-collision Finding and [HSX17]</head><p>Hosoyamada, Sasaki and Xagawa proposed an algorithm for k-collision finding on any function F : X &#8594; Y where |X| &#8805; k|Y | (k is a constant). They generalized the idea of <ref type="bibr">[BHT98]</ref> and gave the proof for even arbitrary functions. We now briefly talk about their idea. For simplicity in this discussion, we assume F is a k-to-1 function.</p><p>The algorithm prepares t pairs of 2-collisions (x 1 , x 1 ), &#8226; &#8226; &#8226; , (x t , x t ) by running the BHT algorithm t times. If two pairs of 2-collisions collide, there is at least a 3-collision (possibly a 4-collision). Otherwise, it uses Grover's algorithm to find a x = x i , x = x i and f (x ) = f (x i ) = f (x i ). The number of queries is O(tN 1/3 + N/t). When t = &#920;(N 1/9 ), the query complexity is O(N 4/9 ).</p><p>By induction, finding a (k -1)-collision requires O(N (3 k-1 -1)/(2&#8226;3 k-1 ) ) quantum queries. By preparing t (k -1)-collisions and applying Grover's algorithm to it, it takes O(tN</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Compressed Fourier Oracles and Compressed Phase Oracles</head><p>In <ref type="bibr">[Zha18]</ref>, Zhandry showed a new technique for analyzing cryptosystems in the random oracle model. He also showed that his technique can be used to re-prove several known quantum query lower bounds. In this work, we will extend his technique in order to prove a new optimal lower bound for multi-collisions.</p><p>The basic idea of Zhandry's technique is the following: assume A is making a query to a random oracle H and the query is x,u,z a x,u,z |x, u, z where x is the query register, u is the response register and z is its private register. Instead of only considering the adversary's state x,u,z a x,u,z |x, u + H(x), z for a random oracle H, we can actually treat the whole system as</p><p>where |H is the truth table of H. By looking at random oracles that way, Zhandry showed that these five random oracle models/simulators are equivalent:</p><p>1. Standard Oracles:</p><p>where &#969; n = e 2&#960;i/N and PhO = (I &#8855; QFT &#8224; &#8855; I) &#8226; StO &#8226; (I &#8855; QFT &#8855; I). In other words, it first applies the QFT to the u register, applies the standard query, and then applies QFT &#8224; one more time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Fourier Oracles:</head><p>We can view H |H as QFT|0 N . In other words, if we perform Fourier transform on a function that always outputs 0, we will get a uniform superposition over all the possible functions H |H . Moreover, H &#969; H(x)&#8226;u |H is equivalent to QFT|0 N &#8853; (x, u) . Here &#8853; means updating (xor) the x-th entry in the database with u. So in this model, we start with x,u,z a 0 x,u,z |x, u, z &#8855; QFT|D 0 where D 0 is an all-zero function. By making the i-th query, we have</p><p>The Fourier oracle incorporates QFT and operates directly on the D registers:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Compressed Fourier Oracles:</head><p>The idea is basically the same as Fourier oracles. But when the algorithm only makes q queries, the database D with non-zero weight contains at most q non-zero entries. So to describe D, we only need at most q different (x i , u i ) pairs (u i = 0) which says the database outputs u i on x i and 0 everywhere else. And D &#8853; (x, u) is doing the following: (1) if x is not in the list D and u = 0, put (x, u) in D;</p><p>(2</p><p>In the model, we start with x,u,z a 0 x,u,z |x, u, z &#8855; |D 0 where D 0 is an empty list. After making the i-th query, we have</p><p>5. Compressed Standard/Phase Oracles: These two models are essentially equivalent up to an application of QFT applied to the query response register.</p><p>From now on we only consider compressed phase oracles. By applying QFT on the u entries of the database registers of a compressed Fourier oracle, we get a compressed phase oracle. In this model, D contains all the pair (x i , u i ) which means the oracle outputs u i on x i and uniformly at random on other inputs. When making a query on |x, u, z, D , -if (x, u ) is in the database D for some u , a phase &#969; uu n will be added to the state; it corresponds to update w to w + u in the compressed Fourier oracle model where w = D(x) in the compressed Fourier database.</p><p>-otherwise a superposition is appended to the state |x &#8855; u &#969; uu n |u ; it corresponds to put a new pair (x, u) in the list of the compressed Fourier oracle model; -also make sure that the list will never have an (x, 0) pair in the compressed Fourier oracle model (in other words, it is |x &#8855; y |y in the compressed phase oracle model); if there is one, delete that pair; -All the 'append' and 'delete' operations above mean applying QFT.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Algorithm for Multi-collision Finding</head><p>In this section, we give an improved algorithm for k-collision finding. We use the same idea from <ref type="bibr">[HSX17]</ref> but carefully reorganize the algorithm to reduce the number of queries.</p><p>As a warm-up, let us consider the case k = 3 and the case where F : X &#8594; Y is a 3-to-1 function, |X| = 3|Y | = 3N . They gives an algorithm with O(N 4/9 ) quantum queries. Here is our algorithm with only O(N 3/7 ) quantum queries:</p><p>i=1 where x i are distinct and t 1 = N 3/7 . This requires O(N 3/7 ) classical queries on random points.</p><p>-Define the following function F on X:</p><p>} and F (x) = y j for some j 0, otherwise Run Grover's algorithm on function F . Wlog (by reordering L), we find x 1 such that x 1 = x 1 and F (x 1 ) = F (x 1 ) using O( N/N 3/7 ) = O(N 2/7 ) quantum queries. -Repeat the last step t 2 = N 1/7 times, we will have</p><p>Grover's on function G:</p><p>} and F (x) = y j for some j 0, otherwise A 3-collision will be found when Grover's algorithm finds a pre-image of 1 on G. It takes O( N/N 1/7 ) = O(N 3/7 ) quantum queries.</p><p>Overall, the algorithm finds a 3-collision using O(N 3/7 ) quantum queries.</p><p>The similar algorithm and analysis works for any constant k and any k-to- 1) . The algorithm works as follows:</p><p>-Prepare a list L 1 of input-output pairs of size t 1 . With overwhelming probability (1 -N -1/2 k ), L 1 does not contain a collision. By letting t 0 = N , this step makes t 1 N/t 0 quantum queries. -Define a function F 2 (x) that returns 1 if the input x is not in L 1 but the image F (x) collides with one of the images in L 1 , otherwise it returns 0. Run Grover's on F 2 t 2 times. Every time Grover's algorithm outputs x , it gives a 2-collision. With probability 1 -O(N -1/2 k ) (explained below), all these t 2 collisions do not collide. So we have a list L 2 of t 2 different 2-collisions. This step makes t 2 N/t 1 quantum queries.</p><p>-For 2 &#8804; i &#8804; k -1, define a function F i (x) that returns 1 if the input x is not in L i-1 but the image F (x) collides with one of the images of (i -1)-collisions in L i-1 , otherwise it returns 0. Run Grover's algorithm on F i t i times. Every time Grover's algorithm outputs x , it gives an i-collision. With probability 1 -O(t 2 i /t i-1 ) = 1 -O(N -1/2 k ), all these t i collisions do not collide. So we have a list L i of t i different i-collisions. This step makes t i N/t i-1 quantum queries.</p><p>-Finally given t k-1 (k -1)-collisions, using Grover's to find a single x that makes a k-collision with one of the (k -1)-collision in L k-1 . This step makes t k N/t k-1 quantum queries by letting</p><p>The number of quantum queries made by the algorithm is simply:</p><p>So we have the following theorem:</p><p>), the algorithm above finds a k-collision using O(N (2 k-1 -1)/(2 k -1) ) quantum queries.</p><p>We now show the above conclusion holds for an arbitrary function F : X &#8594; Y as long as |X| &#8805; k|Y | = kN . To prove this, we use the following lemma:</p><p>be the probability that if we choose x uniformly at random and y = F (x), the number of pre-images of y is at least k. We have &#956; F &#8805; 1 k .</p><p>Proof. We say an input or a collision is good if its image has at least k pre-images.</p><p>To make the probability as small as possible, we want that if y has less than k pre-images, y should have exactly k -1 pre-images. So the probability is at least As what we did in the previous algorithm, in the list L 1 , with overwhelming probability, there are 0.999&#956; F &#8226; t 1 good inputs by Chernoff bound because every input is good with probability &#956; F . Then every 2-collision in L 2 has probability 0.999&#956; F to be good. So by Chernoff bound, L 2 contains at least 0.999 2 &#956; F t 2 good 2-collisions with overwhelming probability. By induction, in the final list L k-1 , with overwhelming probability, there are 0.999 k-1 &#956; F &#8226;t k-1 good (k-1)-collisions. Finally, the algorithm outputs a k-collision with probability 1, by making at most O( N/(0.99 k-1 &#956; F t k-1 )) quantum queries.</p><p>As long as k is a constant, the coefficients before t i are all constants. The number of quantum queries is scaled by a constant and is still O(N (2 k-1 -1)/(2 k -1) ) and the algorithm succeeds with overwhelming probability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Lower Bound for Multi-collision Finding</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Idea in [Zha18]</head><p>We will first show how Zhandry re-proved the lower bound of 2-collision finding using compressed oracle technique. The idea is that when we are working under compressed phase/standard oracle model, a query made by the adversary (x, u) can be recorded in the compressed oracle database.</p><p>Suppose before making the next quantum query, the current joint state is the following |&#966; =</p><p>x,u,z,D</p><p>where x is the query register, u is the response register, z is the private storage of the adversary and D is the database in the compressed phase oracle model. Consider measuring D after running the algorithm. Because the algorithm only has information about the points in the database D, the only way to have a nontrivial probability of finding a collision is for the D that results from measurement to have a collision. More formally, here is a lemma from <ref type="bibr">[Zha18]</ref>.</p><p>Lemma 2 (Lemma 5 from <ref type="bibr">[Zha18]</ref>). Consider a quantum algorithm A making queries to a random oracle H and outputting tuples</p><p>Let R be a collection of such tuples. Suppose with probability p, A outputs a tuple such that (1) the tuple is in R and (2) H(x i ) = y i for all i. Now consider running A with compressed standard/phase oracle, and suppose the database D is measured after A produces its output. Let p be the probability that (1) the tuple is in R, and (2) D(x i ) = y i for all i (and in particular D(x i ) = &#8869;). Then</p><p>As long as k is small, the difference is negligible. So we can focus on bounding the probability p .</p><p>Let P1 be a projection spanned by all the states with z, D containing at least one collision in the compressed phase oracles. In other words, z contains</p><p>We care about the amplitude (square root of the probability) P1 |&#966; . As in the above lemma, P1 |&#966; = &#8730; p and k = 2. Moreover, we can bound the amplitude of the following measurement. For every |x, u, z, D , after making one quantum query, the size of D will increase by at most 1. Let |&#966; i be the state before making the (i+ 1)-th quantum query and |&#966; i be the state after it. Let O be the unitary over the joint system corresponding to an oracle query, in other words, |&#966; i = O|&#966; i . By making q queries, the computation looks like the following:</p><p>-At the beginning, it has |&#966; 0 ; -For 1 &#8804; i &#8804; q, it makes a quantum query; the state |&#966; i-1 becomes |&#966; i-1 ;</p><p>and it applies a unitary on its registers U i &#8855; id to get |&#966; i where U i is some unitary defined over the registers x, u, z. -Finally measure it using P 1 , the probability of finding a collision (in the compressed phase oracle) is at most</p><p>We have the following two lemmas:</p><p>Lemma 3. For any unitary U i ,</p><p>Intuitively, P 1 is a measurement on the oracle's register and U i is a unitary on the adversary's registers, applying the unitary does not affect the measurement P 1 . Because U i is a unitary defined over the registers x, u, z and P 1 is a projective measurement defined over the database register D, we have</p><p>Proof. We have </p><p>By combining Lemmas 3 and 4, we have that |P</p><p>). So we re-prove the following theorem: Theorem 4. For any quantum algorithm, given a random function f : X &#8594; Y where |Y | = N , it needs to make &#937;(N 1/3 ) quantum queries to find a 2-collision with constant probability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Intuition for Generalizations</head><p>Here is the intuition for k = 3: as we have seen in the proof for k = 2, after T 1 = O(N 1/3 ) quantum queries, the database has high probability to contain a 2-collision. Following the same formula, after making T 2 queries, the amplitude that it contains two 2-collisions is about</p><p>And similarly after T i = O(i 2/3 N 1/3 ), the database will contain i 2-collisions. Now we just assume between the (T i-1 + 1)-th and T i -th query, the database contains exactly (i -1) 2-collisions. Every time a quantum query is made to a database with i 2-collisions, with probability at most i /N, the new database will contain a 3-collision. Similar to the Lemma 4, when we make queries until the database contains m 2-collisions, the amplitude that it contains a 3-collision in the database is at most</p><p>which gives us that the number of 2-collisions is m = N 1/7 . And the total number of quantum queries is</p><p>which is what we expected.</p><p>In the following sections, we will show how to bound the probability/amplitude of finding a k = 2, 3, 4-collision and any constant k-collision with constant probability. All the proof ideas are explained step by step through the proof for k = 2, 3, 4. The proof for any constant k is identical to the proof for k = 4 but every parameter is replaced with functions of k.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Lower Bound for 2-Collisions</head><p>Let P 2,j be a projection spanned by all the states with D containing at least j distinct 2-collisions in the compressed phase oracle model.</p><p>Let the current joint state be |&#966; (after making i quantum queries but before the (i + 1)-th query), and |&#966; be the state after making the (i + 1)-th quantum query.</p><p>|&#966; =</p><p>x,u,z,D</p><p>We have the relation following from Lemma 4:</p><p>Let |&#966; 0 , |&#966; 1 , &#8226; &#8226; &#8226; , |&#966; i be the state after making 0, 1, &#8226; &#8226; &#8226; , i quantum queries respectively. Let f i,j = |P 2,j |&#966; i |. We rewrite the relations using f i,j :</p><p>We observe that when i = o(j 2/3 N 1/3 ), f i,j = o(1).</p><p>Corollary 1. For any quantum algorithm, given a random function f : X &#8594; Y where |Y | = N , by making i queries, the probability of finding constant j 2collisions is at most O ( i 3 N ) j .</p><p>Theorem 5. For any quantum algorithm, given a random function f : X &#8594; Y where |Y | = N , it needs to make &#937;(j 2/3 N 1/3 ) quantum queries to find j 2collisions with constant probability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Lower Bound for 3-Collisions</head><p>Let P 3,k be a projection spanned by all the states with D containing at least k distinct 3-collisions in the compressed phase model. And let P 3,j,k be a projection spanned by all the states with D containing exactly j distinct 2-collisions and at least k 3-collisions. Let the current joint state be |&#966; (after making i quantum queries but before the (i + 1)-th query), and |&#966; be the state after making the (i + 1)-th quantum query. We have the following relation similar to Lemma 4:</p><p>where the first term means D already contains at least k 3-collisions before the query; and the second term is the case where a new 3-collision is added into the database. Similar to Lemma 4, only l out of N u will make D &#8853; (x, u ) contain k 3-collisions. So we have,</p><p>It is easy to see g i,0 &#8804; 1 for any i &#8805; 0 since it is an amplitude. We have the following:</p><p>We have the following lemma: Lemma 5.</p><p>Proof.</p><p>Here, in the last line, we used the fact that l&#8805;0 g 2 i-1,l,k-1 represents the total probability of the database having k -1 distinct 3-collisions, and so is equal to g 2 i-1,k-1 . Similarly, we used that l&gt;h3(i-1) g 2 i-1,l,k-1 represents the total probability of having at least k -1 distinct 3-collisions and at least h 3 (i -1) distinct 2-collisions. This probability is bounded above by the probability of just having at least h 3 (i -1) distinct 2-collisions, which is f 2 i-1,h3(i-1) .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma 6. Define</head><p>N . Then g i,k can be bounded as the following:</p><p>Proof. If we expand Lemma 5, we have</p><p>. . .</p><p>where if k &#8805; 1, g 0,k = 0. Next,</p><p>By recursively expanding the inequality, let C = 2 -9.5N 1/8 , we will get</p><p>We then bound A i for all i &#8804; N 1/2 (we can always assume i = o(N 1/2 ), because finding any constant-collision using O(N 1/2 ) quantum queries is easy by a quantum computer, just repeatedly applying Grover's algorithm):</p><p>Which implies A i &lt; 2e &#8226; N 1/8 (by letting i = &#8730; N ). So we complete the proof:</p><p>Theorem 6. For any quantum algorithm, given a random function f : X &#8594; Y where |Y | = N , it needs to make &#937;(j 4/7 N 3/7 ) quantum queries to find j 3collisions for any j &#8804; N 1/8 with constant probability.</p><p>Proof. We have two cases:</p><p>-When j is a constant: If i * = o(N 3/7 ), we have g i * ,j &#8804; o(1) + O(N -1/48 ).</p><p>-When j is not a constant: For any j, let i * be the largest integer such that A i * &lt; 1 2e &#8226; j. In this case, i * = O j 4/7 N 3/7 . So the probability of having at least j 3-collisions is bounded by g 2 i * ,j where g i * ,j &#8804; (eA i * /j) j + 2 -N 1/8 &#8804; 2 -j+1 + 2 -N 1/8 = o(1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Lower Bound for 4-Collisions</head><p>Here we show the proof for lower bound of finding 4-collisions. The proof for arbitrary constant has the same structure but different parameters which is shown in the next section. We prove the case of 4-collisions here to give the idea before generalizing.</p><p>Let f i,j be the amplitude of the database containing at least j 3-collisions after making i quantum queries, g i,j,k be the amplitude of the database containing exactly j 3-collisions and at least k 4-collisions after i quantum queries, g i,k be the amplitude of containing at least k 4-collisions after i quantum queries.</p><p>As we have seen in the last proof, we have</p><p>Define h 4 (i) = max{(2e) 3/2 &#8226; i 7/4 N 3/4 , 10N 1/16 }. Again, we can bound g i,k by dividing the summation into two parts:</p><p>. . .</p><p>The second term can be bounded as the following (and we can safely assume</p><p>&#8804; 2 -9.5N The proof follows from the last proof for k = 3. A generalized version (for any constant) can be found in the next section. And B i is bounded by B &#8730; N which is at most 2e &#8226; N 1/16 . Finally we have the following closed form:</p><p>So we can conclude the following theorem:</p><p>Theorem 7. For any quantum algorithm, given a random function f : X &#8594; Y where N = |Y |, it needs to make &#937;(j 8/15 N 7/15 ) quantum queries to find j 4collisions for any j &#8804; N 1/16 with constant probability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6">Lower Bound for Finding a Constant-Collision</head><p>In this section, we are going to show that the theorem can be generalized to any constant-collision. Let f i,j be the amplitude of the database containing at least j distinct s-collisions after i quantum queries, g i,j,k be the amplitude of the database containing exactly j distinct s-collisions and at least k distinct (s + 1)collisions after i quantum queries. Also let g i,k be the amplitude of the database with at least k distinct (s + 1)-collisions after i quantum queries. We assume f i,j is only defined for i &#8804; &#8730; N, 1 &#8804; j &#8804; N 1/2 s and g i,k is only defined for i &#8804; &#8730; N, 1 &#8804; k &#8804; N 1/2 s+1 . It holds for the base cases s = 4. Define h s (i) (for any s &#8805; 3) as the following:</p><p>2 s-3 i (2 s-1 -1)/2 s-2 N (2 s-2 -1)/2 s-2 , 10 &#8226; N 1/2 s It holds for s = 3, 4 where h 3 (i) = max{(2e) &#8226; i 3/2 /N 1/2 , 10N 1/8 } and h 4 (i) = max{(2e) 3/2 &#8226; i 7/4 /N 3/4 , 10N 1/16 }.</p><p>Define A i,s = i-1 l=0 hs(l) N . It is easy to see A i and B i in the last proof are A i,3 and A i,4 . And we have A i,s &#8804; (2e) 2 s-2 -1 2 s-2 i (2 s -1)/2 s-1 N (2 s-1 -1)/2 s-1 + O(N -1/(2 s (2 s -2)) ).</p><p>Lemma 7. A i,s &#8804; (2e)</p><p>2 s-2 -1 2 s-2 i (2 s -1)/2 s-1 N (2 s-1 -1)/2 s-1 + O(N -1/(2 s (2 s -2)) ) holds for all constant s &#8805; 3.</p><p>The lemma is consistent with the cases where s = 3, 4. </p><p>where the second summation is at most (2e)</p><p>2 s-2 -1 2 s-2 i (2 s -1)/2 s-1 N (2 s-1 -1)/2 s-1 and the first summation is at most</p><p>which completes the proof. Finally, we assume f i,j &#8804; A j i,s j! +O(2 -N 1/2 s ) which holds for both s = 3, 4. We are going to show it holds for (s+1), in other words, g i,k &#8804;</p><p>k! +O(2 -N 1/2 s+1 ). And by induction, it holds for all constant s.</p><p>As we have seen in the last proof, we have the following inequality:</p><p>where as i &#8804; N 1/2 , for sufficient large N , the last term f i-1,hs+1(i-1) can be bounded as:</p><p>2 s-2 -1 2 s-2 (i-1) (2 s -1)/2 s-1</p><p>(2 s-1 -1)/2 s-1 + O(N -1/(2 s (2 s -2)) ) max (2e)</p><p>2 s-1 -1 2 s-2 (i-1) (2 s -1)/2 s-1</p><p>By expanding the inequality, we get</p><p>k! + e Ai,s+1 &#8226; 2 -9.5N 1/2 s+1</p><p>Because i &#8804; &#8730; N , A i,s+1 &lt; 2eN 1/2 s+1 . Finally, we have</p><p>which completes the induction. So we have the following theorem:</p><p>Corollary 2. For any constant s &#8805; 2, let f i,j be the amplitude of the database containing at least j s-collisions after i quantum queries. For all 1 &#8804; j &#8804; N 1/2 s , we have</p><p>where</p><p>2 s-2 -1 2 s-2 i (2 s -1)/2 s-1 N (2 s-1 -1)/2 + O(N -1/(2 s (2 s -2)) ) Theorem 8. For any quantum algorithm, given a random function f : X &#8594; Y where N = |Y |, it needs to make &#937;(j 2 s-1 /(2 s -1) N s-1 -1)/(2 s -1) ) quantum queries to find j s-collisions for any j &#8804; N 1/2 s .</p><p>Moreover, for any quantum algorithm, given a random function f : X &#8594; Y where N = |Y |, it needs to make &#937;(N (2 s-1 -1)/(2 s -1) ) quantum queries to find one s-collision.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>Here, the Big Theta notation hides a constant that depends on k.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>At least, the authors found it surprising!.</p></note>
		</body>
		</text>
</TEI>
