 Hello everyone, thanks for joining this presentation. I am Akshima. I am a PhD student at the University of Chicago Advice by David Cash. I will be presenting a joint work with David Cash, Andrew Tucker and Hotec B, where we study time-space trade-offs in finding short collisions. In this talk, we present our work on time-space trade-offs for finding short collisions in mercury dam guard hash functions. In particular, we show gaps in the complexity of finding one, two and B block collisions. This is the talk outline. Let's start with some definitions. Cryptographic hash functions are functions that satisfy some properties like collision resistance. It takes an input string of arbitrary size and outputs a string of fixed size, so it maps a larger, possibly infinite size domain to a smaller domain of fixed size. Obviously, when they map strings from a larger domain to a smaller domain, there will exist two different inputs that will have the same output from the hash function. However, we want collision finding to be computationally hard. This is called the collision resistance property. For instance, for an output domain of 2 to the 512 size, we would want collision finding to require about 2 to the 256 time. When we want to study the security of any application using these functions, we often model hash functions as a random function. It's difficult to prove that finding collisions is hard otherwise. This is called the random oracle model. A small note that random functions are not practical, but nevertheless useful in proving security properties. Next, very quickly, an adversary's any malicious entity that attempts to prevent a crypto system from achieving its security property. Generally, t denotes the time taken by an adversary, but we would bound the number of queries the adversary makes to the oracle by t and not the time the adversary takes for computations. Its computation time can be unbounded. This adversary can be thought of as deterministic and adversaries winning chances are taken over the hash function. For random functions, there is an unconditional proof that they are collision resistant. From here on, we will always talk about hash functions with output domain square bracket n, which is a notation for set of integers 1 through n. For attackers making t queries, the best any adversary can do is find a collision among the t queries, which has a probability of at most t square over n for an output domain of size n. Next, we define a class of stronger adversaries. These adversaries work in two phases. In the first phase, the adversary gets unbounded computation time on edge, but gets to output a bounded amount of information from that computation, say s bits. In the second phase, which we refer to as the online phase, the adversary gets the s bit advice is input and gets to query the function edge a bounded number of t times. The motivation behind studying these pre computing adversaries is that they capture a class of strong adversaries that can afford to perform one time large computations. If these help in reducing the amortized time for solving many instances of the same problem in the online phase such adversaries exist in practice rainbow tables and log jam attacks are some popular examples. Also, they can model non uniform adversaries. Consider what a pre computing collision finding adversary could do. It can simply output a collision and hash function edge as advice. Then in the online phase, the adversary wins with probability almost one. To make the problem meaningful, we will study collision finding with pre computation on sorted hash functions. Solving hash functions taking additional input of fixed size called the salt. The attacker is given edge in the pre computation phase and learns about the salt only in the online phase. Doorders go and cats on that the probability taken over edge and chosen sort of finding collisions for such an attacker is at most s plus t square over n. For small enough s and t, this advantage would be less than one. Thus, salty makes finding collisions harder with pre computation. The children attack is where the adversary stores collisions for order of assaults and the pre computation phase. Then in the online phase, either the advice contains the collision for the challenge salt or the adversary succeeds through both the attack. This is indeed the optimal attack. Next, I want to quickly remind everyone what is the mercury dam guard structure, which is a popular construction to realize hash functions in practice. It breaks the input into blocks of fixed size and processes them one at a time with the compression function, each time combining a block of the input with the output from the previous round. Md5, sha1, sha2 are all MD based hash functions that are widely used, especially shardu, which is used in buildings and computers today. Please note that MD based hash functions are salted hash functions. So the actual construction of MD based hash functions can no longer be modeled by monolithic random molecules. Instead we model the compression function as a random oracle here. The attacks against hash functions more modeled as monolithic random oracle are equivalent to finding one block long collisions in MD based hash functions where the compression function is modeled as the random. The adversary can be thought again of as a two stage algorithm, the adversary pre computes and queries the compression function edge and outputs the string that should collide for the MD hash function for the challenge salt given in the online phase. Intuitively when hash functions contain unknown structure as the MD constructions, adversaries can possibly leverage this knowledge of structure to mount better attacks. Karate et al. proved this intuition to be true and proved a non-trivial bound of ST2 over N on collision finding with pre-computation in MD based hash function. This bound implies that the adversary achieves constant advantage when SMT are of the order of Q root N instead of the square root of N on collision finding against monolithic hash functions with pre-computation. Next I will list our results. We were the first to initiate the study of short collision finding with pre-computations in MD based hash functions. We essentially model our problem as before but the adversary wins only when it outputs messages with B or fewer blocks. We show that it is easier to find two block long collisions as compared to one block long collisions but harder than finding unmounted length collisions. We do so by combining concentration and compression techniques to bound advantage of any two block collision finding adversary which I will elaborate later in the presentation. Our bound for two block collisions have a matching attack. We wanted to find bounds for three block collisions, four block collisions and so on for arbitrarily B block collisions that hopefully match the STB over N bound on current best known attack. However due to restrictions of our compression technique it was hard. Proving the bound for all adversaries remains an open problem. We could show the optimality only for a class of adversaries following the template of optimal adversaries from prior works. We refer these adversaries to as zero walk adversaries. To state our results more concretely next, for B block collision finding the best known attack to us is an extension of Coretto's optimal attack for unmounted collisions. This attack achieves an advantage of at least STB over N which is our conjectured bound for B block collision finding as we could not find a better attack. Also we know from the prior lower bound that there can't be an attack achieving better than STB over N advantage that works for all B less than equal to T. And it would be quite surprising to see a better than STB attack that works for small B and then collapses to STB when B equals to T. So our conjectured bound is STB over N. However we could prove this up above only for a restricted class of zero walk adversaries. We could prove an upper bound that matches the best known attack for two block long collisions. Observe the qualitative jumps between advantage of one block, two block and unbounded block collisions. The optimal attack from the works of Coretto at all finds colliding messages which are of length of order of T. For a short over 256-bit hash output and input block size of 512 bits, when S equals to 2 to the 70, the attack finds collisions of size 2 to the 93 blocks making about 2 to the 93 queries which is several Yorta bytes long. Messages this long do not exist in the practical world. Think of the applications where the hash functions are used. Digital signatures are max, no one sends messages that long that need to be authenticated. Passwords that are hashed and stored are generally about 8 to 16 characters long. This is our motivation to study short collisions with pre-computation and MD structures. So in this work we study bounded length collisions. The best known attack which is an extension of attack given by Coretto at all achieves an advantage of STB over N. So for B equals to 2 to the 20, best known attack requires 2 to the 166 queries to achieve a constant probability of success. As opposed to 2 to the 93 queries for finding unbounded collisions. Next we show how the techniques from prior works do not extend to a model of bounded length collisions. And this is what makes our work interesting. Let's look at the method that Coretto used for upper bounding advantage for finding unbounded length collisions and see if we can extend it for bounding the advantage of bounded length collision finding adversary with pre-computation. They used a technique called pre-sampling that was given by Unruh. Let's see what it is. In the pre-sampling phase adversary sends a list of mapping between at most P input and output points to the oracle. Now when it would vary the oracle in the online phase the oracle would return output from the mapping for the points in the mapping and for other points it would return the output of function H. Correcting it all improved the result by Unruh that gives a relation between the advantage of adversary doing pre-computations and adversary doing pre-sampling. For our setting their result translates as any collision finding pre-computing adversary with aspect advice and making T queries can be replaced by some pre-sampling adversary with S times T pre-fixed points and making T queries such that the advantage of pre-computing adversary is at most and constant times the advantage of pre-sampling adversary. This bounding the pre-sampling adversary is sufficient. Coretto at all bounded the pre-sampling adversary's advantages follows. For a pre-sampling adversary it means either if it hits one of the pre-fixed points in T queries or succeeds in a border attack. That's its advantages of the order ST square over N which translates to an advantage of ST square over N for the adversary performing pre-computation. However the problem is that advantage in pre-sampling is insensitive to the length of collisions. The advantage for finding unbounded length collisions is ST square over N and we showed that the lower bound on the advantage for finding even two block long collisions with pre-sampling is also of the order of ST square over N via an attack. Our attack prefixes collisions for ST over two salts and in the online phase tries to reach one of these salts with one block message from the salt A in T trans. This attack shows that short collisions are as easy as long collisions for pre-sampling, but that is not the case for pre-computation as we'll be showing later. So next we tried to use the compression technique Dodes et al had used for bounding advantage of one block long collision finding adversary. We first described the compression technique which is based on the Shannon bound at its core. As for the Shannon bound, the expected size of output of the compressor should be at least as long as the entropy of H. Dodes et al gave a compressor that compresses H using the adversary that finds collision on some salt using S bit advice. For the winning salt there will be two equal responses sent to the adversary. Compressor stores the salt, two lofty bits pointing to the entries with the same output and deletes the second of these entries to save log embeds. The compressor also stores the S bit advice. For an adversary, winning on epsilon fraction of salts, compressor on average compresses epsilon N times the savings with each winning salt which is in contradiction with the Shannon bound. This gives the bound to be S plus T square over N. However, this technique does not directly work for collisions in salted ND based hash functions. Say some two block collision finding adversary wins on an epsilon fraction of salts on edge. Then we want to give a compressor that deletes epsilon N entries with repeated outputs. However, for two block collisions, there may not be epsilon N such unique entries. Queries with the same output can be part of collisions for more than one salt. So we can no longer be sure of compressing as many query outputs as the number of salts on which the adversary succeeds. In other words, the issue is finding collision for assault is not independent of finding collisions for other salts. Next we give our technique of bounding the advantage on two block collision finding. Thus to get a handle on the compression proof, we looked into churnoff results for dependent indicators and found a result of interest by impalias and cabinets. A slight variation of the result allows us to analyze the adversaries in our setting without advice, making it a useful technique that can be of independent interest. To explain their result, we start with churnoff. The traditional churnoff requires the random variables to be independent and identically distributed to be able to bound the first moment of their sum. When each random variable is distributed as delta bias Bernoulli probability their sum exceeds in constant times delta N is exponentially small in delta N. The result by impalias and cabinets is relaxes the iod requirement on the variables to some extent. It states that given any n binary random variables such that probability that any u of these indicator variables are simultaneously one is bounded by delta to the u. Then the first moment of their sum is tightly concentrated. The result for such variables says that the probability their sum exceeds some constant times delta N is exponentially small in u. In our application, an adversary could find collisions on every salt of some u size set with a larger probability. In other words, the required bound of delta to the u may not hold for every u size subset. So we relax this requirement. Our variation relaxes the core and seek the quasi independence requirement of the u at moments of the variables. Our result only assumes an upper bound of delta to the u of succeeding on a randomly chosen u size subset instead of every u size subset. Again, bounding the probability of some of variables exceeding six times delta N to two to the minus u as impalias and cabinets result. The proof of our variation is a simple extension of their proof. Next we describe impalias technique to handle dependence of collisions for salt on the pre computed advice. Step one is to analyze the adversary without advice on a fixed set u of souls and bounces bound its advantage to delta to the u. Step two is to apply their dependent sure enough result and obtain bound on probability of winning on at least six delta and solves to exponentially small in you. And since it's exponentially small in you, we can take a union bound over two to the S advices. This gives the bound of six delta plus two to the S times two to the minus you on adversaries with advice. When you was at least S plus log N, then we obtained the bound of order of delta. Our modification is in step one. Instead of analyzing the adversaries on a fixed set, we analyze the adversaries on a random set u of source. Then in step two, we use our variation of the dependent sure enough result and rest of the technique goes through as before. For analysis of adversaries step one, we had to use compression techniques in our setting. Impalias have used their technique to bound the advantage of one to one function in working adversaries where the dependence was easy to bound. However, in our setting, it is hard to understand the correlation and collision finding among salts, especially on adaptive adversary. Thus, used, thus we use compression techniques. We gave a compressor that uses an adversary winning on some fixed issue to compress issue by u times log of one over delta bits. This contradicts the Shannon bound and gives the probability of any adversary winning on a random issue to be at most delta to the U. For bounding two block collision finding adversaries advantage, we gave a compressor that compresses both h and u at a total of u spots, with some fixed h u using the two block collision finding adversary winning on all salts and u. Compressing u allowed us to handle the requirement of dependent sure enough bound. Each of these u spots are compressors chose at most log ST bits and saves log N bits. This gave us a bound of ST over N to the U on the probability of any adversary finding two block collisions on all salts and u for a random h u. This compressor is very complicated and we refer you to see our paper for more details. We failed to prove a bound of STB over N for arbitrary fixed length collisions because our compression technique is quite tedious. For two block collisions, we can have six types of colliding chains. Each of these cases need to be compressed differently by our compression technique. As a number of blocks increase, the types increase exponentially in the number of blocks extending our compression technique thus becomes difficult. We would like to point out that our approach still holds for arbitrary long length collisions. Next, we define a restricted class of P computing adversaries for arbitrary B block collisions and show our technique to bound their advantage. First, we define what we refer to as zero walk adversary. These are adversaries that store order of S collisions during the pre-computation phase. Then, in the online phase, given a challenge salt, the adversary tries T over B times to walk to one of these salts with stored collisions. The walk is made only with zero messages after the first query of the trial. The best known attack to us for finding B block collisions is an extension of the attack given by curate et al for unbounded collisions. This adversary picks order of S salts and walks B over two times with zero messages and finds collision on the salts reached and stores them. Then, in the online phase given a challenge salt, the adversary tries T over B times to walk to one of these salts with stored collisions. Again, the walk is made only with zero messages after the first query of the trial. The adversary achieves at least STB over an advantage and it belongs to the class of zero walk adversaries. So there could be some zero walk adversary that does better than the adversary we defined on the previous slide. There could be a zero walk adversary that during its pre-computation finds and stores collision for salts with large B depth trees. So the maximum advantage such an adversary will achieve is T over B times S times max size of B depth trees leading to some salt over N. We prove that the existence of such trees is unlikely by bounding the size of the largest B depth trees and random functional graphs to the order of B squared with high probability. This shows that for any zero walk adversary, the attack that we described on the previous slide is optimal. Consider a function, a consider a random function F from N to N. We prove a result that bounds the number of nodes in B depth trees in graphs of F to the order of B squared with probability at least one minus one over N. There is a naive approach that bounds the number of nodes at depth I in the tree via a turn off and then takes a union bound over all one through B depths. But this approach gives a loose bound of B cube and we obtained a tighter bound of B square in the paper. This theorem implies that with probability at least one minus one over N, the largest B depth tree will have at most order of B squared nodes. Finally to conclude in this work we present new techniques to prove our two main results. The first result is for any two block collision finding adversary, its advantage is tightly bounded up to polylog factors to the order of ST over N. For any block B block collision finding zero walk adversary, its advantage is tightly bounded up to polylog factors to the order of STB over N. We could not prove a bound for arbitrary length collisions for all adversaries, which remains an open problem. Thank you and for more details PCR paper available on eprint.