 Hi everyone. In this talk we are going to look at a very classical topic in cryptography, namely that of security of password hashing algorithms in the presence of pre-processing attacks. My name is Pria Farsim and this is joint work with Stefano Tissaro from University of Washington. So passwords are one of the most commonly used authentication mechanisms in cryptography and security and typically we would use a hash of a password in place of the password itself for authentication. This is to provide some form of resilience against for example leakage of a database of password passwords where you would get to see the hash of passwords but not the passwords themselves. In fact, sometimes we even go beyond this and we use hash of password as a secret key in some application. For example, we might drive a secret key from a password and then use it to encrypt data. The setting that we are considering here is one where an attacker is interested in attacking multiple users and it is not targeting any particular user but is basically trying to compromise as many users as it can. In doing this, it may use pre-processing techniques such as use of rainbow tables to speed up its attack and typically in such settings the hash function is actually assumed to be secure in the sense that it behaves like a random oracle. And what the adversary tries to do is to exploit the low entropy of human generated passwords. The main wisdom here is that instead of hash of passwords, we actually are going to store salt attaches of password. That is we have some public land randomness assault. And we are going to hash this salt together with the password and start this in our database. And the general idea is that this something will defeat pre-processing because if you, for example, have distinct salts, the distinct salts here will actually result in a domain separation of this hash function, which means that basically a separate effort will be needed for cracking each password. In fact, if these passwords, these salts are also unpredictable. Actually, we are going to defeat pre-processing because essentially the hash function that we are going to use in hashing the password is going to be unpredictable because we don't notice salt part of the hash function. So the goal of this talk is to formalize this conventional wisdom. So, there are two lines of related works here. One is a long series of works on pre-processing, starting with the work of the Genar and Travisan in 2000. And then a work of Ungal from 2007, which looked at random oracles with auxiliary input. Going to the work of the Travisan and Telsiani, and then a number of work by Dodis et al, which develops techniques in proving security in the presence of pre-processing. In this talk, we are particularly going to rely on the work by Coretti Dodis school and Steinberger from 2018, which introduced the technique of bit fixing random oracle to prove results in the pre-processing. Another important work here is that of Bellara Aristampard and Desaro from crypto 2012, where they highlighted the need for multi-instance security in password-based crypto. Let's look at these two settings in a little bit more detail. So here's the pre-processing setting where the adversary A0, which is the pre-processing adversary, gets the random oracle fully, the full table of the random oracle, and output some auxiliary information sigma. And then in the online phase of the attack, A1 is going to run with sigma and the hash function in the game to win some security here. So this is a general template to upgrade notions of security from setting without pre-processing to one with pre-processing in idealized models. What about multi-instance security in the multi-instance setting? The adversary, for example, gets a bunch of passwords, hash of password one to hash of password M, and its goal is to recover all of these passwords. So note that the setting is somewhat different from multi-user security, where you would again get a bunch of passwords, but your goal is to recover only one of them. So here we are interested in recovering all of them because we are interested in adversaries that are trying to recover passwords of as many users as they can. Okay. So the perspective here is that in such a multi-instance setting, ideally the effort that the adversary needs to put to compromise M passwords should scale linearly with M. In this setting, however, where we have not used salts, one can see that the adversarial advantage is actually bounded below by T over M, where M is the, let's say, the size of the domain where these passwords are taken from, and T is the number of queries that the adversary makes around the molecule. In fact, you can see this as an adversary which makes T queries to this hash function is going to recover the ice password with probability T over M for each single instance, and then essentially these games are independent. So the probability that you're going to recover all of them is T over M to the M. So if you think about a pass and an adversary which actually queries all of the domain where T equals N, and you can see that the adversary can actually recover all of the passwords. So put differently, if you have T over M over M is roughly equal to one, and we get that T is roughly equal to M. And as you see here, the time required by the adversary is not scaling at all with M, the number of users. Now, if you come, if you come to the salted setting where we are using salts to hash passwords, we do actually get bound to that scales with M, at least for certain adversaries. So here we have an, you can consider an adversary which has a budget of T queries, and then it distributes them over all these M different hashed keys with different salts, and then tries to guess the password using the budget of T over M queries per salt. And then we will get a bound like this. So now if you do a calculation like before, if you set this roughly equal to one, and we will get that T is roughly equal to M times M. And here you see that the time required by the adversary is scaling linearly enough. So one of the contributions of this paper is actually proved also upper bounds on the adversarial advantage. Okay, so as I mentioned, the goal of this paper is to understand password hashing in the presence of multiple instances and pre processing effort. And put together, we have these two settings, and we are actually combining them in this work. So, in this work we are going to consider three different notions. There are three different settings. One is the setting without salts and distinct salts. These are interesting from a historical point of view and also a more theoretical point of view related to amplification. But also, we are going to look at random salts, and this is the setting which is most relevant to practice, but is also of course, interesting from a theoretical point of view. So to gain some intuition towards this problem, let's look at the multi instance extension of the helman's space-time trade-off algorithm. So here we have a permutation pi, and we create a graph, a functional graph, where we start from some point, and then we repeatedly apply the permutation. And let's assume for now that this permutation, this permutation has one cycle in the sense that this permutation, this functional graph basically covers the old domain of the permutation. So this is the graph that we built. And what we do next is that we take s points on this graph, which are of distance n over s apart. And for each of them, we store a point, we store the former pointer to t over m steps behind. Okay, so these are pointers to t over m steps behind. This means that if I have a point, say this red point here that I want to recover, if this red point happens to be in the segment, all I need to do is to start applying the permutation here, get to this point, chase the pointer back, apply the permutation further, and stop right before I get to this point, and then I have recovered the preimage. So what's the probability that this point ends up being in the segment, the length of the segment is t over m, the length of the segment is n over s. So this is the probability that one of them ends up being there. So the probability that all of them ends up being there is st over mn to power of m. So keep that bound in mind when we come to our security bounds later on. So, to get any form of security in password based cryptography, we actually need to have passwords which are unguessable. And for this we actually need some form of measure of unguessibility for passwords. So the DRT formulated such a notion, they considered a setting, an unguessibility password where passwords are generated, vector passwords are generated together perhaps with some leakage information set. And the adversary is run with a test oracle and a corrupt oracle, where the corrupt oracle will simply return the I to password in on query I, and the test oracle gets a guess password from the adversary, and an index I, and checks whether this password matches the and the goal of the adversary here is to actually win all of these instances that is set all of these flags to be true. And we measure the advantage here by considering the maximum advantage of an adversary of any adversary industry. Now how does this relate to the known notions of unguessibility for distributions. So, if you consider a setting where you have M guesses, and is the number of passwords in the system that's the number of passwords as it might be. And it's not too hard to see that this measure is actually related to the mean entropy of the password or in this case average and entropy of the passwords. So, the advantage of corruptions is a little more complicated, and there doesn't seem to be a notion in information theory that captures this nicely. So for that purpose, we're actually going to look at the case where the adversary does C queries and does M minus C test queries so C corruptions M minus C guesses, as our basic measure, then there are corruptions. So the first theorem is a theorem that relates the unguessibility of passwords when the adversary has two queries to this basic measure that we have. And the actual proof for this is not too complicated. Suppose that I have an adversary here, and I want to convert it to an adversary here which can only make M minus C queries. So what I'm going to do this is that I'm actually going to guess which of these M minus C queries out of the T queries that the adversary makes to the test oracle are going to return true either going to be the correct guesses. I'll guess that that set of indices. So these, these queries are then relate to the oracle that the adversary gets in this game with M minus C test queries. The rest of the queries are answered with F, the bottom, but false, and the correction queries are also relate. So the C corruption queries are related to the adversary's oracle here, the adversary, the adversary's corrupt oracle here. So if I have guessed this set of M minus C indices out of T correctly, then this reduction is perfect. And whenever this adversary wins here, this adversary here with only M minus C queries to the test oracle wins here. This argument, this rather simple argument actually resolved an open question left in BRT where the author actually considered the setting where there's a bound TI, a priori bound TI for each I. And this is a somewhat important restriction in the sense that actually an adversary in practice could actually adapt the number of queries that it is making as it makes progress in different instances. Okay, so far we have looked at the undisputability of passwords and how it relates to our basic basic measures of undisputability. And next we are going to actually look at hashed and salted passwords. So for this, let me start with a notion of security for unrecoverability of passwords in the auxiliary input random oracle model. So this is a game which goes like this, a random oracle is chosen from some domain and range. Then the adversary is given the full code of the random oracle and output some pre processing information sigma. This is as in the pre processing template. Then the passwords are picked. And then we generate salts. So there could be multiple salts per password. So M is the instance count and L is the number of, let's say sessions per user. So you have L passwords per user. Then the password is, is, is hashed together with these salts, and the keys are computed. And the adversary is now provided with the keys the salts, the sigma, the auxiliary information related to the random oracle, and Z, which is some leakage information on passwords, and it's called is to guess the whole vector of passwords. While the adversary is doing it during this, of course it has access to the random oracle age, as well as a corrupt query where it can corrupt users and get their passwords. So this is a very national notion of unrecoverability of passwords in the auxiliary input model. So actually to analyze this analyze such games in the auxiliary input random oracle model. There are a number of techniques. One of them is the compression technique, which has been used successfully in various works before. So here we are actually going to look at the, a different technique known as the pre sampling or the bit fixing random oracle technique, where, instead of a pre processing information that depends on the random oracle, you're going to concern with the property which outputs preprocessing information which is independent of the random oracle. But instead, it gets to output a list of presignments L, which contain entries of the form point Xi maps to why I. So what we do then in the online phase of the game is that we are going to run the game with respect to a random oracle, which is random everywhere, except for points in L there. It should be compatible with the assignments in L. So that is the bit fixing random oracle model. And Coretti Jodesko and Steinberger in 2018 proved that if you have bounds in the bit fixing random oracle model, then you can actually drive bounds in the auxiliary input random oracle model. So what we do in the paper we apply this technique and we drive a unrecovered deep end in the bit fixing random oracle model for the case where the KDF function is the random oracle itself. Solts are uniform in K. There are possibly L instances of salts per password, and we have a password sampler which outputs M passports. I won't go into this bound here, but in a second I will come to the to what this bound actually means. So in the paper we also drive bounds for other cases where we have a general soft generation algorithm not uniform one. And I refer you to the paper to check those theorems. So what are our main bounds. So, we have six main bounds corresponding to the cases where we have no salts. We have three known distinct salts and uniform salts, and we have the case where there is no processing. So S was the, the big length of the preprocessing information sigma, or we have some large amount of preprocessing. We get these bounds one by one. So for the case where we have no salts and no preprocessing we get a bound of the form t to the end over M. Recall that this matches the bound we had earlier on the lower bound on the adversarial advantage that we had earlier. This is an upper bound. For the case with distinct sound salts, we get an upper bound which is of the form t over M with an M in the denominator, showing that the adversarial effort will scale linearly and as long as salts are distinct. And for the uniform case we also get a similar bound. For example, if NL over K is a small because the K, the size of the salt space is large. We actually get the same amount. So let's look at the case with preprocessing, which is the focus of the work. So in this case we get an upper bound. For the case with no salts we get an upper bound of the form s t over M and note that this matches the multi instance helmet algorithm that we have for the known distinct cell case we get a bound of this form with the end squared in the denominator. What this is basically saying is that in order to match this found here for multi instance helmet that we have the for example have to do M times as much preprocessing one for each salt. So if s is replaced with NS because we are increasing our preprocessing effort, then the M's will cancel out when we get this. The most, the most relevant bound for practice that we have is the bound over here. About the case with lead proposed preprocessing and uniform salts. So in this bound if you see, we have a term which is independent of s and the term which is dependent on this. And this term is multiplied by a value which is ML over K. And if the result of so to space is big, then this, this term is basically negligible. And we fall back to a setting where we have T over and meaning that the adversarial effort scales linearly with M and second preprocessing is defeated. There's no dependency on this. In the paper, let me just briefly mentioned that we also look at a composable notion of KDF security with auxiliary information. This is following the work of the Laura wrist important to sorrow. And the proven appropriate composition theorem for this notion in the sense that security in a multi instance environment with auxiliary input is upper bounded by the advantage in this KDF game with auxiliary input, plus the advantage in the guessing game in a multi instance guessing game, and plus advantage in a single instance of the game with a random with random keys. So, here we don't have a multi instance term, only a single instance term. And this is of no concern because the keys here are random and this, this term is still, I should mention that it's one is looking at, look at this from a theoretical perspective. One may want to drive even sharper bound than this. And in the paper, we look at the security of a iterated hash function where we take a hash function and the iterated R times, and we drive a BF KDF bound for this in the bit fixing random oracle model. And then we apply the CDGS machinery and optimize for P to get a bound in the auxiliary input. And that concludes my.