 Thank you all for sticking around for the last session of Crypto. So I'm excited to talk to you today about data-independent memory hard functions. This is joint work with Joel Alwin at ISD Austria. So as a basic motivating example, consider the problem of password storage. Oftentimes due to programmer errors or other security flaws, adversaries are able to break into authentication servers and steal the cryptographic hashes of user passwords. Such an adversary can execute an offline attack in which he compares the hashes of likely password guesses with the hash value of the user's true password. Such an adversary is only limited by the resources that he's willing to invest trying to crack the user's password. In fact, offline attacks are an increasingly common problem. In the past years, we've seen breaches at many large organizations which have affected millions of users. And in fact, over the years, I've had to update this slide several times so we can see that this problem is not going away. Now, offline attacks on passwords are increasingly dangerous for several reasons. First, password cracking hardware continues to improve. And second, users continue to select low entropy passwords. This motivates the goal of developing moderately expensive hash functions. The basic idea is to create a function which is moderately hard to evaluate once. This way, an adversary is trying to try millions or billions of password guesses, just runs out of resources. Now, the key question here is after we've done all this key stretching on the legitimate authentication server, is it really going to cost the adversary extra to evaluate this memory hard function once? In particular, the adversary doesn't have to use standard computing hardware. He could build a GPU or an ASIC, for example, to compute this memory hard function multiple times. The key question really is, are the costs equitable? If we do a lot of work to make this function harder to compute on your desktop, is it also going to be harder for the adversary to compute this function multiple times on customized hardware? And in fact, obtaining a function with equitable cost is non-trivial. In particular, by looking at Bitcoin mining pools, we can see that the SHA-256 hash function is far from equitable. In fact, the cost of evaluating SHA-256 varies by a factor of about a million when you compare the cost on a CPU versus the cost of evaluating the function once on an ASIC. All right, so this hopefully motivates the need for data-independent memory hard functions, an object that I'll introduce in the next section. After introducing data-independent memory hard functions, I'm going to go over our attacks. And as a bonus, if time permits, I'll describe some exciting subsequent results that we've obtained in the last few months. All right, so the basic goal of a memory hard function is to develop a moderately hard function with equitable cost. And the key observation here is that memory costs tend to be equitable across different computer architectures. So the intuition behind a memory hard function is basically a function for which computation costs are dominated by memory costs. S-Crypt is one example of a candidate memory hard function, but the data access pattern for S-Crypt is input dependent. That means that the access pattern to memory, depends on the sensitive user input, which makes the function potentially vulnerable to side-channel attacks. In this paper, we're going to consider a special subclass of memory hard functions called data-independent memory hard functions. As the name suggests, the memory access pattern does not depend at all on the input, which makes these functions resistant to side-channel attacks. So what is a data-independent memory hard function? Well, for our purposes, we can think of a data-independent memory hard function as being defined by a compression function H, which we'll model as a random oracle, and a directed acyclic graph G, which encodes data dependencies during computation. So as an example here, if we have this graph, the input to the function is the password and the salt. And we compute the label of the first node by just hashing this password value and the salt. Now, the label of an internal node, for example, the label of node three, is computed by hashing the labels of its parents, in this case, node two and node one. Finally, the output of the data-independent memory hard function is just the label of the final node, in this case, node four. All right, so, and we can describe algorithms that the adversary might use to compute an IMHF using the language of graph pebbling. In particular, placing a pebble on the graph corresponds to computing a particular data label, and keeping a pebble on the graph corresponds to storing that data label in memory. Now, of course, a pebbling is only valid if it satisfies certain rules. For example, we can only place a pebble on the graph in round i plus one if we had pebbles on the parents of that node in round i. This is because we need any dependent values before we can compute the new label. And the final rule is that by the end of this pebbling, we have to place a pebble on the final node. After all, that's the output, that's what we're trying to compute. Now, the adversaries allowed to place potentially multiple pebbles on the graph during a given round. We assume that the adversaries parallel. And the adversaries also allowed to discard pebbles at any point in time, potentially to save space. All right, so here's a simple example of pebbling. So to start off, we can place a pebble on node one. Now that we have this pebble, we can place a pebble on node two. Now we can pebble node three because we had pebbles on the parents. At the same time, we can remove pebbles from nodes one and two because they're no longer needed. Now we can pebble node four and finally node five. All right, so how do we measure the cost of a pebbling algorithm? Well, the first approach, which is kind of the classical approach, is space-time complexity. So here, the space-time complexity of a graph is the minimum over all legal pebblings of the time that the pebbling takes, the number of rounds, times the maximum space usage, the maximum number of pebbles on the graph at any point in time. Now, this is a nice notion and there's a rich theory with lots of space-time tradeoff theorems that have been proved in this model. But I claim that this is not the appropriate metric for password hashing. And why do I say that? Well, the problem is amortization. So for parallel computation, ST complexity can scale badly in the number of evaluations of the function. Remember that the adversary is trying to compute this function from multiple different password guesses. He's trying to evaluate many instances of this function. So consider a function which requires lots of space at the beginning and then runs for a long time with minimal space usage. Well, if we can execute this function in parallel, then effectively, the cost of computing, say, three instances of this function really has space-time complexity just about equal to the space-time complexity of computing one instance of this function. And Alwin and Serbanenko showed that you can construct example functions where this scaling is even much worse. You can compute square root n functions with the same ST cost. All right, so here's an improved cost metric, which is called cumulative complexity. Instead of looking at time times the maximum space usage, you just sum space usage over every pebbling time step. And the nice thing about this metric is it actually does amortize. In fact, the cost of pebbling two independent instances of a graph is exactly twice the cost of pebbling one instance. So in our previous example, the cost of this pebbling is seven because we just sum the number of pebbles on the graph at each point in time. All right, so the metric we use in this work is actually a little bit more refined. It's called energy complexity. It's very similar to cumulative complexity, except we add an additional term to model the cost of querying h. Every time we query this function, it costs a little bit extra energy, and so we want to charge ourselves for that cost. So every time we call it query h, we're going to charge a cost r. And in an asymptotic sense, this is actually equivalent to cumulative complexity, but in a physical sense, it seems to more closely model what's going on in hardware. Okay, so a data independent memory hard function is given not only by a graph, but also by a naive pebbling algorithm. This is the algorithm that the honest party is supposed to use. So the naive pebbling algorithm should be sequential. You should only place one new pebble on the graph during each round. And for example, many IMHFs are just defined by the naive pebbling algorithm. In the naive pebbling algorithm, you kind of just pebble the graph in topological order, never removing pebbles from the graph. So the time here is n. It takes n rounds to pebble the last node. And on average, you have about n over two pebbles on the graph during each step. So the energy complexity scales as n squared. All right, and we'll define attack quality given an algorithm A that the adversary might use to evaluate this IMHF. Attack quality is the energy cost of the naive algorithm divided by the energy cost of the adversary's algorithm multiplied by the number of instances that the adversary is computing. So if the adversary is computing 100 instances of this IMHF, then we'll scale this by 100. Okay, so what are the desirable goals for a data-independent memory hard function? Well, we want to graph G, which ideally should have constant n degrees so we can actually apply the compression function. And we want to guarantee that for any adversary A that the attack quality is small, less than or equal to C for some hopefully small constant C. And finally, we also want to guarantee that the cost of the naive algorithm is fairly large, roughly n squared over tau or hopefully some small value tau. Now, this last criteria might seem a little bit confusing. Why do we want the naive algorithm to be expensive? Well, for one, it rules out this type of graph where you can pebble it with very low space. In particular, that graph is bad because you have a low cost per step. Users are impatient, so the maximum cost you can incur in a particular time frame is fairly low. And secondly, memory costs are an insignificant portion of the total cost of computing this function, which means that when you implement this attack on an ASIC, for example, you might expect to see a dramatic cost reduction. So this criteria is really saying that memory cost, this n squared over tau term, should dominate. So we'll say that a IMHF is C ideal if attack quality is smaller than C and the cost of the naive algorithm is sufficiently large for some small constant tau. All right, so let me quickly describe our attacks. And the key takeaway here is that depth robustness is a necessary condition for an IMHF to satisfy. In particular, we'll give an attack on any graph that doesn't satisfy this combinatorial property called depth robustness, which I'm going to define next. So what does it mean to be depth robust? Well, first of all, we say that a graph G is ED reducible. If there exists a small subset S of at most E nodes, such that by removing the nodes in S from the graph G, the length of any path in the remaining graph is at most D. So for example, this graph here is 1, 2 reducible. In particular, if we delete the node 2, node 3, then any path has a length at most 2. Of course, if a graph is not ED reducible, then we'll call it ED depth robust. All right, so here's a general attack that works on any ED reducible graph. As input, we'll start off with a set S of at most E nodes, such that the depth of G minus S is small, and we'll also start with an attack parameter G, which has to be larger than the depth of the resulting graph. So the attack is divided into two phases. Light phases, which are cheap and balloon phases which are expensive. During the light phase, each light phase lasts G rounds, and the goal is just to make progress. Pebble the next G nodes and G sequential steps. Now, this phase uses low memory because we're going to discard all pebbles from the graph, except for pebbles on the set S, and pebbles on the parents of nodes that we need to pebble in this light phase. The key point is that this phase lasts a long time. Now, of course, at the end of the balloon phase, we need to recover the pebbles of the parents in the next light phase before we can continue to make progress. So we'll execute a balloon phase. And in a balloon phase, we're just going to greedily recover all the missing pebbles. And this is expensive. It uses parallelism. But the key point is that because the graph has depth at most B, we can complete this phase quickly. And in fact, the attack is relatively simple. I don't expect you to read all these lines of code, but my point here is that you can describe the algorithm completely using just 13 lines of pseudo-code. All right, so our main theorem is that depth robustness is necessary. And particularly, if G is ED-reducible, then there's an efficient attack A with the following energy complexity. So the first term here, upper bounds pebbles stored on nodes in the set S. Here, E is an upper bound on the sides of S and N is the number of pebbling rounds. The next term, upper bounds the number of pebbles kept on parents of the next G nodes to be pebbled in each light phase. Here, delta is the maximum N degree. And this final term here, upper bounds the cost of balloon phases. Here, N over G is the total number of balloon phases. N is the maximum number of pebbles on the graph during a balloon phase. And D is the maximum length of a balloon phase. So if we set our parameters appropriately, we get the following energy complexity. And in particular, observe that if ED are sublinear, then the energy complexity is sub-quadratic, which is bad. All right, so this motivates the natural question. What are existing IMHF candidates based on depth robust graphs? And we'll consider three candidates. Catena, which received special recognition during the password hashing competition. There's two variants. Bluen hashing, which is a newer proposal with three variants. Two of the variants in terms of depth robustness are similar to Catena. And the last variant in terms of depth robustness is similar to Argon2. And last but not least, we analyze Argon2, which is the winner of the password hashing competition. And in particular, Argon2i, the data independent mode, is the one that's recommended for password hashing. I should note here that in the paper, we analyzed Argon2ia, the version from the password hashing competition. Argon has been updated several times. And I'll just mention that our attack ideas do extend to Argon2iB, which is the newest version of Argon. Okay, so brief outline of the attack. First, we show that any layered graph is reducible. I'll define layered graphs in the next slide. And this shows that we can attack Catena, because Catena DAGs are layered DAGs. And then we'll show that an Argon2iDAG is almost a layered DAG in the sense that we can remove just a few nodes and the resulting graph is a layered DAG. Hence, Argon2iDAGs are also reducible. All right, so let's work our way up and start with a layered graph. A layered graph is just n nodes arranged into lambda layers of equal size. The last node in each layer is connected to the first node in the next layer. And any additional edge in this graph must go from a lower layer to a higher layer. So in particular, the only thing you need to know is that edges of this form are disallowed. Now, I claim that these graphs are reducible. Why is that? Well, we can break up each layer into segments of size n to the one-third. And now we'll just add the last node in each segment to our set S. Well, after we delete these red nodes, I claim that any path P can spend at most n to the one-third steps on layer I. And there's at most lambda layers, which is a small constant. So the total depth of the graph is lambda times n to the one-third. In particular, this gives us attacks with energy complexity n to the five-thirds, or if you prefer attack quality scaling, it's n to the one-third. All right, so what does the argon2i diagram look like? Well, you start off with a chain of n nodes as before. And for every node I, I have a random predecessor, uniformly at random, from all previous nodes. I should note that this randomness doesn't depend on the user input. It's fixed once and for all. So now if we kind of squint a little bit and just arrange the graph in layers, it kind of looks like a layered graph. Well, except that when we arrange it in layers, there are some nodes here with edges that stay in the same layer. So the first thing we're going to do is we're going to add a node to our set S if its parent stays in the same layer, if its random parent stays in the same layer. And I claim that by doing so, we increase the size of the set S by a factor of, well, n to the three-fourths times log n. Why is this? Well, consider layer i and consider a particular node on layer i. The probability that we choose a parent in the same layer is at most one over i. So the expected number of nodes that we add to our set on layer i is n to the three-fourths, the size of the layer, divided by i. Summing over all layers, we get a harmonic sum, and we get n to the three-fourths times log n. Now as before, we'll split up each layer into segments of size n to the one-fourth. And any path can spend at most n to the one-fourth steps on each layer. There's n to the one-fourth layers. So the total depth of the graph is at most root n. Okay. So this gives us an attack with energy complexity scaling as n to the seven-fourths or quality n to the one-fourths, which is quite high in an asymptotic sense. Remember, we want energy complexity n squared. Okay, so this shows that existing IMHFs can be attacked. This motivates the question, does memory hard functions even exist? And an answer that might make the adversary happy is no they don't. In particular, any graph with constant n-degree is sufficiently depth-reducible, but it emits at most at least some attack. In particular, energy complexity can be at most n squared over log n. So in practice against argon, we start to get positive attacks around two to the eighteen and attack quality sky rockets after this. I should note that this plot only shows theoretical upper bounds, so it's very possible that attack quality is even better in practice. But these plots show that actually in practice it may still be possible to obtain a secure IMHF because if you look at kind of the ideal MHF this n squared over log n one, attack quality doesn't exceed one until at least two to the fifty. So that still might be a feasible goal in practice. Now we have some new results basically simulating our attack and these results show that actually our attack is much more efficient in practice. In particular, we still get high quality attacks against a new version of argon 2IB with pessimistic parameters and for non-pessimistic parameters which could easily be chosen by the parameter selection process, attack quality is kind of through the roof. Alright, let me take 30 seconds and just highlight some breaking results. So this talk we show that depth robustness is necessary. It's also sufficient. If the graph is ED depth robust then energy complexity is at least E times B. And this actually does lead to kind of optimal IMHF constructions. You can actually achieve this n squared over log n bound. Some more results. We can improve the upper bounds on argon 2I by just applying our attack recursively during a balloon phase and in fact you can also prove some lower bounds. So argon 2I is at least energy complexity at least end of the 1.66. And finally Scrypt which is the original data dependent memory hard function actually does have optimal energy complexity. So there's a clear gap here between data independent and data dependent hash functions. So in conclusion, depth robustness is a necessary condition for provably secure IMHFs because the major open challenge in the area right now is to improve the constructions of depth robust graphs. We have constructions in theory going back to Airdoche et al in the 70s but as you can imagine they didn't particularly care about the constants in their constructions and in practice these constants do matter. So thank you for listening and I'll take questions.