 This talk is about balloon hashing, a memory hard function providing provable protection against sequential attacks by Dan Bonet, Henry Corrigan-Gipps-Stuart Schechter, so Henry is going to give it to him. Can you hear me? Yes. Awesome. So I'm going to be talking about balloon hashing, which is joint work with Dan and Stuart here. So, just in summary, balloon hashing is a new password hashing function that has three desirable properties. It's proven memory hard in a certain model that I'll describe. It uses a password independent data access pattern, and I'll describe why these first two things are interesting in a second. And it's practically good, so it matches the performance of the best, sort of furiously secure memory hard functions. So the idea is we're getting sort of provable security without having to pay much in terms of performance cost. So we actually use password hashing all over the place in our computer systems. You might use it without knowing it, right? You use it when you log in to Gmail, or you unlock your phone, or you encrypt your hard disk, or encrypt a file, or log into your computer system. There's always some password hashing going on in the background here. So the attack of interest that we're considering when we're designing password hashing functions is one where an attacker gets into your machine, maybe it's your server, and extracts the password file that looks like this. So it has a list of all the users on the system, it has a list of salts, which are just public random strings, and then a list of password hashes, which are some function of the password and the salt. And the attacker's job is basically to run through a list of dictionary words, sort of popular passwords, maybe a billion of them. And for each user, try hashing the dictionary word with the salt and seeing if it matches what's in the password file. So the attacker is basically trying to compute this password hashing function a bunch of times at the least cost possible. So if this is what the attacker is trying to do, what we're trying to do as designers of password hashing functions is make the attacker work as hard as we possibly can. So a good password hashing function basically makes the attacker's job as costly as possible, subject to the constraint that you still need to be able to log into your system in a reasonable amount of time. So one way to sort of begin to formalize this property is to say that if your authentication server, the server that's checking the password, can compute some number of hashes per dollar of computation, then an attacker, even one who's using sort of special purpose password cracking hardware, should only be able to compute something like an epsilon fraction more hashes per dollar of energy. So the property you want is that the attacker is running at the same level of efficiency as your sort of commodity x86 server is running for this password hashing function. So if you haven't seen it before, it turns out that conventional cryptographic hash functions like SHA-2 are very, very bad by this metric. So to try to convince you of that, let me show you one chart. So this is on the y-axis, the number of SHA-2 hashes you can compute in billions per dollar of energy. And this is my server at Stanford. It computes like 100 billion per dollar. And it turns out with custom hardware that's designed for Bitcoin mining, you can actually compute about six orders of magnitude more SHA-2 hashes per dollar of energy than you can with an x86 server. So if you're using SHA-256 for password hashing, you're in deep trouble because with a device that you can buy for like 500 bucks on Amazon, you can compute a huge number of password hashes per dollar of energy. So this is why SHA-2 is bad. And to see why there is this big gap between what an x86 computer can do and special purpose hardware can do, you just have to sort of think about what's going on in your machine when you're computing a cryptographic hash function. So this is an x86 die shot. It's this sort of thing you have in your laptop or a server. It has some cores. It has a memory controller. It has some IO stuff. And then it has a big L3 cache in the middle of the chip. And when you're computing a cryptographic hash function, a conventional cryptographic hash function, it turns out you're using almost none of this hardware, right? You're using basically only one tiny piece of one of the cores on your machine. So an attacker who's building custom hardware to compute this function can basically throw away all of this other stuff and just sort of put this little tiny piece of logic on a chip that he's building. So the special purpose hardware for password cracking sort of looks like this. It's a tiny piece of logic sort of piled across the chip. And since the cost to power this stuff is roughly proportional to its area on the chip, this is where the attacker gets this million X sort of efficiency savings. So this is why special purpose hardware is way better at computing SHA-2 than your conventional server is. And if you look at what people do for Bitcoin mining, this is exactly what they do. So this is a special purpose hardware for Bitcoin mining. It's just SHA-2 circuitry sort of piled across the chip as tightly as it can be fit. So this is where memory hardness comes in. So if you haven't seen it before, a memory hard function is a function that uses a large amount of working space during its computation. And the idea is that if the internal state size of your hashing function is big, then an attacker who's building special purpose hardware to compute it is going to have to put a lot of cash on whatever hardware they're building. And this basically decreases the advantage of special purpose hardware. So the idea is the attacker is running at the same cost as your X86 computer is running. And the typical technique for building these functions is actually quite simple. You basically take your input, which is like a password, and use it to fill up a big buffer with pseudo random bytes. And then you mix up the state of the buffer in some special way. And then you extract the output of the function by sort of maybe hashing together everything in the buffer. So this is sort of a generic description of how these things work. So the idea is that without memory hardness, your hash function doesn't take very much logic to compute. If you use a memory hard password hashing function, it actually takes a lot of space to compute on your chip. You might need an entire memory hierarchy just to store the state size, the internal state of the function. Okay, so that was background on password hashing. I'm going to spend the next couple of minutes talking about sort of the goals of this work before I discuss the algorithm. And then at the end I'll talk about sort of where this area is going. So the first goal I mentioned memory hardness in an informal way, but I didn't really make it concrete. So this is what I mean when I say a memory hard function. So it's a function that has some hardness parameter n, which you can think of as like the amount of space this function is going to take up. And we'd like this function to require space s and time t such that the spacetime product grows like n squared. So the idea is the function takes n blocks of space to compute and you need that much space for about, you need n space for n time basically. And everything here is in the random oracle model. So the intuition is that an adversary that's going to try to save space like compute your function in a very small amount of space is going to have to take a lot of time to do it. So anyone who tries to cheat on the space is going to pay in time cost. The second goal of this work that we're doing is that the memory access pattern of the function shouldn't leak any information about the password being hashed. So remember the way these functions work is you fill up a big buffer and then you index into it in a sort of pseudo-random way. And the idea of this property is you don't want your indexing pattern to depend on the password because then you can be leaking bits of information about your password through side channels. So we're just going to stipulate that. And then the third goal is, of course, this thing should be practical. So it should be basically as fast as the fastest hash functions out there for password hashing or as performant, I should say, as the best hash functions out there. So let me just mention some existing schemes for context. If you've heard of some password hashing functions, try to explain why what we're doing is slightly different. So Bcrypt and PBKDF2 are based on iterated hashing and they're by the most widely deployed password hashing functions in industry today, but they're not memory hard. So they make no attempt to sort of force the attacker to use a lot of space. Scrypt was the first function, password hashing function, to popularize this notion of memory hardness, but it uses a password-dependent memory access pattern. So you can leak bits of information about your password through side channel information with Scrypt. There's some new sort of theoretical work on password hashing functions that are secure against sort of more powerful adversaries that are using sort of parallel computers that I'm not, I'll talk a little bit about this at the end of the talk, but they're basically asymptotically good, but practically not so practical. And then there's these two functions, one of which appeared at AsiaCrypt last year, so called Argon2i and Katena, which are the very practical hash functions, but both of them lacked formal security analysis in one way or another. So Argon2i had no security analysis, at least like formal proofs of security, and Katena had a flawed security proof, or at least for part of the proof was flawed. And it turned out that this sort of lack of formal security analysis for Argon led to an attack that we found that's described in our paper, and I'll mention it briefly at the end of the talk. Okay, so now having sort of set up the problem, let me talk about the algorithm a little bit. So I'm going to show the pseudo code for the entire algorithm in a second, but just to remind you what we're trying to do, we're trying to get these three properties. So proven memory hardness, the forcing the adversary to use a lot of space, at least in the random oracle model, use a password independent data access pattern, and have performance that's basically as good as you could ask for it. Okay, so it turns out the first property is really the hard one to get. The second two, you can just get by construction. So all of the work went into devising the proof techniques and basically making sure that we could simultaneously have the performance in the proven security properties. So let me show the pseudo code, and if you don't get the pseudo code, there's gonna, I'll show a picture, so the schematic, but I'm gonna put the whole pseudo code up just to show you that this is not a very complicated function to implement. So this is the entire balloon hashing algorithm. As I mentioned, it takes in password and assault, and then there's this sort of space cost parameter, which is how much space the, like the state of the algorithm is gonna take up, and then there's some number of rounds, which is essentially how many iterations of hashing you're gonna do, and those are the inputs to the function. And the way it works is you have this sort of internal state of the function, which is sort of n blocks of memory. And as I mentioned, the way it works is you sort of first fill up the buffer with pseudo random bytes derived from the password and the assault just by hashing over and over. And then you mix up the state of the buffer. So for each round of mixing, for each block in the buffer, you sort of pick three blocks pseudo randomly based on the salt, and then you hash together the current block, the previous block, and these three other blocks. So you sort of hash together a bunch of stuff, and that's what you write into the block that you're pointing at. And you just do this mixing sort of a bunch of times until you get tired, and then you output the last block in the buffer as the output of the function. So as you can see, it's not terribly complicated to implement this. It's like a couple of hours of C programming. So in case, oh yeah, and I should mention that this hash function here is just a conventional cryptographic hash function. You can think of it as, you know, shot 256 or shot three, whatever, or whichever hash function you like the best. Okay, so to show you a picture of the same thing, the way balloon hashing works is you take your salt and password, and then you hash them together and use them to fill up a buffer of memory. So you sort of fill up n blocks of memory here, and then you mix up the buffer by sort of taking the prior block, the current block, and three pseudo random blocks and hashing them together, and then moving down the buffer and doing the same thing. So you just update the state of the buffer by hashing together every block with some other things chosen in a special way. So you can think of this, again, is like a mode of operation for a cryptographic hash function. So you're taking a non-memory hard hash function like shot three and leveraging it into a memory hard hash function using this construction. And so you can sort of see through this picture how if you start trying to cheat and use less space, you're gonna run into problems, because if you start throwing away these blocks in the buffer, you might need them later on, and then you'll have to go back and recompute them, and you'll sort of spend a lot of time retomputing stuff and not making progress. So that's sort of intuitively why you would get a memory hardness property out of this. So I promised that this algorithm I just described would satisfy these three properties. So let's see how we did. So the second and third, you can get just by construction, just by inspecting the pseudocode of the algorithm, and then by implementing it and see how fast it runs. So again, the challenge is to actually prove something about what an adversary can do in terms of trying to cheat on the space usage. So what we prove are a bunch of theorems of this flavor. So this is an informal theorem, but the basic idea is that if you wanna compute this function, so the n block balloon function iterated for our rounds of hashing with high probability when delta equals seven, so this is just one of the constant parameters in the scheme. So if the adversary tries to, instead of using n blocks of space, tries to use n over eight blocks of space, then the adversary's time is gonna be such that this space-time product looks something like this. So let me sort of pull this out so you can see what's going on here. Basically the idea is that if you save a factor of eight in space, so a constant factor in space savings causes a slowdown that's exponential in the number of rounds of hashing. So if you try to cheat by building some special purpose hardware that only has an eighth of the space that you're supposed to use, then you get this massive blow up in the computation time. So you end up, it's not a very good trade-off because you're only saving a factor of eight in space, but you may be paying something very large in time. So for example, if you run 20 rounds of hashing using a factor of eight less space causes a time to blow up by 60,000 x. So it goes from being very fast to compute to being very slow to compute. And the idea of the proof, I'm not gonna have time to go into it here, but please check out the full version of the paper on e-print if you're interested. The idea is we basically write out a directed acyclic graph where each vertex in the graph represents a value that you compute at some point during the hash computation, and then edges in the graph represent dependencies between the different values that you need to compute. And once you draw out that graph, you just basically have to argue about structural properties of the graph, and from those structural properties you can deduce these sort of time space lower bounds here. And the basic idea is called a pebbling argument, and it goes back to sort of the beginning of theoretical computer science, and it's seen a bunch of really cool applications in crypto in the last couple of years. So even if you don't care about password hashing, you should check out pebbling arguments because they're sort of a very nice tool on set of ideas. Okay, so I promise that there would be, this thing would be competitive with the best juristically secure algorithm, so this is just a chart to show this. So on the y-axis, this is how many hashes per second you can compute on one CPU core, and on the x-axis it's showing how much memory you need to compute this function with no slowdown. And for comparison I've put two non-memory hard sort of standard password hashing functions using the standard parameters that people use in practice, and on this graph up into the right is better because it means you're using more memory with better performance. So argon2 again is sort of their best in class password hashing function that's memory hard. So this is argon2, the version that was around when we submitted the paper, instantiated with SHA-2512 as it's underlying, SHA-2512 as it's underlying cryptographic hash function. So that's argon2 and balloon is slightly better in terms of performance. There's various reasons for this, but basically it's just a little bit better than argon2 on this metric when both are instantiated with the same underlying cryptographic hash function. So the way to read this chart is to say it like 10 hashes per second per core, how much memory can you fill? So you sort of draw a line over from here and then you look and you see that if you're using balloon you can force the adversary to use slightly more memory given the same hashing rate. So argon spec actually defines a non-standard cryptographic hash function that they recommend and if you use that one, the performance of both password hashing functions is slightly better, but still balloon has a slight edge over argon2 even though balloon has these very rigorous security properties that you can prove about it. So just to wrap up I'd like to mention sort of a couple pieces of recent work that are interesting and relevant. So one are parallel attacks on memory hard functions. So in a paper at Crypto this year, Joel Alwin and Jeremiah Blocky show that in a parallel setting when you have sort of a massively multi-core computer and you're trying to compute many instances of password hashing function in parallel, it's possible to execute a space saving attack against any sort of memory hard function that uses a password independent data access pattern. So these include argon2i and balloon and katana and basically any function that falls into this very large class of functions they have an attack against it on a certain type of machine. So the thing that's sort of important for understanding the practical implications of this is that the attack really is an asymptotic attack so it only applies when the amount of memory that you're using is quite large. So say a gigabyte or two or three or four gigabytes of memory. Whereas in practice for password hashing you're probably only gonna be using on the order of like tens of megabytes of memory. So it's not clear whether this attack is gonna be sort of relevant for practice but it's still very interesting from a theoretical perspective. And the other thing to remember is that it basically requires a very sort of special purpose hardware that doesn't yet exist. So it's not clear whether these attacks are a practical concern but it's very interesting that in certain settings you can basically attack any function of this type. The other thing that people often ask about is like for a comparison with argon2i because this is sort of the winner of a recent password hashing competition and has a really nice simple design and it's probably gonna see the widespread adoption in the sort of applied industry community. And as I mentioned, it came out without any proof of memory hardness in any model. So it was basically heuristically secure. And argon2i is basically the variant that is comparable with balloon that uses a data independent memory access pattern. So we discuss argon2 in the paper and it's some depth. So as I mentioned, we demonstrate a practical attack against argon2 that invalidates the original security properties that the designers claimed. And they've since sort of changed the construction a little bit to try to defend against the attack. But we also prove using the same techniques we use to analyze balloon that much better attacks are not possible against argon2. So we sort of attack it a little bit and then show that the attacks are not gonna get much better than that. I should say though that as far as sort of the memory hardness properties go, balloon has at least as far as we can prove slightly stronger provable security properties than argon2i does. But if you, you know, this is a sort of a question of religion, whether provable security matters to you. If it does, you would want to use something like balloon. If you don't care, then there's lots of other options out there. Okay, so just to wrap up, I've tried to argue that memory hard password hashing functions are a good way to increase the cost of offline dictionary attacks against sort of stolen password files. And balloon is this password hashing function that has these three nice properties. It has proven memory hardness properties in the random oracle model against certain types of attacks. It uses a password independent data access pattern so you're not leaking secrets through cash attacks. And it's fast enough for real world use. So it's competitive with the best password hashing functions out there. And for people who are using password hashing functions in industry, if you know anyone out there, most people are using PBKDF2, which is just iterated hashing. And if you take away anything from this talk, it's that there's better hash functions out there. It doesn't even matter which one you pick, whether it's balloon or something else. But basically all of these modern password hashing functions are so much better than what people are using in practice. So try to convince your friends to switch to memory hard password hashing function. All right, that's it. Thanks. So any question? Thanks. Couple of slides before you mentioned something about these parallel architectures with many cores and shared memory. Said it doesn't quite exist, but isn't somehow what a GPU does right now? Like you can have thousands of cores, rather big memory that is shared across all of them. Yes. No. So this is, it's not sort of SIMD computation, which is what GPUs are better at. It conceivably you could, you could coerce the GPU into doing something like this. But for the attack to work, basically the key thing is you're sort of shuffling very large buffers around the chip. And as far as I know, GPUs are not very good at this. But it's totally conceivable that you could make such a thing work. Okay. Thanks. Any other question? So your general construction had the parameter delta, your theorem only dealt with delta equals seven. And earlier you also mentioned in the pseudo code that delta equals three. So what happens if you are using delta which is smaller than seven? Does it become provably insecure or you don't know how to prove it or what? So the smallest delta you can pick for which the proof still works is three. Any delta greater than that, the proof still works and it just changes the constant basically in front of, you get S times T is greater than or equal to constant times n squared. And the delta changes, so let me actually go back and see if I can go back. So there's two magic numbers here. So the first magic number is this one in front of the n squared and the second magic number is this eight here. And so as you change delta, this eight and this constant change. So the bigger the delta is, the slower the thing runs but the more favorable these numbers are. And for delta equals two? Delta equals two, it's open problem or you have proof that it doesn't work? To say it's an open problem suggests that there was a lot of effort spent already but yeah, we don't know how it works. The most interesting case would actually be delta equals one. Sure. Would be nice. But basically the proof relies on a combinatorial lemma and that lemma doesn't seem to apply when delta equals one. Thanks. I was wondering if you're familiar with the paper where I had a unique security pattern on the halting puzzle, which for memory is 10 years ago but I think it was a very similar structure. First there was a bit different type. Because memory hard, the promise to come into an argument to show that hiding the hardness factor actually further increases security. But it happened to also be memory hard that the access was in fact data dependent. It also not be subject to the attack that you just mentioned in the last slide. You're not going to want to know that. So this is the one where you're hiding the number of iterations. That's also a secret. I don't remember the construction that was memory hard there. It wasn't the, as it's in the paper, because definitely memory hard. The execution time was proportional to the memory. Memory filling starts which happened to be proportional to the fixed constant with the running time period was a secret. Okay, so I only remember the first half of that so I'll have to go back and look. Thanks for the point to that. Okay, do I have an answer question? Okay, let's then speak again. Thanks.