 please welcome our next speaker. Adam is going to be talking to us today about revolutionizing authentication with oblivious cryptography. Take it away, Adam. Thanks, Rihanna. Good evening, everyone. Thank you for coming to the Crypto Village. Thank you for coming to see my talk. And I'm going to talk about a technique that we call oblivious password hardening. Oblivious password hardening or oblivious password protection. All right, first some introductions. I'm Adam Emberspaw, Ohio. I'm a cryptography expert at a software technology company called Uptake in Chicago. But what I'm presenting today is some work I did when I was a Ph.D. student at the University of Wisconsin. And that's joint work with these gentlemen. Some of these folks are here. Sam Scott is here in the audience. And all these folks are now at Cornell Tech. And lastly, I'm going to give a live demo of implementation of this technique later in the talk. And that live demo was developed by Virgil Security, which is an open source version of this. And the founder of Virgil is here. Demetri Dain is here. So if you have any questions about the software or the technology, come talk to me afterwards. I'll introduce you to Demetri. So we use passwords every day to authenticate to our devices and to online services. But the way we store those passwords is fundamentally broken because databases are compromised. And when they are, there's no way to stop offline dictionary attacks. Today I'm going to present to you a new direction. A solution that not only eliminates these attacks, but also enables compromise recovery. Let's start with a concrete example. All right. So maybe you've gotten an email like this. Scumbag polar bear wants to connect with you on LinkedIn. He sends you an email. You navigate to a website that looks like this. Now in order to log into LinkedIn, you must provide an email address and a password. And of course LinkedIn has to store that password in some way so they can tell if it's the same password every time. Here are the typical ways that websites store passwords. They either store passwords in plain text. No joke. This happens as Twitter. They perform a hash of that password and store the hash. They perform a hash with a salt and then they store the salt and the hash in the database. Or they do something kind of like an iterated hash function. Like you may apply the hash function say 10,000 times in a row and then store the salt in the hash. Okay. So pop quiz. There's a little bit of a trivia. Who here knows how LinkedIn was storing passwords in 2012? Two people. Somebody I don't know right there. You want to guess? Unsalted hashes. That's true. Now the reason we know this is not because LinkedIn had a press release. It's because attackers broke into LinkedIn in 2012. They stole something like 100 million passwords. They leaked some of those online. Those hashes online. And when they did, security researchers had a field day. They were able to crack most of those passwords in just two weeks with off the shelf hardware. Now the reason those attacks were so devastating and so fast was because they were using unsalted hashes. But what I want to point out to you is those attackers have 100 million passwords. Even if LinkedIn was using what's considered like industry standards today, like the best protection, which is some kind of iterated hash function or hard hash function, this doesn't actually stop offline attacks. It just slows them down. Which means attackers just throw more iron at the problem. So all the techniques that are storing plain text passwords would basically call hash and hope. You're hoping the attacker is not going to spend enough time and energy to crack those passwords because if they do, they will crack all those passwords. All right, so I'm picking on LinkedIn. But let's just say password database compromises are really not that uncommon, right? And what I want to point out is these aren't just like fly by night startups. These are sophisticated technology companies. They're repeatedly breached. These passwords are stolen. They have the technology. They have the motivation. They have the skill set to stop these database breaches and yet they are often unable to do it. Okay, so of course we want to limit these breaches whenever possible. But even when they do occur, we want some kind of protection, right? We want to move beyond hash and hope. So let's look at a different way to protect passwords. In January 2015, Facebook presented this slide at Real World Crypto Conference in London. This slide is pseudocode of how password, of how Facebook processes a new or changed password when you log into Facebook. All right, so I'll give you a minute to take a look at it and then we'll talk through it. Okay, so this is what Facebook does. They take your password. They run it through a deprecated hash function called MD5. They choose a random 20 byte salt. They take the output of MD5 plus the salt. They run it through a deprecated hash function called HMAC SHA-1. They take that result and it's a remote machine. It applies remote HMAC SHA-256 with a secret sensor result back. They run that through a memory hard function called S-Crypt. Take the output of S-Crypt run through HMAC SHA-256. Got that? All right. So it's a little confusing. You might be wondering, why would Facebook who presumably knows something about security do something like this? Well, the reason they do this is historical. Once upon a time, Facebook had 800 million passwords sitting around a database protected by MD5. Security said, holy F, we have 800 million passwords sitting around a database protected by MD5. How are we going to fix this? Well, Facebook doesn't want to make you change your password. They don't want to wait for all 800 million users to log in so they can rehash those with say HMAC SHA-2. So instead, the security team says, let's do this. Take all the existing passwords, choose a random 20-byte salt, run them through HMAC SHA-1, write that back to the database. Cool, we're done, right? Two years later, they got 1.1 billion passwords. They say, holy F, we got 1.1 billion passwords stored in HMAC SHA-1 of the database. What are we going to do? But look, we solved this problem before. Add HMAC with a secret. Doing good, right? All right, so sometime later, they want to do something better. They add S-crypt. Now the output of S-crypt is really long. So finally, they run it through HMAC SHA-2, literally just to compress it, literally just to compress it down so it fits in the database better. Okay, so what we're left with is this pseudocode, which is now an archaeological record of Facebook struggling to protect passwords. All right, but I want to point out something interesting here. Something most technology companies do. Most of this processing happens on the same server. But at some point during this processing, Facebook reaches out to a second server. That server applies in HMAC SHA-2.6 with a secret held only on that server and sends a result back. And that's actually really neat. That's neat because it leads to a different kind of architecture. At a high level, the architecture looks like this. Users submit a password. Web server does some local processing. Connects to the crypto server. Crypto server applies. It sends it back. Now the reason we like this architecture, this is now critical information is split between at least two different machines. So if I break into a web server, I get passwords but no key. If I break into a crypto server, I get a key but no passwords. Okay, this is already better. But there's still some problems with Facebook's implementation. First off, there's a cryptographic key, right? Security people might say, how do I rotate that key? Well, the dirty little secret inside Facebook is they can never rotate that key, right? It's in the middle of a bunch of one-way functions. If you change that key, you effectively delete all two billion passwords in that database. So if you want to change the key, you have to keep adding these like HMACShot 256s to the end of the process. Okay, so that's not good. Secondly, because of this particular API, the way requests come in, it's really difficult. It's actually impossible in some cases for a crypto server to tell the difference between a large volume of legitimate requests like 10,000 different people are logging in with different passwords versus one attacker sending 10,000 guesses for a single sysadmins account. Okay, so for the rest of this talk, I'm going to leave behind the state of the art. I'm going to talk about our contribution to this problem. So we like this architecture. We're going to start with this architecture. I'm going to rename the crypto server to the PIVIA server and we're going to fix some of those problems. We're going to change the API. We're going to change it in a way that allows the crypto server to detect online attacks. We're going to build in support for key rotation and key rotation is going to be really powerful. This is going to allow us to recover from compromises and also proactively rotate keys. And when we rotate keys, it's actually going to allow us to cryptographically erase information that's been stolen. Alright, finally, we want to design a service that's not just for Facebook or people who have these huge security teams and they can run their own service. We want to build a modern multi-tenant web service that allows even small companies to take advantage of the system. Alright, so let's see how it works. I'm going to talk through how a PIVIA query works when a new user logs onto a website and the website's using PIVIA. So user logs are in the normal way. User name, password, send via HTTP posts protected by TLS. Web server chooses a random value. We'll call that value T. This is going to be a user writing. He then takes that password and passes it through what's called a cryptographic blinding function. Think about this as encryption. It actually is. It's key homomorphic encryption. We'll get into the details later. But just know that this protects the password. Web server sends a query. The query has these components. It has the web server's ID. It has the user's ID. It has a blinded password and a cryptid password. A PIVIA server pours the key out of a database that key is associated with only this web server. It then applies a cryptographic PRF, a keyed PRF with that key and uses those inputs, the user ID and the blinded password. We'll call that result Y that result gets sent back to the web server. The web server then applies the unblinding function. And then stores those. We call the unblinded value, the final value, the protected password. Think about this as the same output from the Facebook algorithm when you've applied S-crypt and a bunch of other deprecated hash functions. And then of course when the user logs in, web server does the same thing again. Produces a new value Z prime. If the new value matches the date of the database, the password is the same. If it doesn't match, it's the wrong password. All right. Let's talk about how Compromise Recovery works. So let's say a attacker breaks in, steals a copy in the database. Well this web server is in a better position because the attacker can't do offline dictionary attacks without that key. Now he does have enough information to do online accounts, online attacks. But notice that the API has changed. The API requires the requester to specify the user ID. If you specify the wrong user ID, you get the wrong result and your guesses aren't going to work. This means the Pythia server can now tell if you're getting 10,000 requests for a single user account and it can do something, right? It can throttle requests, can lock accounts. It can send out alerts to SysAdmin to say, hey, someone else has your password database. All right. So Admin goes in, he cleans up his SQL injection. And instead of the normal process of email all your users, tell them they're screwed, they have to go change their Instagram passwords even though you're not Instagram. Instead he does something different. He contacts the Pythia server and says, hey, I need to do a key rotation. I need a new key. Pythia server generates a new key and sends to the web server. We'll call it update token. And this update token allows the web server to update the existing protected passwords from the old key to the new key. And the important thing is here, the web server doesn't have to know the original password. I don't wait for the user to log in. I can do this whenever I want. And also, the underlying password hasn't changed. We're just changing the key that's being used to protect that password. All right. The web server finishes updating his database to make sure it works. Contacts the Pythia server and tells them I'm done, erase the old key. If that key is gone and there's no copy of that key in anywhere else in the HSM or anywhere else in the memory, that password database is useless, right? It's based on encrypted values for which there is no key. So even if you break into the Pythia server, if the key is gone, you don't have enough information to cover those passwords. We say that password database is cryptographically erased. You now have a big blob of useless data and some user names and user IDs. Okay. So there may be some cryptographers in the audience. Even if you're not, we're going to talk about the properties that we need to build this kind of a service. So we need a scheme that's going to be deterministic. Even the same key, same user ID, same password, needs to produce the same result every time. That's how we know if we got the password correct. We need a scheme that's pseudo-random. And really by that, I mean passwords should look like random numbers. They shouldn't leak, sorry, protected passwords should look like random numbers. They shouldn't leak anything about them. We need this weird property. It's like partial message privacy. I want to hide the password, but I want to require you to specify in the clear the correct user ID. And finally, we need key rotation. Hopefully it's obvious how very powerful that is to operators. Okay, so when we started doing this work in 2014, we looked around at existing schemes. There's things like pseudo-random functions like HMAC. There's oblivious pseudo-random functions. Partially blind signatures. Let's give you this kind of partial message privacy. There's a bunch of techniques for key updateable encryption, proxy re-encryption. But in 2014, at least, when we were working on this, actually 2015 we were working on this, there actually was no scheme that provided all four of these properties simultaneously. So we, uh, the authors of Pythia, we had to invent a new scheme. We called it a partially oblivious PRF. It's a sexy name, right? Forgive me, I'm a cryptographer, I'm not a marketer. All right, so let's go through, let me give you a high level intuition about how we're going to build such a scheme. So, how do you build an oblivious PRF? Well, the first thing we need is a secure, uh, a secure reversible blinding function. So I can apply some function, some blinding function usually with some random input. Think of that as like a key. And of course, if I apply the un-blinding function, that should give me the original value back. Now I need a secure reversible blinding function that commutes mathematically with a pseudo-random function. And by that I mean, if I blind the password and apply the pseudo-random function, it's the same as if I were to do it in the opposite order. I applied the pseudo-random function first and then blinded it. If I have these three things, I can build an oblivious PRF. So I blind a password, the client blinds the password, sends it to the server. Server has the key, applies the key pseudo-random function to the blinded input. Sends it back to the client, the client applies the un-blinding function, right? And then it should be clear that we can rewrite it like this, and hopefully then it's still clear that un-blinding cancels out. At the end of the protocol, the client who initiated the protocol learns the output of a key pseudo-random function of a password, but he doesn't learn the key. And on the server side, the server side can apply the key, but never sees the password. Okay, so that's an oblivious PRF. Uh, math warning, I have one slide of math. So for those of you who need to check your email and don't like math, go into it. For those of you like math, sorry it's only one slide. Alright, so here's our partial oblivious PRF construction. There have actually been sort of a number of similar constructions in the academic literature since, but this is my talk, so screw you guys, I'm presenting mine. Alright, so we need to use something called a bilinear pairing. And by that I mean three elliptic curves, we'll call them G1, G2, and GT, and a function that takes inputs from two and maps them into the third. And this function has this property. If I take A to the power x, B to the power y, run them through the blinding function, it should be the same as if I were to send A and B through the blinding function and raise that to the power x, y. Okay, with this we can now build our Pythia protocol. It works like this. The web server hashes a password turns it into a group element, an elliptic curve element, raises the power of r where r is just some random integer. This is a blinding function and this is really strong. It's got information theoretic security. Okay, so here's our blinding function. Send those values to the Pythia server. Pythia server then hashes the user ID, passes the user ID and the value x through the pairing function, raises the whole thing to the power of k. That's our keyed pseudo random function. The unblinding function is then I compute the inverse of r and I raise the whole thing to the power of r. If I write it out it looks like this. And then, hopefully it's clear that you can see that the blinding factor and the unblinding factor cancel out and what I'm left with is hash of the user ID, hash of the password, pass the repairing function, raise the power of a key k. And this thing is deterministic right? The same inputs it always produces the same outputs. Okay, the rest of you guys who don't like math and are busy checking email you can wake up. All right, so that was a deep dive. Let's pop up to the high level and let's talk about what this construction gives us. On the Pythia side, using this construction means the service has enough information to detect online attacks. You have to specify the user ID and you have to be right about it. Can't play shenanigans with it because if you specify the wrong user ID you get the wrong output every time. But importantly he can apply the pseudo random function never learns anything about the password. On the web server side he can compute through this protocol this keyed pseudo random function but he never sees the key. This is key because if you break into the web server, you scrape memory, you steal the disk, you know everything the web server knows and if the web server has the key anywhere then the attacker probably has the key too. All right, so our goal was to, as an academic my goal was to publish this paper but now our goal is to kind of get this into the world and get people to use it. And so a company called Virgil Security has built sort of our production grade, not sort of, it built a production grade version of Pythia that's way better than my research prototype. You can find out more about it this website, Pythia.VirgilSecurity.com. They have a full open source implementation, you can get hub, the code is very good. And I'm going to give you a demo of this. If anyone is brave enough and has goaling on your computer you can follow along in this demo. And I promise that there's no shady stuff in this code. I've never looked at the code but I promise that there's no shady stuff in it. Okay, what I'm doing is downloading via goaling, I'm downloading the package for this sort of demo version of Pythia Client but it hits a real live Pythia server and it's running the Pythia protocol. And now I'm going to set up a client ID. So all I'm doing here is choosing a random ID and this is going to be my web server ID. This is my random web server ID. So there it is, very exciting. So here's the tool. It's designed to be very simple obviously, right? You can get help, you can protect a password and you can check a password, right? This is really all we need to do. Okay, so what I've done is said I want to protect a password. Here's the username, of course my voice is my passport and then it's hard to see but I've redirected the output to a file. And what the client is doing is exposing the standard error, a little bit of demo information so we kind of get some idea of what's happening, right? So the first value is that blinded password that we talked about. This is the response we get back from the server, at least the first part of it, the response we get back from the server and the finally is the de-blinded value of the protected password. Okay, so here's my protected password. It's not obvious but this first part here should match. It's possible I use the one file here. But the first part is a password and the second part is zero knowledge proof and some other fancy information which I totally skipped over because we don't really have time to cover that stuff. But essentially this is the protected password. Big surprise, password matches. All right, so now let's insert an error. Let's say your voice is my password. Okay, password does match, right? Big surprise. All right, so let's do this a bunch of times. Right, if the pithiest server has some policy on it we should only get a certain number of these queries before we get some kind of response. Okay, so there we go. So the pithiest server is monitoring these inbound user IDs and after you get say 10 or so checks they put a throttle on and they say, okay great, you have to wait 60 seconds or so, right? This is a very sort of simple policy that allows us to quickly slow down any kind of online attacks. All right, so that's the totality of the demo. Let's go back and talk a little bit more. Okay, so that's how it works. There's some deep theory going on in there. There's a very cool very simple interface to use it but let's talk about performance, right? So if both the pithiest server and the web server are on the same local error network and that's important because then network latency sort of doesn't factor into these numbers then one can execute a pithiest query in about five milliseconds, right? So you're probably thinking, what does that mean, right? So let me give you some context. If you're following the latest guidelines from NIST which is to use say shot 256 hash, not say, it is to use shot 256 hashing with at least 10,000 iterations on the same hardware that takes nine milliseconds. If you're using an even fancier function, B-crypt is one, S-crypt is another. Let me give you an example, B-crypt set with a work factor of 11 which is kind of a popular setting for obvious reasons, takes about seven milliseconds, right? So pithia today is already faster than the current versions and you get things like protection from online attacks and key rotation. And I want to point out that as computers get faster, we actually make B-crypt slower, right? Because as GPUs get faster, we're processing these things on CPUs, so we keep turning up the work factor, so seven milliseconds is kind of like the least amount of processing time, but as CPUs get faster, pithia just gets faster, right? So we're already ahead of these faster protection mechanisms and pithia is going to continue to outpace them. Some of the notes about performance, on the 8 core machine that we tested in EC2, a pithia server, a single pithia server, they have about 1300 queries per second. If you do more queries per second, you throw more iron at the problem, right? You put a load balance here and a couple more servers. Storage is excellent for a pithia server, right? Pithia server doesn't store information about users, it stores information about web servers, right? So let me give you an example. If a pithia server is serving 100 million web servers, each of those web servers with an arbitrary number of clients say 100 million clients, pithia server needs about 20 gigabytes, right? That's enough information to store keys for individual web servers and a little bit of extra information for rate limiting. Then information is ephemeral. All of this is to say we can deploy pithia today on standard hardware. Alright, so most of this talk has been about web servers, that's kind of a very compelling use for pithia, but really pithia is really good whenever you have a password and a network connection and also like pithia. So let me give you two more examples just at a high level. First is file encryption, right? Commonly today you log into a MacBook or a Windows machine, you type in a password, it grinds away on your password, uses that for a key. That's good unless you lose your password and you're worried that someone's going to crack it. Now, if it only has like say your Instagram photos on there, that's not a big deal, but it's got like client data, health information, financial records that might not be pithia, use it to generate a hardened password and then use that for an encryption key. And then if you lose your device or say the FBI takes it, if you can get to the pithia server and say, hey, I lost my device, will you please delete that key? That means you can actually cryptographically erase that device, even if that device is offline or powered off, right? Because it needs a key and that key is not on that device. Alright, so the last sort of compelling use is something called a Bitcoin brain wallet, right? So Bitcoins are very valuable, sorry Bitcoin wallets tend to be very valuable because of the sort of skyrocketing price of these things. And then there's a question of like sort of how do you protect this digital asset, right? So you can imagine I could take generate a very complex password, run it through S-crypt, make it grind away for a long time and just hope that somebody else out there doesn't grind away harder and find my password in the blockchain. You can imagine using pithia for this, right? You type in a password, possibly combine it with a local secret, connect to pithia, use it to apply in another key, get it back and use that to either protect your brain coin wallet or use it directly as your Bitcoin wallet. Okay, so let's wrap this up. Password storage is broken because databases are routinely compromised and when they are using current techniques there's no way to prevent offline dictionary attacks. But what I presented today is a new solution, an architecture for protecting passwords built around the pithia PRF. And pithia's design is a modern web service. It lets clients of this service inherit the security of the service provider without just handing over sensitive information to the service provider and just hoping for the best. And with that, thank you and I'd love to hear any questions.