 Okay, if you checked earlier in your program, you know that this is an open-source love story in 3x and like any good 3x play We're going to start with the dramatic person a a listing of the characters. So first This is peppercorn Peppercorn is one of your users dogs and like any good dog odor immediately peppercorn's user went to all of their web services And changed all of their passwords to peppercorn This is Mallory Mallory is an attacker. We will come back to her shortly and this is me My name is TJ shook. I'm on the internet everywhere as my name without the dots and spaces Twitter github, etc I work at harvest. We are the makers of the world's best time tracking software You can check us out at get harvest comm if you do anything where you charge money for your time If you work for an agency or consultancy or do freelance you should check it out I'm also here with the European contingent of harvesters Vlad up top from Macedonia and Yoshka on the bottom from Germany So if you see any of us wearing a harvest t-shirt you can ask us about it And if any of this is interesting to you or you just want to check it out. We're hiring We are hiring all across the world So we're headquartered in New York where I am from but we have people around the globe and we're hiring in technical and non-technical position So if you know anyone that's looking you can do that, but notable for this talk. I am not a security expert There are people who are real security experts who get paid a lot of money to know a lot about Many more topics than I know about and if you have true fundamental security issues you should hire one of them But by virtue of the fact that at harvest we have customers and we have users We have to be security experts because ignorance is not an excuse if there is a breach We can't just say we didn't know any better. That's an inexcusable So I am not a security expert, but I have to be and you kind of do too. So let's let's learn a little bit So back to Mallory Her attack is an interesting one because good security is about layers You should have application level security. You should protect against things like SQL injects and XSS CSRF You should have infrastructure level security. You should have a secure data center. You should have physical firewalls You should update your bash installation But to truly assess any individual layer of the security stack We have to assume that all the other ones have failed. So we have to assume that this works We have to assume that Mallory can just run her script and get a database dump of our users table So knowing that and knowing that we still have to authenticate our users. What can we do? So the easiest option is to just store passwords in plaintext We have our users table. We have their email address and their passwords and then when they come in We can check if they match and let them in This is obviously bad and no one here is doing it, correct? Correct anyone want to admit? All right Someone here is doing it and they're doing it because they have reasons they work on an internal app So it doesn't matter or their website is just a place where you can rank animated gifts So if their password got leaked no big deal people can just rank gifts on their behalf But it's bad because users reuse passwords So even though your gift ranking site has peppercorn as one of your users passwords an attacker We'll get that and immediately go to Gmail and try that and go to Facebook and try it and go to all the banking websites and try it and Every individual service that they get into becomes another deeper avenue for a more Elaborate attack against other services because now they can authenticate using those other services with other ones So it is bad to store these plaintext passwords. We now know So we need some way to obfuscate the data in this dump so that if Mallory runs her script and gets these passwords She cannot use them the easiest way is just to encrypt them using some thing to obfuscate the data This is using a very secure encryption scheme called wrote 13 or a T 13 a Caesar cipher with a key of 13 You take all of the letters and you shift them by 13 So an A becomes an N a B becomes an O a C becomes a P and so on The important thing here is that this doesn't have to be something as simple as this it could be DES 3 AES 256 any Known encryption scheme, but the important detail is that they are reversible if you have the key here The key is 13 if you know 13 you can unencrypt it and get the plaintext value with the other Encryption schemes there is similarly a key and that key lives somewhere It's either in your application code or on your server and since the attacker already has access to your infrastructure They presumably already have access to your key as well And it's important to remember that an attacker could be an employee If there was a dump leaked of your database If you have a large enough organization, it's possible that you could have a malicious actor inside of your organization That would then leak that key to make all of those things unencryptable So encryption is reversible It's obfuscated data, but can be reversed back to the plaintext Hashing is irreversible So it's a good bit of vocabulary to remember when you're talking about how to treat these passwords So when you hash a value like peppercorn, you get some output if you hash secret one two three four You get some output again, but if you have some output, there's no function you can apply By design and definition that will get you the input again Additionally hashing is deterministic This is important for our authentication flow because when a user logs in for the first time with their password of peppercorn We get the output and when they come back every time the output of peppercorn is the same the output of peppercorn is the same So that's another important thing. It's also important that it's deterministic, but not obviously so So when you hash peppercorn twice, you get the same value both times But if you trivially change the input so for instance capitalizing peppercorn the output is wildly different And if your output is trivially different the least significant bit here is off by one The input is again wildly different So there's no good way to kind of use that deterministic nature to find other ones So here's an example of our User's table again, but now with all of these passwords hashed This is using MD5 just because that is the shortest output So it's easiest to fit on a slide, but everything here is also applies to other hashing algorithms like SHA-1 So this is good. We can't go backwards now. So our passwords are safe But there is a problem and that problem is that hashing is deterministic This is a double-edged sword because the hash of peppercorn is always the same The hash of peppercorn is always the same and that introduces the notion of rainbow tables Now there are three important things to note here. First of all, this slide is awesome. It's the best ever been made Oh You wait Second I've already turned it into an animated gift for you that I will tweet out later so you can use it all the time Third Third we are technically not going to talk about rainbow tables. We are technically talking about lookup tables, which is It is a rainbow table with a chain of length one is a way of thinking of it rainbow tables and lookup tables are often conflated in literature particularly amateur literature on the internet So it's good to know the phrase and people are often referring to lookup tables and the ways that you combat them are the same But if I agreed to talk about lookup tables, this would be my slide. So instead you get this But you have to know that we're talking about lookup tables. So From this users table database dump, we have this list of hashes So what we can do is if we are an attacker we can have this lookup table where we just take a dictionary of phrases and Hash all of them and then we just keep that sitting around and then we can look up using the hash to get that reverse value It's not a function of going backwards But it is a way to look up what the original value could be as a proof of concept using this database dump here We can use the world's most convenient lookup table, which is Google if you just take any md5 hash and drop it into Google You don't even have to leave the results page It will just tell you what that probably is So we need some way to render these pre-computed tables that are easily downloaded across the internet Obsolete and the easiest way is to change all the inputs So if we take peppercorn and we know that this is the output and we know that this output is in lookup tables All across the internet We can just change it in some way so that now we have a different hash And we know that this is like our secret string that we keep in our app code that we append to all of the passwords And now they all have a little bit of entropy added to them and then now if we check our lookup table There's no record that hash has never been pre-computed anywhere on the internet. So we solve password security It's great. It's easy Clap too soon So that's true an attacker can't look up those values in a pre-existing table But they can generate new tables trivially And again, they can get that altering scheme from our app code or from a malicious employee because again, they have already breached our security On this MacBook Air, which no one really considers a powerhouse of a machine I can calculate 13 million Shah wands per second. So if you had a list of Let's say 25 million Dictionary words you could calculate the hash of all of those in two seconds And if you had to make a new one because there was this you know trivial change that people added to make those tables obsolete It would only take me another two seconds to regenerate that table and the attack is the same For example, this is what happened in harvest by happened. I mean I did it. We did not suffer a security breach But we had the same thing that we described we had this globally salted Sha one scheme for authenticating passwords where your password would come in We had this little bit of secrecy in the app code that we would then append to it Sha won it and that was how we kept our passwords Knowing that this was wrong I decided before anyone else would as just a proof of concept and honestly I was bored one day to try to do the attack So I spent some time poking around doing some research and there are freely available programs that you can download Very easily and they're not too hard to use hash cat is one It's what I used because I found it first John the Ripper is another popular one that you can install via homebrew So again, none of this is complicated. It's easy to do you also need a word list Those are also freely available if you just Google like you know Password dictionary you can find a couple of them and then push them all together and get 25 million in five minutes So I did that and there was peppercorn right in the middle of 80,000 other harvest passwords and it took me 87 seconds to get them all in Less than a minute and a half. I had 80,000 passwords That's not even the majority of our users But it's enough to do a lot of damage and if I decided to spend longer than a minute and a half I could probably get significantly more Additionally by the way for any harvest users out here. I did my best to keep all of this data anonymous So I ultimately just got a list of words. I don't really know who they correlated to So it's also important to note that out of that attack. I got these passwords That first one is kind of that leapspeak substitution style thing That's very popular among people because it's easy to understand and easy to remember But it's good to remember also that these programs are good They know this trick and they can take your word list and automatically swap out eyes for ones and threes for ease The middle one also seems like a pretty good password That seems like a random string that wouldn't be in something like that until you look at your keyboard And if you look at the QWERTY layout of a keyboard, that's just like a hardware hack. They're just marching along keys The third one I don't even know I have no idea why that's in there It's there's no good reason that I could figure out. I couldn't find it anywhere. It's probably just by virtue of its length It's not very long. So it's just a random string that could be calculated quickly, but Again, it's it's not enough to think that you can do a simple trick like this because those simple tricks are also known by these programs and these attackers So that brings us to the concept of true salting before You might have called that a salt because it seemed easy to call it that but it wasn't really because we were using like a global salt That was on every password So we can do a per password salt So instead of saying peppercorn with this global salt We can say peppercorn with the global salt but then anyone else that comes in with a different password we use a different salt and You store this alongside it in the database. It is not a particularly secure piece of information It's really just there to add that randomness knowing it will not really help the attacker much So that's where you keep it. I got rid of the email column just to save space on the slide you would still have that to look them up and Very random salts help to avoid users having the same hash earlier today we saw this quote and doing true Random salts for every user helps avoid this problem by having enough randomness if two people have the same password of peppercorn their likelihood of having the same hash output is unlikely and Particularly more common passwords like password or password one or whatever the weakest password that your system will accept You probably have multiple users with a similar password this keeps them from having the same hash and by cracking one you don't crack all of them immediately and With this system of salts per password we now instead of having to regenerate the table one time We have to regenerate it end times for every individual user for what that salt is We have to regenerate our table and so that's pretty good. That gets us pretty far but unfortunately, that's pretty good for 1976 and In fact in 1976 in Unix's crypt 3 that's exactly what it did And the reason they did that was because modern hardware at the time could calculate four hashes per second But today we have these This is an AMD AX 7990. It's a GPU. They cost about a thousand dollars So it's pretty affordable for anyone that really wants to carry out an attack in a systematic way They could get a couple of these my MacBook Air that could calculate 13 million Shaw ones a second This can calculate 1.5 billion Shaw ones a second So now we're back to the same problem generating that one table for all of the users took me two seconds But generating one lookup table per user using a GPU is feasible again because now they're so fast and easy to create The reason that this problem exists is because most of these hashing algorithms Shaw one MD 5 They weren't designed for password security They were designed for effectively file integrity their check sums when you put in an input an output should come out So in our case it was peppercorn in output out peppercorn in again same output out But you could also do Moby Dick in the entire novel output out Moby Dick in output out And then you can do things like on a network transfer or transfer across the internet Make sure you have the same thing on both sides because we don't want to slow down things like file transfers We have these algorithms that are intentionally designed to be very very fast. That's a good thing usually But for password security, it's not So in 1999 Niels provos and David Matsier has published a paper about future adaptable password schemes and what they came up with Was b-crypt? Now b-crypt has all of the goodies that we've talked about already. It's a one-way hash. It's pre-image resistant It's deterministic it has built-in per password salts Which is also nice that we don't have to deal with salts anymore They just come along for the ride, but there are two additional things that make it very good for us One is the underlying cipher, which is X blowfish. It's based on blowfish, which is notoriously Expensive to set up, but this the Eks stands for expensive key schedule. It's even more expensive to get started So before you can get going just the the boot process of getting everything going takes some time and takes some memory It's memory-intensive, which is good because GPUs typically don't come with an excess of memory just computational power More importantly though, then the underlying cipher is this notion of an adaptive cost It was even in the title of the paper. It was so important. So let's again look at our database dump This is our users table with now a collection of b-crypt digest You'll see that the salt table is gone again because it's built in so if we look at a digest we can Investigate the anatomy of a digest first of all ignore the dollar signs. They're just delimiters. They're just there to Keep all the fields separate. This last field here is the actual check sum. That's the output of the hash It's a hundred and ninety-two bit number encoded in a modified base 64 and then right to the left of it That's the salt so we don't have to worry about storing salts because they're just a part of the digest stored right in there It's a hundred and twenty-eight bit salt to so you have a decent amount of randomness the same base 64 encoding there Far to the left is this 2a all that really is is an identifier. It just means this digest is b-crypt 2x and 2y also signify b-crypt I think 1a is the iterated md5, but that just comes from like the old You know etsy password style of storing passwords and knowing what they are so you can have more than one in one place But most interesting to us is this second field and that's the cost And so what that means is if we take b-crypt and we use it to hash peppercorn with a cost of 10 We get some output and if we take b-crypt and may hash peppercorn again with a cost of 10 We get some other output, but we know that it would be different because of salting We already learned about that and if we take b-crypt and do it with a cost of 14 We get a third output and you can see right there in the digest is that cost of 14 None of this really matters much until you look at how long it took to do each of these The first one took about 0.06 seconds on my laptop the second one took about 0.06 seconds and the third one took just north of a second So by using that cost value we can sort of march it forward with time So if we determine that a cost of 10 or 11 or 12 is good for us today Two years from now we can use a cost of 14 15 or 16 And that will make this checking take longer and longer and therefore take longer and longer for an attacker to do as well So we can now battle that hardware march along with our computational complexity march All of these are on my computer currently You kind of have to choose the cost a little bit with a gut feeling depending on your particular case It will add this amount of time anytime you do a password authentication So if you just log in one time to your web app and you do the authentication then and you set a session or whatever It's probably good to have a higher cost just because you can afford it in your in your sign-in flow If you add a half a second to that no one's really going to notice But if you support something like basic off where a password comes every single time You're going to add a half second to every single api request So you might not want to use that high of a cost or just move over to something like off too So before the attack on the harvest database dump that I took took 87 seconds to get 80 000 passwords That same attack using bcrypt would take 84 000 years So that's a pretty good improvement to make it no longer feasible and so bcrypt is kind of the sweet spot between this usability and Security and it's nice for us because there's a ruby library that we will look at shortly Some of you are ahead of the class and you're just sitting there on your high horses Just wanting to ask a question about pbkdf2 or a script and say well, what about this? It's just good and that's fine. You're probably right I don't agree with you for very esoteric reasons, but you're very Probably correct and in fact those are supported by most things that have these strict requirements So if you're doing us government contract work, you actually have to use pbkdf2 because it's the only approved one But if you already know that you're good if you're already using them You don't have to change if you want to talk about the finer points find me later So we now have had this attack on our own database. How can we fix it? We need to convert all of these passwords, but we need the plain text version to do that If we already have the plain text version if you're in the first step It's easy just kind of run a one-shot script to go through them all and convert them all to bcrypt Otherwise we need to kind of hook into our current authentication scheme because that's the only time we get the plain text password So if we consider that this is our old hashing scheme where we're trying to authenticate plain text password comes in If it matches with the digest we have after hashing the incoming we just return the user object otherwise nil We want to get to here where we are using bcrypt to do the same thing Very astute readers in here will notice the double equals And think that that might mean that I lied to you earlier that bcrypt is in fact reversible because we're taking a digest here And passing it into this thing and then comparing it with this double equals against this plain text password That's actually not true. It's uh, you know the double equals is not commutative. This isn't math It's code so bcrypt ruby overrides the double equals operator and defines it to actually do this comparison I think this is kind of a an unfortunate design decision And I keep meaning to change it But uh, it I've seen confusion about it before in other pull requests. So it's just a good thing to know when you're using it So the way that we can do it is just as part of our off flow just kind of do a little pre-filter to convert So when we come in unless we successfully convert our old password hash to bcrypt Then just return out fail If we do then just off with bcrypt and in that password Conversion function we can just see that if it's already a bcrypt hash bcrypt provides this valid hash method If it is then just go back true We're fine if it's not just do it in place first check to make sure that it matches that it's the actual correct password Update it in place. We're great I know this code works because this is what we use on harvest We kind of put this in there to over time Automatically convert these passwords and this is what happened. These are our users who have bcrypt hash uh passwords And this is about two and a half weeks of natural conversion You can see there's a big spike up front as like all the daily users and all of our apps that use basic off like authenticated in And then there's a slow trickle as your weekly users come in, but we're getting better and better We needed some way to fill the gap and uh for those people that weren't logging in There's no real way we can force them because we don't know their plain text password Except I do I know 80 000 of them So I just re white had attacked our database, but this time with conversion in mind and that got us a lot more This got thanks This got us most of the way there Excuse me This got us most of the way there We still had a few remaining ones for those we just reset their password and send them an email Just letting them know what happened. It wasn't that many and we got literally zero complaints So if you're scared about it because you have to send an email that says hey, we need you to update your password It's not that scary. Most people won't mind There is one downside to using bcrypt and that's because it's an expensive algorithm It is an expensive algorithm So this is our cpu utilization across all of our servers after we launched it it doubled it went from 12 percent to about 25 percent Uh, there was a big spike up front as you know that first spike happened There's a couple reasons for this uh for us particularly we still support basic off and we have a lot of people using it So that kind of increases the load because more people are using uh oauth or i'm sorry bcrypt for their uh requests But this is still well within the realm of acceptability. So it's not truly a problem, but it is a thing to be aware of So in a three act play act one is exposition. That's where you get all the back story But act two is where we add conflict So this is the story of fat binary gems We talked earlier about how there is a ruby library for bcrypt called bcrypt ruby you can find it on uh github I wanted to add a feature to bcrypt ruby So pulled it down kind of tried to get up and running But the test didn't run and the dependencies were out of date and there were docs missing So that one pull request turned into a dozen pull requests And the next thing I know I'm the new de facto maintainer of bcrypt ruby So, uh, it was originally written by uh kota hail. He's the third one Then it was taken over briefly I believe by erin and then moved on to tmm one among gupta from uh github and now it's me So this is what bcrypt ruby source looks like but more importantly, this is what bcrypt ruby source looks like bcrypt ruby is just a ruby Rapper around a c extension and a java extension The reason this is it could be implemented in pure ruby But ruby has a bad reputation for being slow accurate or otherwise And c is always going to be fast and you have to try to match your attacker because your attacker will use the fastest implementation possible So if you gave yourself an artificially low cost to match your slow implementation, that would be better for them So that's why it is just wrapping a c library Well, what that means is when you distribute a version of the gem You have to distribute it with a compiled binary for the sake of your users So that they're not responsible for having a compiler on their system And you can see there's four versions of each gem So three one five at the top there the fourth one is for the jvm This third one is just a native one and the top two are both for windows. The top one is for 32 bit and the second one is for 64 bit And these are fat binary gems, which is this uh notion of having it compiled against multiple Versions of ruby in the one package, which is useful for windows Luckily when amon added all of this to the beaker ruby library He left this awesome commit message of step by step how to do it But he left that three years ago and anything about computers on the internet that's three years old doesn't work Aaron introduced the notion of fat binary gems about five years ago First of all, he made the queen joke that I made five years before me But second of all anything about computers that's five years old on the internet definitely doesn't work All of this though and particularly after Aaron's work got wrapped up into a tool called a rake compiler And rake compiler is a rake task or rake tool for building these binary gems So you can just do kind of rake build native and get a native one out and rake build cross compile and get a Cross compiled one out and it has Epic length docs that you can follow through and get everything up and running and it didn't work so The rails core team has this thing called the rails dev box Which is a virtual machine that inside has all of the native dependencies that you need to run the test suite That you might not have on your native machine or want to so it has all of the databases You'll need my sql postgres sqlite It also has all the system dependencies you'll need and it also has things like node and memcache that you might not be running locally That's great So I had a dream that I would create the rake compiler dev box A virtual machine that had all the rubies. I needed the gcc the jdk min gw, which is a tool for Compiling windows binaries on nicks like machines, which is kind of the bread and butter of this whole process And vagrant exists and vagrant does it makes these lightweight reproducible and portable development environments exactly what we want So I followed all the docs and made this vagrant box and everything was just great and wonderful and it didn't work So what do you do with anything that doesn't work? You put it on github And with that I opened up rake compiler issue number 79 Uh, I pretty much said listen. I followed the docs 10 times over at this point. Nothing really is working for me What can I do in our three act play? This would be the climax the turning point And you were promised a love story This is louis lavena Louis is the developer of the one-click ruby installer for windows Uh because of that work louis is also a member on ruby core because of his work on both of those He was voted a ruby hero in 2010 But most importantly for us here today. He is the developer of rake compiler and he is every uh ruby developer on windows Just best friend He opened up rake compiler dev box pull request number two Where in a very long very detailed very friendly thread He dropped triple hearts on me not once Not twice Not three times Before times So here's the problem. I'm 12 hearts in the hole And I need you to help me pay him back So everyone take out your internet devices and open up twitter and send a tweet to louis with three hearts Uh pro tip if you're on mavericks or later on your mac you can hit uh control command space and get all the fancy emoji I'll be here all day. I'll wait come on get on it so louis is a great open source maintainer if you follow anything. He's done on the internet. He's very friendly and uh very Accepting and very helpful even when he doesn't necessarily have to be So to half of you the half that are you know, just boots on the ground developers like me I encourage you to find your louis find someone that you can collaborate with to work on something interesting That you wouldn't otherwise it could just be a co-worker or it could be someone else in the greater community Um, and you know, you can thank them for the work they do But it's just much more interesting to work with them. Uh louis splits his time between argentina and paris So it's a nice fun global uh collaboration. He's also a harvest user. So that was a nice bonus Uh, but to the other half of you the half that are already maintaining these giant libraries And you know the pain that can come with being an open source maintainer I want to encourage you to be the louis that you want to see in the world and to When that pull request comes in or that issue comes in that's the same issue you've seen a hundred times And you just want to say our tfm noob go away Consider that the problem might be the fm and try to be a little bit more helpful and realize that people are on the other end And open source is people So what have we learned today number one? Just use bcryp. Just do it if you're not already talk to me afterwards We'll hug. We'll cry. We'll convert our passwords. It will be great Uh number two distribute a dev box if you are doing something that has complicated dependencies It will make it a lot easier for other people to use if you give them an easy way to have an environment Additionally, if you develop a gem that has a c extension or a java extension consider using rate compiler dev box It makes it a lot easier But most importantly number three. I want to encourage you to release to collaborate and to iterate Thank you very much