 Okay, this is 80 thousand plaintext passwords. This is an open-source love story in three acts as you were promised in your programs Like every good Three act play we will start with a dramatic person a a listing of our characters in our play So first this is peppercorn peppercorn is one of your users dogs and Like a good dog owner the instant peppercorn's owner got peppercorn. They changed all of their passwords on all of their services to peppercorn This is Mallory Mallory is an attacker Mallory is going to attempt to compromise all of your users passwords We'll come back to her in a minute, and this is me My name is TJ shook. I am on the internet everywhere as TJ shook my name without the dots and spaces github twitter, etc I'm a developer at harvest. We make the world's best time tracking software If you do any consulting or freelance work or if you work for an agency or if you just get paid Money for your time you know it to yourself to check out harvest I am from New York, and it took me a very long time to get here There we go. Hey New York is 12 hours time shifted from here, so it is literally halfway around the world It took me more than 24 hours to get here I left on Monday and arrived here on Wednesday, so Tuesday just disappeared to the skies So if I start speaking in tongues or anything It's because like the part of my brain responsible for speech and language got like mixed up while I was time traveling Most notably for this talk. I am not a security expert There are real-life security experts that get paid lots of money to know a lot of stuff about many things I don't know about if you have true security problems. You should hire one of them But what is important is that I have to be a security expert strictly by virtue of the fact that I have users If there was a breach or a leak or anything Ignorance is not an excuse. You can't say we just didn't know any better that won't absolve you from your sin So I have users, so I must be a security expert. You probably do as well, so this is an attempt to Get rid of some of that, you know, exclusive ignorance So back to Mallory. Let's talk about her attack Um Good security is about layers. You should have application level security to protect against sequel injection XSS CSRF All that fun stuff. You should also have infrastructure level security You should use a secure data center that people just can't walk into you should have physical firewalls between your devices However to truly analyze any individual layer of the security you should assume that all the other ones have failed So we have to assume that this works Mallory can just run her script and get a database dump of your users table so With that in mind, let's uh kind of analyze how we could keep track of your users passwords to let them authenticate The easiest option is just plaintext. Just store your plaintext passwords of your users This is obviously bad and no one here is doing this, right? right No one raised their hand, but someone here is doing it They don't want to admit it but they do it because they have reasons, you know They just they run a site that's for ranking animated gifts. It doesn't matter if there's a leak People will just be able to rank gifts on your users behalf whatever But that's not true because users reuse passwords. We learned this about peppercorns owners So when your database gets breached and they find out that your users passwords peppercorn They immediately go from your gift ranking site to banking websites and gmail and facebook And try that same password and because users use the same passwords everywhere that will work and they will get in further and It will have a deeper leak into their online identity. So We know this is bad. So we need some way to obfuscate the data in this dump So the obvious first thing is we'll just encrypt it This is a very secure encryption cipher known as rote 13 rot 13 a Caesar cipher with a key 13 You take all the characters and you move them by 13 So a becomes n b becomes o c becomes p. It's a nearly Uncrackable unless you have the key This for example could be something more Complex like des 3 or aes 256 But the key to all of them is that they are reversible if you have the key So for this the key is 13 if you know that you can decrypt it for the other ones if you have the key you can decrypt it This is bad though because our system is already compromised We also have to assume that if she was able to get a database dump She also has access to our application code where our secret might be or just the physical servers where the keys might be It's also important to realize that an attacker could be a malicious employee They have access to a lot of your things that you know if they wanted to use them Wrongly they could so the key here is that encryption is reversible The data is obfuscated, but anyone with that secret can decrypt it Hashing is irreversible and that's kind of where we need to go to avoid being able to just take that dump and reverse everything out to get plain text passwords So if you have a hashing function and you pass something into it like peppercorn you get something out Um, and then if you take something else like secret 1234 and hash it you get something else out This hashes the output, but if you have a hash there's no inverse function that you can apply to it to get the input So that's how we kind of you know go one way there Another benefit of hashing is that it's deterministic So if you hash peppercorn you get an output if you hash peppercorn again You get the same output if you hash at a third time you get the same output So what's important about that is that allows us to check the password when it comes in matches what we have stored Um, but also it's deterministic, but not obvious So if you hash peppercorn twice you get the same thing But if you change the input trivially like uppercasing peppercorn you get a completely different output And if you have a trivially different output, so the least significant bit here is off by one The input can be wildly different from the uh the input we know So there's no good way to kind of use these hashes to calculate similar things So great all of our problems are solved We can just hash all of our uh passwords and now when we get the plain text password in We do the same hashing algorithm if they match they're authenticated. It's great. We can't go backwards So everything's safe Uh notably throughout all of this i'm using md5 just because its output is shortest So it fits on a slide well everything about shawan or any other kind of basic hashing algorithm is the same Uh, so but we have a problem and that's that hashing is deterministic. We just went over this But it's a double-edged sword The hash of peppercorn is always the same But the hash of peppercorn is always the same and that leads to the notion of rainbow tables And before you ask, uh, yeah, this is the best slide that's ever been made. It'll save us a lot of time in q and a And more notably no one calls them this you will usually see them referred to as lookup tables or something More not jokie. Um, it's usually just used in joke form like i just did But because every hash is always the same you know that peppercorn in always comes out in the thing You can pre compute tables. So if you just have a long list of possible passwords You can hash them once and then use a lookup table to get your input So if we have our dump we have this hash and we can put it into a lookup table and see what the value is And as a proof of concept, we will use the world's best lookup table Which is google If you just take a hash and drop it in to google, um, you don't even have to leave the results page um You can see right on there That that md5 string is just peppercorn so So we need some way to make that not possible for an attacker who has gotten this dump We need to make these pre computed tables obsolete And the easiest way is to just change all of the inputs So we know that peppercorn is always the same and all these tables exist with peppercorn in it that has that output So we can append a string of nonsense to it And we effectively just gave this user a strong password and we can keep that like nonsense You know in our app code or something and we know that that's like kind of our secret And then we get these strong passwords out of it And then we just add that string to our off checking and it all works And you can see we check our lookup table and nothing has that hash that's never been pre computed in a lookup table Great Did it we solved password security guys way to go. It only took like eight minutes. That was really easy Um, and that's true an attacker will not be able to look up that password in a lookup table or the hash rather um The problem is they can't look it up in an existing table But they can generate a new table trivially because uh hashing schemes are fast Uh on this uh macbook air that is a just workhorse power machine I can calculate 13 million sha ones in a second 13 million every second So as a proof of concept, this is where harvest was and all of our passwords We were sha wanting them with this global salt on all of them um And we knew this was bad. We had it there for a long time. It was on like our list of things to fix But you know, we hadn't had a breach yet, but no one has a breach until the first time they had a breach So to be preemptive, I went ahead and white hat attacked our database Um, you can use any freely available program that you can download from the internet I used hash cat you google hash cat you find it you can use john the ripper You can install john the ripper via homebrew. It's not hard to do I got a handful of word lists on the internet by googling for you know, password dictionaries and put them all together and got 25 million unique entries and I ran it and then right there in the middle was peppercorn Along with 80 000 other passwords from the harvest database And I got them in 87 seconds. They're a minute and a half to get 80 000 passwords and it was not that hard um This isn't even a majority of our users, but it's definitely enough to do damage so And as I know I did my best to keep all the data anonymized So if you are a harvest user, I don't know your password I'd try to just work solely on the hashes and not keep them associated In this dump, there's also things Like these passwords and that first one, you know, that's kind of the common like leet speak way to get a better password But all of these word lists are very smart and all of these programs are very smart and they will automatically check all of these alternates So that universe there was cracked that second one seems like a very secure password as well Until you look at a qwerty keyboard and realize that's like a hardware hack and it just follows keys on a qwerty layout And again, these word lists are smarter than that and they're all in there That third one. I don't know why it's in there. That seems good. I haven't figured out that that's like, you know A character from game of thrones or anything. That's just a thing Um, but again, like these things are good. They they have many different options there It's probably just surely based on its length. It just calculated it So that brings us to the concept of true salting in harvest We were doing a global salt and that's what we were just talking about but we can do a per password salt So every single user instead of appending that nonsense we can do it on every single one This gives all of our users strong passwords and they all have unique passwords if we add enough entropy to the end of it So I got rid of the email column just for space here But so now we have the hash that we store along with the salt Having the salt doesn't really help the attacker This is a game of kind of computational expense So just knowing it doesn't help them do it any faster and we need to know it as well So it needs to be stored somewhere And a very random salt one with enough entropy keeps people with the same passwords from having the same hash So if you had a very short salt if you know you had thousands of users all with the password Password there's bound to be a couple that end up with the same salt and end up with the same hash again Which makes cracking one good for cracking multiples And this is pretty good. This is actually getting us most of the way there But mostly this is pretty good for 1976 And that's roughly when this was used in unix's crypt 3 program that was used for the system passwords in unix At the time in 1976 a modern cpu could calculate about four of those hashes every second So we had enough complexity there to keep them from cracking it But today we have these This is an amd ax 79 90. You can buy one for about a thousand bucks It can calculate 1.5 billion hashes a second So macbook air 13 million this 1.5 billion every second That makes generating these lookup tables of one per user trivial again So it's now no longer outside the realm of possibility The problem is that most hashing algorithms like shawan and md5 are not made for hashing passwords They're made to be fast They're designed to be fast because they're used for things like checking file validity on both ends of a network transfer or something like that so In 1999 neils provost and david matziaries published a paper about future adaptable password schemes to avoid this very problem And the thing they came up with was bcrypt Now bcrypt is it has all the goodies we've already talked about. It's a one-way hash. It's pre-image resistant. It's deterministic It has built-in per password salts. We'll see that soon But it has two specific things that make it better for all the problems we previously talked about One is that it's based on x blowfish, which is its underlying cipher Um, it's based on blowfish x blowfish is which is notably very expensive But this is changed to be even more expensive the eks and x blowfish stands for expensive key schedule And it also requires more memory to kind of get x blowfish up and running So it makes gpus and other specialized hardware less feasible again because they typically don't come with a plethora of ram Um, and so that slows down the process of just getting started But more interestingly it has this adaptive cost that was in the title of their paper So let's look back at the dump So again, this is our dump and now we have bcrypt digest in our password column So let's examine the anatomy of a bcrypt digest First things first, uh ignore the dollar signs. They are just delimiters of all the fields. They don't signify anything special This last field is the hash. That's the final checksum that comes out of it This thing right to the left is the salt. So that's, you know, nice We don't have to worry about generating our own salts or dealing with our insults or storing them They're just right in the algorithm and they're stored right in the digest. So gets rid of a thing we have to worry about Uh, this first field is just an identifier. This just means this is a bcrypt digest Uh, two x and two y also signify bcrypt hashes for historical reasons that we can talk about later Um, other things signify other algorithms, but two a two x and two y are bcrypt But this one, this is the interesting one. This is the cost So what you do is when you bcrypt a password to get your hash out you pass it a cost as well So peppercorn with 10 comes out as a thing peppercorn with 10 again comes out as a different thing But that's because of salting. We already went over that Uh bcrypt peppercorn with a cost of 14 comes out with a thing again Um, and you can see right there in the digest. There's that cost of 14 that we just passed in it What is notable about this isn't the output, but how long it takes to get the output So on this macbook air that first one took about 0.06 seconds Second one took about 0.06 seconds, but the third one took more than one second 1.04 seconds And that's what that future adaptable scheme means This is a rough average of doing a bunch of bcrypt hashes with each cost on this machine You have to balance as a developer How long you want your users to wait But again if they're all thing into your application adding a couple of tenths of a second won't necessarily be noticeable Whereas for an attacker it will be And it's future adaptable because by increasing this cost over time We can march ahead with the hardware So as hardware gets faster our passwords can get more expensive and we don't have to worry about it Before the attack I did on the harvest database took 87 seconds to get those 80 000 passwords Now the exact same attack with the same word list same program will take 84 000 years Which is a notable improvement. Um, that is no longer economically feasible for an attacker to carry out So bcrypt is kind of the sweet spot there Um, additionally it has a ruby library which is very useful for us. We'll talk about that gem in a second Some people here now are thinking well, you should be using pbkdf 2 or you should be using script or whatever And that's fine. You're ahead of the game way to go. You're ahead of the curve We can debate the merits of them afterwards. I still think you're wrong, but uh Regardless though, if you are already using pbkdf 2 or script you can stick with it You don't have to convert to bcrypt, but if you are using something like shaw one you should consider it So how do we fix it? What is the fix? Um, to do the conversion we need this plain text password You can't go from one hash to another So If you already have the plain text password if you're in that first step It's easy. You can just do the conversion manually. It might take some time But you'll get through your database eventually otherwise. We can just uh take our old hashing scheme So when we authenticated through we had the plain text password. We hashed it made sure it matched We want to get to here where we take the uh password coming in and compare it to the bcrypt digest Um, a couple of you are probably realizing here that that uh seems to You know contradict what I said earlier about it being irreversible where we are taking the password digest passing into bcrypt And then complete comparing it to the plain text password That's because the bcrypt ruby gem overloads the equals equals operator in probably the worst design decision of it But uh, just be aware that it is not in fact reversing it. It's Using that to check the hashes against each other. Um, so but to do that conversion We can just kind of pre-filter in our uh authentication flow to convert So we have it coming in if it's uh, you know, we go to this conversion method If it's already converted we just bounce back out. Um, if it's not just update it in place and go onward Again as a proof of concept because we did this with harvest This was what happened with us. We did that exact same code was out of the harvest code base for the conversion Um Over the course of two and a half weeks. We kind of had this curve of natural conversion So there's a big spike upfront uh as all the daily users and people using like the mac app and the iphone app Offed in and then slowly as weekly users and bi legally users and less often users logged in we got more and more of them Um, but this didn't quite get us all the way Uh, but since I had already white hack uh attacked the uh database I could just do it a second time with conversion in mind and that got us a giant spike that got us the rest of the way there Um, we had a few remaining active users that weren't done for them We just manually reset their password and send them an email and let them know it was up But it wasn't that many and it was uh, it wasn't as hard as it seems like it would be There is one downside of b-crypt and that's what we've already talked about It is an expensive algorithm and an expensive algorithm is expensive So you can guess when we launch b-crypt here. This is our utilization of our cpu on our servers and it about doubled Um part of this is because harvest api still supports basic authentication And it's used a lot so every uh request that comes in has a password along with it So another b-crypt gets uh computed, but if you're already whole hog on oauth 2 that won't be as much of a problem But more so this is still well within the realm of acceptability. So it's totally worth it So in our three act play act 1 is exposition. That's the boring spark now. You're all good act 2 is where we add the conflict so Uh b-crypt has a ruby gem that I mentioned earlier called uh b-crypt ruby This is great for us as ruby developers because it makes it easy to use it And as part of this I had a feature that I wanted to add to b-crypt ruby that I thought would be useful So I went to go submit a pull request and the test didn't run and there were dependencies that were out of date And they were missing docs So uh that one pull request turned into a dozen pull requests And then if you just do that long enough and pester enough Amon will get tired of you and just ask for commit bit and then you will get it and then you become Amon was the de facto maintainer of b-crypt ruby and now it's me. Um, so yeah Notably this is what b-crypt ruby source looks like but more accurately This is what b-crypt ruby source looks like it is a ruby gem wrapper around a c and a java implementation of b-crypt You want it to run as fast as possible because your attackers will be running it as fast as possible So you want to you know use a c implementation to match what your attacker will be using Along with this though when you release a version of the gem You have to release native binaries so that your users aren't dependent on having a compiler on their machine It's just a nice courtesy you can provide So every version of b-crypt ruby has four versions that get distributed and the top two there are windows binaries I am not a windows developer. So I didn't know really how to do that But there are these fat binaries that provide support for multiple versions of ruby wrapped up into one binary Luckily when amon added the Code to do all of this he left this nice long detailed Commit message about doing it But he left it two years ago and like all things about computers on the internet that are two years old It doesn't work About five years ago erin, I think introduced the concept of fat binary gems This is where I found out that he made the same queen joke as me five years before me But more so anything about computers written on the internet that's five years old definitely doesn't work All of this stuff is ultimately though just wrapping up rake compiler Which is this great gem that does what it says on the tins provides a standard and simplified way to build and package ruby extensions c and java using rake is glue great So it has nice long documentation and I walked through and installed everything and it didn't work so The rails team has a project called the rails dev box And what this has in the dev box is all or all of the external dependencies That you are not necessarily wanting to have on your machine But this allows you to develop on your machine but run tests in this dev box that has all of the external dependencies preloaded in it So I had a dream that I was going to make a rake compiler dev box that had all the dependencies we needed all the rubies Gcc the jdk ming w which is the thing that allows nix-like machines to compile windows binaries And vagrant made this possible because it's exactly what it says create and configure lightweight reproducible and portable development environments Exactly what I wanted so I did that I made a vagrant box. It was awesome and it didn't work So what do you do with anything that doesn't work? You put it on github And with that I opened up rake compiler number 79 in our three act play. This would be the climax. This is the turning point And you were promised a love story This is louise levena Louise is the developer of the one-click ruby installer for windows if you do any ruby development on a windows machine You owe him a debt of gratitude because of that work He is also a member of the ruby core team and because of both of those he was voted a ruby hero in 2010 But most notably for me as not a windows developer and Not in 2010. Uh, he is the developer of rake compiler So when I opened that issue saying listen, man, I did everything I could I followed all the docs nothing works louise came back and opened up rake compiler dev box number two And rake compiler dev box number two you should check out because this is an epic thread where louise drops on me triple hearts Not one time Not two times Not three times But four whole times 12 hearts, I know But here's the problem now i'm 12 hearts in the hole So I need all you guys to take out your internet devices and tweet at louise three hearts because I have a debt to repay So uh have at it you can use fancy emoji if you'd prefer um And just generally thank him for being a wonderful oss maintainer and contributor um Now some of you think you have figured out my scam that i'm just traveling around the world to a repay an emoji debt But b to kind of you know love troll louise But uh to half of you I just want to encourage you to find your own louise Uh find someone thank them from their work, but that's easy and boring But uh collaborate with them find something and work with them It's surprisingly rewarding and you know louise is uh he lives in paris in argentina So it's a fun global collaboration there Uh could be any oss maintainer whose work you admire or you want to use could be a co-worker whatever it's easy But to the other half of you the ones who are already maintaining a giant library or already fielding dozens of pull requests I want to bastardize a quote and encourage you to be the louise that you wish to see in the world So when someone comes to you with you know that same issue over and over again And you're tempted to just throw your hands up and yell r t f m Consider for a moment that the problem might be the f m not the library or the person And do your best to kind of walk them through it. It will make our community Much better and friendly and triple-hearted for all of us So what have we learned today three big lessons one use b-cryp. Just do it. It's not that hard If you have any questions about it if you want to walk through find me afterwards We'll hug. We'll cry. We'll convert our passwords Number two is distribute a dev box if you have any project that has complicated external dependencies Do everyone else a favor and make a dev box that can let them do it Alternatively, if you ship a gem that has native extensions try out re compiler dev box It will make you not rip your hair out anymore compiling native binaries But most importantly, I encourage you to release to collaborate and to iterate And thank you Thanks tj for the for the awesome talking security and the little spiel on open source Any questions for him again the mics are yeah, you got to come to the mic No, because then it won't be on the video. Hi everyone at home Hello, hi there. What does has secure password do in rails? Thank you So he asked what has secure password does in rails. That was a little fast in australian to understand I believe has support password is in active support It effectively just kind of wraps this up for you So you can drop into your model has secure password and it will provide you an authenticate method And a couple of validations around passwords and password confirmations If you look at the source for it, it's not long. It's you know a dozen line maybe two methods you can Yeah, you can implement it yourself or use it I usually don't use it because the validations are a little bit Not in line with what I want to do. But yeah, it uses b-crypt rails uses b-crypt great Hey, Aaron. Hi So while you were hacking your application, like you're attacking the application Would you say that you were harvesting passwords? Ladies and gentlemen, Aaron Patterson Yeah, yes indeed I was All right, if you do have something in yours too shy find me afterwards. It'll be great. Thank you