 From 11 p.m. after I got home from the bar, I got a message from my assistant man. I let him know that we had just gotten rooted, and I of course severely responded where, and I heard the even worst news that it was that we got rooted everywhere. Literally, service I didn't even know we had were also rooted. And my immediate response was sheer fright and terror like this guy, this guy, this guy, and her. My next thought was, what do I do now? How do we fix this? And the reason I'm giving you this talk now is because I hope I can give you guys some hard-earned advice on how to deal with this sort of security breach, what to do immediately afterwards, and how to recover over the next days and week. And then maybe I'll come back next year or first of all around and tell you how it all ended up going for us. The first thing that most people will tell you when you get hacked is to not panic. It's very important to not panic, but that's stupid because you're already panicking, you can't control that. So what you really have to do is just try to stop panicking. Realize that your primal fear centers in your brain are already activated, they're shooting adrenaline into your system, and you're already panicking, and your brain really wants to make this bad thing go away, so it's going to make stupid decisions unless you get it under control. So you have to regain control first of yourself and then of the system that it works with. And usually, most of the time, the vast majority of the time, that doesn't mean login, take the guy out, business as usual. It means shut the system down, take it offline, shut it down, take it off the Internet, try to contain the failure as much as possible. It's very, very unlikely that you know the full extent of the exploit and that you know what parts of your system are and are not comfortable. So the best and safest course of action is to just assume that everything is compromised and shut it down. It's going to suck. It's going to be difficult, especially if we're talking about a live site, in our case, you know, over a thousand PHP applications that we have to take down. So that sucks. It's going to hurt. You're not going to like it. You have to do it. You have to shut everything down, but keep all of your data right. Keep all of your log files. Keep all of your access records. Keep anything that you have modified around. Don't destroy disks. Don't, you know, in a fit of hoping to cover everything up, try to get rid of evidence. Don't even want it later. You're going to want to understand how this acts, what happened, how you can prevent it in the future and possibly you have any legal recourse going to need evidence to document it for your legal counsel and what not as well. So let me talk to you a bit about how this happened to us. In order to do that, I have to give the original talk that I was going to give, which is to explain our infrastructure. So the way PHP Fog's infrastructure works is we have a multi-tier system for every PHP application that we use that we host where we start out with a cache. We use varnish for this. We then load balance using NGINX to a number of app servers and then each of those app servers reads off of a DD master in clone and in the case that all of your app servers are down, which rarely, rarely happens, but just in case we have a shared Apache failover environment and that is actually where the trouble starts. So it started in the shared environment and then this quickly happened. And the reason it started in the shared environment is because we did a very stupid thing. We didn't secure it properly and at the end of the talk, all of you people who have never done stupid things can tell me how dumb I was and how dumb they were to not have secured this properly. But one thing you have to understand is when you're running a startup and you're running very lean, you don't have very large teams sometimes. When you make trade-offs between developing new features and security, you make the wrong trade-off. You make a really stupid trade-off. And so we made a very stupid trade-off when we should have been focusing on security or instead focusing on new features. So there's one takeaway I can give you guys. If you have anything connected to the internet, make sure it's secure first or at least make sure it's secure as you can make it. Don't expect that you're not popular yet so people won't find you. By the way, do you like my Caucasian-themed cheap mic here? It's skin tone, but really just my skin tone. It's not like Brian Lytle's skin tone. I don't know if they make other models. So we got completely trolled all over our beautiful infrastructure. So that, by the way, was the whole talk that I was going to give you so you can see how bored it would have been. But this is way more fun. So what actually happened was a script that was run, that was supposed to be run in a user-dedicated instance ended up being run in the shared hatching environment, which was unsecure, giving them rude access to the shared hatching environment. And not only that, and here is where the story starts to get really interesting. So shared access, a shared patchy server had basically a patchy running as rude without any real access control, without any real shrewd or otherwise containment for the individual applications. We also had shared SSH keys, private keys for all those boxes on this box. Basically, if you can think of any stupid thing we could have done to make things easy for our hackers, we had done it. And so they got immediately off of the shared hosting box, onto the cache machine, onto the load balancing machine, onto our database servers, even onto the php.com.rails application itself, which is where they found the Twitter password. Let me let that sink in for a second. And then let me also point out that the Twitter password was also the password to our Linedo account. And our DNS account. And our Google Apps account. I was unaware of this. I am now aware that they do not share the same passwords. So our employees forgot hacked. Our Twitter got hacked. Our blog got hacked. Our DNS got hacked and pointed php.com.com briefly at a site called phpfoxsucks.com explaining the exploit and making us look pretty stupid in the process. So we managed to regain control of our servers, which we shut down. We shut down a little over a thousand application servers for our customers. While we didn't have any direct evidence that they were compromised, we just didn't want to take chances. There's no point in having someone else's private data that also compromised for our stupidity. So we took all of our users' servers down. We took our servers down and started figuring out how to bring them back up. How to tell people that this had happened and what we were going to do about it. Because the most important thing you have to do after a security breach like this is also the most painful thing. You have to tell people. You have to tell your customers that you fucked up, that their data may be insecure and that you're going to do things to fix it, but it's not going to happen overnight. It might not even happen in the next few days, that their services are going to be down. And this is going to hurt. And it still hurts. And I did not like having to send those emails out. But what you have to remember right now in this moment is that the bad thing already happened. The terrible thing has already happened in the past. You can't control that anymore. All you can control now is how you respond to the situation. So what you do is you disclose honestly any security vulnerabilities that may affect your customers and what they can do to fix it. Changing their passwords, removing data, making sure they don't share passwords like we stupidly did with other services that can be compromised. Now in our case, the actual exposure for our customers was minimized. We don't have any credit card information in our servers because we're PCI compliant. So there's no possibility of stolen credit card information. We hash, obviously, we don't store plain text passwords in the database. But one thing, I think, Jen did a great talk about Rails app security. But one thing I think a lot of people don't realize is that the SHA 512 or SHA 2 encryption that we use, the cryptographic hashing function that we use to hash our passwords into the database, even if we use a unique salt and all of the industry standard best practices is brute force. I can, for $300 an hour, buy enough compute hardware on Amazon using cluster GPUs to brute force 500 billion passwords a second for $300 an hour. SHA 512 is not intended as a cryptographically secure means to prevent brute force attacks on your passwords' database. It's just not going to work. If your database is compromised, you have to assume that all those passwords either are or will be compromised. So you have to send out a few thousand, a few hundred thousand, a few million password reset emails. All of your users have to reset their passwords. If they use those passwords on other sites because a lot of people do stupid things like share passwords on their Twitter and their Lino and their DNS and their Google Apps. So people do that in the real world, obviously. So let them know if they did that, they need to go change those passwords there too and stop sharing your password. Use something like one password or last pass that you have strong, unique passwords for all of your services. It's stupid not to do so and sometimes you can be criminally negligent as a company if you fail to take procedures into account and do your due diligence. So be careful. Don't assume that the SHA hashing function you use is secure. Assume that if your attacker has got your database that anything in there is deceptively or it is potentially compromised. Reset your user's passwords. Make sure that if you have any billing IDs any keys to any services that you immediately take all of those down, regenerate them. Don't for a second assume that you know the extent of the compromise. I'm sorry about the amount of systems that were compromised. You don't know the extent. There could be rootkits on every single one of your drives. All of your services, all of your passwords could be compromised. Just assume everything is compromised in startup. It's the only way to really really be sure that you're protecting your user's data if you're protecting your own data as well. So after you regain control of your system down after you've disclosed to your users the nature of the exploits, what damage you may have caused to them you have to start recovering. So the worst possible thing you can do right now is turn on those systems that were just hacked. If you don't know why that is you don't understand the nature of the hack or the nature of security. If you don't find any possible rootkit or other trojan that was left on there you really don't know anything. So the only the only way you can recover properly is start from scratch. Rebuild your servers if you're running on a cloud, just fire up new servers. It's trivial. If you're running on real hardware reinstall the operating system. Don't try to recover anything from the systems that were affected. It's just it's going to end in yet another disaster and you only really get one shot at this. We were talking earlier we had a great talk about configuration management for systems. The only way we were able to rebuild about 1200 servers over the course of two days is by using configuration management. So we isolated the attack vectors. We figured out how they got into our system. We realized in hindsight that they should never have gotten in that way because it was an obvious and trivial hack. But now we have to move forward and we have to rebuild a more secure system. So the weird thing about this it's not as if we collectively as an organization didn't know anything about security. It's just that we had really bad fire. So when we rebuilt our infrastructure we knew how to properly secure our SSH by using a bastion host and not allowing SSH directly into boxes within our infrastructure. We knew how to do that. We just didn't do it because we didn't really have time because we were supposed to be launching and I guess in the way of our launch schedule so why do it now? We'll just do it later. We just had really really stupid shitty priorities. But what this end up being was an opportunity for us to rebuild our system from scratch with security as our first consideration and to do it properly. And if you don't know how to do it properly find someone who does find someone who's really smart and pay the money so they'll tell you what to do. Sometimes you just have to acknowledge that you don't have the competency you need to deal with a situation. I think it's a major failing of very smart people that they assume the competency in one area that means or implies they automatically have the competency in some other area that they really don't. If you're not a security person hire some security people to help you fix the problems and to create a security procedure for you so you can be more and more secure as you move forward. You have to start the work of improving the security for the systems that were compromised, fix all of the holes that you know about and then find as many more as you can and fix those too. Finally, once all that's done do a audit of the system one more time, start reconnecting. Reconnect your servers, bring your services back online. If any users were affected always bring your users back online first because they don't really care so much that my blog is down, that the company blog is down if their server is still down. They really don't get your users back online first then worry about your own infrastructure. And finally the only way to save some face and regain some measure of trust from your users is to begin with a very modest admission of guilt. I actually think that our CEO Lucas on our blog posted a very strong post detailing exactly what happened and the steps we were taking to work towards improving our security and you can't skip this step. If you look at there have been a lot of security breaches recently RSA who provides secure logging for Bank of America and a thousand other banks have just been breached. PHP itself has had a recent security breach. There have been a lot of security breaches and historically the only way companies that have security breaches regain trust is by being honest and open and transparent about what happened. Detailing the exact nature of the exploits detailing the exact nature of the fixes you're putting in place, the ways you're hardening the system, the new policies and procedures that you hope will prevent this sort of group cause in the future and the group cause here isn't a 16 year old kid in Australia the group cause here is our terrible priorities are poorly thought out trade offs in terms of security versus other considerations in terms of business goals so the root cause isn't the hacker. It's not actually the hacker's fault that we got hacked, it's our fault that we got hacked so the first thing you have to do is acknowledge that and be honest and as a company founded on open source principles that is built on open source technology we don't believe in security through obscurity we think we can tell you how we're secure and still be secure so that's what we do. So I would like to close with a few quotes that I think sum up my feelings about security. They're both by Bruce Schneier who is a very well known security expert. He's written a couple of great books on cryptographic security that if you care at all about the security of systems and infrastructure in general and the thought processes behind making things secure which is very strange modality for someone who's used to being an engineer and a developer you have to think like a criminal you have to think like a hacker and that's just not that's a normal way of thinking for most developers. So I would highly recommend thinking of this books I wish I had time to cling to them but I'll put up on Twitter some links to those on Amazon. The first quote is that security is not a product it's a process that means that you can't hire someone to come in and take your money and give you security. It doesn't exist. What you can do is be consistently improving the factors in your system towards more security and away from vulnerabilities. It's a constant process you're never going to stop and say okay I'm done now everything is secure and the second quote and this is the one that really bites most people is that security is a chain and it's only as secure as the weakness link. Most hacks are not one large gaping hole in the system. There are a series of smaller vulnerabilities that if you exploit them correctly can lead to a much larger exploit. So for instance there was a exploit on the phone where you could read user secured quote encrypted data and the problem was not that Blackberry didn't encrypt it didn't use an industry standard encryption algorithm the problem was that they used two layer encryption algorithm they just skimped on the first layer that they did only one pass of this iterated algorithm because they thought the second layer would make up for it and it didn't and they were hacked and if you look at what I understand Blackberry's one iteration compares to IOS 3's 3000 in IOS 4's 10,000 iterations so don't assume that just because you're using something that has a certificate saying it's secure that you're using it properly don't just assume that because you put a lot of money and effort into securing most of the system that it's secure because any small breach can't undermine the rest of the security of the entire system so I hope that you guys have some questions for me I'd love to talk about maybe afterwards after at the end of the day about more detail about the nature of the exploit and what we're doing in terms of infrastructure how we're handling it so I want to put the new precaution that we're putting in place I hope I scare the shit out of you and I hope if anyone here learns from my experience and avoids avoids this personal catastrophe for themselves then I think I've done a good job so thank you very much I'm happy to take questions I've got a few minutes can you start now with that for us I don't really have a question but I just wanted to be clear actually if if you're using for instance the default the default encryption or default hashing algorithm is shop but you can also switch out Bcrypt with just a one line coaching and Bcrypt provides far better brute force protection because it it allow it basically increases exponentially the compute time required to solve the password based on the weighting factor so whereas shaw ones encryption doesn't take a lot of time it's not designed to take a lot of time it's not designed to be CPU intensive you can encrypt a password in often a millisecond to five milliseconds with Bcrypt you can force it to take as long as you want you can force it to take three seconds meaning that brute force a brute force approach to gaining a password that's Bcrypt and encrypted could take literally until the heat death of the universe to finish so I highly recommend if you're using shaw switch to decrypt absolutely any other questions in the back here I read your blog on the web and I was really impressed by your interactions with the hackers it sounded like you had a pretty amicable conversation with them how common is that and like any suggestions when you are talking to a hacker who has the keys to your kingdom the last thing you want to do is make them mad not to get them back so I don't know how common it is our our only real consideration at that point was how can we make him not do things we don't want him to do honestly what he did was criminal but it could have been a lot worse he went in he stole some stuff and he posted it he could have removed all of our files all of our data he could have done a lot of worse things that he didn't do so we were lucky in that regard and there's no point in antagonizing someone because we don't have any power in terms of a negotiation relationship we just want you to please be nice to us and stop hurting us yes as far as the users data and applications go did you have to nuke those as well no we to the best of our knowledge our users data have not been compromised which is a good thing to compartmentalize at least that much could the hacker have gained access to that or is this some sort of security measure against that from the beginning the really stupid thing we did where we had shared public key access across machines within our info we didn't do in that case there was short of basically there was no way for the exploit to contaminate those systems there was no path for our systems to their systems the only path was through an app RPC protocol that they didn't have access to and through SSH keys on the machines themselves to give repositories that they didn't have access to how did the attack change things around your offices well it means that we have one now one now and I think if you don't you're as stupid as we were that you should start now with a security policy that includes obvious things like not sharing the passwords like using secure passwords you're a strong you know strong randomly generated password start now with a security policy that includes the things you'd have to be stupid now to do anyway because some people will be stupid and do it and then build from there and then continuously maintain it yeah go ahead just this quick follow up and this is like you've said it's a question before how many of those things in the past you put back and say we knew this was the wrong way to do it but maybe laziness was too tribute and you know we can all look and see where we're going to place in our own right so that's why I'm asking why you think you've seen the same thing in the previous job I'm sorry I can so the question was looking back at previous job experience that I've seen any pattern of laziness knowing that knowing that you're doing a stupid thing or there's something you need to do security-wise and then just putting it off and being lazy about implementing it I think many organizations are not as secure as they think they are and many organizations like ours are lazy when it comes to security that they don't think is actually presenting a real attack factor we didn't think that our shared passwords were an attack factor and now we know how wrong we're I can see in previous work experience there are other times where we've used shared passwords and luckily in those situations it didn't come back to us or hadn't it may eventually it doesn't mean it's right people should be doing the correct thing and I think security is really the area where if you know you should do a thing and you don't do it you've made a grave and often failed there it's one of the few places in engineering in what we do where you really just can't cut corners you can't wait until tomorrow this entire hack happened because of things that I was going to deploy on Friday didn't get deployed so don't wait if you know the security vulnerabilities they are now your top priority and if you can't make them your top priority figuring out why that is is now your top priority Jim do you have a question? I was just going to point out you made some really good points about your server and securing that we had that case we go to conferences a lot and we had a lot of top staff not just your servers it's your laptops and everywhere you go you have to think about containment that's the first key when it comes to security is if and when you do get breached containing the results of that breach as much as possible for instance if we hadn't have had shared private keys to our other infrared if we hadn't had a Twitter password lying around things would be much different we would have had a rooted server without access to our blog, without access to our Twitter without access to our database the ramifications would have been much less so compartmentalization for instance in using different passwords on different sites so that if one gets trapped your entire network isn't also exploited compartmentalization is the first key word when it comes to security you have to expect that you may get exploited and when you do you want to minimize the damage how's it affected the business so far in terms of customers well it's interesting our community has actually been surprisingly supportive of all this was going on they're obviously, I'm sorry the question was how has this affected the business our community has been surprisingly supportive there are obviously a lot of people who have trust issues now with our services that's expected I would in their situation so now we just we work slowly through transparency to build that trust back but for I think we're actually very fortunate that many of our users have been very supportive of understanding of what we're going through because I think a lot of them are also developers and they kind of understand what it's like to be in a situation like this so we're very lucky I think and is that everything