 Welcome to building effective attack detection in the cloud today presenting you've got Nick Jones and Alfie champion. I'm Nick I run our global cloud security team at fsecure when I'm not working on people's cloud security I do attack detection which is why I'm here and Alfie's sort of the reverse Alfie. Yeah absolutely so I lead up the global attack detection service for fcure consulting and likewise much like Nick in my spare time I have kind of the flip focus on building some some really wonderful things in cloud. So first up and we're going to be talking about the differences between on-premise and cloud attack detection and there's quite a few key differences there I think to focus in on. Secondly what's an attacker likely to do and what are they going to try in your environments what should you be looking for? And then lastly one of the things we've learned over our last year or two working on this is very much that there's an awful lot we can learn from the the DevOps tips and tricks things to do that really help level up your cloud attack detection capability long term. Absolutely so first thing to consider then is the difference when it comes to on-prem versus cloud is this already a solved problem and can we apply the same learnings from on-premise detection that we've done for a while now and apply those to cloud? Well we take a look in some ways yes yes there are some very very obvious similarities so you consider things like the the automation of attack the scalability of those we see this kind of never-ending cycle of exploits being released and then kind of proof of concepts being released and then some some widespread scanning and an exploitation taking place that's still a thing there are some kind of cloud variations of that so you see exposure of credentials in things like public S3 buckets or inadvertently kind of push to public code repositories like github and then there's also kind of a new flavor of attack which is that kind of as Mitre defines it as resource hijacking kind of the crypto mining type stuff obviously even more so there's kind of a similarity there with on-prem when it comes to ransomware being definitely involved because it was so there's a lot of that going on but a lot of that to some degree is covered by some of the the managed service kind of the IDS stuff that you see from cloud providers likes of guard duty that provide things they're relating to kind of scanning activity to kind of beaking out to known bad IPs that kind of thing what we're looking at and certainly is where we spent most of our time kind of pushes up that pyramid from the opportunistic exploitation so using the instance metadata service which has obviously been quite well known when it comes to clouds instance exploitation and then moving up into that targeted attack where it's like rather than just that indiscriminate or opportunistic what would it look like for a sophisticated attacker to target you specifically and the assets that you have within cloud so that's where we've been focusing really when we look at this from a detective standpoint we start looking at the telemetry that's like the raw ingredients for detection right on-prem there's three obvious sources there so the first is kind of end point telemetry so you know like end point detection and response EDR agents are obviously hugely valuable there for process creations and kind of registry changes etc etc we've also got network telemetry which hosts talks to each other things like kind of port scanning or kind of a domain enumeration would end up with some some noise coming from that and then we've also got applications telemetry as well so the way that people are interacting with potentially our internet-facing assets what telemetry does that provide is there any way that we can provide some kind of insight into malicious activity using that if we now look at cloud all of that is is relevant but we now have this this layer above the control plane telemetry that kind of is almost encompassing everything we've just spoken about you consider an environment for just a kind of classic on-prem application tech stack you could be that you have the end point telemetry still for the operating systems hosting that stuff it could be that you have the network telemetry of how those the various kind of front end and back end services interact with each other and then you've got the application telemetry of the thing that you've written in your now hosting but now we have this this surrounding layer which is the control plane where we're interacting with that API so all of it the major providers expose that API that you can use to do any number of things within the account that could be from spinning up and virtual machines to provisioning accounts with new permissions that kind of thing so there's a whole load of new telemetry that we have at our disposal as defenders and there's also these three sources within that that square there that are of varying importance depending on what your stack actually looks like what you're hosting in cloud if obviously you're not using any endpoint anymore there's no or equivalent there's no ec2 or virtual machines there that obviously that doesn't really come into play anymore and that's one of the key learnings for us is kind of where telemetry sources are applicable and where maybe they aren't anymore another major finding or kind of experience point for us I guess is around context and that comes down to when you're dealing with an environment that is purpose built the actions that take place in that environment are very much based on the purpose of everything in there every asset in there so we consider a given kind of iam user for instance in aws if that's upgraded it's privileges so in some for some reason it is now kind of an admin it can do kind of high level actions and cause a major impact how that user obtained those privileges could be of huge kind of significance and if that that change is done by a CICD continuous integration continuous delivery user it could be that's completely benign that's expected behavior in which case we're okay with that but the very same change made by an admin user that has no 2FA potentially with like a gip enriched location there you could say this is entirely old you know this is completely unexpected and in which case this is something that we should raise and we should pursue as an investigation so context is key and all of that kind of stuff considered and kind of acknowledged what we're facing now is is far more complicated than that in terms of things like the interconnectivity between these services you start dealing with tons of third party services where your crown jewels as it were are no longer in a single place in a data center that you can kind of point to they're obviously hosted in a plethora of other services that we see there so be that cloud providers in the traditional kind of aws as your gcp kind of sense but that could also be using like Office 365 or Slack for our communications etc and the trust boundary between those is obviously a major point for us to investigate and the visibility into those various third party services and the log sources they provide is going to be hugely important as we go forward so Nick I know you've got a point on this one yes so we've had some recent experience with one client we were working with for example where they had a primary cloud provider that they were hosting all their applications in but their source code was in github they were running Jenkins for their CICD they had uh what else G Suite for their and their mail and their documents and all these kinds of things and so we had quite a variety of different sort of infrastructure to service platform to service and software as a service and packages that were we're in play that we were ingesting logs from and that we were able to use to track attacker activity platform to platform and you know that worked out quite well but it did take a bit of effort to get all of that together yeah and so you know I think one of the the key things that we've learned is around actually like how you design your cloud detection stack and so first off centralizing everything is pretty important I wouldn't necessarily say we need to go for feeding all the data into one central location particularly but you need to make sure that it is easy for your analysts to take a look at one incident in one thing and then pivot from map data point into other data sources in other applications cloud providers and whatever makes sense and you know one of the things we've noticed is that the harder it is for analysts to jump between data sources from different systems and environments the less like they are to actively track down and investigate and some of these things to the degree that it needs to or it takes so long that they don't have the bandwidth to handle everything that's coming in it's very important that that is right I think and that we support the analysts in generating or in developing their use cases right and having that sort of access and first up though and data sources before we start talking about where we put this data and there's a couple of key data sources I think that we need to dive in on and first off you've got the troll playing audit logs and secondly the the service specific logs so by control plane audit logs what I mean there are cloud trial for AWS audit log in azure terms and kubernetes what these give you is visibility of all administrative actions taken within an environment any API call that's made gets logged there right so we can track creation modification deletion of resources we can track access in some cases and we've got essentially all the visibility we could want of everything that's happening at the troll playing layer coming out of that one data source so that really is critical if you only get one data source turned on in your environment that's the one to pick and then for service specific logs and essentially what these are are logs generated by your s3 buckets by your lambda functions by your kmsk access cases where the cloud native services the the past things you're using are generating their own logs and those those tend to be very high fidelity if you analyze them right they also tend to generate a lot of data and so typically we find clients benefit from turning these on on a case by case basis so you might have an s3 bucket that's full of really important information having access logs for that's probably worthwhile equally if you've got an s3 bucket set up to just serve our static content for a website for instance you probably don't need the logs for that so much right so it's about turning these on case by case working out what you need them for and then enabling them accordingly rather than just blanket drowning your analysts in data from them I think yeah absolutely so when we're kind of well equipped we know what it is we should be logging we're going to know what the data sources are the question then comes in terms of what is it that we're looking for then and then that comes down to the threat intelligence problem when it comes to cloud so if we just look here left and right we've got the the matter attack matrix there and on the left you can see the on-premise version of that and on the right you can see that the cloud equivalent there and I think it's obvious that the on-premise version is far more populated with with the techniques across the kill chain than the cloud one is and I think there's obvious reasons for that the first of course is that is that the on-prem version has been around too much longer but the way that this this matrix ultimately gets populated is through threat intelligence through reports findings there of malicious activity that have been found and that's what ultimately it makes its way into this and that's how we know kind of what to look for so it's a little bit of a chicken and egg in terms of we need to know what to look for so we rely on attack for that kind of visibility but then attack itself relies on us providing those reports to kind of fulfill it in terms of its potential so there's a lot there that you know we we need to know what to look for right so what is an attacker likely to do well for us the most obvious place for us to kind of rely on to find that information is in the environments that we exploit as part of our consultancy work and I think probably it's fair to say they can be distilled into one of these kind of four categories and the first is identity management or mismanagement and using our ability to kind of exploit those misconfigurations that allow us to elevate privileges and ultimately take control of cloud environments the second is pivoting from other environments so that could be starting off on-premise that could be through some other kind of internet-facing asset obtaining some level of access escalating to the point where we can put ourselves in the relevant groups potentially if there's some kind of single sign-on to ultimately arrive at the cloud environment with administrative privilege and the third is is source code management and continuous delivery which we'll cover off in a second and then the last is application vulnerabilities so we're going to take a little a little look into that that source code management one now so when we talk about source code management and continuous delivery here and really what we're talking about is an attack at targeting either the code repositories where your application code or your infrastructure as code is stored or the delivery pipelines that take what's in those repositories and either build the relevant cloud resources or deploy the right application containers in the right places and in many respects this is sort of tier zero for your cloud security everyone always thinks about hardening down what's in the cloud and we've seen quite a few cases where people don't apply the same diligence to the supporting systems especially from a detection standpoint and you know being able to track who's doing what to your pipelines into your source code actually it's pretty important now that these are core components of you the security of your entire platform if an attacker compromises either the pipelines or the source code delivery they can take control of basically everything right you know we can deploy pretty much anything we want into the cloud from there and especially when we often find that the roles that those pipelines are running as the access they have is very privileged perhaps often more so than it needs to be so I think that's quite a key thing to factor in really and it's one that we exploit regularly ourselves on consultancy engagements and I think it's a matter of time before we start hearing about attackers doing that you know if we're not already so we've covered off then the telemetry sources so you know what are we going to use for our detections and we're taking a look at some of the attacks that we've seen over the kind of kind of four key areas there so how do we start then how do we action this how do we build the effective attack detection and I think the methodology that we've employed over the past year is kind of summed up with this model here so the first thing that we're going to do is threat model the environment understand which could be targeted what would the attack pass look like hypothetically going from initial compromise right through to achieving some objective and what that objective is is again defined by the specifics of your environment if you're hosting some sensitive data there then likely is that that's going to be the objective for those attack path that's going to be the end goal so once you define what those attack pass can look like we need to prioritize we'd say which of these have the highest impacts which of them potentially is going to get the the attacker from A to B in the most expedient way if there's an obvious path an obvious exposure then that's the kind of path that we need to be paying most attention to and then we need to understand the like atomic attack or actions that comprise those those end-to-end attacks so the TTPs the tactics the techniques the procedures what is an attacker going to do step by step to achieve this objective and then the most obvious step is verifying that the odds as defenders have the telemetry we need to spot those things so if we have a given attack type against a given service we can take a look at that and say do we have the service specific logs do we have the CloudTrail API commands that relate to that and in which case we're probably in a good place to start step five which is actually executing those attack or actions and understand end-to-end if we have the detection cases that work or even as like a step zero understanding what they might look like so we can say okay this is the specific API events that we can look for we can start building detections and fine-tuning those to ultimately detect them should we replay those attacks at a later date so I think one of the key messages that we've learned working with a combination of this sort of the operational teams in some of these these environments but also from the perspective of detection one of the most powerful things you can do as a detection team in general not just in the cloud but especially in the cloud space and is moved towards detection as code basically and define your detections in a machine readable format something that you can easily version control and that you can update over time it provides an easy mean to share knowledge within the team if it's a common format that everyone can read and then rather than just talking about a particular attack or activity at a high level and how it works and explain these things and to your more junior analysts they can go in and read the detections and really understand you know exactly what this this particular technique's going to do under the hood and we've seen a few key projects in the open source space around that already and not least Sigma which is an a seam agnostic set of rules that you can compile down into Splunk or Q as you're running and we also seen some interesting efforts with Jupiter notebooks to build out a set of playbooks that you can run to hunt for specific attack or activity so we took all of this and over the course of the last year or so we've been working on moving a lot of this stuff into the cloud at which point we ended up with Leonidas which is an open source tool we've released and so essentially the idea with Leonidas is that you have your security team your analysts the purple team as red teamers who are executing these test cases as well and they define themselves a new TTP that's committed into a into a repository and we then have a CICD pipeline that actually builds out a serverless function based on those definitions we then expose that serverless function to to the purple team and they can use that to execute attacks so you make a single web request and to execute a particular attacker TTP and that's executed against whatever target resources you've got set up and we then feed those logs straight into the the seam and you can then have your analysts hunt for the activity that we just executed and via this API that we've exposed so I think one of the most critical things to is that it's a fairly easy format to work with the definitions are quite short and sweet and the underlying framework takes care of basically everything about how the the API itself functions including how different identities within the cloud are handled you know for AWS you can hand it roles to assume you can hand it access keys and secret access key pairs and all the analysts has to do is write a single final to a python to say essentially call this bowtoe three function for AWS or the Azure APIs for Azure and so we then generate that into an API that runs inside that service function these definitions also then contain detection cases so we write the Sigma rules into the same place that we write the detection so that you've got a single file that defines exactly how to execute the attack exactly how to detect it and that lives as a single source of truth for this attacker TCP that we we have here and we can take this definition and we can compile it down to to work with the scene platform that we're we're using and we can also generate documentation of it so one of the really powerful things we found with this is it's an opportunity to embed human context and notes about your organization specific environments within your organization to say things like this probably is usually malicious but in the case of these two particular projects their AWS accounts do this all the time for business reasons X Y and Z so therefore factor that in when you're triaging these events and that's that's proven pretty useful too so let's let's demo it so this is Leonidas this is the web API that we expose that allows you to execute those test cases that I've just been talking about and this is built by a AWS native CICD pipeline for Azure and GCP we're building out using their native tooling as well right now this is hosted in lambda function with an API gateway in front of it and you can see here we've got a range of different test cases across the mitre kill chain so let's dig into to one of them as an example perhaps we have some I am access and we have a user that we've created as a backdoor and so let's add a policy to to that user and you can see here not only can we pass in the user and the policy that we're we're trying to target here but also actually we can pass in a variety of identity based data to allow Leonidas to execute the test case as something other than itself it comes with an I am role and the permissions for that I am role are automatically generated by the build scripts but it's beneficial to be able to trigger test cases as a range of different identities so in the case of AWS we can also pass in an arm for a role that we wish to assume or we can pass in an access key secret access key key pair we can also specify obviously the the region that we're trying to target now this interface is quite nice for exploring the API but for security reasons in order to trigger any of these test cases you need to supply an API parameter API key parameter sorry alongside it in order to to get that executing properly so you can either use Leo which is a command line tool that runs encapsulates these APIs you run it locally and it speaks to the Leonidas API but actually we've had quite a lot of success working with Jupiter so Jupiter for those who aren't familiar is essentially a web-based interface to a Python interpreter that allows you to embed code in and amongst documents document content and generate graphs and tables and all of these kinds of things it's a pretty popular data science tool but we've also found it very effective in the detection of purple teaming space so here what we're doing is you know we've got this Leonidas API endpoint it's the same one we were looking at previously and so we're going to load in the set of test cases from this case config and the Leonidas framework will also generate this this YAML file for you that defines all the test cases but you can see here we've got all the test cases then loaded in into this Jupiter notebook so for instance we were looking at the adding policy to a user one before you can see that here number 35 and so we'll make use of that in a bit but then to start actually triggering some test cases you can see here we're calling run cases a simple wrapper function around that that HTTP API and we're going to call get call identity which you run to tell you who you are what the identity of the the current user or entity is and so here you can see Leonidas has assumed a role it's this Leonidas dev app role and so that's the the default IAM role that the lambda function has assigned to it when it executes to allow it to interact with the AWS APIs themselves so let's assume now that we are an attacker we have a vulnerability in this lambda function that has allowed us to gain some sort of code execution or command injection or something and so we're now executing these attack actions against the underlying AWS kit so first off let's enumerate these guard duty detectives that might be listening to us and so in this case we can see guard duty is running we've got detector ID here so if we try and do anything too obvious guard duty might spot us but also let's take a look at whether they've got any cloud trails in the current region so here we can see we've actually got two cloud trails listening to us so if the defenders are paying attention we probably will get spotted but we're going to carry on anyway and what we're going to try and do now is add an IAM user the Leonidas function IAM role comes with a certain set permissions defined in the test cases to allow it to always execute these these particular test cases that it's built with so we know this is going to succeed obviously if you've got permissions boundaries or other things configured it might not but for the sake of this exercise you know we're going to create a new user so we've got this new user created we passed in here into into that API call we're now going to add an API key to that user to make sure that we can interact with AWS as that user for good measure we're also going to add on the administrator access managed policy obviously pretty bad practice but it makes the rest of the demo work quite nicely so we've now done that and what we can now start doing is running other test cases using those access keys that we generated previously that are in our Jupyter notebook so I'm going to rerun that get caller identity test case but you can see now we're actually passing in this this set of credentials that we've defined here and so if we run that now you'll see that when we come back from the caller identity call we're now running as a user account in fact the user account we just created and so we've changed the entity that we're executing these test cases as and this allows us to simulate a variety of different sort of attacker start points assume different types of breaches these kinds of things so then what we can do from here we can list the secrets in secrets manager you know we're now an attacker who's poking around trying to find secret data or steal further access keys credentials to get further into the environment and it turns out we've got a secret it's called Leonidas created secrets so what we're going to do is grab the contents of that with another Leonidas test case and you can see here we've got some some secret data coming back from the AWS APIs there so what we've done is we've executed an entire kill chain we've simulated an application vulnerability inside the Lambda function as a start point we've done some enumeration to see who we are what defences are likely to be watching for us we've created a persistence mechanism by creating an IAM user adding an API key to it adding some IAM permissions to that user that we just created and then we've used that new user to enumerate the contents of secrets manager and dump some secrets out and we've done all of that in a programmatic fashion in a way that doesn't rely on the analysts having an underlying understanding of what these test cases are how the AWS APIs work we're simply triggering the test cases and the individual TTPs one by one as part of this larger kill chain now one of the benefits to Leonidas is not only do we have the ability to trigger these test cases automatically it also generates its own logs so you can compare and contrast them with what you're seeing in your scene and what alerts you've got firing so let's jump into the latest set of logs here and so what we'll find is yeah looking at the top you can see we've got an sts get call identity call followed by list guard duty detectors enumerate cloud trials for a given region these are all the test cases and you can see here that the arguments we're passing in are also represented so we can track exactly how the test cases were executed what parameters were passed in and if we go down to where we then run the second caller identity you can see also that the access key ID that we passed in is recorded and we have that on record so that we can understand what entity these these test cases will run as we can feed all this data into a centralized logging platform or leave it in CloudWatch in fact if you prefer but we can diff this then against what we're seeing in the scene and it allows us to get a better understanding of whether our test cases worked or not and we have those logs there for posterity it doesn't require the analysts to take detailed notes as they go because everything they do is recorded automatically for you okay so one of the other really powerful things then is off the back of that we've got Leonidas running we can execute these test cases we can look at the the logs in our scene but actually Leonidas itself also produces its own set of logs as to what attack or actions were executed you know what parameters what context what resources they targeted and and we can actually feed all that somewhere as well into your scene or to another platform and we can then essentially diff the use cases that were triggered in Leonidas against the events in the scene and we can see whether the detections that we were expecting to trigger you know either as alerts or as tags or whatever else whether those happen in the correct way and that means that over time actually we can regression test improvements to our detective capability right we can add additional test cases additional use cases even we can then verify that doing so hasn't broken any others we can update test cases to to match new changes by the cloud providers catch additional things we can verify that we've not backslid in the process and actually that means then we can iterate much faster because we're confident in our ability to catch mistakes as we go yeah absolutely so conclusions there well it might sound corny but i guess the ultimate thing here is a detection is a journey right so your cloud environments are changing all the time um that could be because you're implementing new features you're changing the way you do that you're changing the services that you utilize so ultimately your detection has to change with it um we can also think about context think about how important that is and not only is that a case of you know it might seem as though that's an issue for you that's that's to your disadvantage you can actually use that to make more effective detections if you know the behaviors of your environment inside out any deviation from that could build a high fidelity alert so you can use that to your advantage um and then lastly as we kind of discussed here that threat intelligence problem not knowing what bad looks like what we should hunt for the ability to codify and ultimately share the use cases that you build is going to aid knowledge sharing and ultimately help all of us to build more effective cloud detection really um and then we've talked about that threat modeling process going into and identifying attack paths identifying potential needs for new telemetry or to kind of decommission something we're not using anymore that constant cycle is something we can use to make sure that we're always kind of the most equipped we can be to detect malicious activity in the cloud and then lastly um you know please do go have a go with leonidus let me know what you think of it um it's available up on github fsecurelab slash leonidus um we currently support aws zero and gcpr actually pretty close to being done we're we're getting pretty far with that now um at the moment we've got 45 test cases for aws you know you saw some of the demo earlier there's quite a few others too um go have a play with it see what you think um let me know uh in and equally those test cases are up there because they've what we've needed so far but actually they're pretty easy to write so please do contribute your own pull requests more than welcome awesome yeah and that's it from us thank you