 right hello everybody this is a bit of a monster of a talk because it has a lot of live demos in so I will ask you to bear with me as I flip around all time between whoops windows there we go okie-dokie so hello I am Andy I like to break stuff and put it back together once I know how it's been working I work a control plane and I'm very proud to say that I'm a trainer for various organizations I might disappear eventually watch out the sand sec 584 which is a cloud native security course five days coming out next year that I'm pleased to be working with on that out of battery marvelous I'm a founder of control plane which is continuous security engineering practices with focus on cloud native and regulated industries and I want to talk about container breakouts Cuban asses break-ins cluster drive buys API exploits poning everything that we can find fingers crossed and how we fix the security skills gap in the cloud native ecosystem spoiler alert we do it by training the next generation of cloud native security engineers and architects with production like systems that they can hack they can play CTFs until they can pop shells and they really understand what they're doing they test remediate and harden so we can hack back in we do this with models that we build the system these are threat models attack trees this is all open sourced under the financial services user group in CNCF and attack simulators which are production like infrastructure in safe testing environments more on this later so what are we going to do I've got some local VMs I've got GKE got some droplets got some Docker versions and some kernel versions and I will demo a hacking simulator right at the end if 30 minutes is enough so I am drinking through this premature perhaps ladies and gentlemen okay so what are we doing dock on Cuban asses do we love Cuban asses well broadly does it love us absolutely not why is it so difficult it is a distributed system we're all trained on monoliths and all of a sudden we have to deal with all sorts of different network-based problems what is the problem with Cuban asses well it's a layer security onion but if I can get onto a single node if I can root one of your workers can probably root your whole cluster so containers containers are awesome we have to thank the hosts of this dev room for all the work that they put in not only on Lexi but also upstream into the Linux kernel but they emerged from the primordial kernel soup a child of evolution rather than intelligent design that has morphed refined and been coerced into something usable but don't forget containers do not exist they are merely the resultant bundling and isolation left once we've set up on namespaces and C groups and Linux security modules and started our precious little process inside finally that being said we love oh that is a shame because that is an amazing gift Cuban asses so this talk will go fast there's lots of demos I'll do my best a prayer to the demo gods and an offering let's go we all know this right we shouldn't run privilege which would not run your ID zero and the dock socket should not be mounted inside a container why is that I hear you ask let's see okay where are we we are here hopefully nope we are here okay so we are spinning up a container with a double hyper privilege flag what happens when we run privileged do you see anything in that list of mounted devices that we shouldn't do yes that is the host device dev video zero mounted with Etsy hosts on top of it why is that a bad thing because you can do as route inside the container and importantly there is no boundary between route inside and outside the container unless we have user namespaces enabled what can we do we can mount dev vda1 and we can put that inside our container at any old mount point and then what do we have there well that not that alias we have the hosts route file system that's not a good look can it get any worse well yes indeed it can let's leave evidence that Andy indeed was at one stage here all right and then we're back on the host what are we going to do now have a look at the route and Andy was there who is that owned by and when did that turn up 1416 looks relatively recent and that's not how you spell stats yeah so that's a bad day do not run privilege having your ID zero inside container is not a vulnerability in and of itself but it leads to a much easier privilege escalation if I can get inside your container okay and those were misconfigurations right that's that's something that we can actually defend in the pipeline so it's all very well talking about the problem let's talk about the fix as well in the containers lifecycle unit tests are kind of running isolation and I'm taking the test pyramid and I'm super imposing my own view over the top in this case unit tests are static or dynamic analysis on the container itself integration tests are probably more dynamic analysis inside container maybe akin to actually testing the public API's of the application inside the container why is that important because the configuration changes with each environment that we promote it through the container stays the same but it's behavior is dependent upon environment variables or config that we mount in so we have to test an end-to-end test for that that's the full system essentially so what we do static analysis the doc files we can live them and we can determine whether or not we have done certain things wrong something that we can't do or that is in fact very difficult to do is identify which user was running because that is runtime construct and we can switch users in the entry point for example so we probably oh also you can do a few minutes is with cube sim a sec brother which is that analysis for Q net is resources this will tell you do not mount XYZ and it will give you a risk score to try and quantify the danger of running a particular configuration there are lots of things that you can do in a pod yaml to break it right so what about dynamic testing well we can use in spec it's heavyweight it's Ruby do you want to install Ruby inside your container hell no what about service back well it's still quite nice but again the same problem it's Ruby so what do we use gots go service back it is simple declarative highly parallelized and written in Golang what does that say cloud native hooray so this is what gots looks like you have a simple yaml based format it runs everything by default of 50 threads or go channels and in this case command is the type of test and key here is a version so we're just making sure that our base container we have a contract with it that it's shipping something to us right obviously you can use gots for anything and everything and I recommend it to the house what next let's break out of some containers who remembers dirty cow a copy on right vulnerability in the kernel which had been there since potentially 2007 version 2 6 22 and it's a copy on right race condition whereby an unprivileged user is able to write into root-owned memory executed and pop shell exploitation of this bug does not need leave any trace of what happened it was detected by some dude running a rolling packet catcher on his honeypot we then pulled out binary and recompiled it what a guy okay why is it bad it houses your system a container is in a default configuration at the time did not contain this bug containers rely on the kernel for protection system calls from inside a container do not hit a local kernel they are proxied onto the host this is why the kernel isolation model is more difficult and nuanced and a VM which has an entire full biosimulated version of the kernel running inside it instead of containers we get the speed increase we start our processes very quickly but we pay this penalty of proxying system calls if the container is reliant upon the kernel and the kernel lets the containers guard down we're having a bad day there are ways around this let's have a look non deterministic life demo whoo okay so 30 cow so this may suffer slightly on this green one side they weren't already yes okay so what are we going to do here start team up sweet also I managed to remove the you key from my keyboard so you'll notice that there was chocolate beneath it in a late-night hacking session is my excuse okay and this okay so what are we doing we've got Docker at the top we have cystic twice we have at the bottom temp x which is a lock file that the exploit uses to determine whether it's been run because it's just spraying system calls highly parallelized it's a copper right rate condition vulnerability so we do as much as possible in order to break it sorry it's not still okay okay do we want to proceed yes please the number of exploits suggesting that actually the kernel version we're using is vulnerable but it is do we want to run with a farmer in this case no because we have a specific configuration that will fix it and off we go so we're just firing dead beef is the exploit name wonderful choice of hex value and we're trying to patch the V DSO virtual dynamic shared object which is kind of a proxy in user space to stop us having to hit the kernel all the time and then we're trying to p-trace it and at the point that we gain control of the process then we inject our own code in that will then you see we've got a listener running here on log to 1234 on the host all the host adapters and then once this thing kicks in it will sit so that host adapter is where the endpoint is in the VDSO and once we pop that we will connect from inside the container to that and then we have root control of the host as I said this is non-deterministic if it doesn't finish by the time I finish this sentence I will come back to it because it will just have to run in the background okay you will have to take my word for that and we'll come and have a look in a moment okay so what is going on there we're reliant upon the p-trace system called patch to VDSO this what just happened well theoretically we bypass container security but yeah there we go hooray so this claims oh no this is one still okay it is non-deterministic let's keep on going alright so theoretically we bypassed container security mechanisms if we rerun that same exploit with a slightly modified app armor profile different to the default one the docker ships the block p-trace calls from within the namespace we effectively blocked this exploit but the actual solution is to patch our kernels as always run latest versions of things what can we do around this there's various things here but ultimately this is sandboxing fix there we go alright okay so theoretically this has worked you can see it's got patch two of two because of the number of system calls we've made the second window will keep on spooling adding for an item but here we should be sweet so you can see here that we've actually that is on the host at the bottom now because we were inside a container at this point we shouldn't be able to see anything on the host so what should we look for yes let's see if run C is there we are going to grep we should probably use the word grep and nope because it's older than that there we go so we should not be okay obviously this kernel version is a few years old and I've had to keep this VM around but what we see there is we're inside a container and we're doing stuff on the host it's a bad day container isolation is broken let's continue so what do we do we modify app armor second profiles we should be fine-tuning these things tools like Jesse for sales bane there's loads of new ebpf based set comp tracing stuff that will extract second profiles from running applications this is all that the big container security tooling does for you anyway some bells and whistles or as I heard it described Belgian whistles around that and of course we want to wait effectively these slides are available for posterity afterwards 13 minutes okay bypassing controls what else are we going to do well oh yes thanks Jim some of these are just not loading in time are they bypass container security controls oh well that does move eventually so there was an app armor bypass recently this was quite interesting because these are just not loading okay we get that cool so what we've seen here is we just created a volume and put it over part of the proc file system we shouldn't be able to do that it was a bug but this means that when the application looks for information as to what app armor configuration is configured we've overwritten it and there is none sly little bypass I don't know how long this was around for probably a good long time yeah proc self XE this one is fun this is taking a sim link or a pointer perhaps back to the docker run C binary from inside the container overwriting it and popping a shell and I will attempt to demo this one so what have we got we have a docker file and in that docker file you can see we have compiled some exploits we're actually patching sec comp inside container and then we're sim linking proc self XE to the entry point of the container I should point out that none of these exploits of mine I'm just standing on the shoulders of giants of course and all of these are publicly available so what happens we build this docker file and actually what we've done in the exploit stage here is actually it's in the other stage is right here we go so we write the string cv 2019 on to the end of the run C binary what would we actually do well we just replace it with a malicious payload like a file or a batch script that did something that we wanted so how do we prove this haven't yet worked let's have a look at where run C is and we see at the end there is nothing there and then if we just run this container that I built earlier you'll have to trust me then what are we going to see nothing there we go so these are the returns coats return code consistent calls and we have appended our string so again what's happened we're inside a container a theoretical isolation boundary and we've been able to influence or impact things that sit on the host this is a bad day this is how we break out of containers what is the fix here don't run all versions of software really really easy but as we know everything's a people problem and our organizations probably mean that it's very difficult to keep things patched in a timely manner this is a mutual infrastructure this is aggressive builds and pipelines for all of our work for all of our servers onwards right what have we next done that one yeah here is the lesson patch your hosts we can use yeah so we can use goss to test the current parameters to test the output of bash scripts to test for everything and we can use it in a sly way let's and one of my favorite quotes from our esteemed track hosts containers are a user space fiction I love that and it's you will notice haters don't really exist let's not forget we are still in the host kernel we are still poking around on the same machine we don't have this nested but we don't have a virtualization as we would do with other containers other yeah okay collection of stimuli and restrictions born from unintelligent design and years of evolution a lot like consciousness is there a lesson here no but everything on the internet and in our organizations is held together with string and sticky tape we should test everything because when it gets changed we need to maintain the same behavior we had before we're doing this for ourselves we're doing this for future us we're doing this for the maintainers of the system but don't even know who we were we're doing this because we are good open source citizens and colleagues and we're putting a security test suite in place to help it so test is a dark art it is a software engineering discipline we need rigor we need objectivity anything can be a security test what do we do a range act search prepare the environment perform some sort of execution and capture the result and then assert that it's actually worked prove it failed as expected this is very very important when writing tests otherwise you've just got a green test suite that doesn't actually catch anything and be aware except in testing okay that's like ops right yeah this carry on could be a bit pressed for time testing update versions alright gosh again this is an example of how to paste an animated gift as a gift alright well never mind there is a link there because basically it's very easy to go back to our streets I love them testing is cool it's the only way we prove that we're secure but we're not proving the absence of bugs we're not proving that the system is actually secure we're just saying but our particular model of it and our understanding at this point in time it conforms to some level of security okay but that was too easy right let's find some public clusters and poem those right and the clusters in the wild how many insecure Kubernetes hosts do you think we can find a few settings 10 100 thousand very good it'll be a lot let's go okay this is yeah right I have my head of security to thank for that right this is binary edge binary edges showdown for infrastructure this search term finds open Kubernetes is it's not a nice day if you're on this list which well it's actually web page this may not be legal in your jurisdiction alert what binary edge does is it connects to authenticated API endpoints that is a gray gray area so take this as you will but the platform has already scanned the IP for address space for us and then poked at what it has found we can see up here that is the query and down here this is some Chinese honey pots actual cluster who knows I wouldn't touch it with a barge pole so here is one I pwned earlier so let's still on these so let's test the API server and see if it is leaking anything useful this is an nmap script that runs in the nmap scripting engine let's go back up here and so mmap cube API server I run this with a little bit yeah so all we're doing here is okay so what we've done there is we've looked at a certain port we checked an HTTP response and run a regular expression over it the regular expression matches get version get commit and we can see here that the API server the Cuban API server is leaking its version information you would not do this with engine X you would not do it with Apache we learned this less a long time ago but evidently Cuban is is better than us so what do we think we can do about that version any ideas if we can attack it well 114 has one of the mother of all CVs associated with it it is error ha it's poor error handling in the MTLS server essentially what happens is a WebSocket connection is instantiated that is bundled in a TLS pipe if you like so the encryption is established and then the WebSocket communication goes over that there was incorrect handling of the WebSocket error code so the tunnel would stay open and we could then send whatever commands we wanted through this was initially the most extreme remote code execution actually it's scoped to reflective APIs and it's a little bit more difficult to exploit than we thought about but let's try anyway okay so first of all we will just run this so we can see we're not actually running the exploits and as such we have a bar and all we're doing here is opening socket and sending this WebSocket upgrade connection six times but it's not handled correctly because we keep on sending it and exiting if we then say that we will exploit this to run that piece of code you'll see what we have at the bottom which is 403 forbidden magically become to 200 okay as we have an unauthenticated request handled by the API server through that connection this is exploitable but a bit more difficult and I will leave that as an exercise through the reader seven minutes okay we can now watch it burn we can run some in there if we actually got pod deployment access we could deployment narrow minors minors or change cluster creds or delete all the nodes so more lessons don't run a public API server endpoint on the internet it's a privileged API and zero trust does not mean just trust everything is infinitely secure and all our authentication and authorization mechanisms just work they don't we would lay ourselves in any other situation tinfoil hats are cool and moss knows best if you need another reason why not to run a public API server you may have heard of this attack thanks to Roy McKeown Brad Geeserman and Ian Coldwater for bringing the honk on this one this is essentially it's a fork bomb for yamal it's like a zip bomb it's eternal recursion it will exhaust our API should be noted that any string will do in data key a but honk has been emphatically recommended by the authors another open API moral zero trust trust to verify mutual cryptographic authentication does not preclude the existence of other bugs and of course we update and we keep ourselves the hell offline okay let's see if this one will be done in minutes I have left again I may suffer oh that's another API server test so try that again I don't know why that failed yeah of course we can write tests for everything and anything we can program actually do and this is just a test for the presence of that you don't care so much because we're now on the billion last okay so what do we have here so this is my nice server and I'm looking for something that I've now lost here we go so we'll run time to do this we'll run that data so we can actually see something going on we'll check the API server logs we'll watch events and we will basically just fire loads of see what that says so we've got an exploit here which is just going to run this so all we're doing is sending the self-subject access review which says as an unauthenticated user if I was this user what would I be able to do that is not an authenticated API called send our payload billions of honks and we are just doing it again and again and again which is what the try medium prefix is on this now source it okay so we'll come back to this in a minute but what we will start to see is the API server exhausting its threads failing its health checks and restarting because we keep so many sockets open doing this that is the end of the API servers even if they're load balanced there is an asymmetrical data flood in this attack they cannot handle the amount of recursion necessarily to infinitely recurse unsurprisingly so we will keep on going indeed we have the time that's not what you want to see testing testing arrange active server network infrastructure well we've built a tool that control plane that assert is highly parallelized and map these slides are available later we love that core I am the only main trainer left on this project please join me it's useful assertions for bash we have built some extensive and expansive test weeks with that it's brilliant who runs Istio in the room a few people this is for you especially user okay Istio threats we did a lot of threat modeling around this there's lots of stuff that go wrong with Istio we don't use the pod security policy because it just doesn't let us configure it so we have to use OPA let's attack the mesh in the last two minutes okay so what have we got here I server think I may actually be out of time but suffice to say there is no endpoint security you can hit local host 1500 triple quits you can post to it and this issue will explain how you knock yourself off the mesh it is going to be fixed let's just start we make the other list for a real interface and recursively correct ourselves we are almost there how do we evade detection we stop the API server emitting its audit logs the black hole traffic we get in the way of the endpoint we deny the service the endpoint we root the cluster and we turn them off nice and easy the auditing cubanettes is hacking in a safe space how do we teach everybody the extreme amount of content I've packed into 30 minutes with this tool my time is up it teaches you all of this stuff and it's really great this is all a wonderful that's everything thank you very much