 Hey, good morning, everyone welcome to the session attacking and defending kubernetes P e enclaves and critical infrastructure. My name is Robert Ficala I'm the co-chair of the kubernetes policy work group and a member of the cncf security tag Have led some security assessments in that group, and I'm also assisting with kubernetes third-party audit In my day job. I work with high security critical systems things in payments banking health care and government and Of course, I've enjoyed the Spanish hospitality as I hope you all have as well So what is a tee when when they assigned me the auditorium? I assume that Lots of people were coming for a free t-shirt, but we're not talking about teas in that sense talking about trusted execution environments These are hardware enabled Enclaves that allow you to protect your code and data while in use. So many of you are probably familiar with the concept of protecting data at rest on Distorage and in data stores and transit TLS TCP IP sockets protected by encryption integrity here. We're talking about actually making sure the code That is executing in the CPU is protected Isolated and the data is encrypted Just to show of a hands. How many of you are familiar with what T ease or enclaves are? So few few any of you using those in production today? So more the experimental stage. I think that's the norm So Those of you who may have done attack modeling Threat modeling or even trust modeling will at some point start making assumptions documenting those assumptions and if you're operating in User land usually you're making assumptions like I can trust my hardware provider I can trust my chip manufacturer. I can trust my cloud provider I can trust my sys admin my root users who are employed and background checked But what if you can't Who would who would need this level of trust and integrity? Well, we're talking about critical energy infrastructure. Well, really any critical physical infrastructure systems Defense military systems if there's a large financial Opportunity then you may be storing value in the billions of dollars and so it becomes a high value target So that's the audience. We're really addressing today Those are the attackers that we're interested in and those of us who are defending those are the environments that we're defending I Mentioned some of the use cases, but I think where T ease and enclaves really got their their start was Enabling a more scalable version of what had already been used for a decade plus things that were hardware-based and rooted a Trust chain so that you as a developer could deploy code in a secure way, but that was very hardware specific It was often dedicated hardware modules You might plug into a server or it was a specific board and a client desktop or laptop And so this was a way to really scale this up to cloud and make these more virtual But in general you can also use this to increase your compliance posture And we'll talk about a little about that at the end And if you're doing in a regulated environment GDPR of course is a common topic for us today operating in the EU then you might want to have a way to Prove that you are protecting that user data from the Cloud provider itself and then again as we get into blockchain and smart contracts if you want to protect Intellectual property doing machine learning and you want to be able to use data. That's not yours And you did or you and or you don't want to disclose your algorithms. These are all use cases for Hardware-based encryption hardware-based isolation And so that's a quote from the confidential computing folks you're basically trying to remove as much of the attack surface as possible Down to the as close to the hardware as you can get So if we take a quick deeper dive into what exactly this means at hardware level It's really a special set of registers and opcodes at the CPU level. That's interacting with usually a memory protection Co-processor or unit on the CPU and it's enabling the the separation of a trusted area in memory usually enabled at the BIOS level and then in a small amount of microcode and That allows you to marshal calls between a trusted region of memory an encrypted region of memory and Then outside of that enclave you can pass secrets in you can get secret data out encrypted And it's unique to that processor, right? So, you know you're operating in a particular environment and that you can you can verify that we'll talk a little bit more about that So it's authenticated as the data is passed in it is now Confidential and even at the code level you can be assured that the code itself Has been checked for integrity that no one has tampered with it. It's exactly the code that you expect So there are many well, there are a few chip implementations by far the largest is Intel's implementation SGX and they have some newer capabilities TDX Which we can talk about a little bit later. I Am not affiliated with Intel or any chip manufacturer. I'm not affiliated with any Enclave or TE software provider. I have no horse in this race. I am essentially an operator probably like many of you looking for a way to solve a particular problem and and SGX is one implementation. That's widely available arm and others are making progress but if you look at how that Memory encryption works you've you've got some reserve memory and You've got pages of memory in a lookup protected in in the microcode a Lookup table and then any time you deploy an enclave and any of that Reserved memory space for your enclave is managed in a in a cached Array so that you have a mapping between what pages are being used and how they are assigned to each enclave So an important part of how you establish this trust is that you need to know that the code and that metadata There's some cryptographic keys for that encryption There's cryptographic keys embedded into the hardware unit to establish that root of trust by the manufacturer need a way to Measure to a test that everything is exactly as you expect And so you're using things like cryptographic cryptographic hashing and asymmetric encryption So you're hashing all the information the metadata on the chip. You're hashing the the state of Kind of a bootstrap set of enclaves on the chip a provisioning enclave a signing enclave And you're able to to then know as the client in the untrusted space Whether you're operating in an environment that you expect So typically on the hardware they'll have special effuses that will Enable that root key and then typically there's a derivation key derivation function So you're kind of using key encryption keys and then eventually when you call these trusted functions to to pass things in and out of Untrusted space you're using those derived keys So when you when you initialize an enclave securely you've you're passing in secrets and keys from outside You're doing this attestation process. So you're saying that if two Enclaves on the same CPU need to communicate with each other that they know that each of those are Have not been tampered with and have integrity and have an identity and then if you need to Attest to some global registry so that you can move data across CPUs then you're usually attesting to something like an Intel Registry and that's why they've embedded their Public key in the in the chip so that you can make that secure connection to Intel and verify There is a on the chip there is a component We talked a little bit about a provisioning enclave to kind of bootstrap all this There's usually something a quoting enclave that manages that verification to to the global registry So data you need to have data that you've Operated on in the enclave need to somehow persist that outside the enclave And that's where SGX and and others arm have a scheme for sealing the data And then that way you can call special routines that will allow you to encrypt it with a key that you know came from the enclave Persist outside of the enclave and then be able to restore it back into the enclave and use it for later Otherwise if you if you're only operating on very small amounts of data You might not need this, but any any practical application we're dealing with machine learning Blockchains, etc. You're probably going to need to seal data and persist it outside of the enclave securely although I will note that newer generations of SGX chips do you have Expanded memory it used to be limited to you know a very small amount now it think it can go up to a terabyte Let's see. So how how to practically how do we make use of this? You have a few options you can kind of handcraft your code Re-architect your application to use only the specific o-calls and e-calls these Trusted entryways into the enclave between untrusted and trusted you can try to wrap it You can there's some code that we'll talk about that will allow you to Convert your existing code into enclave ready code And then kind of a newer option is to use things like wasm to create bite code And that that can be used across te architecture. So you get a little bit more portability and the idea is that you're getting The application you have today. You're just cross-compiling that So those of you who might be fans of Costa de Papel I'd like to put myself in the mentality of what would the professor do so Probably he wouldn't go for direct assault through the front door So you wouldn't attack the boundary directly although you might use that as a form of misdirection You would probably try to attack every part of the supply chain. You'd probably attack The personnel involved the cloud provider. You'd probably attack the provider hardware systems integration so interesting Reports in the last couple of months from some startups doing work on looking at Components on a board found that a vast majority of them have fakes So you're getting a bill of materials. It says it's from this manufacturer that manufacturer and you realize nope It's it's been tampered with or it's you know something off of a different market And it's been repackaged to look like the chip you're expecting so Your highly motivated Attacker for high-value target is going to make no assumptions, right? They're going to assume that all parts of this system are attackable and they're going to spend the time and the resources to do that So it's patient research and it's not not something that can be Accomplished with you know kind of in an academic exercise So to enable all that memory encryption the supervision of those O calls and e-calls in and out of the enclave CPU cash control Wear memories how memories laid out you've got to have a lot of features and then to build on top of that you've got to have You know firmware you got microcode you're going to have kernels and operating systems and drivers and then inside of your Application that you've now somehow enabled for enclave you've got this whole set of code all of which would be part of the trusted computing base or Attack service if you look at it from the perspective of an attacker So a big part of some of the shortcomings of enclaves is that they are susceptible to side-channel attacks So you can measure the the timing you can measure the rhythm the cadence of operations and then you can Baseline good versus bad you can tease out bits of a secret key just by measuring The kind of thermodynamics of what's happening over time And so you will talk a bit about some of the defenses at the end But that is a non-going concern and the chip manufacturers continue to layer on every generation defense techniques for side channels So I just I'm not going to go into the detail of what what this attack tree looks like But just to put out there the idea that every one of those CPU caches the bus caches the bios all of these have attacks that are documented either academically or in Some open source literature so it's it's not a given that just because it's in hardware It's secure. I think if you take away one thing today that you know trust is is very difficult in these types of systems You know, do you trust something because it comes in a box and has an Intel logo? Do you trust it because you hire a third-party? Evaluation you I mean and in smart cards and some phone manufacturers. They go through a certification process with labs Is that trustworthy? Maybe it's like risk and you put out all the designs for the hardware for all the and open source all the code Is that enough to be trusted or are enough? reasonably educated folks looking at that code in a meaningful way already just rely on Standards bodies and talks like this to make your own decision What I what I think I would have you come away with today is it's not a silver bullet. So I've heard some High-level hand-waving that if you put everything in the TE then you don't have to worry about XYZ That somehow it's bestowed this magic that because it's in a TE No other controls are required and it's perfectly secure. I think I would definitely take this A way today that that is not the case that you do have to be if anything more concerned and scrutinize more exactly what is going into your trusted computing base and try to minimize that attack surface All those knobs and levers that we looked at in implementing this memory segmentation memory isolation They offer attack points So where does Kubernetes fit so we're here at a Coupe gone obviously we're all interested in how Kubernetes can use this technology So the the most obvious way to think about this is at the container level at the pod level We wanted to play our workloads somewhat easily into Kubernetes. We Talking about that attack surface You probably don't want to bring the entire Kubernetes control plane Into that trusted computing base because again any any flaw in that trusted computing base is exploitable as attackable So you really want to constrain down to the most useful unit and for all of us in the Kubernetes environment That would ideally be the pod So that's I think we're going to focus most of our attention Now if you're trying to operate this you already have a tough job operating Kubernetes as it is Now as a dev ops person you may be required to restructure your application re architect Maybe cross-compile things you have to worry about how T's are communicating Do you put are you assigning workloads to the right nodes? Are you putting the right? Hardware do you have a mix of CPU nodes? They have the right feature flags and the bios? So It gets very very difficult very very quickly So we're going to talk a little bit about some open source help And this is again. I'm not affiliated with any of these projects So this is just a menu of options that are out there Some I've I've used some I haven't So open enclave, so I look at this as your first step So if you're you know Right above just taking the SDKs from the chip manufacturer and using their utilities open enclave adds and a layer of abstraction Right, so they're trying to get multi TE support cross OS support When you're developing for enclaves from just the SDKs You have a lot of responsibility the crypto libraries that they distribute are usually marked as not for production They may not have you know pass-fips You don't get any operating system. So you have to produce libraries of your own or Use existing libraries that have been tested on these enclaves So open enclave gives you a lot of that functionality at a kind of an SDK abstraction and makes it possible to do More powerful things in fact most of what we'll talk next in somewhere Another either uses open enclave directly or has learned a lot from open enclave and mirrored some of their approaches So the next level of abstraction is you can kind of wrap your Application into a unicolonel lib OS Basically, you're creating a process. So instead of having a modular OS with kernels drivers separated You're kind of taking the exact opposite route You're compiling everything into a static executable and that's what you're going to deploy in your pod So there are a couple of open-source projects that you should definitely take a look at Oclem and forgive the project folks if I mispronounce any of these Oclem is one Grameen is another used to be called graphene But I think there are it's important to note there are caveats and as an attack you're thinking These are attack points Some of the implementations maybe works in progress You know found a couple of example nuggets on on github and some of them rely on more academic notions of how to verify and There's a lot of attacks that are using things like return return oriented programming jump oriented programming So you try to understand how memory is written And and what timing and try to get control of the environment. So not all of these Open-source wrappers have been hardened against those attacks and probably not many of them have been studied and Oclem and Grameen are certainly the ones have been studied the most and use the most but there's definitely some Find print that you need to go through and ideally if you have the opportunity review the code yourself Here's just a quick over. I think Grameen is on the left Oclem on the right, but they basically You know worked within the host They have kind of an abstraction layer And and then they you execute your executable that manages all of the processes They have a tool chain that takes your your code and then verifies it and You can then deploy it in the enclave So there's another way to do it. It's kind of create a shim. So in clavar is an approach where you kind of create a shim they call it run e or run and They allow you to use your your containers They're OCI compliant pretty much out of the box, right? So you can now just run run or run e against your container and create an enclave compatible Container and they also like I say they piggyback on Oclem and Grameen and the wasm so Yeah, I think this one is a pretty easy way to get if you have a pod application You don't want to or can't do much refactoring and you're not that much interested in the details of how the different SDKs and Attestation plug-ins work. This is a pretty convenient way to look at it again The caveat there is now you're expanding your trusted computing base. So you're bringing in for all that convenience You're bringing in more code that is susceptible to attack So that you know folks can analyze the way that things are are written how the O calls and e-calls are they patterned and They can extract secrets if they do that level of attack So another approach is marble run And so they're trying to take the microservices approach giving you a lot more functionality around Managing the secrets, you know, you all the attestation It's really just you're defining YAML adjacent and it's like any other microservices infrastructure No code changes required. No special tooling required And they can deploy Grameen and Auclum apps as well. So it's kind of a nice Stack I think again the biggest concern I would have with this is that now you've added even more Convenience and functionality and even more to your TCB. So it requires you to really think through What are the trade-offs of you know, do you trust all of that code that providing you that infrastructure? Do you really need a microservices approach for this critical application or is this, you know, maybe something that is down the road? I'd say that the last one I Would take a look at is nArcs So in this case, they're have taking your code You don't have to make much code modification at all if any and you're cross-compiling that to to wasm so they manage all of the Functions for the O calls and the E calls. They're doing all the remote access station They've implemented pretty much everything you need to deploy your your wasm app The problem here, of course, is if you're not familiar with wasm If your application is not organized so that you can quickly partition it into cost-compileable components This might not be an option for you. I'd say, you know for fairly modular small applications This is a great idea and I think There's project the project is making a lot of fast progress. I think they'd not quite up to production standards I think they themselves would would note that in the documentation, but I think it's it's a way forward in the next couple of years So I've given kind of an overview. This is my take on if I lay out What are the what are the attributes we're looking for in a platform? And then we kind of overlay those approaches, you know come on the left The SDK the plugins you can use kind of off the shelf You know, you have to do everything yourself and you know while there's You know a very small trusted computing base You really have to do a lot of work, right? So you're kind of trying I'm trying to find as someone who's deploying these systems I'm trying to find the balance between Make it an easy enough for me to run real-world applications But do it enough of the legwork to make sure that I'm confident and can attest that I have a trust Trust computer base that I can I can myself Audit and verify so I think if I kind of my my choice and I think we talk about that a little bit Oh, I'll come back to that. Sorry. So if you're thinking about this from a defense perspective Talk about that, you know minimized in the trust computing base You do I would recommend that Intel and others have done some formal verification So that's where you model the system and use you know either higher order logic or symbolic logic to look at What the behavior of the system should be and then you compare that to the behavior of the actual system? I Would say for those open enclave even the Ocklam Grameen I think if they could benefit greatly from using that formal verification approach and You know, it's it's not a silver bullet either. You still have to continue to do that every change. It's not going to find Concrete bugs in you know code check-ins It's going to really model the system and find fundamental design flies, but I think it gives you a level of Integrity to the to the design that you can be more comfortable using a particular solution So on the hardware side, you know, the chip providers themselves are adding more control over the memory layout and memory access You know for those Return-oriented programming jump-oriented programming attacks, you know, it's all about understanding memory rights and reads and a particular sequence or a particular timing and and comparing that to Trying to extract from that a signal that gives you the the key or you know branch information and extracting out Either the secrets or understanding more about the algorithm and the code that's running inside So to to do to prevent that they are adding features There's some software approaches that come a software supervisor that can check those calls and either add random delays and things like that or just block We reassigned things all together Yeah, so there's Intel has TDX it's their newer capability and that's adding, you know different king to the memory encryption And it's you know adding more granularity to the Memory page encryption And it does have support on a case. I will note that The achy if you're running Kubernetes on Azure They have a lot of support for all the tools that I've shown again not affiliated with Azure But you you will find it easier to do some of your prototyping on Azure using their confidential AKS support So let's talk a quick bit about compliance compliance When I talk about compliance I'm usually talking in the form of like a government standard and something like NIST Where you have to demonstrate that you your system meets certain security controls So the security controls that I've listed here from this state 153 Rev five and they have different baselines high and moderate low usually for any kind of critical infrastructure or military system you're gonna be looking at high baseline and So they lay out specific requirements for a system, you know, you you've got to isolate and protect memory You've got to isolate processes And so all of the features we talked about earlier with enclaves really check these boxes. So It's not otherwise Possible without dedicated HSMs or other specialized hardware This makes it very easy to meet these in a kubernetes environment and at least You know document how enclaves are are being used to implement those controls there's another set of even more granular controls which if if you're contracting with government agencies they can layer on Specific controls that may not be in a particular baseline. So here again the formal formal model for your Verification is helpful. You're getting Non-monifiable executable. So that's a lot of these boxes get checked by using enclaves So where do we go from here? So this is this derived from an actual system build-out And you know for us we have to look at it from the attackers view and and the defenders view For us, you know, we really wanted to use things like marble run or an RX and arcs But again, we were very concerned about the the size and the scope of that TCB. So The other end of the spectrum we we didn't have the time We didn't have the developer team to write to the raw SDKs and the plug-ins So we kind of fell in that middle ground and and are using Testing both combination of the Acklem Grameen and then you know, we are looking at in clavary. I think our our current inclination is To not using clavary just given that extra level of Code that it adds with I think not not that much convenience not to say that it's not a good project It's just for us. We'd rather incur a little bit more of the pain In verifying and understanding how things are implemented and do a little bit more of that work So that we can have an understanding of that trusted competing base So we will be posting kind of build-out Updates and we'll be posting all of the designs and all the attack models all the Tax scripts will be posting that on our Twitter feed. You guys are definitely welcome to check it out and Take a look Just how are we doing on time? Are we have enough time for questions? Great Yeah, so the question did everyone hear the question the question is about You're protecting the key, but what about the supply chain code that you're pulling in so as as you're pulling in libraries and open source into your application that is part of the the trusted computing base that you're now relying on so The There's no real magic to how the enclave is going to protect the supply chain It will provide code integrity so that you know that the the code that you've compiled into your application Or that you've cross-compiled the wasm is the same code that you compile But it's not going to and then you know when you load it into the enclave It will verify cryptographically hash that it is that code So you know that it hasn't been modified from the time you compiled it and into the time It's running in the enclave, but up until the point where you are compiling in that code. That's your responsibility Right so and I think that probably represents the largest and most fruitful attack surface if I think of it as an attacker You know I can somewhat dismiss a lot of the hardware You know without I would say for a high critical system I'm absolutely gonna put that on my list, but the low-hanging fruit is to go after the supply chain Right, so I'm going to try to introduce very subtle timing Changes in open-source libraries that nobody really looks at too closely So I might go attack their github repos. I might attack their their JavaScript packages I'm going to try to exploit those return-oriented programming jump-oriented programming tricks and extract data out of the enclave using those open-source bugs or Malware hacks that I put into that open-source library So that's to me why I want to minimize that as much as possible and You know the the folks that open enclave You know I think would probably be the safest route where you really have to scrutinize everything you're bringing in But it just leaves a lot of work for you Yeah, so the question is if the orchestrator and control plane should essentially be in the enclave as well, right? So a couple of answers there one, you know just at a very high level There is a performance hit when you when you work with these enclaves, right? So you're losing a lot of the memory optimizations different levels of caching different, you know hardware Performance optimizations, you may be losing a lot of operating system caching and performance tuning So you have to consider what is the performance hit in using this and if I just stick the whole database or whole you know application stack and Control plane and kubernetes into the enclave itself, you know the performance hit is going to be significant Not to say it's impossible And certainly with those newer processors giving you more memory. It is possible Above and beyond performance you've you've got to look at why right? so if if you're following that attack tree and you're minimizing your trusted computing base That's kind of at odds within adding the whole control plane and the whole orchestrator in there as well You know so do I need to do that? What's what am I gaining by adding that in there? I don't trust the operator. That's explicitly part of our model. I don't trust the cloud operator I don't trust the root users of the system. I don't even trust the board manufacturer The only thing I trust is that intel or arm chip So everything else is about convenience really and you know developer productivity So I think it's just a trade-off you you you want to minimize the amount of code that you have to either formally verify or You know quasi formally verify and scrutinize and do code audits and code reviews and lab tests or whatever mechanism You're going to choose But as you expand that into kubernetes, I mean as I mentioned at the beginning. I'm helping the Six security group do the third party kubernetes security audit and that hasn't been done since 2019, right? So I'm you know don't know what's going to shake out of that, but I'm sure there are going to be a few things The attackers of course are going to be looking for every opportunity in every layer of kubernetes the implant code that will enable those memory layout memory timing attacks and So it just gives them a huge surface area. So I'd say for If you find something kind of a moderate level of risk, it's it's definitely something I'd consider but if you really It's you know the nuclear missiles. I think it's too much convenience to trade off Right. Thank you Rob. Great. Thank you everyone