 So, this is Josephina, she will give her introduction later on and I'm Kai, I'm CTO of Sikinet and CEO of Sikistec and we want to give you a brief view of what we did with confidential computing to harden the OpenStack control plane. And my part will be to give you a short intro on what confidential computing is. And Josephina will give you the details how we did it and this may be the more interesting thing. So I have not much time, what is confidential computing about? It's about security. Okay. I'm done. Do you give me some time, Josephina, to explain more? Okay. Thank you. A little bit more concrete. It's about CIA. What's that? Sounds interesting at least. But CIA is a common abbreviation in the security area for security goals that we want to reach with mechanisms that we implement. It's confidentiality, integrity and authenticity. And this is especially interesting in the context of cloud computing when you have these goals of providers and different stakeholders on the stack of the software stack. So it provides mechanisms and means to secure data, to provide integrity of data and authenticity of data. And so it's a mechanism to provide trustworthiness. And you might have heard about confidential computing in the context of workload protection. This is a common use case where it's used to protect workload against providers. But we wanted to look at how we can use that for the control plane. To be a little bit more concrete, what does it mean to protect data on these three dimensions? And especially also code, not only data. This is a nice distinction, a new paradigm, let's say this way, because that's something we didn't have in the past. We had in the past data-addressed protection. And we still have it, of course. So we can protect data when it's stored on disk, on storage wherever. There are means of encryption and integrity protection so that data is stored securely. We can also protect data on flight so when it's transmitted over the Internet in different ways. So we have VPN technologies since 30 years. We have new versions of VPNs like Vioguard. So the same thing in different ways to encrypt data when it's sent and decrypt data when it's received. So not very easy to make it right. But technically it's known how to make it. But the new thing with confidential computing is to protect data and use. This sounds like magic. How to protect data, how to encrypt data or use encrypt data when it's used when we compute on this data. So we have two general solutions to make that happen. Mathematical solution for that is fully homomorphic encryption, which is a very interesting mathematical principle, sophisticated stuff. It's currently of quite limited use because it's really hard to compute or hard to use for general purposes. So we have some specific use cases. We can somehow search in databases. There are a lot of interesting use cases coming up with homomorphic encryption, but it's not ready for prime time yet, I think. So it's especially not suitable for program code. You can calculate the data, but you cannot protect program code with homomorphic encryption. Therefore, hardware supported means are developed in the past, and especially now in the context of cloud computing, which are, well, differently suited for the OpenStack control plane, especially, but so the technology behind is almost the same in different implementations. It started very early with the trusted platform module, the TPM stuff that was developed also some 15, 20 years ago. And you have technologies, mechanisms, software stacks that support trusted computing. It's called trusted computing architecture these days. So it's specially used to protect the software stack against tampering with. So it's integrity protection mechanism. But it was very hard to manage. So it doesn't have a quite good distribution or usage in the broad yet. And it's, from what I see it, TPMs and TCG, TCA mechanisms are not used that much anymore because it's now involved in the secure enclave paradigm. So step in between, there was a mechanism in the ARM processors, especially called trust zone that was implemented as a kind of CPU in the graded second part, secure part of processing. So also, it was already used for processing data in a secure way, but it was also very limited in its use. So the new implementations of secure enclaves are there in Intel and AMD processors. Intel calls it SGX ARM, AMD calls it SF, SEV. And for ARM, it's coming soon from what I heard. So it's not officially there. And there is a consortium that is dealing with a standardized software stack and APIs for that. It's the confidential computing consortium. And there are a lot of information how it works, how it works on different processors and so on. I brought a slide from Intel here that shows a little bit how it works to make it more visible or tangible, what it's really about. So you can protect data in an application or in a process context with SGX or with this as a secure enclave in a way that data is encrypted when it's transferred over the bus and the memory in the RAM, but it's only decrypted in the CPU. So the CPU itself provides the security boundary to protect the data and the program code. And there is especially a mechanism that goes beyond what we have until now. It's attestation. We can implement mechanisms to verify, to validate that the code that runs in this enclave is exactly a piece of code that we have compiled before. So it's made by HashSums, so the infrastructure behind verifies the HashSums and only if this code is calculated to this HashSum, then the CPU is executing it. So that's a very nice behavior or mechanism to provide integrity, code integrity. And even more, the code and the data when it goes outside the CPU is encrypted. So other processes on the system can look into memory, can dump the memory and look what secrets are or what data is in the memory and mostly program code deals with secrets. And that's where we come to the idea that we could use confidential computing also to protect the open-stack control plane, because there's a lot of stuff inside the control plane. You know it much better than me, especially when it goes about key management and key storage, like Barbican, like Keystone, then it's not so good to have these secrets in the main memory and the RAM of this machine, because then it's attackable. If someone gets root, if someone gets physical access, then it can read out the memory and there is even more attacks like cold boot attacks when you can read out the memory when the machine is switched off even and get the secrets that are stored in the RAM. So we think it's very good to have a mechanism to protect data of the control plane in these sort of endplanes. And it's even more protection against the major attack vector in the cloud scenario, the VM break out when someone is able to get access to the compute node when he breaks out from the VM. And so he has root or even ring zero access and writes and he can do everything on the machine and he can read out the memory and whatever he wants to do. And so these end-clave mechanism is very interesting to protect the control plane even if the VM or the operating system on the compute node is not that secure that we wanted to be at this. So that was the motivation behind and I think it can really help to harden the control plane against this major attack vector on a cloud stack. And finally it's about CIA, confidentiality, integrity and authenticity of program code and data on the control plane. What we did is we started to containerize OpenStack services. This is nothing really new, maybe you already did that or rather did do that but we then started to use SGX to protect the system. This was a joint effort with Intel and Scontain, Intel as you probably know. Scontain is a small company in Tristan area that has a lot of experience with SGX and libraries around that. What we experienced is we failed often on the current technical limitations of this end-clave paradigm and Josephine will give some insights in this pain that we had at some point but I think it's worth to go this way and because this technology emerges we will have more resources in this end-claves. We will have encrypted virtual machine environments in the next generations of the process so this technology will be there in the future and we would like you to build these OpenStack services more end-clave friendly so that we can use this hardening mechanism to make it more secure to run an OpenStack control plane. That was my intro and now Josephine will give you some more insights what we really did. Hi, my name is Josephine Zeifert. I'm a developer at ZeclStack. Some of you might know me as being one of also contributed to OpenStack and normally I'm more directly into the OpenStack code and now I will give you some overview about our case study with Babicam which is on the control plane. So we started with a very, very simple Babicam setup. We had our controller node with our Babicam container, a config file and we have also our life cycle management which we wanted to include and we wanted to use the image updates or the life cycle management for any outdated images and we still had some pain with putting Babicam in an SGX container that was only one part or main pain with putting Babicam in an SGX container or any other OpenStack servers like we also did it with Keystone and we are currently working on Nova and putting that into an SGX container is the forking of the processes which leads to a lot of processes which should maybe be executed on an enclave but that's way too much. So after we got that worked out for Babicam we had two other open questions here. The first was, yeah, what do we do with a config file? I mean, where does the data in the config file come from and how can we protect it but make it usable for rebuilds, etc. And we especially used simple crypto as a use case here because the hardest thing when simple crypto is there's the key encryption key, the master key encryption key directly in the config and that is a wonderful attack vector. And so if we can protect that keg, then we can protect other credentials like database credentials in the config file. So this is the first question I want to talk about and the second is, yeah, we still have a life cycle management and maybe we want to update the Babicam image because there's a security issue. How do we prevent the rollout of outdated or unpatched images and, yeah, scan this and therefore we used scone from scontain. I just give you a short overview about the architecture. Scone consists of local attestation services and configuration and attestation service, which is more central. On every note which executes a secure enclave, there has to be an LAS and the CAS can be central or there can be more than one CAS and stored on central nodes. The local attestation service is just for measuring and testing the local enclaves, as you can see on the left side, it provides a secure hash of the initial enclave state. So just make sure this is everything okay with the enclave on this. The CAS is a little bit more complex. There we can store policies for each service which is executed like Babicam and we can provide configuration like environment, viables, even injected files and keys and that is something we can use for the config files and it also can attest each service instance. So first we took our original Babicam image without being SGX ready and we also used scone to build an SGX-based Babicam image and we did so with also providing a scone file which described the target image, what it should be, what it should go on to the enclave and also to describe confidential properties of Babicam and those confidential properties ended up in the policy which was built for the Babicam service in the CAS and that policy also includes an initial checksum of the image for the image attestation. Now, as we can give parameters rightly into the policy, we go back to our config file and separate the config values between open config values, those which can be stored on every node and can be accessed by everyone, can be adjustable by LCM and it's like the EIP of the container, it's not confidential. And then we have secure variables and therefore we have placeholder variables. Here you can see we replaced the keg with placeholder variable and this is immutable for our lifecycle management and for operators. Then there are two ways to get the keg into the policy there. It is either provided with the scone file, so in a creation or what better way is you can give scone the comment to generate a key. So the key generated is in the CAS and as you can see the CAS and LAS also have a little lock on it, so they are also executed in an SGX environment. So if we generate a key into the CAS as it's stored in an enclave there, no one can access it and when we bring it to a container, when we start it, when we initialize it, the generated key is transferred directly into the enclave of the container. And here we can say the first step when we have a container initialization is of course to build the container, we use our new image and the policy which is stored in the CAS where we get the generated key from and also the hash of the image for the attestation. The config file from the controller node is also read in that part and everything is placed together to have SGX container. Before activating it, the second step is to measure and locally attest the container that is done by the LAS and there are even more secure key management than just providing the keg for the config file. But you can also generate keys for example an encrypted file system. These keys are also transferred at the initializing of the container and the Barbican container or any other container maybe more case for NOVA can have access to an encrypted file system using that key. And every key which is coming from the CAS is reloaded at time for restarts and rebuilds and never saved anywhere else than in the enclave of the SGX controller on the controller node. For the attestation part, we can go into this little example. We had a Barbican image also in version one, which went very well, but then there was a security issue. We needed to fix that image and there was a critical security update and now there's image v2. And because we did that with SCONE again, there's an automatic update for the images in the policy. As you can see, the hash is not of the image v1 but of the image v2. What now happens if we want to start a new container with the old image, then it takes everything, begins to build it, but the second step, the measurement fails because it creates a difference between these hashes and other measurements and therefore it is not executed completely. So it stopped. And then we use the image v2. We have the same building process first using the policy and the image and then the measurement because we have the correct image hash says, yeah, you can go to summarize all of this. The main thing we have now is the policy in the CAS system which is in another sgx enclave. So there we can have key for a potential file system or any config variable which is used in the Babacan container or any other container. So we have that one issue addressed where we want to address the values of the config comes from and how can we use the same value for any rebuilds or any other rollout because redundancy you can use as you see the same policy for a redundant Babacan installation. And it is verified through the hash so you can always run the correct version and there are also some other, yeah, little gimmicks to scone because it even checks your local hardware and looks for the right code for sgx for example and if there is a security breach it does not execute any container but it waits for an update so you have to update first the version of your sgx before you can execute another secure container on it. And that was a lot of, yeah, theory. So now for the demo time. At first I want to show you, you know, our key encryption key protection and therefore we don't even stay on the control or not but we go directly into our sgx based container and grab for that config in that container so anything you can do as an operator. And here we can see we grabbed for the simple crypto plug-in and we see the keg is just the variable. The original keg is stored in an enclave not accessible even from this container out. Now I can show you that it's still working so that we provided a keg. We are going out of that container again and just executing an open stack command, getting a secret and decrypting it and it works. I can get payload and I have protected my key. So that's the first part of this. Now I can show you actually, it's not easy to show you where this key is generated in the CAS because you need special operational keys to access this. I'm not allowed to do it but I can show you the scone file in which you can create it and which you can give the key and this is something which looks like this. There we can see the keg and you can either specify the key as a value or tell the CAS to generate the key at that point. And so we protected that one value from the config. Now I've showed you the config and the next part is the attestation. At first, we still have our sgx based container and we just make a backup image of it, of the running container. This is a new image. Backup done and now we stop the container. We inspect the container and grab the scone config ID because we need this ID when we start the container again, which is done now. There you can see the ID and we specify not the old ID but the backup image, so it's a different image and now we look into the locks. What did happen? As you can see, suspected replay attack and a few lines above you can even read that attestation failed. So even if it's because it's just a backup image but it doesn't let me start it so I cannot even start that image because the measurement is different, the hash is different, I cannot just inject another image from a love cycle. That is the second important thing. Okay. That was everything from my side. Thank you very much. Do you have any questions? Please. You should go to Mike because there is a stream. Question for Josephine. I think you mentioned at one point that forks were problematic with the SGX. So bearing in mind developing enclave friendly applications, what's the problem with forks and are there limits on forks and what can we do to work around that? The problem was when I informed correctly that because there were so many forks later on from the standardised, from the start point which you wanted to put in the enclave that it was problematic, it was too much for that enclave and we had to use or we had to adjust a few things to parallelise this. That was a big thing which did cost us time. There is currently a memory limit in this enclave and if you swap out then it becomes too much too slow. The hardware vendors are constantly increasing this but it's still with a new generation with Ice Lake, it's still limited in some sense. It's not unlimited as you work outside the enclave. It does need some care how many forks you do, how many cash you use or whatever. But that's something I think we should summarise in some sort of do's and don'ts and put it on a website or wherever to help to develop enclave friendly services. Can you go also to the mic? Have you looked at any other services, especially those that use root wrap or any privileged escalation to run at root level to set up neutron, set up networking things and what the limits of your approach are to just protect the software but also make sure it only does things that it's supposed to do. We are currently actually going into Nova and trying to put Nova into such containers and yeah, this is currently something we are working on. So as far we had Babacan and Keystone they are done. That's working on this one. They are the most lightweight ones so you wanted to test the two questions here and that was it. So what is the strategy of secretary type on addressing latest spectrum issues for example branch history injection which was discovered quite recently. So are there some features in the software development like you used the chopper points or do you just rely on the restrictive speculation litigation to develop? I can comment on this. Side channel attacks are a big problem. Period. And you won't get rid of it completely in a shared environment. So I think we are working now on more sophisticated attacks. When we look back at spectrum meltdown this was really a disaster but now it's getting better. But it's also getting slower you know. So you always have this trade-off and I think it's not a like white solution you can separate workloads if you want to get rid of side channels on different machines. This is usually done with the automation mechanisms in OpenStack but well it's overhead of course. We also use these mitigations of Intel and we work by the way we work quite closely together with a company that discovered meltdown and that has very close or very detailed knowledge about virtualization and side channel attack protection so we also look at how we could improve the hypervisor to protect better against side channel attacks. But that's a long story. So the contain toolset is not for free so please address them that they open source it. We would like that but it's their business model by the way but it's only a way to easily use SGX. You can do it step by step. There is a ton of documentation how to use SGX and do it yourself. There are also open source in the field to use SGX but that's the enablement of SGX different ways to use it. The way we combined it together is more a kind of configuration so that's not well an implementation work but a configuration and deployment work that we did together with our lifecycle management and we integrated I think with Yauk already so that new lifecycle management based on Kubernetes is able to deal with that sort because it's containerized this way and that way. So it comes together at some point but if you want to test yourself just connect to us and ask how we can help you. Okay I think that's the time is up. Yeah the time is up. Thank you very much.