 Hello everyone, welcome to this session. My name is Kai Lunqing, and I'm from Ant Group. Today my topic is Kata times TE, a Lego line two-way sandbox for similar security and privacy. First, let's start with this famous saying by David Weir, a well-known computer scientist. The saying goes, all problems in computer science can be solved by another level of indirection. Except of course for the problem of too many directions. So let's keep this in mind and see how this is another level of indirection to help with the security and privacy in Kata containers. Let's firstly recap a little bit about the history of Kata containers. As the very first beginning, all the Linux processes are running directly on top of the same Linux kernel, so that if one of the processes is attacked by a malicious process, it can usually use the very same way to attack all the processes running on the same host. This is because all the processes, they do not have isolations at all between each other. To tackle these problems, containers have important namespaces and the C groups to provide isolation and limitations of results allocations between containers. So that one process running inside a container cannot be easily attacked or affected by the other processes running in the other containers. However, is this enough? Let's think about more scenarios. If we have some untrusted code in containers that tries to attack the host, as we can see in this picture, all the containers share the same host Linux. So if one container can escape to the host Linux, all the processes running on top of it can be compromised. This can be beyond your imagination, but this is not very uncommon in the current days. As you can see in these two diagrams, these are the CVEs of vulnerabilities within the current Linux kernel. As you can see, there are over 400 CVEs in 2017, and with a part of them, the containers can easily get root privilege to have the full control of the platform. So how to handle these security issues? We're thinking about adding another level of indirection by importing virtualization. By adding processes into virtual machine, we can have more isolation and in defense in depth. Here comes our leading actor today, Kata Containers. It is not a normal VM and it tries to combine the best for containerization and virtualization. Kata targets to be as secure as a virtual machine to be smart and fast like a container. It would also like to provide the very same behaviors like a container to its users. So let's take a quick glance of Kata Containers architecture and core components. As you can see in this picture, that we have Kata agents within the port sandbox, the guest VM. It handles, it manages all the containers and processes inside the guest VM. We also have a Kata runtime, which is also compatible and handles all the runcy common lines. To communicate with the guest VM, it's a portable serial or VSOC. And once VSOC is enabled, Kata proxy is not needed anymore. There is another important component called Sheem, which is acting like a reaper and which monitors and reaps all the actual container processes running inside our virtual machines. Kata Containers is running towards the second generation of architecture, which uses container D Sheem Kata V2 to gain better performance and security. We will talk about them later. Still is this enough? If we have some secrets inside virtualized containers, we can think of the circumstances that there are malicious administrators that can intentionally steal secrets from guests. And we have buggy and complex hosts and hypervisors that can be easily compromised and unintentionally lead guest secrets. As we can see, there are still attack vectors in hypervisors and VMs. This can be real true by seeing these two diagrams. These are CVEs of vulnerabilities lying inside the QMU and then the hypervisors. So we are thinking, besides of defining attacks from the guest, should we have another type of two-way sandbox, which can also defend the attacks from the host? We may also need a two-way sandbox in various scenarios like financial services. For instance, you do not want your payment user or password to be accessed by the other users running on the same type of platform. You may also not want your payment passwords to be accessed by the platform owners as well. For confidential AI scenarios, you do not want your intellectual properties like the training models, the models trained for your AI to be accessed or to be stolen by your platform owners. The two-way sandboxes are also very important for scenarios like blockchains, edge computing, serverless, and bare-metal applications. So the question is how to make up a two-way sandbox? Let's recall our principle of adding another level of indirection. Why do not us import another level of indirections? So we are thinking about trusted execution environment, TEs, also called enclaves. It is a secure area protected by the processor. And only the application owners can access the code and data of the trusted execution environment. Neither the info owner nor the hacker can have the possibilities to have the access to steal the data inside your TEs. This is brilliant. So we are thinking about combined virtualization with TEs. With TEs help, we can put our secrets or processes directly into the other TEs so that the code and data can be secured from malicious admins attack or compromise host hypervisors attacks. We can also extend the security boundaries to the virtual machine level. So the question now is how to build our level like two recent box for similar security and privacy with color and the TE. We may all have the experiences of playing with Lego toys. While playing with Lego toys, we usually have quite a lot of pieces of Lego components just like playing with color containers. We have acquired a few choices of hypervisors with quite a few of hardware platforms. So let's take a first look at the Lego-like hypervisors supported by color containers. Color currently supports ACON, QMU-KVM, cloud hypervisor-KVM, firecracker-KVM as a moment. You may be confused by why color supports so many hypervisors in color containers. The answer is that the different hypervisors may target different scenarios. For example, ACON is focused on automobile and IoT and cloud hypervisor as its name implies and targets modern and the cloud workloads. Besides the scenarios, they have the different hypervisors may have different device models. For example, QMU-KVM, they have extensive device models which can support general workloads and is the default backend for color. And for cloud hypervisor and firecracker, they have limited or reduced device model support with Rust-VMM-based code base. So they can be more secure and performant. So you can just pick up your favorite hypervisors to construct your favorite figure of the two-way sandbox. Besides hypervisors, color containers also support different architectures besides x86, it also supports AMD ARM and IBM platforms. In addition to the hypervisors and architectures, let's think about what we still have in our hands as the legal pieces. When we think of, we still have a TEs, they can usually be categorized into three types, virtualization base, application enclave and hardware as a native VM. For the first type, they are purpose-built isolated VM running on top of lightweight hypervisors to provide TE functionalities. For application enclave, they are backed by hardware-based integrity and confidentiality protection and supports ceiling and attestation services. For hardware-isolated VMs, they support attestation as well, as they are usually guest level, they usually guest level isolation with memory encryption. So let's go deep diving into them one by one. Firstly, let's have a look at the virtualization-based TEs. On our left hand, it is an architecture presented by Intel called trusty. And on the right hand, at any group we have our own virtualization-based TE called hyperenclave. And we have our root of trust lying inside the hardware and we have our trusted hypervisor running on top of the hardware to provide the separations or isolations between normal words and secure words by leveraging technologies like VDD and VTX. The second type is called application enclave. The typical example is Intel SGX, software guard extensions. It can protect select code and data with hardware-assisted confidentiality and integrity support. In other words, that all the system software and firmware like BIOS have provided us OSs and drivers are all TCVs. It also supports attestation services and sealing services. The third type is called hardware-isolated VMs. And I'm gonna introduce you AMD-SEV and Intel TDX. The AMD-SEV secure encrypted virtualization it supports running encrypted virtual machines by providing one key for hypervisor and one key per VM so that the VM can be protected from each other and from the untrusted hypervisor as well as administrator temporary by lying on the basis of cryptographic isolation. And SEV also supports and maintains the compatibility by running the normal VMs on the same host. There is another technology called TDX recently released by Intel, the trust domain extensions. It is very similar to AMD-SEV by using different methodologies. In general, for hardware-isolated VMs no changes are required for applications to operate inside of VM. Above all the cited TEs we have some other types like arm trust zone based. With the support of hardware-enforced isolation built into the CPU, arm trust zone can provide one trusted execution environment. However, it is not applicable for clouded usages since we cannot guarantee the multi-tenancy in the only one single trusted execution environment. So let's take a simple comparison of all the four categories of TEs. They may have different protection scope, access level and SDK and we may need or need not software changes for our applications. And for different TEs they may have different memory size limitations and TCB considerations. So you can just pick your own level of trust and more depending on the pieces of a Lego-like pieces in your hand in the Kata containers. Besides of considering how to build Lego-like two-way sandboxes we are also considering similar user experiences. Actually, Kata containers have already supported similar integration with the cloud native ecosystem. For example, it has already integrated with the CIS like container D and CIO. It can take the run-through common line and all the aspects as the input and it abstracts the hypervisors with the CIS back. So we can simply use the tools like Docker, Kubelite and Podman to create the virtualization-based containers just like the normal containers. Next, I'm going to introduce you how to integrate seamlessly with the three different types of TEs. The first one, virtualization-based TE. As you can see in this picture it is very similar to adding another hypervisor support. So we can just follow the way of enabling the hypervisors like ACON, Vibcracker, QMU and the cloud hypervisors for reference. And you really need to extend the Kata run-time to have our virtualization-based hypervisor to create on-click VMs for us. The second time, how to seamlessly integrate with the application on-click. As you can see, it is a little bit complicated as we need to extend multiple Kata components. For example, we need to extend Kata run-time to have support for SJX related options. For example, for what containers and its dependencies. And we have to expose the configurations to the users. We also need to extend our hypervisor with VSJX support. And for the Kata agent side, we need to provide live on-clave to manage all the on-claves created inside the namespaces. And we may also need to handle of on-clave signing and maintain the compatibility of the normal containers created inside the namespaces. And if we still want the on-claves to be created inside a container, we may need to add an on-clave agent inside an on-clave container. And for the third time, hardware-isolated VM, we can see that it has almost the same steps as the previously introduced. We need to extend Kata run-time to add support for VM memory encryption related options. And apparently we need to expose them to the users. And for the hypervisor side, we need to have the SCV support. For our inventory goal of similar user experience, actually we still have to consider similar performance. And the Kata is trying hard towards similar performance. For example, it has shrinked its shame, that's the shame, 2n, 2n plus one shims to SCV2, to solve the problem of too many indirections. And for the hypervisor side, it provides many choices and most of them, and some of them are lightweight choices. So you can pick up your own with some overhead reduced ones. For example, some have some extensions of memory consumption overhead reductions, and some have some reduced device emulations, et cetera. So you can usually pick up your theories or the most suitable one for your specific workloads. And for the Kata agent, we are seeing Kata is moving from go Kata agent to the rust ones and using TDRTC to reduce the overhead. And Kata containers is also integrated with parameters to have the virtual machine metrics to be monitored and so that we can have better methodologies to tune our performance. We are also thinking about to provide the two-way security and privacy. From the security's perspective, Kata containers have already done quite a lot of things. For example, it has provided as well as we were already introduced, it has provided the KataStream V2 to provide one single Kata process per port so that the attack surfaces are reduced. And for the port side, it is supporting NVM image handling to pull images from the port. And for the hypervisor side, we can pick up the rust hypervisors and the reduced TCP hypervisors to have a better security. And as we talked about earlier, that Kata agent is moving into rust as well so that we can guarantee the memory safety. We are also thinking about the two-way privacy guaranteed by the teas introduced earlier. So with the different types of TVs, we can protect code and data use. We can provide confidentiality and the integrity guarantees. We can have increased isolation and the configurable level of trust. We are near the end of our topic, so let's wrap it up. As you can see in this side of the epilogue, no code, no box. No code is the best way to write secure and reliable applications. Write nothing and deploy nowhere. This is quite funny, but this can be true to some extent. Well, this might be a joke, but we can still, we still can get better security and privacy by doing several things. The first one, minimize the code. We can reduce the TCBs, especially for virtualization based and hardware isolated TEs. The second one, encrypt the code. We may have a confidential code and we may have our workloads running in the regulated industries. So image encryption is inevitable. And the third one, attest the code. We need to integrate with the local and remote attestation process, processes provided by the TEs and we have to make sure that their architecture have not sticks. And I think that's all for my presentation. Thanks for listening and I'm here for your questions. Thank you.