 Hello, everyone. Thanks for joining this Cloud Native Security Conference session. My name is Kai Lunqing. I'm a Cloud Software Engineer at Intel with a focus on security and confidential computing. My topic today is towards the hardened Cloud Native cornerstone, container runtime protection from security to privacy. I'd like to hereby thank for the tremendous help received from my teammates while building these contents. So firstly, let's jump into the question about our main character today. The Cloud Native cornerstone container, how secure it is. We may not have a direct answer right away, but let's take a look at the container security from different views. Let's start with the container security risks from user's view. Here is a chart from a white paper of Tencent Cloud in 2021, which illustrates the container security risks that users are most concerned about. While in this chart, we can find that the container escape, which is part of the runtime security, together with image security, and cluster intrusion are the top three concerns raised by users. However, it is really hard to address the risks before knowing the potential threats. The systematic analysis of container threats is therefore needed. Here we present a threat model, which was given by Open Web Application Security Project, OWASP, on Boxer in 2019. The threats of containers have been categorized into six aspects according to OWASP. Currently exploits, container outbreak, denial of service, and network threats, also possible poisoning images, and compromising secrets. It's easy to understand that most of these threats are happening or can eventually lead to security risks as a runtime of containers. In fact, the threat model of containers can be further extended in several ways. To complete the OWASP threat model, first, the end-to-end flow of containers should be considered, including the CA and CD phases, as well as the code and image management, where supply chain kind of attacks can happen. Secondly, the stride analysis can be applied to more fine-graded layers of containers, where we should be able to understand when and how spoofing, tampering, repudiation, and information disclosure, and so on can occur in which layer. Moreover, with the evolution of attacks and varied user scenarios, we should take stronger security targets and more advanced attacker assumptions into our account. With this extended container threat modeling, then let's take another look from an attacker's view. How does a running container look like in their eyes? A container can run in a VM or directly on top of the bare metal, depending on the deployment schemes. Well, in this diagram, there are a host of attack vectors. For instance, vulnerable code exploits can occur in almost everywhere the code runs. In addition, bad and uncompliant configurations can expose a large attack surface. Also, the running containers can be connected within secure networking. Besides, containers can be using compromised images, leaking secrets, and escaping from their security boundaries. It's also worth mentioning that in some scenarios, even the host, I mean the host, kernels, and OSs themselves can be malicious. So with these attack surfaces exposed, we're able to summarize the possible attack paths or scenarios of containers. So firstly, the first attack scenario is the intercontainer attack, where the user's applications can be attacked by the language runtime or libraries running within the same container. The second attack scenario is called intercontainer attack, where containers are able to attack each other. The third one is called container to host and host to container attack. This is one of the most common attack scenarios. Whereas the fourth attack is called container to container runtime, or sometimes container engine attack. The fifth one is host to runtime attack. And finally, we have the last one. We call it cluster or cloud attack, where attacks can be launched through cross-host networks. After having a thorough analysis of our enemies, the next question is, how can we defend against all these threats? So let's try to put a shield on the containers. One first attempt is to apply the best practices of container hardening, such as using secure mapping, following mutable paradigm, and so on, as suggested by OWASP. This best practice can help avoid at least or at least mitigate several types of container threats. A second shot can be given by fully leveraging the container security features that we have so far in the container ecosystem. The security features include the ones applied to the images, cluster environments, demonetized the core components of containers, and to the containers themselves. Let's revisit here the software-based security features of containers themselves. For example, we have sea groups and namespace in the kernel space, which are the foundations of containers. Beyond these two, there are also capability mechanisms and Cisco restriction mechanisms like SITCOM. In addition, mandatory access controls, MAC, can be enabled via Linux security modules, LSM. There's also container UID and GID management, and user namespace, remap, et cetera. We can build up a more security-hardened container. What else can we try? In production, real-time security detection, which monitors and reports out abnormal container behaviors such as high-risk Cisco and file tampering, is usually deployed in production. A series of vulnerability mitigations in kernel and in applications can be utilized in parallel. Note that image scanning and encryption are two practical enhancements for images to help raise the security bar. It's worth highlighting that hardware-based container security techniques, like virtualized secure containers, are regarded as very helpful for defending against the container escape. And for wood-looking, trust-execution environment-based confidential containers are appealing to address the privacy concerns, which we will talk about further later in this talk. So with all this black magic, are containers free from security threats today? Unfortunately, the answer might be a little bit annoying and disappointing. But the truth is that the containers are far from secure. One of the first reasons is that even though designed almost perfectly, by relaxed security feature deployment in real production environment, this can be due to the high entry barrier, low ROI and stability concerns and so on. The investigation from Tencent Cloud found out that there are still 7% of users are still running without any security capabilities, which can be very terrible. Furthermore, some advanced security capabilities that we introduced just now are not yet in use to tackle with the advanced exploits. Besides the shortage of deployment, another important fact that cannot be ignored is that the weaknesses are still across every layer, the kernel layer, the container layer and the orchestration layer. Even in this year or the past 2021, CVEs that can lead to container escape and privilege escalation still come up very, very frequently in the kernel layer and the container layer. The container layer where the core container components like Docker, container D and run C-aligned in are still facing a huge number of vulnerabilities. Not only the container escape privilege escalation, but also in the common execution, their amounts and configs. Even the virtualized secure containers like Kata containers, which were thought to be more secure are found to be not immune from some advanced hacks. Well, in short, the container runtime security is yet another cat and mouse game. Let's then dive into what are the recent advances of defense security techniques in the container world in response to the varied requirements and advanced attacks. In real world, innocuous and trusted process can sometimes become malicious during its lifetime because of bugs have been exploited by attackers or just triggered by or misused by users. A series of sandboxing mechanisms are therefore invented to help isolate a software component from the rest of the system. In this picture, we are comparing different sandboxing mechanisms and we will introduce today a recently one called Landlock, which has some advantages in performance, fine-grained control, embedded policy and unprivileged use. Different from a mandatory access control MAC provided by Linux security module LSTM in kernel, Landlock is the first MAC available to unprivileged processes on Linux since 5.13. Landlock is such a sandboxing technique to restrict ambient rights, according to the kernel semantic. For example, global file system accesses to set up processes to complete the limitations of a sitcom. It also helps create a safe security sandboxes as new security layers in addition to the existing system-wide access controls. So with Landlock, we are also able to compose access controls from multiple talents, for example, system domains, app developers and cloud clients. It's a really interesting technique to enable built-in application sandboxing to protect against exploitable bugs in trusted applications by embedded policy or directly against our own trusted applications via sandboxes or container managers. So container runtime is such a case that can benefit from Landlock a lot. Equipping containers with Landlock is an ongoing effort happening in the runtime spec and the run C communities. Here in this diagram, we are demonstrating the schema in the runtime spec where users can specify the Landlock unprivileged set's access control settings for the container process. The users firstly need to identify a set of rules and define them as a rule set. So in general, the rule set is basically the actions or objects that need to be handled or restricted. Next, the rules field specifies the security policies to be added to an existing rule set. And possibly needs is the only supported currently, which is an array of the file hierarchy type rules. In each rule, users can specify a fine grain approach of the file's hierarchies to restrict as the actions are allowed for each one. A best effort control is also provided to help the runtime enforce the strongest rules configured up to the kernel, which is currently supported. In general, unprivileged sandboxing can be enabled through this approach. Then let's take a look at an advanced attack called cross-ht attack. HT here stands for hyperthreading or hyperthreads. A cross-ht attack involves the attacker and the victim running on different hyperthreads of the same core. MDS and L1TF are examples of such attacks where an attacker is able to steal secrets from another co-located hyperthread through some set channels. The only full mitigation of cross-ht attacks is to disable HT, which is considered to be extremely inefficient. Cross-scheduling since Linux 5.14 is a scheduler feature that can mitigate some, but not all cross-ht attacks. It allows HT to be enabled or turned on safely by ensuring that only tasks in a user-designated, untrusted group can share a core. This increase in core sharing can also improve performance, however, it is not guaranteed that the performance will always improve, though that is seen to be the case with a number of real-world workloads. But in theory, cross-scheduling aims to perform at least as good as when hyperthreading is disabled. The basic concept of cross-scheduling is to allow users based on defined groups of tasks that can share a core. From this diagram here, we can observe that some tasks are grouped together. The grouped tasks are holding the same so-called task cookies. Tasks from the same group, for example, T1 and T5 can be scheduled on the same core. There are two other rules. The first one is that never mixing on the same core tasks from different groups. For instance, T6 and T7 on the right of the diagram will never be scheduled on to the same core. Another one is to never mix on the same core, the tasks that one is grouped and another one, which is not or ungrouped. Some CPU threads may stay idle even if run queue is not empty. In this diagram, we can find that T8 and T9 are in this case. Know that cross-scheduling can also help with some performance use cases, but we will not elaborate them here in this topic. Enhancing containers with cross-scheduling support is also raised in the runtime spec and run-sheet communities. In this schema, users are allowed to configure the cross-scheduling options for the container. For cross-gat, users are allowed to define the following operations. The create field can choose whether to create a new unique query for the processing in the container. The share to field specify the PIDs that the cross-gat cookie of the current process should push to and share from specifies the PIDs that the cross-gat cookie of the current process should prove from. All this provides support for setting and copying cross-gatuling task cookies between the container processes and the threads processes and the process groups which helps define groups of tasks that can be co-scheduled onto the same core according to the basic concept that we mentioned just now. The hardware software co-design is another emerging advance in the container security enhancement that is worth introducing here in our talk. For instance, the control flow enforcement technology delivers CPU-level security capabilities to help protect against common malware attacks. They have been a challenge to mitigate with software alone. CT offers two key capabilities to help define against control flow hijacking malware. Here we can find these two technologies. The first one is called indirect branch tracking, IBT. And the second one is called shadow stack. We call it SS. You can see from the left part that the IBT delivers indirect branch protection to defend against a jump or core oriented program. We often call them JLP or SOP attacks. You can see from the picture that IBT will prevent attacks from jumping to arbitrary addresses. And the SS technology on the right delivers the return address protection to help define against return oriented programming. We often call them ROP attacks. In this picture, we notice that SS will block core if return addresses on both stacks don't match. These types of attack methods are part of a class of malware referred to as memory safety issues and include the tactics such as the corruption of stack buffer overflow and use after free. Then maybe why do they use techniques in large classes of vulnerable code exploits in the container world? And most recently, data breaches have been increasing which is constantly suffering enterprises and regulators and customers. For containers, which are the defective vehicles carrying a variety of workloads today are facing rising concern about the privacy or the protection of confidential code and data within them apart from the security concerns. Consequently, the augmented threat model has come into being in this context. Different from the original threat model which considers the host to be benign and includes the host software stack into the trusted computing base, TCB. The intention of this threat model is to prevent the host software stack from accessing container data and code while in use. A new technology called confidential computing makes it all possible to provide isolation, runtime encryption and verifiability at the same time to fill in this blank. With confidential computing, we are able to completely remove the host software stack from the container TCB. This includes the host software which are host-from-ware, kernel, OS, hypervisor, which are all out of the TCB. And only the tenant can see and modify its data. We can see that infrastructure owner, sometimes this CSP cloud service providers themselves are not even trusted in this case. Now with additional support in the container runtime, a two-way sandbox for seamless security and privacy, we call them confidential containers, can be constructed based on hardware-based, trusted execution environment. There are in general two types of confidential containers. We can see them in this diagram. The first one is called process-based and the second one is called VM-based. The process-based confidential containers can be built on top of Intel SGX, for example. Process-based isolation is beneficial by joining the isolation boundary exactly around each container process. And this reduces the trusted computing base to some extent. In contrast, the VM-based confidential containers can be more straightforward, where the hardware will augment directly the VM-based secure containers, for example, Cata containers, to guarantee confidentiality and integrity based on that. Additionally, image management and the test station services should be provided to firstly ensure that the container image is always encrypted or signed. I mean, the container images should be protected in some cases and their post-processing is also need to be protected. And secondly, the ability to test the confidential containers which run on top of genuine and trusted hardware should be guaranteed and the workload is the exactly expected. Here, we are unable to dive into every single detail of the potential gaps that the original containers need to fill in. But we list a few of them from a bird's eye view. Firstly, confidential container runtimes need to adapt to different hardware-based encryption. They should also support a test station, keep provisioning and the secure agent APIs which are strictly limited to confidential containers. Also, for confidential container images, the image service offloading, layer encryption and a security context need to be added to adapt to the confidential computing usages. For confidential container OSs, minimize TCP and unified enclave abstractions as the top two requirements. We should also know that confidential container orchestration and monitoring, especially for the logging, debugging, error management and potential migration with CIA guarantees, should also be considered by the confidential containers. You may have the ambition to achieve a further stronger threat model. Yes, indeed, it should be possible, ultimately. Imagine we'd like to extend an intro container with more fine-grain isolation. In this case, for example, we'd like to only consider the sensitive data compartment within this container and the hardware it runs on to be trusted where with all the others, including the libraries and references by the compartment can be malicious. Actually, multiple fine-grain isolation techniques are already available to help this stream come true. They often have different execution or switching overheads. So one of the challenges here is to balance between the security and performance for different container workloads. One example is to integrate or combine the memory protection keys as known as MPK with containers. With MPK, it's able to tag memory pages within the containers using P keys within a permission register called PKRU, which is provided to allow the user space instruction to update PKRU so that letting the hardware to enforce the isolation between the compartments within the containers. Well, I think we might be getting the right feeling that advances of container security will never stop, just as new attack methods will continue to emerge. The future of container security still has a lot to explore and it's worth checking out. That's all for my talk. Thanks for listening. Any questions?