 Cool, so Welcome to our talk where we are going to explore an interesting field, which is called digital forensics And especially a technique that is going to revolutionize this field, which is called container checkpointing So a little bit about us. I'm Daniel Simeonato. I'm currently a technical marketing manager at CISDIG But I have a past in site reliability engineering And I'm Javier Martinez. I'm a software engineer currently based in Zaragoza and I have a background in In e-commerce currently working as a freelance So what is this digital forensics that we're talking about? Well, this is a basically process that is about retrieving analyzing and preserving electronic data but Related with criminal activity in the sense of we want to catch whenever there's a criminal activity happening in our digital Elements, but also as well Retrieving this evidence that should be enough proof that the crime has been has been committed Additionally, you might have heard the term the fear which is a wider area Which is digital forensics and incident response So it's this incident response as well about how do you react to these kind of attacks, right? And how to improve in a similar way to DevOps. There's a loop where you need to iterate and try to Respond better in case any of these incidents happen and at this point you might be thinking okay, but this might be interesting for secure companies or this might be interesting for you know Applications that treat very sensitive data, but chances are that you're running applications in the cloud So right now if you if you heard about Pablo's previous talk he mentioned that you know crypto jack-in malware run somewhere, you know, they are running while and As I said if you're running something in the cloud chances are that someone wants to run application there and for example start mining cryptos Since Pablo has well talked about the how Falco is the the webcam for any sorry the surveillance camera for any criminal activity I'm going to talk about the the cordoning or you know preserving the crime scene Do you feel familiar with this image if you have seen some TV shows or movies about crime? So do you know can anyone tell me what's the point of you know? For example try to put a cordon around the area with crime scene or putting these kind of flags anyone guesses Probably, you know Come on. Let me say Evidence, okay. Okay. Try try to locate. Where's where's the evidence right and try to Exactly no one no one is stepping in and probably breaking or try to tamper or Imagine that this Completely ruins the evidence not only that the people who is finding the criminal activity Might not be the same people who is going to investigate it and that's interesting if you see in these in movies as well They're waiting for the forensic team and they're going to analyze based on this and this is why container checkpointing. It's important Okay, this is working again So container checkpointing is a technique that is used to save the current container state You can think of these as a snapshot, right? It's going to take the current the snapshot of what's the state of the container or you can think of it as a backup But in the end we can find an analogy if you're playing video games like it's like saving my my game Right, and of course I can load it at a different time So this effectively in case that there's a criminal attack happening I could save these in order to retrieve these As criminal evidence, right? So you can see you can see why container checkpointing could be a very interesting technique for this And Basically, this will be the summary in base to get a container retrieve criminal evidence and finally Do that while the attacker is unaware, right? We can do that and because if he or she isn't aware they can quickly remove any traces and such and this could be a Graphic you can see it There's an attack happening and there's a container being affected and quickly we can create this Checkpoint right and we can start retrieving information on the commands that has been run or files modifies Or the attacker path, etc. We're going to see it right away, but first So container checkpointing it all started with a cryo cryo is a project that started as a proof of concept from virtuoso in 2011 and reached the 1.0 version in November 2013 where the first patches got merged in Linux kernel 3.11 So what does it mean cryo? It means container checkpoint and restore in user space the use case in virtuoso was a live migration of containers so stopping a container Executing on a host taking a checkpoint Transferring the checkpoint to another host and spinning it up on the other host basically what happens with live migration on Classical VM hypervisor like send or VMware, etc, etc But if you think about it, it's a little bit more complicated in container. Why? For hypervisor you have the whole virtual machine is virtual, right? So you have a precise control of the CPU register of the Ram of all the memory the hypervisor has a Complete view of everything that is happening So what happens in live migration? For example in send or QMO KVM, whatever is that? They start copying well the definition of the virtual machine is usually copied immediately It's just a spec file They start copying the memory pages of course memory the pages that gets Changed and modified are copied incrementally once they have Just a small amount of pages left to copy they freeze the processes They copy the remaining pages copy the CPU register and spin up and restart the machine on the other host This is not possible with container, right? We do not have an hypervisor Containers of course run Abstract up to the kernel and when we live migrate a container we might run on a different kernel So what happens? We need to rebuild the whole process tree We need to load the memory pages and we need to restart The processes also applying for example Security a CD nooks a second back set are etc. So it is a bit more complicated because I had a solution already, but it was using Kernel models so it had components right in the kernel and they wanted something in users, please That they could also contribute to the main kernel So The main case the main use case and what was it born for was container live migration But of course we are talking here about digital forensics So it has multiple use cases of course if you can take an image of a container so Processes running on a system You have a lot of possibility we will talk about those later In the wind cryo is used of course by open DC the platform By virtuoso it is used in Borg Google's internal container orchestrator it is used in Linux containers So Alexi LXD exactly for live migration and It is in recent years. It is also being used in Docker They added the checkpoint and restore functionality and literally also in podman With the work of Adrian river So let's see a small demo of our container checkpoint using podman actually So here I have Okay, the text should be readable here. I have a Ubuntu container running in podman. Thank you I will exit inside the container you see The container processes just tail on them now to keep it alive I have this script which is not part of the base image if I were to restart the container I will lose this file I will now execute this file and to keep it running is just printing the date every second I will now disconnect from the container So the script will be kept running and I will do a checkpoint So as you can see there are some option You can give podman while taking the checkpoint. I will just Give the export file, which is a tar dot gg file There is no standard for container checkpoint Yet there might be in the future as you can see there are other options by default Podman stops the container after the checkpoint. So after this is done We will see that I have no container Running on the system. I have my checkpoint which weighs about 45 megabytes and comprises everything all the process memory pages The image that changed files in the container As you can see, I even know container running and I can actually Go on and remove the container So there's no trace of the container on my system right now. I will start the restore What happens podman? We restore the container Preserving the exact same container ID So for example, if you have that container ID Bound I don't know to a system this service You will recover that ID The container will be running of course If I exit into the container, I can see all the processes are still running and most importantly They preserve the same process ID Remember we reconstructed exactly the same situation. That is why it is useful for digital forensics and as you can see, there's a 40 seconds or so that it took For my container to checkpoint and restore where the process did not print the date So there's that and back to you Javier And luckily we have seen when we have seen podman in action with checkpointing But luckily Kubernetes as well has checkpointing feature since version 1.25 Where this was graduated as as alpha and it's actually based on cryo as but the the tool that Daniel showed and We can see that currently we only have the checkpointing Out of the box for for for Kubernetes. We don't have a still restore with our analogy It's like we are able to save our games, but we are not able to love their games yet Okay, but we will see that we have other other ways to work past this and This will be the the procedure to to create a checkpointing in Kubernetes. We can call the kubelet API as that is last Checkpoint providing the namespace the pod and the container remember that these Checkpoints done at container level not pod level and kubelet will request a checkpoint Which will resort will result in a tar file being created under bar leave Kubelet checkpoints. Okay. Once one one little mention is that these file will only be available for users with root access Which is as well important because we will see that that this this file has a lot on it, okay Which requirements do we need for for this to happen? Well, we need to have both cryo and cry you we have seen cry you Which is the checkpoint restore in user space? But we need as well cryo as in the container runtime nothing to do any one with the other but it's it's easy to To misunderstood. Okay Once again, this is the cryo container runtime that needs to be enabled and we need to enable it as a feature gate Okay, with the usual command and of course we need to check that cry you is enabled or the support for cry You is enabled in the cryo config. Okay, and we will see a small demo for this later What would be the output the output will be a tar file as mentioned and it will contain an archive of all the change files as well images of the processes and Sorry, yeah processes memory file descriptors as well some metadata of the own file, right of the of the tar file that has been created Some bind mounts info for the container or even some stats and logs But once again, we're going to see it in action Okay, so this is a demo of the container checkpointing in Kubernetes This is just an example cluster. I have an engineering spot running on the default namespace I will now connect to the node where this is running Remember the checkpointing API is exposed at the cubelet level for now because of the security concerns Here I'm running an Oracle Linux 8 I Here I'm showing that I've enabled the cryo support cryo support in cryo as Javier said and also I've enabled the feature gate to container checkpoint in the cubelet Configuration again, this is a feature at the cubelet level. It's not yet at the API server level One last thing Especially important if you're on red at 8 sentos 8 or Oracle Linux 8, you need the cryo version 3.16 or later for this to work Otherwise the API will just throw you an error that it needs cryo at a later version so I just exported a token to authenticate with the Cubelet and Here is the call as you can see the cubelet answer just with a JSON with the file path of the Checkpoint and as you can see here is my checkpoint just Five megabytes. It is not compressed like the one in podman It is just a tar file There is still no Specification for the checkpoint Format 5 format So what can we do with this checkpoint right now? Let's go a bit into digital forensics in practice in practice. So here I have my again my tar file There is a tool which is called a check point checkpoint control. It reached the version 1.1.0 last week actually It is written in Golang. So you can compile it basically for Architecture and without uncompressing the checkpoint you can see you can show some statistics What was the runtime when it was created? What was the? Container engine etc. Etc. You can show for example these statistics here You can see the freezing time of the processes the time where cryo stops the processes and starts Dumping the memory page was under one millisecond. So it is Quite fast You can see for example the process tree all the processes that were running in the container at the time we took the checkpoint and If you do dash dash hold you can see all the information available for the processes which Open files every process add but most importantly the open sockets If you see Something suspicious in your container you might have a or for example in Pablo's example it was a Reverse shell running Exploiting log for shell you will see an open socket with the address Right so you can start investigating what was happening and You also have all the bind mounts and all the other information available, but it's even cooler This is all without uncompressing the checkpoint. You can also do a mem parse So for example for the PID number one I can just have a representation of the XO decimal and ask a representation of the memory I Can do a tail this would take a little bit longer for the worker process Because he needs to parse all the memory But you can see that you will find all the information and that is also why at the moment The API is still at the cubelet level for security concerns. Of course if you can output the memory You can probably read the keys secrets All the tokens that process as in memory. So again, it's available only for you root users the checkpoint and Still at the cubelet level You can do even more so Let's go actually inside the checkpoint and see what's there So you don't have to trust us. You see there are some dump files. They are just text files In most of the cases here you have a config.dump Here you have a spec.dump You can see all the configuration of the pod That was Check-pointed you can see the environment variables the current working directory The capabilities the configured one and the effective one at the system level The path of the volume on the root Etc etc and all the disinformation you can also see the Annotations on the pod. Of course this was taken on cryo But you can see also Kubernetes annotations on the pod you Basically you have every information you need to reconstruct an exact image of the The pod the container that was checkpointed and that is crucial when we are talking about 4 and 6 and preserving evidence Of course, you have the bind months. There are some limitations In what can be check-pointed which we'll see later will detail later. You can also have a Decomplete dump log so you can have all the times That's where it took cryo to dump what cryo does is basically Starts collecting all the information from the proc file system about the processes Then it does something really tricky which is Saves the code of the process and then injects a parasite code Which acts like an interface with the cryo and from this parasite code It stops all the processes and start dumping the memory pages, but it is quite quick There is also this tar file, which is root FS dash diff Which contains all the files that have been written to the container to the overlay FS After the container started for example You have all this cache directory, of course that gets created a runtime that are not in the base image And that you would lose if you didn't have Container checkpoint or for example bash history if an attacker Didn't disable the history but of course Let's go to the interesting part inside the checkpoint. You have a lot of EMG's file files. These are binary files. So not really human readable If you were to open one, you would just see mumble binaries for the most part But actually Cryo and in cryo repository you can find tools That allow you to read This is for example opening a file that allow you to read these binary files One of these tool is called a crit It is distributed with cryo or if you install it in packages you can find in a package It is Python based it uses cryo Python bindings You can for example explore Checkpoint directory and here you see you get the process tree with the the pids etc etc You can See for example the open file descriptors and again You get everything that you check pointed with the crit but also you can Do something similar to the checkpoint control and parse the memory. So for example Opening the memory pages file with crit What you can actually do and we do not show also on the demo is that there is a script in the cryo repository Which is I think it's called core dump Maybe it's unfortunate naming But whatever you can actually convert This image pages image file into core dumps Which you can then debug using gdb or whatever other tool you might want to use to analyze all of this and Yeah, here for example, I'm opening a core process image Just the one and you see you get all the low-level details This low level you weren't really able to see with the checkpoint to control. So If you have a really low-level investigation going on probably your experts will appreciate having all these details available and With that I think We can go ahead Thank you Daniel, okay, so we have seen checkpointing in Kubernetes in action and You might be wondering as well This I mentioned earlier that this needs to happen Without the attacker being aware of this happening and as Daniel said this happens a very quick time I mean it's like for a second as I said This is a snapshot. So it's it's frozen but for a very small time So it's it's it's not noticeable by the attacker and as well There's no interference with the running processes, which is also very very important additionally We haven't talked as well. We have the availability of restoring Which is the the opposite is like we have a check point available and we can load it in a controlled container like a sandbox This is useful for example once again for the for the investigation to happen But we will see as well that there are multiple Other uses for for checkpointing is last restored Once again, I mean restoring is not available in Kubernetes. So you will need to use podman and Cry CTL as it said and in case that you want this in in Kubernetes There is the possibility that you can build an image using build up Which is a tool that you can based on the checkpoint. You can create an image It has some limitations, of course that we are going to summarize Of course, not everything can be checkpointed You have seen that this quite a thorough list and when I when I mentioned that this is related with the container state But things that are not going to be checkpointed for example are the devices or open files from a mounted file systems or even Processes that they are already being traced. These are not going to are not going to happen. So it has some limitations And as well the Kubernetes checkpoints has limitations of its own First we have seen that cryo has to be enabled. For example, if you're running Eks in AWS this the runtime is not cryo. So you're not be able to do You need as well cry you version 3.16 or higher and There are some security concerns as you have seen that the dump contains all the full memory, right? The raw so you might be exposed by I mean the whole file could be as well Prone to to leaking some important data and of course if you're using container D There's no current support. I Mentioned as well that this talk is focused on the Digital forensics, but of course the sky is limit when talking about the the usages because we can use it for example to Backups we can talk, you know cryo was used as well for Container like migration or we can use it even to Reproduce problems that are happening in production We can take it a snapshot and we can reproduce in a more controlled environment or a sandbox Once again, and we can use it to for two hot star applications that are critical, right? So because we are basically Loading our game in our inner in the best spot. So to say And I want I want you to recap what we have seen in today's talk first the checkpoint It can be crucial for digital forensics and easy and response as well that this is so to say a snapshot. This is a It's an image of a frozen time a frozen moment for the container This is available in Kubernetes only for checkpointing restore is not available and Beware of the requirements because we have seen that we have both current this requirement and both cry your requirements So these are some of the sources Which you can get in the presentation There are a couple of Kubernetes blogs by Adrienne River again from Red Hat and There's also a talk at the QCon 2021 Adrienne is responsible for the PR in the Kubernetes and also he worked a lot implementing container checkpoint and restore in Padman, for example and in cryo I think is also working on the cryo side, but the process is a bit more involved there and of course the the PR where it all started the API reference the repository for cryo and checkpoint control and There's also a sysdig blog by Alberto Pelletieri from the sysdig threat research Which explores this functionality? Analyzing malicious container So with that, thank you for your attention and if there are any questions We'll be glad to answer them If you scan the QR code, you can get the repository with the slides and all the files we used for the presentation and the links