 So, so thanks, like I already said, I'm going to talk about what I had to do to integrate SELinux into CreeU. The goal was to enable container migration with SELinux enabled after and before the migration and to have the same context after and before the migration. I am at Red Hat's kernel team, I'm doing process migration, what CreeU does for at least for, I don't know, 10 years probably I'm involved in CreeU since 2012 when it came up and basically all I'm going to talk today about is all the fault of some CI system which forced me to finally look to in it because honestly this is all my fault because I knew that SELinux support in CreeU was missing but I kind of kept on ignoring it for a few years so that's a story of a long-lasting technical depth of what I want to talk about today. So this all started in around 2011 when CreeU was first introduced at LPC in 2011 and at that time there were multiple checkpoint restore implementations which were existing, which were implementing differently, completely in user space, completely in kernel space using syscall interception. There were a lot of things and CreeU kind of was the result of many different implementations at the time and CreeU stands for checkpoint restore in user space and it does a lot of things in user space but it uses existing kernel interfaces as fast as possible. And to understand a bit why the problem with SELinux existed for such a long time with CreeU was the first thing how I got involved in CreeU was I introduced a Fedora package into Fedora for CreeU in 2012. There was a process I followed it and in Fedora 19 there was CreeU available for all the users and at that time it didn't have SELinux support but for RPM packaging to would have picked up SELinux support at that time the packaging would have to have known about that it should install that lip SELinux in the build route and then CreeU would have been built against SELinux but this didn't exist because CreeU didn't have any SELinux support so it just worked without SELinux and for like three years nobody really looked into any SELinux support for CreeU. So in 2015 Tyco implemented for CreeU SELinux support and his work was focused on up armor to get it working with Lexi, Lexi to do container migration there with keeping up armor context correctly and what he basically did is during check pointing he just read the value out of current for each process CreeU dumped to disk and during restore the value was read back to disk and once the process was created for each process in threat he just updated the current with the value from the restore. That was the up armor support and SELinux support basically looked like this so it basically said if you're not running in an unconfined context CreeU will just abort so if you're running with SELinux enabled and the context is unconfined then okay I will not do anything I will not read out the context I will not restore it so it probably will work for you but if you're running in something else CreeU will just ignore it and bail out and you're on your own it doesn't work anymore. So CreeU had now some limited SELinux support but a CreeU package didn't because I didn't put in the build requires in 2012. So this meant that the package in Fedora didn't support SELinux and also when I ported it to REL the REL's CreeU package also didn't support SELinux. In 2018 I started to work on getting CreeU into Portman for container migration and I implemented it all and submitted a pull request and I was asked to provide a test and I wrote a test and it worked because at that time Portman was still running on Travis and Travis doesn't have SELinux enabled so all the SELinux things which Portman tried to do were not happening and so CreeU was still happy the restore was still happy nothing broke at that point of time. At some point Portman moved away from Travis to a CI system with SELinux enabled and checkpointing and restoring containers still worked because the CreeU package was built without SELinux support and then when I continued to work on new features for CreeU and Portman I had to build CreeU from sources because the things were not in the Fedora package yet and so the CI system built for me the latest CreeU package the CI system had SELinux installed so CreeU all of a sudden was running with SELinux support the limited one it had and at that point nothing worked concerning container migration anymore and CreeU broke in multiple ways which I found out later but the first thing I saw basically was this denial and it's a strange denial because as far as I understood to read it it says source context the third line it says the source context of the denial is from a process running in container T context and it tries to talk to something target context I guess to container runtime T and the process is top so that was was running in the container and it tries to do a connect so somehow the process in the container tries to connect to the outside of the container for some reason which looked really strange in the beginning and I titled the slide as parasite code and the reason for this is the is CreeU's parasite code so I just want to shortly introduce CreeU's parasite code and to dump memory of a process from within the process is namespace CreeU uses something which is called parasite code parasite code is injected into the process using ptrace and then it's basically a demon running in the process to be dumped and this demon waits for commands and it connects to the main CreeU process running outside of the container in our case and the parasite code is removed after usage and the process doesn't know it was under control but basically this explains the thing so you see the parasite code is injected into the top process which is running in the container and now the process in the container wants to talk to the outside of the container which seems strange and shouldn't really be necessary so as a Linux block this and this was correct and so I had to fix it in CreeU and this wasn't really too hard to fix it because I just had to read out the context of the process of the container and label the parasite code which is running on the outside of the container using setsoc createcon to be able to enable the parasite code in the container to talk to the outside of the container after that I'm resetting socket context to the default and everything is good so this was the theory and this is where it got really complicated because the CreeU package didn't have correct as a Linux support so to enable the correct socket labeling I first had to fix all of the existing or non-existing CreeU as a Linux support so my first step was to do just whatever armor does I'm reading out current during checkpointing I'm writing it back during restore the one thing which seems to be different I need an additional policy to allow a process to change from one context to another because usually it's not except expected that processes are changing their context on their own and that's what CreeU would do during restore so it would start as the content if you talk about a container context it would start as the process out of the container and during restore the process would be relabeled to be a process inside of the container what it actually is and the idea I guess of as a Linux is to use setexeccon to set a context of a new process you're gonna execute but because CreeU uses clone to recreate all the processes it wants to restore and this this didn't work for me so I had to talk to as a Linux policy maintainers to get me a din transition policy to enable CreeU to change the context of a running process during the restore so this meant when I enabled this year in the Fedora package with the existing newly implemented as a Linux supporting CreeU this would break a lot of users because all of a sudden they used to expect that their package that CreeU would just restore and a checkpoint and restore any process they have all of a sudden if it's not running in the unconfined context it would stop right now so I had to coordinate the changes in the Fedora package with the din transition policy for the containers and now we have a transition policy for containers which allows containers to change from container runtime t to container t during runtime and we added an additional Boolean in Fedora to allow processes to change their context if the user wants it on the system but they actually have to enable it manually but at least there is a way if they come to the point where it fails to enable it manually so everything seemed to work so far I had to wait all the changes to appear on potman ci and most of the problems at this point were gone with container migration and as a Linux policies but there was again ci arrow it seems kind of unrelated because the container migration which failed was the migration of a redis server so this was I only introduced this to be able to check the migration of t established tcp connections and because redis was the one thing which was already used in the ci I just reused it opened the tcp connection migrated it and checked that the tcp connection is still there but this thing failed and this was the only one which actually uses threads so this I actually only tested it with containers which contained only processes and no threads and this was the only one which was using a threads and the as a linux arrow message I got now was totally different and it took me a while to understand because it was actually an error and I found this line in the linux kernel which says that it's only possible to change the context single threaded processes in as a linux and I saw that this line was mostly unchanged since since 2008 so it seemed pretty unlikely that I would be able to convince someone to change this line there so that I can change the context of threads with as a linux I later found out that it's possible if you are using bounded as a linux policies so they somehow are connected together as far as I understood it but talking to the policy maintainers in fedora it seemed nothing is using bounded policies there so I stayed away from from using bounded policies for the container migration and the and the solution was basically for up armor when we're restoring a process and we are creating threads we are also writing to the to the we're restoring the context for each thread with up armor and crew and we don't do this with as a linux so we only set the context for each newly created process and we then create the threads later with the label of the previously created process this means currently we cannot migrate containers which whether threads have different labels than the processes in the container but as far as I have seen it at least for the potman migration case all processes in the container so far seem to have the same context so this is this is an limitation but which is not relevant to the use case I was looking on so I we didn't change the third context only the process context this led to another problem with grue so to restore a process grue there's something which sometimes we call the PID dance you want to restore the process with the same PID it had during check pointing so there's a file in proc where you can influence the next PID of the resolved stored process and this file is not writable from the container run from the container T context the containers running in so we relabel the file to be writable from a container during restore and so we were able to correctly restore processes containers with multiple processes and with multiple threads and it looked really good so see I was happy everything seemed to work but at that point I found another problem so this was actually not found by see I but by myself so if I migrate a process from one system to another and it's it uses sockets and I try to connect to the socket to accept fails after the migration and the reason is I didn't correctly relabel the sockets of the migration I only label the processes but not the socket so this was basically solved by grue so I use I'm reading out the label of the process of the socket during check pointing I'm writing it back during restore I'm reading out the also the attribute which is stored in sock create to make sure that newly created sockets in the restore process only also are created with the correct contexts if the if the process in the container actually has a different context for sockets so then there was a small thing this was not fatal in any way for the migration but we had as a Linux error messages so the log files were not labeled correctly so I just let podman create a pre-lock files with the correct labels then were a few additional messages where we had um file descriptors leak into IP tables which we used to um block the network during um migration of um containers with established TCP connections and at that point we were actually able to migrate all the containers in CI and all the ones I'm testing with uh without any as a Linux failures and all labels restored correctly so the steps we had to do to do the steps I just mentioned here today where we had to label the socket for the parasite demon correctly we had to read and write the value from a proc attribute current correctly we had to allow different dint transitions to be able to migrate one container to be able for the restore to change the socket to to change the context of the process we had to relabel and his last pit to be able to create um threads correctly increase we had to fix the socket labels pre-create all the lock files and make sure we don't leak any file descriptors um out into process which we are calling during restore and with that I'm already in the first link is also a blog post which contains all the information here and I'm already at the end thanks for the attention any questions about questions so could you tell us just a little bit about what you had to do in order to get the order of execution correct um because it would seem that when you're restoring restoring the container you're going to have to be very careful about getting the order that you're setting things up correct so so we we try to risk so one of my first tries was I'm just labeling the process as early as possible I'm telling career I think actually one of the first things I was using I was using set exec con to um run crue completely under the destination context um but the problem is then crue does crazy things during restore and so I had a I like I don't know like 50 different as illinux denials during restore and so we changed it to restore all the labels as late as possible so we do as much as possible during the restore and really one of the last steps is to restore the labels of the processes we can do it earlier for the sockets there's no problem because they are not really used once they are set up but for the processes because crue morphs the processes itself from the crue binary into the restored binary and so we try to do the things um as long as possible in our original context and just basically I think the third last thing is setting the as illinux context and second or up our more questions let's thank the speaker thanks