 Welcome everyone, I'm Philippe Le Foucaillet, I'm a Distinguished Engineer in the Secure team and today we're going to talk about Docker in Docker and why it matters for the Secure team and for all customers. Hopefully you can see my screen now. Can you see that as well, the whole presentation? Awesome, so let's get started. I have a few slides before we go through the questions. We're going to go through why we are using Docker in the Secure team, why it's important for us. How do you set up that with a GitLab runner? If you use GitLab.com, obviously you don't have anything to do, but the point here is to deal with the customer setup and how to build up again and start and work faster, sorry. We're going to go through the pitfalls that we've seen so far really quick and make sure that you understand the work architecture and why it's not working sometimes. Just a few words at the end to explain why we want to get rid of Docker in Docker. That's going to be interesting with the live team. So what do we use Docker in Docker in the Secure team? We use that first of all in a few products, not all of them currently SAS dependency scanning and container scanning. For SAS and dependency scanning, we use that because we need to have a kind of orchestration layer where, for example, the SAS job is kind of an empty shell that we just detect and be able to map the language and the frameworks that we're using in the project with the analyzers that are available. So based on that, it's going to download and run the analyzers locally with the current project, gather all the results into a single output format and output that format to create the report. So we could achieve the same actually without this orchestration layer, but we still miss a few features like ignoring some path or aggregating the analyzers together. It's only possible to have one analyzer running at a time if you don't have this kind of orchestration in place. We also do that because we don't want to have a huge single image where we would have everything or so because it might introduce some new and expected behaviors like, for example, tomorrow we want to support a new field framework for PHP and that framework is relying on some non-GS dependencies and the versions that we have in the image are not exactly the versions that would be compatible with this new framework. So it's going to create incompatibilities inside this image and on top of that, it would end up with a single huge image that would be updated every time we want to update something in any of the tools. So that's absolutely not something that we want to do right now. Container scanning is a bit different. We need the Docker server to run the Clare analyzer. Clare is going to run locally the image and analyze the layers. There are some ways to deal with that other than having a Docker server, but it's a bit more work and we're not there yet. So we have some issues to avoid that, but it's going to be a bit more complex than just a SAS dependencies scanner. Really quick, how to get started on GitLab CI. So quick reminder on how Docker is running. Docker, it's a client-server application. Keep that in mind. When you type Docker into your terminal, you actually use the client version of Docker, but this client is completely useless without the server. So you need a server to run the containers and to make sure that the images are stored somewhere. So the server will be in charge of managing the networks, the containers, the images and the data stores. By default, Docker listens on the socket instead of the port. It has been the case for a few years now for security reasons. It's generally this file that we have here. It can change from one system to another, but it's commonly this file that we're seeing nowadays. The Docker client will use this socket by default unless you have Docker host environment variable specified. It's the case, for example, in some jobs and especially with other DevOps, if you have rumors running on the Kubernetes cluster or other things, you need to set up Docker host a bit differently. Something that you have to keep in mind as well, Docker requires a lot of capabilities in your system and it has to run as root. So the Docker server always runs as root. That's important for the rest of this speculation. Getting back to GitLab Runner, this is how you configure a privileged GitLab Runner because this is their requirement. If you want to run Docker in Docker, you need a privileged GitLab Runner. And this is, I would say, the top one pitfall that we're seeing when something is not working with the security product. It's because you don't have this privileged line inside your Runner's configuration. So that's the key point here. And you use that actually. The second part of this page is actually a GitLab CIEML file where you can use a Docker service and you can see then here it's going to start a Docker server and style the service that you can use inside your script. You're not supposed to use the Docker server that is on the Runner and we're going to see why in a few moments as well. So I told you, come on pitfalls, there are a lot of tracks around the security products. The first one is not using the right runners. I said a lot of times, multiple runners are set up on the self-hosted instance and when the job is running, it's not using the privileged runners. So make sure that the right tags are in place and the right runners are being used. The second one is using the Docker executor but not privileged. So as I told you, having this line here privileged of course through is essential. Otherwise, Docker will run, but it's going to fail, I'm not sure. The third one is also pretty common and I'm going to have a full slide on that, mounting the Docker sockets. It's pretty confusing because if you read the doc, the official doc of GitHub, there are many mentions of mounting the Docker socket into the runner. So it's super confusing because you might think if I want to run a privileged runner, I should mount the Docker socket. It makes sense, right? Not that much. It's actually this line that you see at the top of the screen. You have the varrundocker.sock mounted to the same file inside the runner container. So every runner is actually a container. But you have to understand the war picture and we're going to drag down to the base. On the host, we have a Docker server and that Docker server will run the Docker, sorry, the GitLab runner. When we run a job with the Docker service, we want to have our own Docker server so that we are completely isolated from the host. When you run a job like the SAS job, inside the job we are running Docker itself to call SAS because we want to moon the local directory to a slash TMP. It's oversimplified. It's not exactly the same, but you get the idea. We're running Docker inside the Docker of the GitLab runner. The problem is when you do that, the Docker socket that you are using is not the one that you think it's not the one of the Docker service. It's the one on the host. Not only are you creating a security issue because your jobs are going to run directly on the host, but also when the job is running, this PWD, so the current directory that we have there, it's not exactly the one that they're thinking of. In the context of the job, this folder contains a lot of files and all the files can be maintained in there. But on the host, guess what? The folder doesn't exist. The Docker server on the host doesn't have any clue about this PWD, the slash build slash your project. So in the end, it's going to end up with mtdir. You're expecting to see a lot of files and you don't see anything, and that's a common question that we have. It's an mtdir. What's going on? My SAS job doesn't see any of my files. It's because of this. You are talking to the wrong Docker server. One doesn't know anything about your project. So that's probably the most common pitfall that we're seeing. And I will take questions on that a bit later. So Docker and Docker is really a fool for us. It tends to isolate the containers, the context, and a lot of matters, but it has a lot of drawbacks and that's why we want to get rid of it. First of all, if I take back my overall picture of the Docker executor in the GitLab Runner, we have one Docker server here, one Docker server in the GitLab Runner itself that will be spin off as shut down for every build. And in the SAS job here, we are using Docker run, et cetera, et cetera. The problem is when we do that, the images are going to be pulled on the GitLab Runner Docker server. So if I have, for example, Java and Python, and I took, that's a Sam's image, by the way. I know it's not the right logo. That's a private joke. If you are using the JavaScript analyzer and the Python analyzer that are going to be downloaded and run inside this context, but guess what, after your build, once the job is done, all this context is going to be removed. So all the Docker server here and its data is going to be completely wiped off from the host. And so there is no more Python and JavaScript analyzer images anywhere. There, the cache doesn't persist anymore. So if I want to run the SAS job again, I will have to run that and to download again the images. So if I have a lot of images and a lot of analyzers, I have to download that every time. It's not great for the cache, but it's not the main reason why we have complaints about customers on Docker and Docker. The main reason is because that's demo time. So I will stop sharing there and share my terminal instead. Let's make that a lot bigger, even bigger in that. Okay, so let's take a very tiny example. Let's say I want to run Alpine on my machine. I'm on the Mac here. It's pretty straightforward to run. Demo time is always on. I don't get that locally. All right, here, I'm inside the container itself. So you might think I'm completely isolated. It's actually the case unless there is zero vulnerability inside Docker. So I can wipe out, for example, the slash user folder without any problem. It's going to crash my if I exit the container. If I exit the container and restart again, I will have the fresh new version of that container. So that's cool. The thing is inside this container, I have a bunch of files that are specific to this container. For example, here, I don't have any access to any device that would be on the host for security reasons. But if I do the same, if I run pre-religed, if I run the same container in a privileged mode, then look at that. I have something different here. And I can even... Oh, there is a device here. See that? I can access this device. What's this device? It's actually the boot device from my virtual machine. So Docker on MacOS is running inside the virtual machine. If you don't know that, you can check out the Docker documentation. So that's why this device here that I'm seeing is the one inside the VM. But I'm already outside of the container. So I can crash this device. It's going to crash my World Docker installation. It's not really a big deal for my host, my Mac machine instead. But guess what is going to occur on a Linux server or any self-hosting installation? If I do that, I'm outside of the container. That means if I am able to access the device, the hard drive, and any other device, it basically roots on the host. And the same goes, for example, the proc file system. Let me use that. All right. Just what could be passed. It could be easier. For example, if I check the swapiness parameter, I can change that. Let's make a 61 to this file. If I cut it, it's 61. I get out of the container. So here on the host, I run another container. If I do the same here, you see that I changed the parameter directly on the host. I'm changing the behavior of the host. That means, yes, I'm basically root on the host and you certainly don't want to do that. And that's why your customers are not really up here with Docker and Docker because it creates some security flows. And they have to be very careful about that. So they're likely, they will not install Docker and Docker because they want the security products. And so that's the end of my presentation. Do you have any questions? I hope it makes sense. Hey, this is DT. I'll jump in with a quick question. Can you go back to the slide where it showed the mounting of the socket being the same? Yeah, this one. And this is great. I'm going to have to go through this presentation one more time to get some of the finer details. But can you explain that dual, the mounting there? Is that the part of the core problem of the visibility between the runner and the host? And is that a requirement? It's absolutely not a requirement. It's actually one of the pitfalls. If you do that, if you move this volume to the GitHub runner, Docker instance, you're going to use the socket on the host. So you're going to use this server. And this server will actually be just on this side and you can't actually hit it. That's right. That's right. And that's since it's mounted there. When this server is going to start, it's going to say, oh, it's, there is already a file. There is a server running. I'm stopping there. So this server is actually not existing. Exactly. So is that volume specification? Is that something that's default or we were documented or like, how do we get that step pitfall? It's not by default, but I saw that multiple times in the documentation. Got it. And that's why I also saw some customers being super confused because they started to configure a GitHub. I would say the regular way with the Docker executor, et cetera. And they discovered afterwards that they need some privileged runners for the security features. And at that point, they still have the first runners that they have set up and they are trying to set up this new one. And in the documentation, if they are not on the right page, they will probably see that. And they think it's a good idea to win this socket. And actually it's not. If you do that, it's not going to work at all. That's a quick smell for troubleshooting there indeed. Exactly. Okay, thank you. If you see an empty here, if the result of the scan is empty, probably that. First thing, check that, check if the volume is mounted. It's likely to be the reason. Okay. If you understand this kind of Russian doors paradigm where you have a Docker running Docker, that will run Docker. If you understand exactly where you are in, at which level it's going to be a lot easier. And here, you can see that we are bypassing completely the layers that the job itself is running directly on the host. Exactly. Absolutely not good. Okay, great. Thank you. I was expecting questions on on gitlab.com security. That's weird. Because we are running the security checks on gitlab.com itself. That means we are using privileged runners on gitlab.com. So with what I've just shown you, it should figure your curiosity, actually. Anyone knows how it's working on gitlab.com? I can jump in there quickly. I think it works a bit the same as what you were showing with your local VM. So all the jobs are run in a separate virtual machine and that virtual machine has a Docker running. So we are not so much concerned about if a job is being able to break out of its Docker. You're absolutely right, Bernice. Thank you for that. So it's part of what we call the autoscaling feature on the gitlab runner. Autoscaling means we are going to spin off as many runners as we need. The point is we are also able to spin off runners for every boot. So all the runners have a configuration with the max boot set to one. So one runner is used to one boot and one only. So there is no way of collision, for example, a runner can be used twice for two jobs and we don't have to care about isolation between the jobs anymore. So we are spinning off a full VM for the job and you can see that we are adding another layer to the Russian door system that we have here. The wall host is going to be wiped out right after the job. So we don't care if there is a security issue and the user can get out of the Docker container. Right, any other questions? Yes, follow-up question to what you just said. You are saying customers don't like to run privileged mode, understandably, but would it be a model to run it like we do it on gitlab.com in a virtual machine? Absolutely. There is a way to check that and we are a good example but it's really very tedious for our customers to have virtual machines being created and wiped out. It's a full new environment if they want to get started and that's usually one of the drawbacks of having Docker and Docker. We are dealing with customers that are evaluating ultimate so they are in the process of a POC and they don't have unlimited resources, they don't have unlimited time and doing that kind of work around it's really time consuming. So you want to do that if you have a very stable architecture like we have on gitlab.com If it's just to evaluate the security features that's a lot of overhead. So that's why they generally prefer to just remove the SAS, for example, layer, the SAS request to run directly the analyzer on the project so that they don't have to create any privileged runners. I have a question about our SAS orchestrator that will go into history but it has a convenient detection functionality. So are there any plans to somehow recreate this functionality to benefit from automatic detection of the project by different images of analyzers? That's a great question I'll be talking about. So actually the detection mechanism is hosted in every analyzer. They come up with a way of saying if there is that kind of file in the repository I'm a compatible analyzer. So all this business logic is already held in the analyzers themselves. So in the future where we will get rid of Docker and Docker and the SAS orchestrator, all the analyzers will have to run and they will exit very quickly because they will say I'm not compatible and they will exit right away. You might think that it's not performant but actually it's more performant than what we have today because all these analyzers when you run them directly instead of inside SAS they are going to be cached on the GitLab runner. So all the GitLab runners is running. If you use a Docker executor and you have an image declaring the image type of the GitLab configuration the runner will pool this image. This image will be stored in the file system for a long time. I don't know exactly how long but it's going to be there. So for the next execution it's going to pool again but since the layers are already there and it's likely that they haven't changed it's going to use the layers directly. So all the analyzers will be already there and they will exit very quickly. Also because by doing that they will run simultaneously in parallel instead of sequentially like we have today in SAS. So it's not perfect but we can imagine that in the future we can also have some detection mechanism ported to the GitLab runner but it's for the future. Right now it's going to work right away just by removing the SAS layer and having the analyzers as the first level. Does that answer your question? Yes, perfectly. Awesome. We are at time. Is there any last question I might answer? All right. I really hope that was useful for you and for the customer success and support team. If you have a question please feel free to reach out on Slack and I will be happy to answer that. With that I will wish you a happy end of the day. See you all. Bye bye. Thanks Philippe.