 And thanks for inviting me to this room. I already have great news for you. After this session, it will be raining outside. So enjoy. All right, so today we'll be discussing a trust more container images. This for less, if you're familiar with the concept. And in particular, how and why we built into this for less container images. Before starting with the talk, I have a story that I'd like to tell you. This session was originally scheduled to happen in Austin in the last June edition of the Open Source Summit. Some unfortunate event happened. And I'm going to tell you very quickly the story. I was going to the US for the first time in June. I was going to spend the month there trying to optimize my traveling. I was going to have another event end of July. And first time in the US. So that's the first part of the story. Second part of the story is I live in France. I'm French. I'm sorry. Nobody's perfect. And I don't know if you know Paris. But we do have these big old buildings. You can imagine like seven stories building without elevators or very small elevators, very small irregular stairs as well. So you can imagine me, not too tall, up the stairs with my two big legages. I packed everything I could because I don't know the US. And inevitably I fell in the stairs, broke my leg, and I had to reschedule the session. Hopefully the Linux Foundation organizers were super kind. So they found me this sort in the WN edition. But it is much shorter than the one that was supposed to happen in Austin. So the session is less. Hopefully we should be able to cover all of the topics that I want to. And nicely, I just spent two minutes telling you my life. So we have two minutes less to discuss all of that. Another thing that there's always a good side to every story is that in the meantime, since June, a lot of stuff happened. So I was able to change the end of this presentation to discuss new things that we just announced around the container images, around the Ubuntu container images with a new concept that we called chiseled. So we'll be discussing that at the end of the presentation. And hopefully by then you should be happy that I actually broke my leg. Obviously I'll ask you to ask to hide if you're happy because it's still a bit painful. So let's get started. Another title for this talk could have been how to get the advantages of Linux distribution on the container image but without the overhead, in particular the size overhead. I'm Valentin. My Twitter handle is there. I'm working for Canonical, product and source contributor since the past two years. I'm the product manager for our container images and anything that we're doing in the OCI container space. I'll be there after this presentation, but probably you'll be off and tomorrow as well. So if you want to come and chat, you're welcome. I have the green sticker on there. For the fun fact, by the way, when I joined Canonical a few years ago, we were not the publishers of the Ubuntu obviously, but we are not the publishers of the Ubuntu base image that you're probably using in your projects or development often. And now a change, we have a great container that's in Canonical that is working on that. We have exciting stuff that is coming soon and in the future. But yeah, it's always good to know where we started from and all of this is going to tell the story of why we, and how we built this Ubuntu distributionless container images. So you can imagine that if I left the end of summer in Nice in the south of France, having the afternoon meetings on the beach, I have a few colleagues don't tell anyone, but anyway, I came here into Dublin. That's not for leisure. That's because we do have an industry issue and we need to cover it and to discuss it. So we're going to start with that and the why behind all of this story. So there's a general software provenance and the security question in the open source software. There was an interesting talk, I think it was on Tuesday and that discuss a big part of the why it happens which is how to found open source software. We're not going to discuss that today. We're going to discuss the other part which is more like how we structure open source software. And especially in container images where this issue is even more real. For a very simple reason, it's that people are still thinking this sentence that because you're running something in a container, there's a wrong belief that you're secure and you're going to be fine. And it's partially true, you're isolating your process from the US, you're isolating your processes from the other ones, but still, you're still running something and it's not enough to make it secure. There's this nightgift that illustrates the situation pretty well. Look at this poor guy. It is containerized, you can see it. However, it is still running something and well, the question is what do you really know about this something that is still running? And the data shows not so much. In fact, almost all of the third-party container images that were running on the cloud infrastructure had known vulnerabilities inside them and attackers could explore them. That's what we often call the CVEs for common vulnerabilities and exposures and that's a lot of good smart words. But what does it mean exactly and why are we looking at CVEs? So in fact, the issue is not so much that we have security issues. We are always going to have security issues in software and that's fine. The main issue and although it's generally is a good thing is that we are sharing so many dependencies in our open source software, in our software that we are delivering, in container in particular, you can think of shared layers in particular, these base images that we are using to ship our software. And because we are sharing this in so many downstream usage, we are also sharing all of the vulnerabilities and issues that come with these dependencies. So whenever there's a known exploit in one of these shared layers and in particular in this one, well, everything fails and falls apart. You can think of the log 4G vulnerability that happened at the end of last year, of course. And I love this XKCD drawing that shows the issue and I find it funny because I would say it's funny because it's true, it's also sad because it's true but that's what we need to work with. So running a quick Docker scan through some of the most popular container images on the Docker Hub shows that they all had vulnerabilities, non-vulnerabilities inside them and most of them had a high and critical one that you can actually exploit to do something, steal data, compromise data, anything fun that you can think of, I'm sure you have great ideas. So we do have a problem. Question is what do we do now? And our longstanding answer to that has been reduce the size of your container images, make them smaller. And if we were to trace the number of vulnerabilities in a container image as a function of its size, we probably would find something like that. I wanted to make it with real data, it's not with real data, the connection at the airport was not so great and also I wanted to sleep a bit so. Anyway, that's what you would get if you were to trace this curve. And it's pretty intuitive if you think about it. The more things you put into your container images, the more likely you are to include security issues. Thinking reverse an empty image is going to be the most security can do. There's a good comparison with that is that if you actually want to secure a server, there's one solution, you just unplug it from the network or shut it down, I saw a sign in the audience. And that's actually not for me, that's from Guildfoil and the Silicon Valley series and that's also from a lot of secret engineers. So although this is probably true, that's going to secure your use case, it's also not useful because you don't have a use case anymore. So in fact, it's also confusing correlation with causation. So it's not because an image is smaller that it is actually more secure. And although it is true that the smaller the images, the more secure, it's not the reason why. There's a lot of examples of correlation versus causation. I love this one. It shows the power that is generated by US nuclear power plants. And it also shows the number of people who drowned while in a swimming pool. I think it's also in the US. And according to data, the more power is generated, the more people are likely to die in their swimming pool. And I'm sure there's a good explanation for it actually, but I wouldn't take this one as an answer. Also, there's another XKCD drawing. I was planning to take a water break now, but I did take it, but I'll leave you some time to read it. All right. So if the only thing that matters is not size, then what it is. And how to smartly reduce the size of your software while effectively reducing the chances of vulnerabilities. I recently discovered through this tweet, actually, the Slim.ai tooling that is pretty cool stuff. And it gives you this fancy UI to optimize, to help optimize your Docker images. And it tends to support that the idea of provenance being the first factor when looking at how to secure your content images is actually a good one that you look at for as a first action that you can take, looking for supported and stable content that you can use for your images. It also reminds us that the correlation trend tends to omit the optimization outliers that you have and that you should be using. And it's particularly important to me and this discussion as the Ubuntu Basement has been this particular outlier for a long time and it's been very difficult for us to optimize that. We're already pretty low in security, but pretty big in size. And so that's what we are going to discuss now. But first, my first recommendation when you're trying to secure as your content images is to look for provenance for secure and stable content with long term commitments. And actually there's more and more development in the software builds of materials that is going to help with that. Second would be to keep them up to date. It's pretty funny to see how many people are making great efforts to optimize their images, but they are using deprecated content without knowing it. So make sure to check and to have a process to keep your images updated. Securing your supply chain is also something I would do before even thinking about reducing the size of your images. It's the same trust but verify where you are consuming content from, if you're checking for signatures, if you're scanning your images with tooling, lexinic, white source, all those, et cetera. And finally I would come to that, reducing the size of the container image that you're using. And so make your container smaller, which is the topic of this session. And what we actually mean when we say make your container smaller, we say reduce the attack surface. So reduce the things that you're not using in order not to include something that would create well, dependencies and potential vulnerabilities. It's not saying compress your container, that's pretty useless. So yeah, make your containers smaller. So then there's this other question that I promised I would answer. Why nonetheless use a Linux distribution and what makes a distro less container distro less? So Linux distros are great. They remove a lot of the toll to the developer's lives. And first of all, they give this provenance, this sense of provenance that we were looking for. The content comes from somewhere and you probably know if you can trust it or not. They resolve many dependencies. You often have a nice package manager that does all the work for you unless you focus on the development of your application rather than on the tool chain. Here there's a lot of different preferences. Some do prefer the always latest and greatest content. I have friends using our clinics. They still like them. But others would prefer something that is more supported and stable, even if it's on the latest. It's also depends on what you're looking for. If you're looking at Lipsy, might not care about the latest one. But if you're looking at the fancy NPM library or Conda library, probably you do care about the last one. Also distributions, they come with an ecosystem of developers. So that's online forums, that's solving your problems through tutorials, blogs, et cetera. It's very important because we don't really think about it, but if something is not easy to use, it's probably not easy to secure as well. I've seen a lot of solutions that are like, we are going to secure your content images, but they are so complex to use that probably you're going to rock around them because someone is going to impose them on you. So probably not great. Something easier to use, easier to secure. And it's also worth considering the developer experience in terms of compatibility. So containers do solve the compatibility issue for when you're running something. But when you're building something, you might turn into issues. For example, the most famous one I know of is the Ubuntu versus Alpine. A lot of developers were developing on the Ubuntu base image and then reaching out with issues because they were actually shipping on Alpine and getting into issues with JLPC versus Muzzle libraries. Not that one is better than the other one, just they could be incompatible. And finally, some linear distributions come with clear release cycles, enterprise support, et cetera. It's not so fun to talk about, but many of us work for companies that have strict policies or even industry regulations. So that could be a reason why you're looking to use a Linux distribution and not just an empty container image. So to put it in that shell, you would try to find the right balance between these four things for, these four S's for a base image, secure and stable and on the other hand, simple to use and very small, right? So all of this brings us to 2017, almost about six years ago. And this great presentation from Matt Moore at the time he was working for Google and he introduced distro-less container images to the cloud-native community. With this idea of extracting only the content that is needed to support a runtime application or an application at runtime rather than containerizing an entire Linux distribution. I would advise you to watch the talk, obviously not right now. I'm done in about 15 minutes and you can watch it afterwards. In a few words, I would say that the Google distro-less images are Debian-based, so if I had to summarize them, they only select files though from the distribution, but they are based on Debian packages. They don't have a shell, they don't have a package manager, which is great to avoid whole classes of attacks there and you probably don't need them once your container is containerized. It reduces the size of the base image that they published to about 20 megabytes to an image that was useful to run C-compiled applications, C++, Golang and Grail VM because it comes with its own runtime, but very restricted use cases in that sense. And it was built using Bazel, not super user friendly scripting language that is used at Google. I don't know if there is anyone from Google, but I didn't find anyone who was super happy about Bazel. So it's been six years and distro-less images, although being a fantastic solution, they are not so well-adopted. You don't find them used in production use cases that often and I do think that's because they have a few unfortunate downsides that make them challenging to use. In particular, they require this deep understanding of, sorry, they require this deep understanding of the Debian packages as you'd have to select yourself the files that are important to run them so you don't obviously know the files that are important in a package to run your application if you're just the application, the set application developer. There was no package manager as all the work was done through the use of Bazel, I was saying. And for the downstream developers, it's not so great either, they had to use multi-stage builds, which is a great solution as well, but it's not so easy to use. And the images of course were hard to use and debug, no debugging tools, that's attack surface. And related to that, deriving these images, if you wanted to have something as than C++, if you wanted to go Python or something else, you could not do it as such, you'd have to build your own destroyer's image. The result being that there was as many destroyer's recipes available out there as there were destroyer's images. So this brings us to when I joined Canonical and I had this mission of reducing the size of the Ubuntu content images to make them more adapted and keeping this secure, even increasing the security of the images. And therefore we started this discussions about publishing what we would call Ubuntu destroyer's base images. Basically the same as Google was doing with Debian, but using Ubuntu and packages from the Ubuntu distribution. The idea would be that when you get some of these support and problems that was talking about that, you might not get with Debian in particular if you're in enterprise use cases. So we were worried about a few things doing that, especially the name Ubuntu destroyer's is pretty, well, scary. It's kind of saying we are killing Ubuntu. So we wanted it to be Ubuntu based, of course, but to keep some of what we think makes the Ubuntu experience. Meaning easy to derive, if you want to build a different image than the one we are shipping and add dependencies. And in fact, discussing that, we realized that probably shipping base images for every possible use case was not the best idea. And that the great thing when having APT in your containers was that we don't need to ship so many different base images, you can create yours and it's easy to do. As a side effect of making these images, destroyer's, we want them to be smaller, of course, inspired by the concepts and padding from destroyer's, but really just remove things that we will not need in add run time, not just to reduce the size. And also removing the shell and the package manager as it brings a lot of add acts, different classes of add acts as I was saying. And finally, we wanted to have this compatible developer experience that I mentioned before, which is you could have your first stage or if you're not doing multi-stage builds, but your development environment on an Ubuntu distribution, doing everything you want, not caring about the size, but then shipping on something that is entirely compatible using the same libraries. So there has been a first attempt at building an Ubuntu destroyer's, it was called the tiny stack. This was done by the package team at VMware a few years ago. And there's a public GitHub that shows a full example on how to do it. I think it's now deprecated though, but you can look at it still. And it's quite interesting because it only uses Dockerfiles and scripts to do it. It's still very hard, not much better than Bazon to be fair, and you'd still have to know the content of the packages to select what files you want to use. They did it, of course, the build packs way. I mean, it's the package or team. So they had a build image that was the full Ubuntu with a few more libraries. You cannot just hear that they are adding CA certificates. CA certificate is not in the full Ubuntu base image, the one that is already 60 megabytes. And that shows that building the perfect base image is just impossible. You cannot know what users are going to do downstream and you cannot either include everything or you just include nothing. On the runtime image, they only had these few packages that are listed here, which is basically the same as the Google Distrol S1. So now that's where the presentation folks from the one that was originally planned in Austin, in Austin we'll have the hands-on session. Now we are going to the Chisel Ubuntu containers discussion that I promised. So in the meantime, we released this new thing that we call Chisel Ubuntu container images. And the only one we have made public for now is available on Docker Hub and on MCR because it was a partnership with Microsoft, with the .NET team at Microsoft in particular. And we released this Chisel Ubuntu containers for .NET as a runtime for .NET. The one that I'm sharing on the screen is for self-contained .NET application. I don't know if you're familiar with .NET or not, but if you are, then don't hear, don't listen what I'm going to say. It's a bit like statically compiling .NET applications. If you know about .NET, you probably know it's wrong, but that should be enough to understand why the image is so small because it doesn't have the .NET dependencies per se. It's much more like an Ubuntu Distrol S image. And the result is that we have four to six megabytes image that are made from Ubuntu content. And you can go from your first stage that is Ubuntu to this image running your .NET self-contained .NET application without having to wonder if it's going to be compatible or not. So we are super excited about this new thing, about releasing so much to size. Actually, we are not expecting such a size rejection. As I was saying, we are not targeting just smaller. We are targeting to remove the dependencies that we didn't need at runtime. And here it's much smaller than the, even the one that Google was building five years ago. So there's a hint with the naming we choose that these images are not meant to be generic. They are effectively chiseling, so which is a verb. I didn't know the verb, so chiseling is this, for a specific use case. And ideally, we would not be the one chiseling all of these images. You would be chiseling Ubuntu, so the full Ubuntu distribution for just the use case that you have. Using this new tool that we call chisel or other tools that are going to come. So with this, let me open up our emphasis to dive a bit deeper into what this new chisel Ubuntu thing is and how it works. So very simplistically, I'm sure you know a lot more than that, but very simplistically, a unique distribution in Lac Ubuntu is a bunch of packages that have developed and tested together and they have dependencies between each other. Simplistically again, a package is a list of files. Binary, docs, configs, et cetera, that's scripts that compose a working application tree. So when you were creating these distressed images by hand before, you were selecting only the packages and files that you had as a requirement for the runtime of your application. So for example, you could decide not to include the documentation, the main pages of the package. It's used less in the packed container image. You could also decide to break dependencies if that's not useful to the Euro specific use case and remove binaries, libraries from a specific package. But that requires a lot of knowledge about the packaging themselves and the packaging of the distribution, as said before. And also this is done every time for every distressed image definition, so you have to maintain it on your own repo separately. So with chiseling, we introduced this idea of package slices that would be maintained at the same level as the Debian packages and defined by, well, you can define them themselves if you want, but they could also be defined by someone who knows a bit better about the package than you. So someone could be saying, in this package, there's these files that are for documentation, this is one library, this is a second library, this one library does not depend on the packages that the package itself depends on, but this other one does, et cetera. So you have all of this upstream or all of this knowledge that is now upstream and you could even see it almost as a distribution within the distribution, but it's even more than that because it's almost an infinite number of distributions within the distribution. You're composing your distribution from the upstream one whenever you're building an image. And for this, we built a tool that is called chisel, so it's not super documented at the moment. This presentation sort of became state of the art now and probably it's going to evolve very quickly, but the concepts are super interesting and you're very welcome to come and discuss them and to look at the repo, I think it's linked there, but it's also a very simple name, so you just Google chisel and you should find it. Effectively, we sort of created a from scratch package manager because it is looking into the Ubuntu archive, it's using the Ubuntu archive, but it's only installing the files on top of the scratch system that are needed from the so-called slices that you wanted to install. So now, closing the parenthesis and moving on. In a few words, so the mission was succeeded. We had this compatible development experience. We had only the content that was needed for the runtime. We had trusted content that was long-term maintained because it comes from our distribution and it's content that is created at the right level, not by everyone downstream multiple times, but upstreamed in some place that can be maintained all together as a community. One of the lessons we learned, it's that there's no point in trying to define the perfect base image. At least that's what we think and that the best is to provide the tools for everyone to build their own base image when they are packaging their own software. And I had this example of how you can use it. I'm not going to do a live demo because in the past, they failed me a few times and I wanted to leave time for questions. But you can also look at the repo. I forked the main repo and I created a lot of examples. So if you want to look there, there's an example that is using build packs. There's one that is building this release image as shown on screen and a few others to show how you could use it in your project. For now, again, it's very state of the art. I would not recommend using it in production yet, but I would recommend following very closely this project that I think it's going to become very quickly the future of our open to content images. Hopefully in a few weeks or months, I can be in front of you discussing how we put them into production. So yeah, that's how and why we built open to release images that are half the size of the Google Disrollers based images and that are built from, I hope, your favorite Linux distribution. That was all for me. Thanks for listening. There's time for question. I think there's a mic and the process is that you stand up and you go to the mic and you talk or you sing or whatever you want to do. And otherwise, my name is Dalonza. I'm going to be tomorrow there if you want to chat. And yeah, thanks. Hi. Just to verify, so you're using the regular Ubuntu packages and to rebuild the chiseled versions. So for the next version of Ubuntu, same script, new packages automatically. Yeah, yeah. Okay. I mean, automatically is never as automatic as Ubuntu 2B. With small variants, but the same concepts. So the binaries are not specifically made for this. No, no, no. Okay. Only the slices are defined separately. Thank you. I'll ask the guy. I'm a Debian developer. I'll ask the guys in Debian if you can, we can use it as well. Thank you very much. That would be nice. Actually for background, we did have the discussion should we try to make it upstream to Debian at first, but we thought it was easier just to iterate like that and try all things. As a proof concept, it looks amazing. I think when it's ready, we should start to upstream it. I think it's all the, I'm talking on this track tomorrow on the same issue. So I think it's an amazing solution. Thank you very much. Let's go. Thanks. Our question. One of the things actually that scare me a lot with this one is containers and very small containers. This is actually that all the container scanners don't like that. I do fully trust that when you actually carve out such a small container with schedule that in the beginning, it's CV less or has limited CVs. But during time, actually the CVs will be introduced. Is there a way, for example, still for the container scanners to find these because you actually set out all the package database and that kind of stuff? Yeah. Well, that's the same question. So the question is, could we still scan these container images, right? This cheese or the old destroy less content images? And actually it has been an issue with the destroy less images for a while. For now, the scanners, how do they work is that they use the DPKG database. They just look into it and match it with the vulnerabilities that they know. One approach we consider is keeping some kind of the DPKG database. But we also have issues with that, is that it's one big file that you rewrite every time. So every time your layer is changing, it's not so great. So at the moment, we don't have it, but we do plan to have it in the form of an S-bomb to be doing like one stone to two birds. I think it's the expression. Unfortunate expression. But yeah, the idea is that we should be shaping an S-bomb and the scanners will be able to use it and at the same time, you do get an S-bomb. But at the moment, we don't have it yet. It's part of the plan. Sounds like a good solution if it's there. Thank you. Thanks for the question. I don't know if there's any remote question or not. It doesn't seem like it. All right, so I think I'll let you enjoy the rain. Should be starting soon. Thanks for coming. Thank you.