 Any questions? All right, so this has happened. I saw this image. These are Fedora container image sizes from Fedora 22 up until Fedora 30. And it went from like 200 megabytes to over 300 megabytes. And there's a dig in here that was some sort of minimization effort, and it just grew again. And OK, so that just appeared. And the question was, hey, can you make Fedora smaller, please? And I was like, sure. So I just grabbed Fedora. That's me and tried to make it smaller. And then I realized, no, that's not the way. I just need to have more sophisticated approach. So I proposed something that's called Fedora Objective. I don't know how people are familiar with Fedora Objectives. But these are like highly vestigial goals for the project. Fedora has like, I don't know, four. So I proposed minimization to be one of those. And I basically say, hey, I want to minimize things. I don't really know what I'm doing. And I'll figure it out, and I'll let you know what happens. I called it the discovery phase. And it got accepted. And let's talk about the discovery phase, what it was, and what I came up with. So they mentioned images, right? Do we actually need them smaller? So if I look at base images and applications, right? No one really runs a base image just for base image. People are on applications. So we might not need to focus on the base image itself, but on the whole thing. And also, you have application with dependencies. And if some of the dependencies are in the base image, it might actually just move the line and it really increase anything. It might. It might not. There's cases in which bigger base images are actually better. If you, for example, have containers on huge scale, the base image gets shared because it's a layer. And in the result, it might be even smaller. So what I took from it, I'll focus on the whole thing, not just the base image. So why do we care about size? Isn't this space cheap? That's a good question. And I think there are three reasons why I care about size. Composier and security slide. Less things installed, smaller attack surface. That's a reason that's not storage related at all. It's a valid reason. Another one, Internet of Things. Again, that's cool. One more slide. It's not about disk space again. Like Raspberry Pis and things are maybe small, but it's really about the connection speeds. Because they run not in the data center, but they often run somewhere even in the fields, have very slow connections, and sending updates quickly. If things are smaller, it actually helps. And then containers, I mentioned containers on scale. Like if you have just too many containers and it's too big, it's actually painful to manage. So making it smaller actually makes sense, even from the storage perspective. So all right. Those are the three things I took in mind. That's why I care about size. All right. Even more existential question, like what's the point of Linux distros? It's even for the right place to do it. So how does it work? Like we have upstreams. They develop software. They develop nice features. And we have users who use it in production. And they need fixes fast. And then they have a bug. And maybe are pressuring the upstream, hey, you need to fix our CVE. But we don't have time because we actually need to develop things and stuff like that. So that's where distro comes in. And they can integrate things. They can test things. They can make it easily consumable and updated and basically make everyone's life better. So they can focus on development. They can focus on running. And like they are the gateways between those two. And you can have various distros with various target groups. You can have community distros like Fedora. You can have an enterprise distro like Red Hat. And then you can actually throw money at them and scream louder and it will do it faster. So that's why we care about distros. So Fedora's the right place again. So let's finally minimize things. How to do that? Let's have a look at Fedora. Oops. That's the repose. It's an SVG file that just somehow tries to show you all the packages and all the dependencies everywhere. And this is live for my last presentation. This file took three hours to generate. It's 1.6 million lines SVG, 130 megabytes. So even for computers, this is kind of hard to approach. And I was like, nope. I'm just minimizing this. I need to have a different approach. So let's focus on use cases instead. People don't use repose for anything. They have use cases. They install something very specific for specific reason. And that's what they care about. It can maybe name a few and focus on them. So that's a thing that we do. And what we make with them, basically they are four areas in the project that we'll be focusing on. So making that's making smaller, minimization, can we optimize things in various ways? As we saw on the graph, things tend to grow. So we need to also keep them smaller over time. So if you make change, it actually persists. Talking is about making this initiative more popular, teaching other people, guiding whatever, educating. And leading is about coordination in the project so we don't actually step on each other's toes. So that's what the minimization effort is actually doing in there. And with that, I proposed a second phase that was much more specific than I don't know. And this is what's going on right now. By the way, in the meantime, very nice thing happened. The size has got smaller with the base image. You can see F30 and F31. Actually F31 got below 200 megabytes, which is even slightly smaller than the F22. And thanks to the container team in Fedora for doing this. This happened just, I think, because the objective was in there, we talked about it, and they're just awesome people. Now we have to keep it small. We can't just let it grow again. So step one, how do we actually approach those things? So back to the objective. I said we define use cases. We have a few like HTTP server. We have MariaDB, Postgres. And we're always looking for more that are relevant that we can take care of. And then step two is prevent growth. We do that before actually minimizing it. So like if you do some work on that, it doesn't creep back. And this is the area where most work happened so far. And I have a demo for that. It's about a service that will monitor things, and I will just show you. If you just open this URL, this is where the service is. I'm quite happy about the URL. Thank you, Patrick. So if you go there, you see something called feedback pipeline. And it says that it's reporting notification regarding dependencies and sizes of defined on-prem installations. Okay, so what that means? Let's have a look. So I just click here on the results. And there's these dependency reports. And I can see two major things here. There's use cases and there's base images. And basically use cases are the applications that we care about. And base images are where we install them. Like in the context, because they can run in many different places. So let's, for example, look at the HTTPD. And it's monitoring things for the size. So I have some history graphs. And there's actually three of the same color. It's a prototype, I'm sorry. But yeah, this is from end of September last year. And it's been flat, which is sort of good news because it didn't grow up. But there's an opportunity, of course, to do some work. I don't have the old data before I showed when it went up, so like there's not actually anything shown from the minimization. But that's what we have. And then we can work on notifications and things if that grows. Where this data comes from. So if I go a little bit down, there is a definition. And that's how I define the use case. So I can say packages, HTTPD, install options. I have no dependencies, no docs. And where it's installed, it's installed in three different contexts. And that's where I can monitor. And we can look at some package list. And this is the full report of what's in there. And this is not the history, but this is actually where we are right now. And there's many things visible. So there's HTTPD, which is the package that's required. And then there's all the dependencies. And some of them are in the base image. These are the green ones. And can you see something interesting in there? So I took a Fedora container base image that has like 200 megabytes, installed HTTPD, and it is 243. And then I took empty space, where it was nothing in there, installed the same thing, and it's 379. That's like 120 megs more. Who knows why? Yeah. All right. So the other student. We'd like this content. That's sort of true. It's not with dependencies. It's basically choices in the repo. So if I, in this case, in this case, so if I sort the package by size, I can see in the middle, there's gilip see all landpacks. It's like 200 megs just got in, but it's not in there. Why it's not in there? Because if I sort it by size again, but the other way around, I can see this gilip see minimal landpack that's zero, basically cutting some support for like languages, but it prevents the one from coming in. So the message from this is that having the base images or the environments is worth it for those choices because you can make specific choices that make it then more useful for users. And the last one is basically only that gilip see minimal landpack and nothing else. And that's the smallest one because that's as low as we could get. It also does dependency graphs. And this is for inspection. If I see, hey, this is too big or why is it in there? I can just show like the relations. There's this container based image squashed into one node. I can expand it by going to the other graph and I can actually even click on packages if I want to see the relations. So I can keep on the system DNC what requires it or why is it there? And it does some basic clustering. So that was useful. For example, there was a MariaDB use case and there was like a bunch of pearl in one of those clusters and it was actually easy to trace like why is in there. So that can be used that way. All right, I showed the base images as well. It'll be very similar like those. But again, go to results, click on base images and I have four here. We saw those three empty container based and minimal landpack. I also have IoT that's just for tracking for the IoT initiative. We don't install anything on IoT because the other things in container that doesn't matter, but it's there. And the definition of, for example, the container base is the same. It has the graph here. So I can see it's mostly flat, maybe tiny bit down, but not much. Not growing though, that's great. And required packages, install options and the full package list here and the size. And I can also show you the minimal landpack which is just a thing I've made up. Only actually requiring the G-Lipsy minimal landpack and nothing else. Got some dependencies, it's 35 megabytes. There's no any package manager. So if you want to use it, you need to use multi-stage builds. I don't know if you're familiar with that. If we can talk later, there's nothing to scope here. But this can be used to build even smaller images. So that's feedback pipeline right now. It only shows things, so you need to go there to look for things. I want to make it more reactive so it actually notifies if there is a change in something and that'll be useful. For example, if a package makes a change and they don't see it in their context, but it somehow influences something on the other side so they can get a ping, hey, you caused this, please evaluate and do things about it. So that's the URL again. You can go have a look. And all right, let's back to the objective. So I said, define use cases, prevent growth, and then we can actually minimize those. And this is actually, we already did something in there, but this will be the main area right now. We'll be restarting some of the discussions and what we're basically looking for. Unnecessary RPM dependencies. There will be not many of those. Maybe not at all, but it's something to look at easy wins if there's something forgotten, whatever, we can take care of that. Multiple implementations of the same functionality that happens like multiple cryptos and things. So this is more systematic approach to just like looking for those occurrences and maybe choosing one. So that's a problem that we're looking for. And that I call it context specific requirements that sounds horrible, but for example, it's completely fine for packages to rely on system D because it's on every system except containers. You don't really need it in containers. So just like in different contexts, you have different requirements. And there's many packages requiring system D for creating users, which is great way to create users, but like it gets many things into a container. So we might wanna think about that. What we actually did was separating or discussing separating the system D, CIS users thing out so it doesn't pull everything, but that's in progress. And then there's this classic case like requiring massive things for a tiny fraction. It's like you have tiny script that you want to run and you require the whole Perl stack for that. So those cases we're looking for as well. And I have a few things. How engaging we upstream developers, of course, we're not doing this in isolation in Fedora. Some things are just packaging, some things are sort of for changes. So we're looking at both. Also implementing process and policy changes. For example, if you figure out a better approach for users in containers, so there's no system D, we can make a policy in the project. So it's easier followed. And then providing guidance, of course. Like there'll be many things that we realize, discover and we can teach people how to make decisions. So how can you help? Last three slides, if you're packages, this is just about the minimization mindset. Like just think about it, if you add a dependency, it can have a huge impact on many things, containers, floodbacks, ISOs, or even just those installations that we don't even ship as an artifact in there. We don't even see it, but what people install. So that's how you can help. If you're developers, you can help improve the feedback pipeline, or just, we can talk about other things, how we can make it easier for everyone. And if you're Linux distro gurus, you tell me because you're much smarter than I am. In all seriousness, I'm happy to have all various discussions if you have ideas. Let me close with this in lots of questions. This is not about like cutting features off from Federa, it's about being more flexible. So you have less installed on the system, but like you can choose things in that. Let's do it all again. Thank you. That was awkward. No. Sorry. I suppose it was gonna be capable of doing notifications. What I would personally like to see in distro is, the example you gave of pulling something for a single script and it pulls in the entire flow stack, I would really like to have thresholds to say that if between build three and build four of this package, the dependency tree group by a certain margin, I think that should be flagged for not just the maintainers, but ideally anyone downstream of that package. Yes, I agree. I just repeat for the recording. Steven proposes that the feedback pipeline should have some thresholds, for example, for certain growth and have notifications so we can actually see like differences between builds. That's exactly what I'm planning there. It's a little bit tricky in that sense that there's nothing to query this data for. Like, if I only care about single packages, I can ask about the build and that's easy to figure out, but here I wouldn't have to do an actual compose of the whole repo for every build of every package, like in theory to get this data. So we're going to figure out something like Alexandra is looking at me will be figuring this out very soon. Not for every build, but like, yeah, very often so we can do those things. And that's a very good point. And it's definitely the plant. Yeah, yeah. Yeah, the note was that we can actually add this feedback to the package update. I think Bodhi and Fedora. So if there's a package you're submitting an update, you can see, hey, you caused this, are you sure? And if you're sure, like you can proceed, but if that wasn't intentional, you can just, you know, revert and do try again. Yeah. Do you have any support from RTL? You want to install on the subset of the package, like you want to throw out main pages, documentation, or you want from packages to build separate packages, the main pages, separate packages, the documentation as well. And the other? Let's do the one and then we can go. So the question was if I intend to be able to somehow install just parts of packages, like we leave out documentation, leave out main pages, or if I want to break these things into separate packages. I would say I don't really have a preference. I just want it to be possible for people to install maybe just what's relevant for them. However, we achieve that in the implementation. Like I don't dictate implementation, I just want the result of being able to. But I'm telling you, like, those are more, the features are more for third is adding features. So we have close features and they make space with more libraries and so on. But often the use case which is bringing the library is minimal, so not that you want to use it, but it's nice to have it. So I do want to squeeze this feature pool or base features. That's a very good point. Yeah, so the point was that people are adding new features into their packages because they used to, but it grows the size and like, how am I going to approach this? So like that's one of the constraints. This is a binary distro. So like some of the configurations that you do before build, like for example, in Gentoo, you can do this very easily. In federally count, but that's why I'm focusing on the use cases and not like on, I don't say like I focus on HTTP as a whole. I want to focus on HTTP in this very specific use case so I can actually know what it needs and what it doesn't need. And if we define many of those use cases, then the project can say, hey, these use cases are the most relevant for us. So we need to prioritize those or we can say, hey, there's these huge groups of people using this same thing in very different ways. Can we build it twice and make both smaller or something like that? But there's ways how to approach it. But yeah, we need to keep in mind that like adding new features is nice, but it has impact. So there's a balance. It's a balancing act. I guess I have time for one last question. All right, two questions, very quick. Just curious, do you have any data online? So if I have data, if it's just like dependency growth or not individual packages, I don't know right now. I don't have, well, I collect the data so I could visualize it somehow, but there's a service, it's developed by David Cantrell, that's focusing only on the packages and because it's just the packages without dependencies, it's much easier to access the data right from the builds to get fresher result and he focuses on violent things. So we already talked and it'll be part of that. How are you collecting the data and how often and from where? How am I collecting data, how often and from where? So there's nothing to query for those use cases. I have to do that and because I implemented that and I'm not that smart, I install everything and then query and it takes like four hours and I do it every day. I need someone to fix it maybe if there's even a way. Like my motivation was to do it as fast as possible because that's the prototype so I expect to throw lots of codes away and I also don't want to introduce errors by re-implementing, I don't know, dependency, whatever. So like that was the easiest way but it's a prototype and we can definitely. So I just installed from the federal repose on the box query pro way. That's how I get the query results. All right, I guess that was everything. Sorry, we can talk.