 Okay, thumbs up, I'm starting. I'm Will Woods, this is a scriptlet forum in RPM independence. Building images in the container area, which is kind of a mouthful that I've been living with an academic for the past few years and academic use has crept into my brain. Will, can you be a lot more loud? Huh? Can you be more loud? Oh, I certainly can. Awesome. There we go. All right, so who's this guy? I'm Will Woods, I do stuff. I've done a whole bunch of things. Most of my career has been dealing with getting images sort of, or getting systems installed and booted. So like, once the system is actually running, I get bored. But all of our things that build images or do installs and things like that are things I have worked on. I'm also the person who said the phrase, BG Miracle, so sorry about that. Or you're welcome, I don't know. Thank you. So the main sort of thing that I want to talk about here as like background is where do the images come from? Because we have a whole lot of terminology and ideas around how all of this process works. And the last time we had a big talk about these things, I ended up, anytime anyone mentioned anything like a module or a component or a group, I would rate it down on a post-it. So we could try and figure out how these things are related. I ended up with a stack of post-its, like 40 high and we never figured it out. So instead I'm gonna tell you how it works. And really what it works like is this, and it's pretty easy, out in the world, so ignoring the way, just talking in the abstract, out in the world there are upstream projects and they release sources, they do source releases. We take those and we take a individual source release and build configuration, which is like, now a spec file, instructions about how to build things. Also all other like build settings and stuff like that. Put that into a build environment or the image build part there. And we do a build and you end up, so you put those things together and you get binaries to get a build. And you wanna make an image, really take a recipe and a whole heap of builds and you do some depth solving and you mash it all together and you end up with an image. This is a really oversimplified version, but that's basically how it works. And it's sort of important to make sure everybody's thinking about these things in the same way. And one of the important things is that it is a loop. You need a build environment in order to make builds, but you need the builds to make the build environment. So that's the thing we keep in mind. It's, you can get real mathy with it. And we kind of want to because one of the problems that we have is that when you're doing mathy stuff, like you don't have side effects. You add two in two, you don't get like four, but also a lizard is in your house now. You just get four. So we want to make our processes work a lot more like math and avoid side effects whenever we can. All right, so three comes in the middle there, but we'll come back to that. Take your time. But yeah, so this is roughly what we're talking about here, but there's one part at the bottom, right? Like you can kind of follow along. You got a project and it needs a bunch of sources and then you do the source and the build environment and then you get builds and a build is like build artifacts and metadata. And a recipe, well, okay, so you have a recipe which is user wants a few things and we can use that metadata and turn it into a manifest or like a list of the actual builds that would satisfy the recipe. Once you have that list of builds, in theory, if the world was easy, you could just take all of the contents of all those builds, smack them into some sort of container and be done, but that doesn't work. We need some sort of magic. There's some stuff that needs to happen in between taking all of the build out, put it and having it work in the image. And that's really, that's essentially that's where scriptlets happen and it's kind of a problem because it's magic and we don't really know how that works or what it's doing, but we can deal with this. So yeah, the magic is what makes this hard. Like in theory, the process of building an image, like I'm saying, should just be, all right, we've got all this build out put sitting around, cram it into a folder and you're done. Or maybe you have to do some other stuff. And we know some of that, but we really don't know what goes on. We have this enormous heap of magic scripts that happen and they do stuff. And because of that, we have problems like composers take a really long time. Sometimes it fails. We need root to do this for some reason, even though we're just, it's all just data. We're just moving data around. You shouldn't need to be the super user to just make an image. You're just moving files from one place to another. It's just moving bits around. So like, why is it all so hard? Why does it take so long? Why does it keep getting harder? Why do we feel like we're fighting with our tools a lot? And kind of the answer is that RPM was not designed to do this. RPM was designed as a way of basically delivering updates and other things. It's a nice little package format and it's a good way to, there's a bunch of stuff going on in our account. It's a build system. It's a packaging format and it's a delivery mechanism for data, but it wasn't really designed to make images. It was designed in an era where you sort of had, like, where the idea of having to build a whole lot of system images was not really something people thought about. You didn't have virtualization. You didn't have containers or anything like that. You had a computer and you did an install on it and then you carefully tended that thing for years. So that was not what RPM was designed for. We've made it do all of this stuff, but that's not what it was made for and we're bumping into these limitations a lot. So thinking about what we're trying to do and that abstract view of what we're doing, we were gonna build it all now. The question, one of the central questions of what have you been working on for the past like year or two is if we were doing it now, like what would we build? What would it look like? And the more important part is how do we do that without just burning everything down and starting over? Because as fun as that sounds, it really, we can't do that. We have an entire enormous ecosystem built above and around RPM and we've inherited some of its limitations and we'd like to remove those, but we don't wanna cobble the entire tower. We don't want to blow up the engines while the plane is flying. We need to replace them. So that's what it is for. So this is what we've been concerned with. Project Welder is what we're calling the thing that we've been working on. Where Welder kind of stands for, well, it stood for a lot of things. The joke right now is it's been for the Wigum Enterprise Linux distribution we've lived in. So yeah, good for Ralph. It's sort of a experimental toolkit that we are working on and we've to look at how we could import build artifacts, store them efficiently and make it easier and faster to generate custom images, which is the things that lots of people wanna do and they have a hard time doing it. And part of this is also, we have a part called Composer, which is an existing and working actually. A cockpit-based UI and if you have for browsing, content building and building up recipes and composing images from those recipes. It's nice looking. I'm not gonna get too deep into that because it's sort of a different project than the stuff we're talking about here. The stuff I'm talking about today is more about lame foundations or making this more efficient. Because if we tried to run this thing using the tools that we had today for building images, it would fall over very, very quickly. It takes like 10 to 30 minutes to build an image and then you have to store all of them and it just wouldn't scale well. So we can get this working at a small scale but if we wanna do this in a much larger scale, we're gonna need better ways of building images. And a couple of quotes. I have a couple of little demos but I don't want you to, I like combining these two quotes, which is, you know, standard urgency part is that any solution has technologies and it's a ritual of magic. But as teller, a pen and teller has said, sometimes mad, but just someone spending way more time on something than anyone else may be able to expect. Which is a thing that has come up a couple of times. Here is, I hope, a demo of, so this is us using the welder toolkit to build a web server and build a Docker container that has a web server in it. Now I'm kind of skipping through it a little bit because I don't need to play a little deal but the important part right here is we've got, so we've got like a recipe, which will show. Yeah, so there's a recipe that lists some stuff that it wants to have in it. Oops, yeah, there you go. It's just a few packages. It's not important what's in there. Could it be included more than me? It's a video, it's a video. It's a video, so I kind of can't. Can you go full screen on the video? I don't know, can I try? Is that better? This is about the best I can do, but really the important part here is one little thing. Where are you at? So, Is that full screen button at the bottom right? Oh, there is one. Get bigger and bigger. Oh, much bigger. That was pretty bad. It's just bigger enough. Now, it's not super exciting, like text-based demos are not the best thing, but really what I want to show off here is this. The time it takes to do this, because we're gonna generate an entire, all right, so I'm gonna type my password and it's doing dev solving. It started with compose and in a couple seconds now the compose shouldn't finish and it starts doing a Docker import. So at that point we're done. Like that was a complete compose of a web server and the rest is at Docker import and then it takes a second to start out. It's mostly just IO down at this point and there you go. We enter the image and whoosh. So the problem is it doesn't work and I'll get to why in a minute. The second demo is similar, except this time it does work. It's the same sort of thing. We're gonna generate an image, except we're generating a Kikai 2 image this time around and we're booting it in QMU. And so, I mean this one takes a little longer, mostly just because we actually need to build a, or we need to do formatting and stuff of this image, but it's roughly the same thing. I think this one takes about 15 seconds, 20 seconds, something like that. The current, a lot of the stuff that we're trying to do for continuous integration, I think the average run time from kickoff to startup, there you go, it's already booted. So yeah, the usual time I think is about 10 minutes right now, so we're getting into about, yeah, in the range of 30 seconds and let's give it, try to optimize this yet. So yeah, here we go. It's now it's booted and I can log in and I'll log in and then you'll see me. Yeah, oh yeah, there's no net stack. Oh, but the demo continues, right? This is where you add in. Oh yeah, well yeah, well this is where I do a nasty bash hack. Yeah, to see whether or not the port is open, it isn't. And then I start the web server and it starts. So the point here is that we can build custom images really, really quickly that work like they're supposed to. And, oh and then I'm gonna go, you know, in this demo I go through and I do some other, not in the sense of throwing the package, it contains net stack just to sort of prove that yes we are building custom images at the time you ask and this isn't like, you know, smoking errors, it's a little bit smoking errors, but everything's smoking errors, right? But there we go, we're creating everything, built the image, it's doing that, right now it's doing some disk processing stuff because it's creating a QCAL and then it'll boot. So the point is, it works, it makes images that work and it works real quick. Which is gonna be, which would be super helpful for things like CI and CD. So what's kept here, sorry? What was kept here to be lost? There, yeah, so there is the question, how does it work? So there's a bunch of parts to this whole thing and mostly what we're skipping is scriptlets, but we're skipping a whole lot of stuff. So one of the things that we're doing is we've got, instead of pulling us all out of RPMs, instead of decompressing all of the RPMs, we've got everything stored in what we call VDCS, the binary data content store, or originally the big damn content store. It's deduplicated by other storage, it's built on OS tree, it's doing file-level deduplication, so we're not storing multiple copies of everything that's in there. And this is super helpful because 99% of the time, two subsequent builds are 99% the same thing. It's a couple of one-line changes or a small patch. And so storing, instead of having individual RPMs or everything, so by storing it in this form, it's equivalent data, but we're using a lot less disk space and we can pull things out a lot faster because we don't have to decompress the entirety of every package as we go through them. So yeah, as you can see, it's generally around 50% or less of the size depending on which dimensions you're putting in there. For Fidora 26, I just put in every arch but only the GA lease and it's about half as much disk space. The trial with REL, we just did one arch, but we did it over time and it went down, it's eight gigs instead of 22 gigs. That's compressed RPMs on the left and uncompressed but the view doesn't vary. Correct, okay. Are you sure it's uncompressed? They're actually uncompressed. Yeah, they're file-z's, so I guess they're gzipped but we're not compressing the entire thing into anything super heavy. So is that the input or the output? Those are the, so the BDCS is where we store the, we take all of the RPMs and instead of storing them in a yum loop, we cram them into the BDCS and this is partly just to make it so that we have a different, we have different forms of how this stuff looks on disk and what we can do with the metadata, which is the next thing, the MPDB, which is actually just part of the, what's the word? It's part of the BDCS but we kind of think of it as a separate object. We need to embed it in every seven gotten area. This is all very like early alpha prototype stuff but it's really an exercise in thinking about how we do these things. So the metadata database that goes in there, it's RPM-credited, we're just cramming it into a big SQL database but we're doing slightly different structure of it so that we can extend it if we feel like it. We could do things like have, we have, you know, we keep capturing sub-packages but you could do other stuff. We could have multiple markers for things that are like both docs and binaries or there's a lot of possibilities that we could open up if we started adding arbitrary data into our data stores and we can go into that in a little bit but I wanna kind of talk about the script like bits cause get out of here. There's a lot of interesting stuff we can do with more metadata but that's sort of some blue sky stuff that I wanna get to in a little while. So the other thing is we're using our own depth solver, a modular depth solver in fact so that we could extend things. It's a standard three set based depth solver thing like they've solved, it's a very well-studied problem. This is some fairly easy computer science type things. So we have a RPM compatible depth solver that's not using the RPM code base so that we can do RPM things but we could do other things too. So when you say RPM compatible, do you mean actually compatible or just work with the same thing? Let's talk about, yeah, let's talk about equivalence because a lot of times we talk about things being compatible or equal and we don't well do a good job of defining what we mean by that and we're kind of cheating a bit here in that we define whether it's compatible as whether the outputs work. So I don't care whether or not the dependency-solving output is identical to RPM or young, I care whether or not everything the user expects is in the output and whether the output works. So... Not compatible. I mean, what depends on what you mean by compatible, right? We use the RPM metadata, we get approximately the same results, we get working images and we do it a hundred times faster. So... And also we can extend it. The core of this is generic. It's just a SAT solver. So we could add modules for any other types of things we wanted to solve. In theory, we could do stuff like we could teach you about, say, cargo crates and just start pulling in rust crates and rather than having to manually repackage things, just pull them directly into the BDCS as they are and then a user can construct their things out of red hat content but also upstream content or Fedora content or whatever they can get their hands on and whatever we know how to do. There's probably some wrinkles to that but so far there's no technical reason that couldn't work. We just need to get enough metadata to understand how it works and most of it's already there. And also things like plugable policies. There's things in modularity land where you're gonna work out what I have. Sometimes user will want the newest version of the package if there's multiple providers. Sometimes they'll want whatever is the best or what is recommended by whatever and so having more types of metadata and more inputs for that allows more flexibility. We wrote it in Haskell. Oh wait, that's the wrong one. Wow, whatever. We wrote it in Haskell because, there it is, partly because all of our previous stuff is in Python and Python carries with it. You have to have your entire runtime environment. You have to have, and you have dynamic typing and so you end up with a lot of like none type errors so it's harder for us up front but the type system and things ensure that every time we hit it, every time we have code it doesn't compile, that's one mode that's not reported. We have to fix it before it goes out in a while. It's also really heavy into referential transparency and not having side effects and as I'm saying, for a lot of the problems that we have with our VM as it stands is about side effects and not knowing what's going on with scriptlets. So there's a whole lot of really nice stuff about working with Haskell for this sort of thing. It doesn't have to be it because mostly this is a research and development and experimental thing where a lot of what's gonna be born out of this project is not like code but sort of blueprints for how to do things or data or schemas for storing things. So, and you know, plus Facebook's using this so it's cool now, right? So yeah, it's, our time compatible, what, oh yeah. Is there an Easter egg in the second to last bullet there? We might not understand what that means. What, I mean, if you've ever looked at Haskell, yeah, everyone's like, well, what's a monad? It's just a mono in the category of endofunters. Like if you have any background in category theory, I see trivial. Oh, thanks for that, I quiz that right now. No, Haskell is sort of, Haskell is kind of an obtuse programming language because it was designed sort of for and by computer scientists but it's finding a lot of the concepts of functional programming languages are finding their way into mainstream programming paradigms and everyone's like, wow, this stuff is great and all the Haskell guys are like, yeah, we told you so. So there's a lot of stuff about referentially transparent so we always know exactly what's happening in the coded labs for really interesting stuff. Like, when we know, there's not gonna be, when we know that two pieces of coding aren't going to interact because we can see every point that we could and there's no side effects to any of the things where there are hidden interactions, you can do things like automatically parallelizing code just tell it to be parallel and it does. It's pretty cool. It has a lot of really nice stuff. It's not as easy as writing Python but really important code, I think. It's all right if it's hard to write if it gets written well and it doesn't get supposed to. So yeah, what we're talking about here is ArcM compatible but ArcM independent is the goal that we're shooting for. Where it does everything it's supposed to do but without using any of the existing code base so that we can figure out what, part of the larger problem here is that we as a sort of community and ecosystem don't know what all our parts are doing and if we want to improve them, we kind of have to pull them apart and look hard at them and be like, what are we actually using ArcM for here? Sometimes it's like, well, it's a pretty convenient data storage format. Well, there are better data storage formats that could be equivalent and that's fine but sometimes you have to know what all the pieces are to actually be able to know whether or not it's equivalent. So even just the exercise of trying to make equivalent software, it doesn't have to be successful. It still allows us to understand what's going on and improve things because we've identified how all the pieces fit together. So there's still value in that. So, scriptlet reform. One of the reasons this is important is when you look at all of the scriptlets, when you do a compose normally, we end up with a list of like 300 packages to 1500 packages depending on what kind of system you're building and the reason that what we do is fast is that we're skipping, basically we're skipping the scriptlets because normally you have to do install package one, do a pre-script, install payload of package one, F sync, do post-script, F sync and so on and so forth. It's very slow and a lot of times there's a lot of leased effort. Look at the pre-trans and the pre-post. Yes, there's pre-trans, F sync, pre, F sync, install payload, F sync, then post and then F sync and then maybe another post and then yeah, there's a lot going on there that doesn't need to be happening because at the time that we start the transaction, other than that wrinkle of scriptlets, we know exactly what files are going to be in that file system and then we need to run some scripts and because we don't know whether or not any of those scripts requires anything that's in the previous packages, we have no choice but to run them in order. We cannot reason about anything that's going on there, partly because we, like, our view doesn't give us any way to express it and we can't introspect on it. Like, what does that do? I found six different, I, the, just to make it a good quote. It's an empty file. Right, I found six different ways that scriptlets create empty files because I went through and read literally every scriptlet in row seven because sometimes magic is spending more time on something than anyone can reasonably find useful. So I went through and read literally every scriptlet in all of row. Yeah, exactly. Why would you even do that? It's because somebody's main file did that. Right, exactly because scriptlets are written by individual contributors, often little silos and they don't know, you know, and there's, we have some guidelines but there are no hard and fast rules about how things should get done. Nor are there tools for the common tasks. Nor have we really identified what the common tasks are. We're just sort of like, well, here's a hook. Best of luck. And so we have this heap of scripts that's sort of accumulated over the past 15 years that are maintained by disinterested people who don't necessarily know anything about Shell. The Java package scriptlets are a thing of horrific, Lovecraftian beauty. But anyway, so ignoring all that. Oh, and also there's the fact that it comes, there's like three separate Turing complete languages in there because you have Shell and Lua and also the template language itself is Turing complete. So in your review of all those scriptlets, did you find like meaningful scriptlet functionality that like did something that you need to take into account? I did. I found six, 99% of them are doing one of these six things. So yeah, mostly what happens here is one of these six things. There are some stuff that's totally insane where it's like, oh, let's load a kernel module because I might want to use it. Or I'm gonna, despite whatever the user might want, I'm gonna do forcibly turn on this service. Or there's a lot of, there's some questionable stuff that doesn't even have them. But 99% of it is doing one of these six things. And we do have tools to handle all of these. We have, you know, there's like system D now has stuff to handle this, a lot of them. Creating users and groups, we have sysusers.d. I would say we could just use that, but there's sometimes their file formats might be a little too simple. There's the whole problem with sysusers.d. If you try and tell it to make a user with a zero, like say zero day as a username. Yeah, you end up with a root user. Sometimes file formats can be a little too simple and this is why types are a good thing. You go back to Haskell for a second. Because if you know you're expecting a string, you don't care that it starts with zero. So you might want to be a little more. So part of what I'm sort of proposing here, and I haven't really, I'm still working on the schema and all that, what I'm proposing for a first step towards improving the entire ecosystem, allowing more of what we're doing to happen is that we identify all of the places where these tasks are happening, create standard tools that handle those and convert templates to use those tools. So to give you an example, the users and groups one is the easy one. I've got a nice long JSON, or I've got a schema here that basically says everything you can do when you're creating a user. This encapsulates all the stuff that happens. And it's not super complicated, it's long, but it's just like, okay, yeah, your GID is gonna be an integer. So this is a schema that describes everything you can do with a user. And so then what a maintainer would do instead of doing their, there's like, I've seen like six different ways that users get created in script that's all set. Instead, you just do that. And so, and in fact, what you would do is wrap it in that. So what you have here is, it's equivalent to what they had before to whatever user ad and all of that line is, except the logic of whether or not to add a user and when to add the user would be handled by the interpreter. Now the fact that we're using an interpreter here is that has two purposes. One is that we have some tool, a standard tool to handle it. The other is that external tools can examine the headers of the RPM. And when they see that that's the interpreter, they know that the content is gonna be that. And so we've moved from opaque code to nice introspectable data. That does have to be pre and not post. Does it? Yes. There's otherwise the RPM when it installs, the files have to be right. Oh yeah, it'll change. Yes, well see, there's another like, because I think you can just lay, I'm gonna lay down the files. I think RPM stores them. No, RPM does store them as strings. So yeah, RPM wants 30 to use the file and then it looks like UID from there. So this would have to be pre. One thing I would note at this point, this is not going against RPM for most of the fold. Yeah. No, I don't think so. Essentially it's a macro processor by itself. Yeah. That's just use of that. Yeah, exactly. And so the more, so this is one, this is a good example of a way that we can make changes to our ecosystem to allow us to do some really interesting stuff without burning the file down. With this sort of thing in place, we sort of think it in the demo that I showed, the second demo, we sort of think that we went through and looked through all of them and found every time something was creating the user and we just have a hack that creates every possible user before we build the, or when we build the image. But with things like this, we know what we can examine the payload of what we're trying to install and we can know whether or not we need to create users or we can do it later. We can reason about when and why we need to do certain tasks. For instance, going back a bit, like these scriptlet tasks, some things like registries, caches and things like that, you don't need to do those until, something you don't need to do until first bit. So if you're pressed for time, like if you want a dash dash turbo switch, yeah, just skip all of the registries until first bit. And actually nothing of type four should really ever exist in a spec file. Trying to get to the point where it isn't. Right. Our VM file triggers should seeably take care of everything. And almost all of that is already done, except for LD config because G-Lub C is resistant to change. But there's still the problem of knowing that that's what's going on in a post script. You still, like we still can't, well, or in a trip, like we can't just assume that every file trigger is a cache or a cache catalog or registry. Some things are required in some, like some things if you don't run it, your stuff's not gonna work. Some things, yeah. So having a way to express that sort of metadata is super useful. I'm just saying that the package itself should care. Right, well that's the long term goal is to get this stuff sort of back into data where you declare what you want to happen and the tools figure out what to do rather than being code that we just have to run, and we have to run it as root because we don't know whether or not it runs root and we have to run it immediately because we don't know whether or not we can do it later. And so one of the major themes about making this stuff faster and more predictable and less hard is any place where we can take complexity and declarative code and turn it, or complexity and code and turn it into nice data is a really good idea. And yeah, we only have tools that handle most of this stuff but it's just having a standardized sort of interface and we can have like a nice little RPM script that, or a RPM macro that would just be like percent user, whatever, and then give it the necessary stuff and then we expand back to whatever. I still want to see the interpreter line in there so that external tools can identify what's happening. So two, six, six. You can have multiple scripts. Yeah, you can have multiple for all of these. You can have multiple scripts, you just separate them into multiple. You can have multiple pre and multiple posts and it just runs them in order. Yeah. And the other part is that this allows us to actually do security holding, to automate the security holding before the change comes into the election. Exactly, and yeah, another thing that this allows is that in theory you can confine these things. Anything that has that interpreter should not be reading or writing to any file other than Etsy Shadow, Etsy Group, and Etsy Password. And if it tries to do any of that, no, you go away. So it's a nice little security there. I have ideas or I have schemas written up for some of the other tasks listed here, but I kind of ran out of time with my slides, sorry. So that's the sort of general idea that sort of the background and the big ideas, but also this one very specific little slice that we'd like to get moving on to enable us to make forward progress. So yeah, that's sort of my pitch and some background. Are there any questions before? So what you envision with this is you are hoping to get to a model or a universe where most of the art cams you care about, you can, your new dev solver or installer can actually look at that and say, oh, all of these scriptlets are run using interpreters I'm familiar with and I know that I can safely bash the activities that they're doing until the end. Yep, exactly. Or, and there's other nice things about that. Like you can, if we start marking, well, the ones that are turning services on and off, what if we were having a scriptlet or if we have RPM spec files to clear in what services are in there. And then you can look at a set of packages and say, and let me know exactly what services are going to be in the resulting image and whether it will be on or off by default. And then you could say, overwrite that as a user. Doing more, letting it be more of scriptlet or more data oriented allows us to reason about what's happening, rather than just having to dump it all in the box. You should only need to have the list of the services because the presets already handle which ones will be on and off. And if they're doing otherwise in the RPM, they are breaking rules. Yeah, it's surprising how much that happens. Well, yeah, I mean, actually just the audit, just the work you did to collect that data. It's actually terribly useful because we're trying to get, in the packaging guideline, there's one way to create users. That's the one you should be using. You should be, we'd hate to have anybody ever have to cut and paste code. Rather it'd be automatic, but. Yeah. Did not know if it was about real. What's that? The beauty of the real is being done. R-H-E-L. Yeah. Not for gold. I went through a lot of Fedora and I don't recall whether or not it was the same situation. I mean, there's a lot more so it gets a lot weirder on the other side edges. But no, I was in Braille, I was still finding things like non-standard, like programs I didn't even know existed being used to create users. And I'd like to look up what this program did and I was like, oh, you're creating a user. That's new. Those technically should be considered bugs. They should. Not in real, but in Fedora, yes. Sure. But even if we standardize on a standard script that people should be using, if it's a script, it's an opaque code and we'd have to go through and change all of them later. Except that you could do that in an automated fashion, whereas if they followed the pattern or as if it was random stuff, then you could just. The other thing that I've started doing is writing a shell parser that simulates shell to look at scriptlets and figure out what they're doing. So we can kind of convert from existing scriptlets to this format, to data formats. It's not complete yet because parsing shell is horrible, but it worked enough that we got a complete list of all of the places that users were being created. And we could probably extend it to other stuff. It's good. I imagine that that will be more like, depending on how easily we can convince people to convert scriptlets, it might be the case that we want to have something like that as a sort of like the Python two to three tool to be like, run this against your scriptlet and it'll tell you what you probably should be doing, but it'll also tell you if it's stuff, there's stuff that it doesn't understand that you need to deal with yourself. Or we can just say, everybody has to re-convert all their stuff now. But first we need to define the actual tools and all of that. And the tools are in progress but not finished and the schemas are in progress but not finished. So that's a couple of ways off, but yes, 80%. Nice. At the time, first it's like, October or September? That's like, I'm writing this answer down. Nice. I'd say like if it was my only priority, we could probably have something like that. We're just having the tools. Something, yes, so that actually works in packages and say, can we put this in your spec? Well, so, are you in the technical side of it or the political slash policy side of it? Because I can't predict what it's going to be for right now. What's that? It's all solved. When is the rough timeline when we cut over for I guess, when we open up for, what about F28, 29 or even, what month is this? 27 recently, we're at just 28. Okay, we just started, so we just started 28. So we could, I mean, definitely 29, we could probably start doing it in a, we could probably have tools ready for at least one of the, I mean, for at least users in the next month, easy. So next couple of weeks. Are the interpreters, are you suggesting those be written in Haskell as well? Whatever, they should probably be written in something that compiles to the native binaries. I would write them in Haskell just to get my Haskell muscles up, but. Then does the modularity people care that you now put Haskell in the minimal set of things that you need to get in? No, it's a build group for criminal. Yeah. They sort of care. For self-hosting? No. Nobody has a build group for criminal. Yeah. The build group, the build group for minimal has to platform is a lot bigger than minimal. Right, but you're trying to shrink. But if you just take Haskell in there as a hard dependency then. But if you start talking about the larger picture of what we're actually trying to accomplish, which is to say making it so that you can. A point that I didn't make really well here is that we're constructing images totally differently than we have in the past. And that we're constructing them using none of the tools that are inside it. We're constructing them completely from the outside. Which means that's a pretty major shift in thinking. And so things like, well, what's in the build group? It's like, I don't care what's in the build group because the build group is a totally different system. Maybe, most people already have pandoc in their game. Yeah, right. But somebody cares. Somebody else might care. There are people who are trying to figure out the minimal set of self-hosting which includes being able to build itself and install itself, which, I mean, there's another effort just down the hall that is worried about things like adding Haskell to that minimal set. That's the only reason I mentioned it. Haskell's more important than Rust right now. Well, sure. Haskell has dynamic linking, doesn't it? So the wins, as far as I'm concerned. I would suggest not blocking on the decision about whether or not to. Oh no, of course not! Yeah, I mean, I can write these things in Python but I would not be able to use them. And I would not want to be able to use them. And I wouldn't want to carry that because then you have the problem of it being in the pre-scripts for things. Right. So there's a little bit of finesse to be done there. Wow. So, yeah, write it. Because you didn't have to. I mean, we can extend the RKM. I mean, we could add a macro that's percent user, you know, and just do that and it would expand out to the part with the interpreter. So like behind the scenes, that's what it would be. But the user doesn't need to, or the maintainer doesn't need to handle that. And that would be my goal. I just kind of write out a time for that in the slides. But yeah, adding a set of macros that would do this for the maintainers. We will go eventually. Yeah. I mean, it's user-feeling to part. Yeah, exactly. But as long as the macro expense is something that has a well-known thing in the header and we know what the data inside is going to look like, then you unlock all sorts of new tricks. Something. So you could actually do the macro now? Yeah. And it could expand to what we use now? Yes. And then behind the scenes, it expands to, what, something else later? There you go. And then, or it could expand to two things, one of which is just creating a new header that we use. There's a lot of ways to do it. Okay. I... So one of the things for the interpreter is the assumption that you're never... That you're never meant to what? So you're just going to say, like, user then, well, the, like, add user, right? Right. If you don't put, like, a version of the map because anything is... I mean, that's fair. Especially user accounts, I'm not as worried about that for, because, like, user accounts haven't really changed their schema in 40 years. But no, for the other stuff, that's a fair point. I think we will really have this sustained doing the local account storage. There's always stuff that can change. Yeah. One constant. That worse is the conversion that they're somewhere. Yeah, it's probably a good idea. I mean, if you're hiding it behind micros, who cares? Yeah. People don't have to see it. Yeah, the idea, yeah. That's the sort of thing that we could probably hash out sometime during the rest of this week is that sort of thing, because this was just sort of, like, the general idea is, hey, more data, less code. Did you mention this talk? Oh, yeah. And do you want to plug your talk, or do you want me to plug, because I don't actually... Well, I plugged it this morning, but similar things. We've got a bunch of stuff that... There are a bunch of scriptlets out there that do things like set up system-specific information about a service when it's installed, which is really bad when trying to generate something like a container image or a virtual gold master. So a while back, I worked up some packaging guidelines that I wanted to plug about how we can take those things and make them using SystemD to work out... To solve them on the first start of a service rather than at installation time, so they can also be wiped at any time to reset essentially the system to factory defaults. And that's scriptlet task number six. Right. So that's... Yes, exactly. So that's a talk I planned to give on Thursday morning at 11.30. Right. Anything that's generating, like, a unique ID or an encryption key or things like that. Yeah, that needs to get... It's certainly the other problem is that the scriptlets are wrong in using SLINU's specific context. So some of the operations will be denied under that context if it's launching certain things as a part of the... It's not a great scriptlet. Right, right. So having a separate interpreter needs to be doing SNI's transition or transition into a more confined domain. The mechanism I developed also could be a transition. For example, we can use system control because that's just a debug score, so we can confine what the system control under RPM scriptlet can do in that context using debug score to system D. But if it's something like your default in store, you showed it on the previous slide of the install, that's something that you just can deny completely in the RPM script or RPM install context. Yeah, I mean, really, this should be a line that declares I want this file to exist and have this mode. And that's a different theme. Yeah, absolutely. This is real quick. Right now, RPM is on the street today. When you're doing composes, not yet when you're doing composes on search side, but on the client side, we actually run... We basically have, like, an 80% rehabilitation of blue RPM. We run the scripts in bubble wrap. So, like, right now, every RPM scriptlet, like, so I'm on my workstation, I have a couple packages later, but those scripts don't have access to my home directory. Like, they just can't see it. We don't actually disable system CTALE because, like, part of the whole idea is to do an offline update. So rather than change your current pre-processed system and potentially restart soft services, but not much today. So, like, yeah. I think we'll have to try and make sure some of their efforts are aligned with, kind of, because we have it wider. And I will say that some of this stuff, like, mostly the object ingressions and caches, there's a lot of stuff that does that, and not all of it is designed to be able to operate from outside, or is not. I don't know if all of it's arched neutral, or I know not all of it is designed to operate from outside a system. A lot of it is, as far as I can tell, any of the tasks, except number four, can be done from the outside of the system pretty easily with existing tools. A lot of the registry cache type stuff can, but it isn't currently that we get shifted, and some of it just isn't, and that's gonna be the tricky part. Part six is potentially arch-specific. Right, anything that's system-specific should not happen until first boot, like, obviously. Yeah, that is exactly that topic. Yeah, and that's what we're talking about. So, as soon as we conclude you don't want, kind of, a system dependent with two, three should be trivial. Three's basically temp files, .d, and system temp file, right. Six statements can be solved. Yep. So, how many problems are four and five? So, five is arguably actually more like, you can make an argument that some of the stuff that's one and five, like, creating users and groups, and messing with the firewall configuration, you're really just sort of modifying a system configuration file. So, like, you could probably merge these together. Conceptually, it's easier to figure it that way. Making those declarative would be a big step forward. But yeah, exactly, and the existing tools that fall under the bucket file for system configuration, I think, can be used for any outside right now, so probably we can figure that out. But it's, yeah, that fifth one is sort of a miscellaneous bucket at the moment, so a little more research might be needed, so far it hasn't yielded anything scary. For, there's a lot of stuff that's like, okay, and then you have to run, you know, whatever code inside the thing. And everyone's just like, yeah, sure, that's totally cool. Run whatever binary you want as root every time I update this package. That's great to me. That's not my favorite. And a lot of those things are really old. Things like term info files being compiled. Like, I don't know if the term info guys were like, someday people are gonna want to create Docker images. Like, no, they didn't think of that. So, like they're gonna, there's gonna come a point where we're gonna hit some necessary tool that needs to build something that's not technically art specific, but it doesn't know how to build from the outside. And we're gonna have to like, think about modifying the tool to do it that way. Because I really, really believe that for the long term, health of like, how we construct images, doing them from the outside is the right way to do it. It's the only way that makes any sort of sense. Do you, and if this is obvious to everyone else, you know, forgive me, do you envision a day where the packaging guidelines forbid arbitrary batch scripting? That would be my first decree, like, oh my God. I'm sorry. Talk about that. I found that most of the time when you find it, it's things you really wish shouldn't be in there. Like, I'm going to modify your existing configuration file for this thing. Yeah, there's a lot of like random set, and I definitely found like places where people do it wrong. Like, I found a whole lot of bugs in Scriptlets where it's like, oh, you got your, you got the sense of this test inverted so that will never happen. Cool, well, okay. And in that sense, the migration from, you know, in it to speed, in it to system D is illustrative, right? Because you have these examples of all sorts of weirdly arbitrary ways of doing start, stop, restart. Except that there's- Yeah, there's, there's only, it's just a couple of macros and there's only two different patterns you're supposed to do. And it works tonsider, they're all better off. Exactly. And a lot of people are just so angry about it. As long as you can handle people like, all right, like, you can't do whatever you want, but all of the stuff that you want to do here is, you know, here's an individual tool for each of them. Like, people don't seem, well, no. There are plenty of people who don't feel mad about that, but they're the sort of people who are mad about anything when it's different. That's the 20% you haven't finished yet. Yes. We are out of time, but I'm only gonna talk about this for hours, but I'm gonna say, after five more minutes, you have to buy a new beer if you wanna keep talking about it. All right, now, I'll buy new beers. Hell yeah. You use OS tree in an intermediate format with the things you're generating are like regular admins at this point. We can generate OS tree output. It's not RPM OS tree because we're not generating an RPM database inside the output. We could do that and I've kind of come to the conclusion that we might have to synthesize an RPM database in our output and Colm's had some suggestions and tricks around that, but I really don't want to. Because in theory, so if you've got a system where you can almost instantaneously, like the reason it takes six seconds is because we haven't optimized it yet. It's almost instantaneous. We're IO bound essentially in creating images. So if you wanted to update an image, like why are we building an RPM database inside it afterward? Why don't you just generate a different image that has the stuff you wanted in it? That would be faster and easier. That covers a lot of use cases. Exactly. We having your RPM database or generating a new memory? Generating a new memory. Well, the other thing about doing it like that is that, yeah, it's almost instantaneous. We don't have to save them the disk. Like we can just write them on the fly as soon as you request them. Also, like if we're doing it so that the recipes are all deterministic, like if we know, okay, you did it on this day, we know you had this metadata and this recipe, therefore we already know what the output is. If somebody, and if somebody already did that and we cached it, we can just hand it to you again. Like maybe it's a little more messy. We need this in production. All this information, I mean in production, the metadata about what's in the system. Yes. How the things that aren't on the image that is being run, different from what was installed originally. So if the config files are changed and something like that, because the whole source report tool that we use in Red Hat support depends on that. Yeah. So we need to solve this case like Adam said. Yeah, there are definitely, this is another one of those cases where we have to look at the things that RPM does provide for us. And yeah, having a manifest inside the image that has the provenance and exact signatures of all of the files that were provided by them is like a really good thing. Yeah, it's absolutely insurmountable. It's to be inspectable also inside the running image. Yep, yep, no, that's definitely something that we need to do. We just haven't figured out how we're going to do it and I'm not convinced the RPM database is the best or only way to do it. Which RPM database they're changing it. Yeah, the fact that if you want to install a package, like especially if you're installing a module, you need like an XML parcel, a YAML parser, a SQLite database, a DB4 missing at least two other types of data formats. So RPM header is like, there's just so much that happens when it could just really be like, we have one database and one data format and you're good. We'll get there eventually, but we're not gonna worry about doing that all at once because we can't just part it down. Sorry, Raymond. Question, the syntax that you had on the previous screen to knowing, yeah, that. And then going back to the next screen, you were just not. So that syntax looks a whole lot like the Ansible definitions and we're working on the, or the Ansible variables. And we're working on the system rules using Ansible to define like a consistent way to configure all of those subsystems. We've already got a lot of modules that already add, remove users and basically configure the system. And so we're addressing a lot of the items under number five for like SE Linux and firewall, things like that. Is that useful at all as far as already having something that we're using to configure lives as strings and leveraging that same work to integrate in here? We just need to change the output of what it's writing to. I'm pretty sure there's stuff is YAML. So we could, I mean, there's a little bit of, yeah, there's a little bit of wiggle room, but you can kind of convert this back for it. This is Toml, which is basically just JSON with comments that looks more like an in-ya file, but it's exactly J, like you can convert it back and forth to JSON, but you can also put comments in it. So it's better than JSON, yeah. But so yeah, we can, so the data formats are a good idea, but we can just translate back and forth between those things and that's one of the things that computers are great at. The tricky part is that the way that Ansible stuff wants to actually make the modifications happen is essentially you, as far as I've seen, you have like a Python module that then wants to run code inside the target system, which doesn't really work with our model. And I've talked to the Ansible folks a little bit about like trying to make their tools so that they can operate from the outside or the inside or thinking about that, thinking about doing it that way, but that's not really what they're thinking about. But having common data formats for the two things, and if we ended up building a tool that does the same thing, but it could do it from the outside of a system instead of from the inside, that wouldn't be bad for them either. So I think there's definitely some back and forth to be done there, but it's not like we got to immediately get to solve the problem for us, unfortunately. Does that make sense? Yeah, you got it. It does. I'm gonna harass you with that question more later. Feel free. I am kind of hard-line on the fact that we need to be building images from the outside and it is the same way of doing it because it's the only way to make it predictable. Like running arbitrary code inside a system, you don't really know what happens at that point. All that's are off, so don't do that. Also, the other thing that you should know about the demo is that it all ran as a user. Like we didn't need to be rude to do any of that, which is kind of nice for security of builders and things like that. And that's what happens when you make all of your process about transforming data instead of taking code that you're not sure what it's doing and running it and holding it for the best. Well still, the demo was one of a sort of. Yeah, only because I had to talk to the Docker dammit. But yeah, there's really nothing in that. Like at that point is when it actually two does. And that, and I need to get to pause there so that I can look at the thing. But yeah, the actual generating an image does not need to be rude. As long as you're not generating a QCAP and you need a route for that because you need to do partitioning and look back mounting, so. But that's just data. We can fix that. We've just been lazy about it. We'll get there, but yeah. It's, this is part of a really long thing in my head to sort of rework how stuff gets built in general. And this is like the first tiny little slice towards redoing that entire big build loop that I was talking about earlier. And most of what we need to do is figure out how to make things not run, not be using opaque code when it could be data that we could think about better, if that makes some sense. So he's now at 612. Is there any last questions or is it time for escaping and getting a beer? All right. Thanks very much.