 What? Hey, nice. Ignore me! Interesting. Yeah, go ahead. Actually, do I want to do this? No, I'll just... Yeah, I'm audible, right? Hey. Alright. Alright, our speaker for right now, RPM Dependency Abuse, we have Will Woods, a Red Hatter of 15 years? Almost, this is my 15th year already. Nice. Alright, so... Yeah, something like that. Give it up for him. Hey, all. Hold on one moment. Eatin' crackers and sand all day and parched. Hi, so this is me talking about RPM Dependency Abuse, and thank you all for coming, and I want to say in advance that, due to some circumstances outside of my control, I didn't get as far in my research on this. I was hoping to present working demos of some of the stuff I want to talk about, but I just haven't been able to get around to that. Hopefully, my brave employer will see fit to let me have some time to work on this, or setting the problem straight, or at least firmly crooked, but we'll see. So RPM Dependencies, they're fun. Well, yeah, by the way, nothing I'm saying here is actually like Red Hat official. It's kind of just me ranting as a person who has seen way too much inside of RPM and is worried about it. And yeah, I worked these like two hours ago. So RPM Dependencies are very, very complicated. We keep pushing more and more of the complexity of this system where we build software and put it out to the world into RPM Dependencies. So the operation of the system, when you look at it as a whole, a lot of it is encoded in a sort of big ball of like a graph of Dependencies where all these little things point to each other and they're all connected in ways that nobody understands fully anymore. And the whole thing runs its route, which is super cool. So I don't know that there is no specification for RPM. I mean, there is mention of it in the Linux, what is it, the LDP, yeah, the LSB, but that's the effecto specification. If you implemented the stuff that's in there, it won't work because we changed it. We don't like publish a spec for any of this stuff. We sort of publish documentation about how it's supposed to work, but it's basically the code works, use that. This is why there's no third party RPM installer tools because nobody else can get it right. The only canonical implementation or description is the RPM code itself, which seems kind of bad because we keep adding more and more features and we don't really know how they all interact. So just to go through the basics of RPM a little bit, Dependencies are a pretty good idea, right? You've got a software package of some sort. It has some capabilities. So yeah, packages should be able to require each other, right? Like this thing uses this library. So if I install this thing, I also need to install this library. Okay, that's sensible. And then you get versions involved. And it's like, okay, I can now say, all right, so every package now implicitly provides itself at whatever version it is. And we compare the version. Now we have tools. You can compare the versions. You can say I require something greater than this version. So now we're doing a matching operation. We're saying, all right, look at all the things that exist instead of just saying find one from this big set of things that exist. You're saying find all the things that match this and only this part. And then you have to do some other stuff. So we're getting into like, we're sort of building a weird little set theory thing here with it, which can start to get complicated later. So yeah, basic RPM dependencies. You have requires, you have obsolete, you have conflicts. You want to say, all right, this package replaces this one. They're all sensible ideas. But the way they interact after a while, it's interesting. There's also things like build requires, which is to say, so when you say, okay, when I say this package requires this thing, what do I mean by that? Implicitly, whenever we talk about requirements we're saying in the RPM world, we're saying at, well, depending on how you think about it, it's basically when I want to run the software that is in this package, it won't run correctly unless this other thing is there, which is sort of a vague notion. We have one other version of that, which is to say, oh, right, that doesn't cover every case because sometimes we need a package to be installed when I build a thing. So we added one more. But there's like a lot more subtle interactions between packages that we just don't really have tools or really have tags to capture. So we've sort of started doing other things to just sort of fake it rather than actually getting good data. So we're like, so now we have weak dependencies where you can be like, okay, well, this package kind of wants this other package to be there, but it's cool if it's not. Or this package wants this thing to be there and if it's there, that's great. And if you don't feel like installing it, I don't know, that's fine too. The precise definitions of these things are like we do have precise definitions for them, but that's the only tool that we really had for being first saying either this thing requires this or it doesn't. And now it's like, now we can say kinda, which doesn't really help with everything. And then we're like, well, what if we could do it backwards? Maybe that'll help. So now you have supplements and enhances where you can say, I kind of like this other package to be, or I help out this other package, so maybe you want to install that one too. Or you say, oh, even weaker version of that. So we've got the concept of the strength of a connection between two packages, but nothing else about the package. So now we had one connection between packages. And then we're like, okay, now we have a bold version and an italic version, but it's still a connection between the two packages. And now we're like, okay, now we can do backwards connections where I say, that guy over there, he likes me. Okay, cool, I guess. And then we got real silly with it where you can start doing things like, okay, I want this package and that package to be installed, or I want that package if it's greater than this version, or this other thing is fine too. Or you can say, I can do with and without, I want these two packages, they have to be from the same package, or something like that. So you can start putting all of these things together. So at this point, you've got all of these things that I think RPM dependencies are turning complete. And this is the theory, right? So imagine you have a package that's called dpuax, like a register on a CPU, right? You've got version zero to 255, so you've got a high byte and a low byte. So then you've got another package that's instruction one, set AX whatever, and it just says, okay, I require that package. And then you've got another, a next instruction, which requires, you know, like, I want to increment AX. Okay, so that means if we're at one, go to, or yeah, if it's zero, go to one. I mean, it would be a really gross thing. And each instruction there absolutes the previous one, thus booting it out of the dependency transaction as it's being solved. I'm pretty sure if you put enough of these things together, you could do arbitrary calculations during depth solving inside of RPM. I don't think we meant to do that. I don't think that was on purpose. I think this happened by accident. I don't think that's great also, because it also runs as root, because, like, what if I just went around doing things like, like any package can make these sorts of declarations and we have the reverse ones now. So like, oh, so yeah, enhances and suggests we have reverse ones, but you could probably also do things like this. Say that some package that's on everybody's system, say that you obsolete it, but you also require it. And I think what'll happen is that it'll see that and it'll be like, okay, well, I'll put you in here because your system has this installed and this new package absolutes it. Okay, so I need to pull this in. And then, you know, just to make sure that we don't break people's systems, you also pull in the thing that you say that you're replacing. And then I can do whatever I want. I could probably just write a package that, you know, injects itself into your next update transaction and then run whatever scripts I want. And like, okay, but you're going to notice if like there's this weird package that gets installed in your system and then, you know, the post script says rm-rf slash or whatever. But like, what if I start hiding what's in the scripts? What if I'm doing, what if I hid things in like the change log? And so I can, of some other package, and I can just like pull parts of the other package change log out and execute them so that like I, the scripts that I'm running on your system is actually being pulled from places that aren't in my package. It would be really hard to trace back where the code that got executed even came from. So like, at a certain point, I'm pretty sure that I could craft a package that would inject itself onto every Fedora system in the world, figure out, you know, figure out what users are on it. Like, I could figure out if it's Adam in the room. Well, I was going to pick on Adam Williams, but oh, there you are, hey, Adam. So yeah, what I was going to do for this talk, what I was hoping to do was I was going to find, I was going to do trial of this out, and I was going to get a package injected into every system with some innocuous name that you wouldn't even notice was getting installed during an update. And I was going to have it wait until today, and oh, it was also going to check to see if users were on the system and look for your username. And it was going to wait until this time of day, and then I was going to pop up a whole bunch of hot dogs on your screen just during the talk, just to sort of prove the point. It's left as an exercise to the reader whether or not I can actually make hot dogs pop up on your screen. But the pieces are all there as far as we can tell. Yeah. I'm going to do questionable things with spec files in like 10 minutes later on when you know what the package is like. What the hell are you doing here? You're watching every single commit to every venerable package? Yeah, there are. It's like a code rate from the win. Yeah, there are people watching, and you do want to watch out for that. But I think that because scriptlets can be super duper complicated. I mean, you can write them in Lua, and not everybody knows Lua. And if you're, you know, clever or determined, you can make it so that your script looks innocuous. And you could probably sneak through a code review. You can also have packages like remove themselves or have a pair of package, one that shows up later and removes your other package. So there's no trace that the thing that got installed or that did the damage to your system isn't there anymore. Or, you know, you'll have some record that it was there. But we don't keep a historical record of every package that has ever been installed in your system. It's in the yum log, I think. But we're not sure about that. Here's an RPM log. Wait, there is an RPM log? It does. Okay, good. There is a lot of that in RPM. Yeah, there's a larger thing here. I wanted to talk about this part because it kind of scares me. There's a lot in RPM that we don't fully understand or really use anymore. Did you know RPM has per file dependencies? Yeah, we don't use them, but they're there for every single file in every single package. There's 15 million files in Fedora 27. And yeah, we have the output of the file command and the output of the file command and a relationship between which dependency attaches to which file. And we don't use any of it. We also have great things in RPM headers like the file device number. Like when you have a file on disk, it has a device number. When you have a file in a tar ball, it doesn't because that doesn't make any sense unless you're actually backing up from a tar ball. So they're still in the RPM headers. They're always zero. It's a 32-bit number. So there's 15 million 32-bit zeros in every package. That's great. There's like 90% or not 90%, but some huge percentage of the metadata that's in package headers is totally unused or nonsensical at this point. Like the package that comes? Oh, yeah. Package icons and things like that are really great. Well, I mean, there's headers that we don't use, like icons, but there's headers that are in RPM that we just ignore, like file device, file inode number. There's an inode number stored for every file in every RPM. Yeah, apparently we wanted to have complete CPIO headers for every file, so we did that. There's another bunch of stuff like RPM only knows about eight types of data. It knows null, eight-bit integer, 16-bit, 32-bit. It used to know about 64, but we stopped using it. It knows about strings. It knows about arrays of strings, and that's it. Oh, it boolean blobs. That's the other thing that it knows about. One thing you'll notice is it doesn't know about arrays of boolean, sorry, binary blobs. It doesn't know about arrays of binary blobs, and that's why we store all of the hashes as strings, as hex strings, which means that they're all twice as big as they need to be, just because we don't have a type for that. It's just easier to put it as a string. Okay, these are all little nit-picky things, but when you add them all up, there's a huge amount of slack and unused stuff in RPM. Some of it's just useless, some of it just slows things down, and some of it is, frankly, dangerous. I want to start thinking about, well, for requires, I wanted to start thinking about as a community and industry, thinking about fixing some of this stuff, frankly, and what comes after RPM, because I'm pretty convinced that a lot of this is really super unsafe, and it could be better. To start with dependencies, we'd be really a lot better off if we were talking about having a purpose for our requires. The larger point I'm trying to make is that I think we need to actually sit down and design something, which is weird and scary where we've just had this thing we've been using for 20-something years, and kind of tacking on more pieces until it accidentally became turn-complete and sentient, and maybe can destroy all of your computers, but it hasn't yet, so we're cool with it. We probably should design something with the stuff we want it to do and doesn't blow up. So the point for this particular part is talking about dependencies, and I think the one thing that we're missing and one reason that we keep pushing more complexity into, like the way we do language packs now is through the soft requirements. They are either enhances or whatever, but the correct operation and construction of the system keeps getting pushed into this thing that we don't understand, and it's actually turn-complete and it keeps blowing up. So here we go. I'm thinking we should have a purpose for a require. A require should have three bits. I mean, probably strength in here, too, but I'm not convinced that a require strength is actually what we want. I think what we actually want when we're trying to say, well, this maybe wants this other thing is we're trying to expect some other type of relationship between the two packages, and we need to be able to add those. On the previous slide, I talked about test requires. We don't have that. We've been fighting about it for 15 years and we've been trying to figure out whether we're going to be able to add these other packages. So the fact that we can't extend RPM to add new things easily is holding us back in ways that are becoming dangerous. So I don't think we... I don't know about strength. I'm willing to have the conversation about whether having the strength for a require is enhanced, this is a good idea, but I think that the purpose is one that we're missing. Because if you have a man page and you require the man page reader, does your program actually need man to work, or is it the man page needs that to work? So there's different relationships happening there other than just requires. So we want it... These should be at the file level, I think, probably, or at least a sub-package level, but we also need to be able to talk about different purposes and add new purposes. And then component expression is the thing I've put there. When we talk about things, when we have, on the right side, when you say require, you can say a thing, and it used to be just a version number, and then it got real... And now you can do all these, and you can nest the hell out of these, and you can do all sorts of crazy stuff. So what we're doing there is sort of... Well, you're doing a bunch of stuff there, and it kind of hurts my brain, but what we want in the simpler model where it's like, what I'm trying to say is, for this reason, I want one of some set of things. So you need to be able to describe a set of packages somehow, and you just need some sort of expression for doing that. And right now, the only thing we have is a name, and that's where the names of packages keep getting weirder, and longer, because the only thing we can do to look up a package is its name. So that's the only thing that we can change to make two packages different. We're timing more and more things into the name, which is why we have lib sub-packages and all of the other things. You're talking about the purpose of the package. You're talking about what those parts of the package are for, but that doesn't need to be the name of the package. So we need some sort of... We need to start designing different required types and purposes that we could just have, okay, this requires a runtime, or this requires for testing. And we should put together, we should think more about packages and how you match pieces of packages against each other. When you think about sub-packages, they're kind of tags, really. What you're saying is these files, I'm tagging as libraries, or man pages, or whatever they are. We could have tags. We could do that. That would be pretty cool, because then you get a package that had multiple tags applied, because right now a file can only be in one package. So you can't have multiple tags on a package. We could fix that. And we could have some sort of... So you have some way of describing the data that you're providing about your stuff, and then you need some other way of matching against that data to get the pieces that you need to build, whatever it is that you're building. These are not like cutting-edge computer science craziness ideas. This is really simple stuff. It's just that nobody's bothered to sit down to design a thing for doing it in RPM, or whatever the next thing's going to be. So I think that's what we should do. Let's write some damn specs and figure out what it is we're actually trying to do here and write code that does it and does it well. So yeah, that's my pitch. So are there any questions about anything at all? Or do you just want to hear me yell about RPM metadata more? Because oh my god, I hate it so much. So on projects like X2Go and Tygo VNC, we provide an X server. We have the common problem of like, you know, packages explicitly client Xorg when they don't need Xorg specifically need an X server in general. How do you think that should be solved? Yeah, so yeah, in RPM Land we have virtual provides, which are like a hack that we added where you can say a package just provides something and it can be some abstract concept like love. It's whatever. It's a free forum string, but you can say yeah, my package provides love. And then anything that needs, and then if there's like 15 packages that provide love then your package can say I need love because everybody and you'll get one of those packages. And the question is which one do you get? And like, we don't have a way of expressing that sort of preference. And what used to happen in at least early versions of YUM is, you know, there's like a built-in heuristic which does anybody in the room know the exact way this happens? I bet Neil does. It's stat order. What? Yes. It used to be the shortest, well there's a bunch of things. So it used to be the shortest form of order. It used to then change to minimum of penalty three leg. And now it's just LS stat order. Yeah, and there's no spec for this, y'all. And it changes all the time. The other guys just decided to change. Yeah, YUM had this really complex heuristic to have to do it. The NF was like, no that's stupid. Let's have a totally different really stupid heuristic for this. And let's not tell anybody or increment any sort of version number or have any sort of specification. Experimentation to figure this out. Yeah, yeah. The idea of virtual horizon isn't a bad one on its own for a package to be able to say, I provide this thing. But we need a richer way of matching against those things to be able to express the sorts of preferences and the stuff we have just isn't coding it. But yeah, virtual provides and virtual, or yeah, virtual provides or capabilities, things like that would be a good idea. We should probably also have types for that stuff because it's just free form strings in RPM land. And it would be nice if you could say, I need this amount of love. I think distro default should be good. This is distro default services to enable or disable. You could have distro default X server is X org. And default love provider is well, I'll leave that to you. Right. Yeah, I mean, at a certain point, you need some way for the system to arbitrate like, okay, of the packages that match the, you know, this of the set of things that could match which one should I pick. And the way that like that policy will change depending on what you're talking about. And without better metadata, we can't make those sorts of things. And that's why you randomly got like X in as your mail transfer agent for a while because it was a shorter name than a post fix. So like that's not like nobody knew what it was and it's not user configurable and it's just strange. So yeah, user there should be default for that sort of thing too. Anyway, I'm sorry. The distro defaults is a big deal for services who make sense here. So thank you. Yeah. So this is more of like a question of absurdity here. What happens have you ever tried putting emoji into into these strings extended Unicode basically what happens into package names. Yes. I think we specifically disallowed that at some point. But I'm not certain. Yeah. Yeah. The fun part is that like version numbers don't have to be numeric. Anything. As long as they sort they can do whatever. But yeah, it's all just strings. Oh, yeah. Did you find out about the tilde, the special behavior with tilde? Yeah, that's fun. So it no longer does this than the original version of Shepard does. For a long time, Shepard used to be like very Yeah. Yeah. And because there's no specification for any of this, like everybody has to write their own and everybody gets a few quirks wrong and there's nothing that says Yeah. Right. And so now we have the Yes. So now we have the chicken and egg problem where there's a function now. Yeah. Yeah. Yeah. This is my other this is sort of the other part of my pictures that like we're in this unfortunate circumstance where everything needs RPM to work correctly, but we want to fix RPM, but we can't fix RPM because everything is using it and without it it won't work correctly. So we can't change anything. So like we're going to have to build something else alongside it. We can't I don't think we can start by just fixing RPM. I mean, some of these things we can add to RPM, but I think that it probably behooves us to try and a second thing and then mash the two together or you know get from here to there, but I don't know that we can apply these sorts of fixes directly to RPM. I do know that writing some specs and like actually having a model for how it should work is a really good first step. That's my opinion anyway. Feel free to tell me I'm wrong. Sorry. In the description we get the most dangerous thing is where you get the most fault. But then how would you have anything that would be nearly powerful enough to give you a way to do the stuff you need post-installed and not have the ability to execute scripts anyway? What do you need to do after install? Because this is a different talk that I gave, but you're so you're asking what would happen without post or like how do you... That made me an option is just dropping post entirely. What I'm saying is in the things you expose I'm not too concerned about the Turing complete thing because of performance concern. I think the complete machine might be slow. It's more of a parlor trick than it is an actual useful... But about executing stuff and doing that silently that might be dangerous. But then are you going to conclude that we get rid of any way to execute scripts in RPM? Is that where you go? Yes. Short answer? Yeah, no. Scriptlets have to die. Longer answer is that if you go through and look at all of it all the things that happen in scriptlets they're... Well, some of them are optional but we don't know which ones are optional. Some of them are just like this is a performance enhancement enhancement but we don't know the difference whether a post script is this won't run unless I run this script or if it's like this builds a cache you can skip it if you want. So, there's two things I think that need to happen. One is we should totally get rid of scriptlets and instead just have a carefully selected set of things that we know to be deterministic. We know what we're going to do. It's not a problem if you're like I want to create a file. Sure. I want to write some data to a file. I know what that's going to do and I can decide when to do that and we can make allowances for things where for anything that actually needs some code to be run after the package is installed there might be an argument for that but I went through and read every scriptlet in RHEL 7. Like every scriptlet in every package in RHEL there's only like six things that are happening and it's mostly like adding users, starting services building caches and sometimes it's like making keys and things like that so we don't need to let users or packages run arbitrary scripts. That's dangerous, frankly kind of silly. We can hand them the tools that they need and let them use those tools and especially if we design them to be safe especially if we design them so that they we know when they're going to operate so that we can make smart decisions about when to do the things and did that answer the question? It does and what about having a rule where you can't execute you know you add some stuff that allows you to execute a comment that all the comments have a common prefix like forgets. Sorry? All the comments have a common prefix that is added automatically like forgets so you'd have rpm-something and you can only execute rpm-something. Oh yeah, yeah we could totally work this in. Actually my if we're working on in the current stuff, we're trying to add basically macros for all of the common tasks and eventually I would like to see it so that out in the packaging world as it exists now it will have a rule that says after a certain point your script can only use these macros and anything else will be rejected. Anything else? Let's thank our speaker. Thank you all very very much. So what's going to happen is at six o'clock we all have to be out of here and in the lobby and then once the party starts at seven you guys can come back in. So we have some games that's going on in the lobby so you can tide you guys over. Yeah, downstairs. Yeah, I think we're good for now but like at six o'clock we're just told that everyone has to be in the lobby.