 So welcome to the talk from Niels about dependency resolution into point of time So hi, I'm Niels. I am doing my second talk today. This time about something slightly more technical I Wanted to do dependency Resolution in deterministic polynomial time, which is what this is all about I didn't get as far with this as I wanted I wanted to have a tool ready and have some results Which I didn't quite finish So some of these will just be my findings work in progress nodes We have six topics. I have to cover a quick introduction then there's something about the heart problem Then the part where it actually turns out to be highly tractable in practice if you don't think too much about it and Then there's some installing upgrading and installing again For how to improve you and your resolvers Still we need to deduction. I'm hoping mostly to debunk the myth that Dependency resolution is a very hot problem that we cannot solve and we should remove packages in the hopes that they will somehow keep us Keep the archive from exploding and the dependency resolvers to break an app to die and God knows what So I've been working on Britain in which is a different problem to solve but it Revolves the same techniques or uses some of the same techniques some will be directly applicable to apps some will not and There's not a one-size-fits-all solution. Sorry, and I much less haven't wrote it yet even if there was So defining the problem when you try to install a package on your system. There's actually two problems being solved one is the part where App figures out if I have to install the clips You've been eating a lot of Java packages. You've been eating a lot of other stuff And so it figures out which is which packages are needed for this to make sense and The separate part of the problem a second problem is once you have this list They have to be unpackaged and configured in a certain order for this all to make sense They are in theory two distinct problems I'll be talking about the first problem Because that's actually depends if you solution the other thing is just ordering The ordering thing is certainly also a very important part. I don't want to dismiss that it's just it is in fact Territory a polynomial time problem So to solve the ordering problem, you basically Compute all the action you give it a set of actions that need to be done Then there are some constraints between some of them like you unpack things before you configure it Maybe you de-configure something else before that So you end up with a list of partial ordering constraints From that you build a graph for some ordering. It's fairly simple to do in practice If you have the right tools for it And then without cycles, this is a trivial thing. We're just also orders it and gives you this is the order go fix When there are cycles, you will get an a single action consistencies of multiple things to do at the same time Which is really impossible to do It turns out that the way we defined this whole thing If you have a package that does not have a post-install script, it doesn't need to be configured separately So that means it's just unpacked or not unpacked and that tends to break most cycles So if you want less problems with ordering constraints Help us to find a way to get rid of most of a post-install scripts such as the ones just running LB config and nothing else That could solve some of the cycles and Otherwise, it's just feeding it to de-package and it works. If you have a cycle, yeah, it's a separate problem fixing that I won't be covering that As I said the roll time is polynomial. It's Something like the size of a package a regular graph Which is fairly polynomial. This even covers finding cycles Again breaking cycles is just moving post install scripts And of course if you think this is too simple so you can always implement more features on top of it Such as trying to optimize for minimal downtime of services and all that You can make any trivial problem very hard very soon with that So the heart problem first of all the players We got apt aptitude copped whatever We've got Britney. We got those eaters whatever its name is today They saw the heart problem Or these they should be trying to do Notable tools not affected by any deep package So deep package truly figures out if you're missing a dependency it says I cannot resolve this And I don't know what to fetch it. So I'm just giving up and that is the only same thing you can do Which means it is not it only verifies a solution which is known to be polynomial and therefore it is not a Game player here and Dak RM is Also doing a polynomial time check it just happens to be slow for other reasons So that's basically the known players that probably others But we're moving on to the heart problem itself so The problem we actually have is we got version dependencies. We got alternative choices. We got virtual packages All three makes this problem hard You basically have to remove all Possible alternative choices guesses whatever to make this simple It just so happens if you move version dependencies among other things Things become really fun and It can feel easy to solve the problem, but upgrading gets really really spicy So how do we ensure the package is configured before your new package using a feature new dip package is installed? That's sort of impossible so It becomes simple, but we end up with a broken dependency graph. So it's not really an option Technically, there's a way to also make it simple by having multiple versions of things and remove negative dependencies But that's not necessarily easy to do either Presumably file conflicts will get in our way or something. We have to redefine where we place all files That's not going to be something we do anytime soon So Yeah, it's not really an option To make better understanding of the problem Please consider this example if we have core utils at some version depending on a new version of Give me version of this see either this version or a newer one From this can we immediately conclude that if lip see 219 is known to be installable. We conclude that core utils will also be installable How many people think we can immediately conclude something from this? How many people think we cannot conclude anything from this? Well, that's good So it turns out with negative dependencies and without native dependencies. It doesn't really matter With negative dependencies. Let's see or whatever it depends on could just conflict with core utils We're broken. It's out and without negative dependencies You could have one of the things where it depends on a version that's newer or older and now It's it's not really too real to do you can't conclude something locally in theory and that's Something that can become a problem. Anyway, it is highly tractable in practice Because if you do have a break at conflicts or otherwise negative dependency it tends to be up about so if the previous version of Core utils was installable with that version in a new version will probably also be Likewise most dependencies Circular or the otherwise tends to be up about There are cases of version ranges, which is a lower bound and a number bound at the same time they exist They're just not so common yet, and that's a good thing And then also the number of alternatives the number of Possible solutions to any clauses actually tends to be fairly limited Of course, there are the exceptions Packages from stable They might just be missing in which case it's really hard to solve the dependency You got mutually exclusive packages like all the sent mail stuff MTA's The Russian ranges I mentioned before and then of course the strictly equal versions which you see inside packages built from the same source package and The redeeming feature here same source because that is actually very simple to figure out if they're from the same source So they tend to be upgraded lockstep anyhow The new versions tend to be available at the same time sort of thing The problem made hard in a nutshell is this contrived example for example You have a package you want to try and it depends on any number of foos and each of the foos can either be solved by picking a bar package or the good one which is Might be a hint to which one we would choose if we want to solve it and The bar ones depends on a bad package that does not work with the starting package. We are broken and The good package just doesn't have any dependencies that they're for good in the perfect example You're so with try foo good for one and then good and be done in Practice if you don't have some way of ordering these so you know one of them is better than the other you now have an end time M Trace where you have to guess up and down which one am I doing back tracker back forth and all and it's gonna be a horrible and This contrived example can of course be solved by some solvers that can see the pattern, but you can always make it more Contrived pattern that's harder to understand. So I Try to keep it somewhat simple So the problem here is the good thing is very few people might suffer like this And that is the most redeeming feature about the package graph We do have exceptions a spell dictionaries I spell dictionaries They're basically if you depend on I spell I spell dictionary Which is a virtual package provided by 50 different I spell packages or a spell packages and Then they depend on something else after that So in theory they can they can be a part of creating this problem Fortunately they themselves are fairly simple once you pass them You also have multi art foreign or multi art allowed with package any dependencies So if you're multiple architectures you in theory can have any number of things satisfying the same clause This gives extra unnecessary work for most is resolvers if you enable them to do multi-arch and cross things But unless you do more than Well, we have 12 of them, but I suspect most users multi-arch are somewhat limited to two maybe three Yeah possible exception being people like writing solvers and trying them out trying to torture system so The real problem if you had to do this in the archive you would need to do for example something like write in different ork implementations We have three And then afterwards for example have M different distinct but equally valid lip-seed implementations It's not something we likely to do in the near future and you have to do this on multiple levels because You can just pre-solve the essential set most of the time so this particular thing is not so interesting and the data packages They can in theory blow up when you have in true truly interchangeable data like we do with the a spell packages But they tend to be either not having any dependencies after the a spell after the big data package Well, they have a simple loop back into what you came from So the blurb tends to be limited to one level it if you have enough of those that's still going to be a problem, but It keeps it simple So on to installing The easy case as mentioned is when you have a single suite and a single architecture a Lot of things in the graph collapses into a what looks like a regular graph and therefore become simple trivial This actually happens so much that if you take lynching it has 26 distinct dependency clauses if you look at each of them When you only have a single thing sweet and architecture, there's the most one package Directly solving each clause locally then further down the graph somewhere you depend on for example. They've come or they've come To which is one of these alternatives Giving rise to blow up my lynching it itself. It's not subject to becoming a backtrack point. There's no guesswork when you reach lynching Which of its dependencies should I pick which version of it? No, that's that's not there. You have to go further down to see that so and Actually by the time you reach the dev conf thing it has already been solved by something else As I recall No, it the dependencies of dev conf is already covered by the time you're forced to resolve this choice So it's not really a big issue either. You just pick the dev conf one see dev conf currently depends on dev conf So it's that way to rule So the question is when do we have these conditions? We do that a lot actually if you do builds in unstable for example in a pure unstable see a truth only one sweep only one architecture piece of cake if You install packages on a pure DSS system with multi arc You have the single sweet one which limits a lot of them and then you have not the single architecture That's what it is But I'm pure squeeze you're going to do not have multi arc because it didn't work that back then so that's even simpler Also, you can do unstable plus experimental or stable plus back ports if you do if you fiddle a bit with the Resolver and say it is not allowed to pick experimental unless it is explicitly requested to pick experimental and From there it only picks what it absolutely needs to solve it then you can mostly work around the You basically get the same result as a single sweet restriction But that requires you to actually code something for it and then there is Britney which The testing migration thing and that's because she basically takes things moving into a new version of testing and Try to test that this is still the same solution So she forces it into a single sweet currently So these are common cases where it happens. It's not everything that happens like this So stable the stable upgrades are still funny enough not a single sweet and all that But it happens because there's only one package and architecture Only one instance of a package and we have one version it won't have one architecture that solves that particular dependency Whereas a multi-arc you could have the eye DI 386 and the MD6 if MD64 version if you do an upgrade you can have the old version the new version that might or may not satisfy it Also, we have this thing with we don't really like Libraries to have alternative implementations. It sort of breaks things horribly, especially when they're not actually agreeing on the interface they provide There's the social aspect of it that we don't really bother having 200 Interchangeable implementations or everything. I think the record is 79 5-6 different version of sent mail implementations Do we have more than if sim null mailer and Postfix and some other thing. I don't think so and we might have and Even when you do have one of these Explosions It has to actually hide the breakage beneath one of these choice explosion alternative explosions for it to be a problem You might have a spell over here on run hand side and you have a breakage over here Which is true to find out so if you solve just the first explosion to solve the other part if realizes This is not working. I'll just bail out so There's a couple things that we We do to make this easier The most interesting thing is of course can we still make this simple without the single suite and single architecture restriction? We can do to some extent And that's where we moved to upgrading in deterministic polynomial time This is where I spent most of my time working And So to start with when we do an upgrade from stable to stable and here I'm taking pure stable no back ports and all What we do The general recommended way to do it is replace We see with Jesse hit up get update an upgrade at Upgrade afterwards and app replaces all the old packages with a new version if they're then new version We also want all the packages from we see that do not have a new version in Jesse to be removed And there might be a new essential package somewhere So long story short upgrade if it's present remove it's not present and Then install the new essential packages if any and they don't think tend to change that often I'll be going installing part for now It is of course valid and interesting, but We'll skip it So let's take a simple example Somewhat contrived somewhat made up the numbers. They're not too far from reality, but they're not exactly though We have a system we are operating from we see to Jesse I claim there are 30,000 packages in we see we got 35,000 in Jesse The system we are working with has two thousand packages installed Mine here has something like 2200 if you do a simple d-package DSL and then do a line count you can see how many you have your new system plus minus five or so To give you an idea of how many packages we're actually working with and we assume that all We see packages got a new version in Jesse because I'm about to do a pop quiz So with these numbers What is the maximum problem size of this upgrade? Any takers is it 30? Anyone for 35 37 60 37 yeah, one for 65 Yeah, one for 67,000 And who believes I was an asshole and didn't include the right answer. Oh, who's so little faith? I Glad you believe me the right answer is 35,000 It doesn't show. Oh, well, the right answer is 35,000 The trick is When we do the upgrade we replace we see with Jesse we do an update So all the we see packages not in a solar system disappears and then you get all the Jesse packages With the assumption that no Jesse packages have a new version compared to we see You got the 2000 from we see on your system plus the 35,000 packages from Jesse So that means your average stable to stable upgrade if you do it this way is actually Only about 40% of the worst case it could have been if you kept the old stable as well There's also another awesome feature with removing the old stable repository It means every time you upgrade a package your effective proper science decreased by one that's Not all the time always useful It's not always awesome and not always the thing that solves your problem But it means if you can actually make after you upgrade a single thing every now and then Eventually you might be able to figure out the rest of the upgrade path after you spoon-fitted 200 packages or so As we upgrade we end up towards we end up moving towards a single sweet situation So the more things we upgrade the more we get to a single sweet situation again, mostly true for pure stable to stable upgrades and Of course the right answer would have been 65,000 packages if you kept the old stable so That would have been a possible answer as well now Upgrading this asset should be as easy as Mark all new and central packages for install Mark everything that has new version for upgrade remove stuff and Then figure out if something's missing then install that This soul stuff by up this stuff souls are agreeing by installing so I'm not really interested in doing this because Installing is hard We can do something smarter than that for upgrading when we do you upgrades In a political anomial deterministic time I'm not going to give you a hundred percent solution. That's not gonna work if I could I would be very very rich so I tend this to Sort of find some easy low-hanging fruits that That can be sold cheaply and then you can throw a general purpose resolver at the rest of the problem after you've done the simple parts Or you can feed your solar a partial solution and ask it to fix it up. So it has Plus it'd be a slightly smaller proper size It relies on two things one it explores a lot of common patterns in how we do dependencies and The other thing is if you have a valid system state So if you have a pack a system where all your packages are in fact installable, which do you package tend to enforce? You can verify that is true in polynomial time You can also verify a change to it in polynomial time so Here I'm going to do a bit of theory and a bit of English in between So we start with a not broken system. That is we have a set of installable packages that are multiple co-installable than all that We call that I Then we can add things to I we can remove things from this set We can take something in certain places than the else basically let me move And that can be done in linear time if I take a random package mash it in that's constant Maybe depending on your implementation The theory is we can compute a new set where we move something at something else we get a new set and Then we can verify this new set is indeed still a valid solution in polynomial time so With a constant time modification can be verified in polynomial time the issue of the day is Randomly feeding packages in and out of that set is not going to work very well So we need something smarter than just random modifications Well, somebody might actually write something that works with that but anyway So the first thing I did as I mentioned earlier. We have these exclusively equal dependencies Insight binaries or between binaries from the same source So I grouped binaries from the same source if I was to mark one of them for upgrade I would immediately pull the rest of them as well if they were installed It happens to also sort out a very common case of breaks replaces when you move files between two binaries in the same source Then you just try to upgrade each of these groups in some order preferably deterministically, but I didn't get that far and If the result of one of these modifications are upgradable We can test that very cheap we committed and you just rinse and repeat until you run out of things to test That leads to something new this works to pen largely depending on what you have installed Unsurprisingly, so if you have a few packages, it may work better than if you have more and sometimes it works better when you have more And that's really exciting so I did a example we see Installation based on what I have installed on my machine Not quite but close Well related and so lip see for some was immediately installable by the upgradeable by this procedure and as well some Java package worked fine Mandaby Cliffs extra then I had to do a couple packages before that which were all Upgradable. I think lip see was primarily it and maybe some lip Selenax thing But they are couple packages you can actually upgrade with this Or set a package you can operate like this It is of course not testing on configurations in the particular configuration I could upgrade the package like this, but I don't expect you to be able to do that in all cases in fact, I find it highly unlikely because We see to Jesse In we see to Jesse deep package has tons of breaks for all sorts of things So if you have the right thing there you break something else and that has to be upgraded at the same time And that is sure to build a loop eventually So Basically what we exploiting here is the greater than equal version, which is the common way of doing things We read we rebuild the package it gets a new depends depends in a higher version of things That means the foo in stable Now as it goes right, so everything depending on foo is Actively happy with the motion in Jesse as well because it's a low amount Jesse has a new version So that works very well. This is the common case for libraries with our IBA I transitions and apparently including lip see and enables You to upgrade from inside out From the core of the onion and out if you think of the graph is an onion The algorithm is fairly dumb Unsurprisingly It can't solve pearl it can't solve Python because they tend to involve a new package You're pulling in a new package. You have to install that and all that And in pearl it could technically solve if it would merge groups together into larger groups But anyway, the runtime of this is something like I to times I plus e which is Overbound by I to the power four. It's fairly cheap. It is fairly polynomial. It's very trivial to do We can do better than that as mentioned if we can have it figure out that deep action You need a new version of C we could up with this together It was not from my system, but it might be another's and Then there's the part with the page breaking other stuff So you might have to upgrade in at the same time or before the package and then End up with some sort of big loop or three structure of groups. You have to migrate together or upgrade together and it should just work There's also the part where we have to handle reading packages and This comes into my ends basically they come in a bi bump, which would be very useful for stretch due to DCC 5 Basically if you want to do a same restriction on this you could do something like We have a close that's not satisfied But the thing we need for it is a package introduced in the new source And then we pull that if it has same dependencies we already solved that sort of thing There's also the part where people read in the package from Food to food replacement or something like that again if it's from the same source We might do some magic here some heuristics here And Also at some point you will end up needing to install stuff that is actually required for VC to Jesse because it pulls in a new init system There if you had triple no-guess solutions by which I mean the first thing is of the Alternative choices is a real package and there's only one of them You could solve this automatically in the domestic way And otherwise give up and try something else So this is the basic algorithm and the basic ideas I put down and I have been fiddling a bit with and didn't get very far with So far installing part two so After we learn this with upgrading we can now in singles singles we single-activity is trivial upgrading we can Usually true it reduce the problem size off to some extent We can do better on the installing part as well we can Look at these a spell package again Because they are one of the big problems We have a thing that are we have a couple packages that have That appear to be almost identical And then we also have packages that are clearly superior to others packages are Identical shows up like this So we stay long enough in a spell package from Russia and the one from Ukraine something very special happens You will notice that the primary difference between the fields I have selected are the package name and the version exciting Because that means if I can solve if I can store a spell UK As the only thing on the system The same solution is valid for the Russian one So takes a trolley to differ, but semantically they differ only by name and version which And sometimes you can actually have version dependencies that are always satisfied in old stable for example, then that's the same game actually This is a simplified view because in theory you could have a package that breaks That refused to work with the Ukrainian one, but not the Russian one and vice versa. So they actually become distinct solutions But the general use of a spell dictionary tends to be I need one of them and I really don't care which one you pick So we find those by saying they have the same effective dependency clauses We can also look at the negative dependencies. That's fairly important too. They have to have the same there and Here we need to merge Here we have to remember that negative dependencies. We like to think of them as directed. They're not So it really doesn't matter if food breaks bar or bar breaks food. The important part is they don't work together and Then we have to figure out we have to assert that the satisfying the same clause Both of them are a valid solution to the same clause as the other one and This becomes interesting when you have the dependency ranges Which is not truly a dependency range, but you have two clauses that says I Won't eat food greater than version one and I needed strictly less than version two Then one of them contains both of them and the other one doesn't When you have where you're doing upgrades for example So it has to satisfy the same clauses as well a little trick thing. I discovered a little too late, but it's fun These things Can generally be done in polynomial time That ballpark. I haven't really computed the actual results But we can go further than that because equivalent or identical packages are Easy easy to find but there's something stronger than that are better than that not stronger better We can also have the case when one of the packages are clearly better than the other for example, it satisfies this It has fewer of the same effective dependencies. So I need less to solve this package It has fewer of the same negative dependencies So there are fewer things that do not work for this. This is all good. And finally it solves at least as many dependency clauses as the other one This is truly something you find in upgrade scenarios So where the old version Your version and a new version do not have any different dependencies. For example, they don't have any new conflicts relations But the new version might solve more diverse dependencies. It might have more packages that can solve In that case, you can just unconditionally take the newer version because if the solution works for the newer version, it May or may not work for the old one, but if the old if the solution worked with the old one It definitely also works for the new one. So that's sort of a freebie You just solve it for one of them. If that doesn't work, you know, you don't have to try the other That's important, but and the purpose of this is basically or the point with these are basically that identical packages are just two-way substitutable And this being a one-way substitution instead I haven't figured out how to find these in general more effectively than I do the identical one The identical ones are easier to find But I would like us to get somewhere we can find these superior packages truly Because they are more useful in general or they are there might be more of them as well and This is as far as I got Including writing slides up to five minutes before the talk. So are there any questions or There's a mic there or there's a runner For your interesting talk I've two questions. In fact, the first question concerns your problem of finding equivalent pop packages Yes, are you aware of the co-inst tool? I Have read about it. I have I read the article about it. I am I Remember the thing I remember with that was that it wrote somewhere that This should blow up exponentially, but it doesn't Well as with all theoretical hard problems, of course, but yes, so this tool in fact twice to Coop to to find out well to classify packages with respect to To this which other packages they are Co-installable. Yes, and this works one of the steps is to group together packages which behave the same With respect to co-installability with other packages. So this would be your notion of equivalence if I understood it correctly It sounds like it At least it's very similar. So I so maybe maybe we can do this offline later We can but this is definitely something you should look at I think and my other question is About this deterministic p-time install ability I Mean of course you're well aware of the fact that this is a result which holds under some Restrictions of the problems, of course as you haven't as you don't claim to have sort of p equals and p So can you for instance? We know already that it's it has this good complexity when you don't have any alternatives as you also said These are explicitly or implicitly and also when you don't have any conflicts These are implicitly or explicitly You seem to have in a different class. Can you characterize precisely under which conditions on the dependency graph? You get a p-time p-time complexity of the problem so if if Actually the negative dependency is not an issue if you take a package and the entire transitive dependencies below it Can always be solved By looking elsewhere in this so you have a set of transitive dependencies and you can always solve it by following a Clause where there's only one option or there's an option you already picked earlier This package you'll be picked earlier then locally this becomes a Simple regular graph. Well, you can massage it into one I Don't think it happens often enough for us to rely on it at 100% I think there might be cases or corners of the graph where it happens locally and then when you leave that part Go further down the stack or outside that part you get back to the original problem But you will have parts of it that are Locally polynomial and Which would be interesting to find So whenever that dependency problem fail, and you have to do manually I always thought that this would be an optimal Place where user could submit their way to resolve the issue to a public place and then share the information through other people could see and then Easier to solve the dependency and you have things like popcorn where you already Upload what package you install so it would be be a wet small step towards upload The problems you have and the solutions. So I'd like to hear if you have any views on that. I Think it could be interesting to have that but mostly as a way of generating test data for solvers My take it is that as a user This problem should be simple enough that we can solve this in the tools So the user doesn't have to manually fix the problem But I said feedback getting actual test case data might be might be useful for that There was a question there in one of your slides you save that in order for Upgrade from a stable distribution to the next stable one It is only considered to be Successful upgrade when you have only packages in the newest table Which means that if you have a package which does no longer exist in the newest town in the newest table The package is removed. Is that really a good thing? It This is the short version of it. So Usually when there's not a new version in the new state of release, it's because we moved it and we no longer support it I Suppose there are cases where the user might still want the old version of the package to be there if it's installable and all This is one of the cases where we might be debating over what is practically useful and what's The third idea of doing a disk upgrade the micro line So this was just useful for me to get an idea of where I was headed where what a problem I was trying to solve basically And So So maybe I can just give a remark to your question. So The problem when you are trying to pre-compute solutions to Installability problems then it also it of course it depends on From which suit you are taking the packages that you want to install or upgrade But also what you have currently installed in your machine and this of course There are hardly two users who have exactly installed the same packages on their machine and for that reason It's it's difficult to to pre-compute these these kind of problems I would say and the other remark is that in practice Even for very very hard instances with like you have many different suits from which you install this out having any pin Preferences among them stuff like that in practice. You get very good Solutions this other solver technologies like that solving or other other solvers Which can be used as external solvers in APT for precisely the same reason as you all already have have identified But in practice even hard theoretically hard problems are in practice often much much easier than what you could obtain in the worst case Think we're running out of time. So if there's no more questions, I think it's cool last one one last question So do you have any preliminary results from timings or something like that at least for your system or you've tried it? No What I got is from Brittany actually I Added some stats to Brittany Because we got a data set from Kali Linux They were trying to do go from their version of VC to Jesse which was in August so that was before Jesse free throws and Basically, we were working with 7,000 70,000 notes or packages. So basically all of VC plus all of Jesse had an unique version each And the interesting thing is that for example, the average package average note in that graph had like four three to four dependencies the million was three and dependency clauses and the average was four point three and each of these clauses had an average of I Can't read this shit anymore And It had a million of two Two possible options for it and in average one point eight and this is before Brittany accounts for Excluding things so this is the raw graph and then based on that she selects something that is in testing and everything outside that she ignores so when I have two options a Lot of the time she would throw out one of them because that one is not in testing So that's that's the numbers. I have I have some more if you want to see them, but then It's not very detailed. It's not very it's not the point where I think I can predict something useful from it so, okay, thank you news for your talk and Was really good insight into the problems of dependency resolving. Okay. Thank you