 Okay, this talk is about the sub-solver, and basically the algorithm is behind it, which is actually not really the reason why the new code is so fast, but the real reason is the different preposterous handling, but this is a rather boring topic. The interesting topic is the solving part, so I'm talking about the solver here. So first of all, what was the issue with the old solver? I mean, it just got replaced for 10.1, and now we're replacing it again, what on earth happened. So basically, it turned out that it was just too slow. It was written, it was the old solver from the red carpet project. It was basically written so that the company can keep their workstations up to date. So it was written with a mindset that there's one small update repository that they call the channel red carpet, which was subscribed to, and then the solver just always installs the newest version of one package. So when we used it for complete installation, especially with the build service, where we have 10 or 20 repositories added, it just broke down. So we really had cases where the solving took several minutes, so that's why in the code we disabled some things like the original red carpet solver branched at every alternative and tried every solution, and then had a metric and chose the best solution at the end. But this turned out that we couldn't do that with so many packages. In open source it has about 10,000 packages and it had 10 minutes or so. So part of that code got disabled very fast, but then of course you don't have an optimal solution in such cases. And the other problem we had, it really could get stuck. There were times when it couldn't find a solution, just hung. Probably due to bugs, we never found out exactly what was going on there. So the just team actually implemented a timeout, that after a couple of minutes you'll get the refaster, something's wrong. Do something different, I can't help you with that solving, so that you get super feedback. Another issue we had, we did that extension with the big dependencies where we have recommended packages and suggested packages and the idea behind the recommended packages is that they basically get installed when it's possible, but as the code was pretty fixed at the moment because we took it over from the carpet, it really didn't integrate well into it. So basically the solver more or less treated recommended packages like required packages. It couldn't go back and deselect the recommended packages and then branch something else because then it was to the code. Are you okay working on the Zipno or the pre-perfect? That's basically for 10.1 and 10.2 it's the same. The Lipser code took over the solver from Red Carpet. So the Red Carpet code, as you can see, was bought to C++ and put into a little bit, but the algorithm is the same. Another thing which really annoyed the users is the bear diagnostics. So if the user lecture turned out to be unsolvable, then you got a re-tester telling you that Libfuba requires dependency bar and none of the providers can be installed and you think, gosh, what's that? I never pre-twisted Libfuba. Why is it telling me something about Libfuba? Why that other package installed? So the user just doesn't know what to do with that and the suggestions, as I've already told you, was don't install Libfuba or break Libfuba. The user doesn't know what Libfuba does and why it's there. So this is also great. But the thing that the new solver will do very, very much better than the old solver. Speaking about the new solver, why is it called SAT? That's because it's the name of the standard problem from the algorithm for us. It's called the Boolean Satisfiability Problem, which basically is that you've got a big Boolean expression with some variables in it and and or or not. And the job is the algorithm must find a solution for the problem and solution is defined as assignment for all the variables so that the resulting expression is true. So this is actually an empty complete, so this is a hard problem. And when the clauses are in the expression here, it's not very trivial, it's empty complete. So what you normally do with that problem and what you also do with the algorithm is you do some sort of search with the backtracking. Of course, easy preprocessing is normalization. So you have that big expression to normalize it so that all the expression now has this form. This is some variables with or in it. And then, of course, you can have a negation here and all the terms here are connected with an and. This is normalization form. Here's an example. You have A or B or C and not C and not A or C is true and the solution would be set A to false, set B to true, set C to false. Then this term is true because of the B, this is true because of the C and the last term is true because of the A. So this is the solution. Normally there are multiple solutions for a problem, but I come to that later, not to do that. What are the advantages here of using the sub-algorithm? The very big advantage is the very good research problem with lots of papers and lots of intelligent people have thought about how to do that really, really fast. There are very, very good algorithms out there to solve such problems. An example would be Schaffer's very good solver which introduced some special things to make it fast and Minisat is pretty much state-of-the-art and actually my code is based algorithm-wise on the Minisat solver. So it's really, really fast because it's researched that well. Actually the good thing is for the sub-faults package solving is basically trivial. They have yearly competitions where the solving can compete against each other but they have millions of rules and hundreds of thousands of variables. So our little dependency problem is for them so small a problem. So the sub-codes we have normally used to solve the problem in milliseconds. Sorry, they don't even bother about thinking about that. Their problems are solved in minutes or so or hours so it's really not hard to solve dependencies. The code algorithm is pretty easy to understand if you know about the come from the sub-fault. If you read some papers or so the algorithms are very basic and I'll show you later on some of the main things how solving is done. So it's not hard to understand but it's just a couple of hundred lines of code where as the old red carpet solver used the sub-faults a couple of thousands of codes so it's about ten times less code to understand and that's good because if the community is working on it with ten thousands of lines of code it's hard to find people to really dig into that code and understand what's going on with just hundreds of lines of code people will contribute to it. And of course as I said the algorithm gives you a nice little if there's unsolvable it gives you really good suggestions how the problem can be turned into a solvable problem so this is also much better than with red carpet. So let me start digging more into the depth how the normal package dependencies are turned into a sub-problem. Say we have a package A and A has a multi-dependency that you can find as dependency B and B gets provided by packages B1, B2 and B3 so the idea is now that this can be transferred to the following rule. As you can remember the rule all rules must be true and all the terms and rules are connected with war so the rule is it's either A is not installed or one of the three packages here is installed this is exactly what the requires also says if it's installed then we need one of those otherwise I'm okay with it. Same with conflicts if I have a conflict dependency A not the problem with conflicts with B and B again is provided by B2 obviously what my transfer is then three rules from that namely minus A or not A or not B B1 so this is true if either A is not installed or B1 is not installed but just only false if both are installed that's exactly what the conflict says you must install A and B1 if the second one is the same you must install A and B2 and you must install A and B3 and obsoles pretty much work the same way normally for installed packages they are ignored but for uninstalled packages obsoles are treated as conflicts because the solver doesn't know but if you select a package A for installation and A was not installed before you select B before installation and B is not installed before and if A and B obsoles what is the result if the first install is A and then B then both packages will be installed but if the first install is B and then A then the other one is obsulated so it's undefined so it's conflict because actually they can't exist very well in the system together if the other package is already installed then of course obsoles is ignored and then it also works that's actually it that's for obsoles that are really directly in the package but of course they are also indirect obsoles namely packages with the same name if you have some package A which has version 1 with another package A which has version 2 you can't install both because they obsolete each other more or less so if I would install it with minus U with RPM then the other package is redundant also packages with same name also get those conflict this is of course where you can do special casing because in SUSE anomaly if you have a kernel then you catch dumps in and secure it up therefore a kernel doesn't automatically de-install the other kernel so wherever you have a special case you would just drop the rules for those special cases but normally you want to also need packages with the same name you want to conflict those so there are also unary rules these are basically special cases when there is no nothing that provides a requirement then of course we have a unary rule that is not A which just stays to the sub-solver package A can't be installed so this can either be because what is provided or maybe it's a request from the outside this is where the user interface comes in when the user selects erase a package then this rule gets added so that the solver doesn't install it because the user wants to erase same with installation the user clicks on once a package installed then what really happens down in the machine is that unary rule is just a package gets added which stays to the sub-algorithm as this must be true A must be installed true is installed and false is packages don't install package so it's previously was installed uninstalled if you have questions about something just don't hesitate to ask me now some slides about how the solving works and then I'll show you what that means for dependencies so the solving algorithms the main algorithm is this unit propagation and this is a special word from the sub-forced the rule is exactly called unit when all literals but one are false and the special thing is if it's unit then the last literal must be true so here's again the example say this is false because it's an assertion so C must be false because the complete expression must be true then this rule over here is unit C is false as it's only one literal left this one must be true that means A must be false and then we have to look here we have C was false because of this one A was also false this rule is also unit so B must be true so we have a solution for this problem between propagation and the complete solving algorithms works like this if there's nothing for propagation we need a free choice so we pick some undecided variable assign basically a random value this is all first this part is basically heuristics and then the next step is all rules that are now unit and if the F must not propagate if no rule is unit any longer then continue with the first step and do this as long as you assign for variables and then you found the solution of course you're thinking this is stupid picking some random variable won't help us very much but here's where you program the direction the solver should take so here's where you program that the solution must be minimal must change the minimum number of packages or must update as good as possible so this is where you can program your goals and this is what's forced about this unit propagation is that what's forced from the dependencies from the RPMs but I'll come to that later too I can show you what unit propagation is if you think about RPM dependencies so a request rule say this rule is unit that means all must be true except one now if say B3 is the one that is not unassided and all other is false that means A is false so A is true B1 is false and B2 is false and build propagation says we must be true but that's basically what you'd expect because this is just if I put it into a sentence if A is installed and B1 and B2 are not installable then I must install B3 so if I for dependencies and for a package that is installed and all more alternatives if I just reduce to one left that are installed then I must take that alternative so this is actually very easy to understand so what this adds packages to the set of installed packages this is basically how every other solver also works it checks dependencies that are unsolved and if it's more than one alternative it may try branching or try something different but if just one package left to install it chooses that one this the thing is if A is uninstalled then this is what normal solvers normally don't do if A is still left and the others are false so if none if all the providers of the dependencies can't be installed A also can't be installed this is normally what the solvers don't do so this adds packages to the list of conflicts or packages and conflicts is a set of uninstalled so this says this room says if those are forbidden for installation then I must also be forbidden for installation so this is very nice to have because it runs the set of packages that this one runs the sets of packages that are uninstalled and this one runs packages with list of packages that are conflicting or must be installed conflict user of course if you understand if A conflicts with B then if A is true that means A is not installed no A is true that means A is installed but that means we were not before this is pretty straightforward so of course if in normal solving I will probably know you sometimes get contradictions same is true with SAT solving the unit propagation can lead to contradictions here is an example with rules the description and maybe the SAT solver chose to install A so this is the rule is the unit so we know B is true this rule is the unit because A is true so no C is also true and then we have this rule it tells us B conflicts with C we have to establish to install both B and C so this is a contradiction so what happens is that the SAT solver algorithm checks and learns from all the rules that were involved with the contradiction and learns a new rule and adds this new rule this is sort of a learned rule to the set of rules in this case it's very easy the learned rule is just I can't install A and we get this contradiction but the learned rule can actually contain any numbers of literals with knots there can be more complex than this one and if I can't go back, if I can't undo all steps then of course the complete program is unsolved in this case I could go back because this was all the contradiction was also only there if A was set to true I could undo the steps led to the contradiction and then continue the solving and the idea of learned rules was a major breakthrough for the SAT forks and determined in 1969 in what was first implemented in the grasp solver this is really what makes the solving reliable so that's really always if there is a solution then the SAT solver will find it so otherwise it will return and prove by the what's unsolvable so the code is really reliable because of this it doesn't get stuck somehow it doesn't get stuck in an endless loop like the old solved by the time everything better than the old no but it's really it's surprising actually when you know the algorithm that the other solvers like smart or don't use the algorithm because it's easy to implement and it's fast and it works really really well okay but anyway Susan must let that's why one part excuse me I can't read the development no okay that's a joke okay but time back to the free choices here is where you can direct the solver on what the aim is from the side normal goal is try to keep packages that were installed installed so arrays as less packages as possible and also minimize the number of packages that get added because you want the user doesn't want the changes that don't need to be done and the algorithm I implemented to do this is pretty easy first of all if there's a free choice it checks if there are packages that were installed before and are not yet set to install then it chooses those so this is first step we try to keep all packages installed that already been installed of course it depends on what the goal is if your goal is to always have the newest version installed it would change this to if the package was installed try to install the newest version so this here can tell how the solver should behave and the other part is stealing how to find a minimal solution if we have rules that are not yet true because if we have rules I don't even want to look at it anymore and all negative literals are false then I can choose from the positive literals any packages with some metric maybe the best version and install it so in here with our example if A is true then I have an unfulfilled dependency so I have to choose between B100 B2 and I normally choose the package with the highest version and then install it and the strategy is if those two points are done I can set all packages to false and have a valid solution because of this part here all the rules that have negative literals are already done so I can just set anything else to false and have a solution and this is the minimization part because I just it makes sense to this is where I must invest some work where I must install packages to fulfill dependencies and if all packages are fulfilled then I'm basically done that's maybe the best way to explain okay let's talk about policies the thing is if I only have the dependencies coming from RPM when the terminal solution is always don't install anything because no RPM installed means no dependency broken so we are finished but this is not obviously not what we use around so we have system policy rules the policy rule basically defines what to do with installed packages some policies maybe must not be de-installed or downgraded or must not change architecture that's also normally what we in SUSE do so we have installed packages which is 32 bit we want to keep let's talk about GNC with I686 and the solver this is installed the solver must suddenly change it to I586 so we insist that the architecture doesn't change without the user confirming it or a vendor changes the same thing if the package is from SUSE then the depository contains the package say from PEPRIN then the solver must use the other package without asking the user that it's okay to change to a different vendor so this is such policies are defined with policy rules and the rule format just looks like this you notice there's no negative packaging it so this says A must be installed or A2 must be installed or A3 must be installed or A4 must be installed which pretty much finds the package says to the solver you can replace this installed package A with any of those but that's it so packages with different archer different vendors are simply not in this list so the solver now says it must be installed now normally as I said with those packages and this system rules you normally get unsolvable problems because maybe you want to install the newest version of emerald which needs some other package from the pacman repository also then you get the package system unsolvable and if you want to ask the user is it okay to switch my vendor this is done with standard problem reporting mechanism and the trick is as I said systems without just when you look only at the rpm dependencies then it's always solvable so if that can turn that around if I find that the system is unsolvable then there must be at least one rule involved in that proof this is either a job rule so the user click install me that package or erase the package or policy rule because that is this package must only be replaced with the net package so and furthermore I get from the algorithm all the rules involved in the why it's what's not solved with the contradiction if I now break one any one of those rules the system gets the conflict is gone and the system may be solvable again so the suggested solutions is just do a basic any one of those rules normally we just as of this we have at least one system or job rule and we can create suggestions by leaving out all the rpm because we normally don't want to break rpm rules because this leads to an inconsistent system so we just say is it ok to break that job rule which means you click install that package maybe you don't want to do that and just leave the old package alone or don't install that or maybe do a policy rule that means is it ok to delete that package or is it ok to change the vendor so this is and the good thing is the user knows all that so he knows what's going on there because the user either clicked if it's a job rule the user either directly clicked on that or the user has also an understanding about those policy rules because they are so easy they just don't change the vendor don't change it out or whatever so this is if the user gets suggestions that he understands ok so basically my I stretched the surface of the algorithm so if you are interested in more in depth part you have to currently have to look at the code or ask me about the code is in the library called libsatz solver which is currently in the opus of the factory unfortunately as we are heavily hacking on the code the documentation really really is not basically how peaceable is it to use it in other than just it's actually in SUSE it's used in different projects not only in SUSE because it's just a generic solving library well I was more thinking outside of in SUSE it would be easy to put in different but the library is so simple it could be maybe in the Python interface you do have bindings for it so you are focusing on actually making it easy for us to use but it's not an advantage for you you are focusing on doing it like peaceful like some others can do it's very modular and it's generic we try we don't it's not so specific actually it already contains some code about for Debian if somebody wants to use it for Debian solving because Debian has some different things which provides and requires so it's pretty generic and all the SUSE specific stuff is in Lipsy also we are working on but that's algorithmically not that interesting we are working in a repository format to replace the XML format which really makes things fast because the repository files are very very small compared to the XML of course that's easy and the trick is that this is dictionary base so we have we first have a string space in front of the file so where do we define all the strings and assign integers to the strings so just string 1 is there and string 2 is there and then all the dependency lists and so are the only list of integers which is good because integers if you have 64 bit you can still fit in 32 bit and with pointers to the string that's over 60 per bit so solving doesn't take more memory if you transfer a 64 bit machine and its string compare as dictionaries get unified string compare is that easy you can do an exact compare you just compare the integer if the integer is not the same you know the strings are the same so this is what's really making the solution but solving 2 but this is much more okay any questions on the previous slide ah wrong version when you're talking about trying to remove a rule to find to change from an unsolvable thing into a solvable thing if you try removing one rule and for every rule and you still start remaining 2 or do you not try that actually what happens is that there's a function called refine suggestion that looks if it's now solvable and if not this adds more it knows what to do it it just adds more rules to remove because normally if I delete say Perl or so then I want to list all those which is all Perl and don't need to be removed and this one will just give you one and then you click okay and then it brings up because the next Perl rule and that's not what the user likes the user likes to list so this is why the refine solution then adds more rules to it so we're doing some clever things there so user interaction can be minimized but this is more complex to come it's very hard and I kept on the easy stuff here another question yeah presumably once you found the solution the intention is that you do a single RPM transaction to take the system to the new solved state and you thought about because RPM is not truly transactional we'll be lighter together in the fact that for example if you have if you're removing an RPM the uninstalled script fails then in those cases there's no easy way of reliably backtrack doing rollback so if you thought about maybe doing a separate step where after you've got a solution that you're trying to get to then you break up the journey to that solution so there's multiple smaller transactions to make the thing more reliable so if one smaller transaction fails then the rollback isn't going to be use to tell you the truth Lipset normally just doesn't do it in a complete RPM but in single steps so it doesn't doesn't use RPM with one big transaction but with one feed to one RPM after each other you don't have to implement the transaction so but you still have the problem that if the script fails the uninstalled RPM's normally don't have some downgrade script or so because there wasn't something like that to go back to RPM5 the rollback is basically a new world screen and put back in what was there before and even if you rollback you're not moving forward you really want to move forward so the approach there is no solvable approach what was supposed to happen there's a solution QA to take out the script failure and type O and there's no discipline RPM goes a great length to compute the resolution of every file that's supposed to happen beforehand there aren't no failures the president is just a state machine and he went okay there are exceptional failures so if it fails then we'll get them out of failure but the script's out of the whole extra dimension but breaking it down into smaller transactions does partner and I'm less aware of that it's not too hard the RPM works loads and treat and so we could process subtrees as sub transactions but don't bother it but this is just the layer above it doing the real installation this is different than that I'm not working on that if that's another fault can we try integrating RPM? can we try integrating RPM it's just but feel free okay I noticed that when doing several doing it in many several transactions instead of one or larger