 Hello, everyone, and welcome to this presentation about a package and Debbie's little cousin and This presentation is a recording and so feel free and to put your questions on the chat window and At the end I'm going to be available to go ahead and answer those Okay, so a little bit about myself a This is how I look like so if in A Better world where we can see each other at conferences again. You see me in the hallway Please say hi come say hi and and we can talk about a package or about a You know these are the projects that I've been involved with and so I work for and I the company that Used to be known at National Instruments just went to a rebranding and my involvement with open source goes back about 10 years and We create a we have a platform of embedded controllers and we need to modernize that platform and so around 2010 we decided to use a Linux with the preemptive patch and a group was created to Build a distribution which we called it the NILs real-time distribution that distribution Was based on a open embedded a joke though And which got me involved with those projects and our software stack is pretty complicated it has a dependency Tree that is very wide and deep so quickly we started a Heating the corner cases and limitations of a package the package manager that is used and by open embedded and so when and There was an opportunity to step up since the previous maintainer and was already longer able to to maintain a package I Step up and became the maintainer and I've been the maintainer ever since since 2015 Below there are some other projects that I I'm involved a Jupiter hub and a soul stack So today I'm first gonna give you guys a historical context on the project Then I'm going to dive into the architecture Then the most interesting part of the architecture is the dependency management solvers So I'm gonna be talking about The both the internal solver as well as well as live solve then I'm gonna finish up with a some ideas of Where I see the future work on a package and there's gonna be some time to answer whatever questions Do you guys may have a please put those on the chat? So let's start with historical context and A package is based on an eye package. I package was created in 2001 by Carl Worth a Which is a long-time Contributor of open source and maybe better known for being the creator of a live Cairo to the graph library for Linux And I think a G its name comes from It's a bit see package manager and he started just as a shell script But as it started to become a popular it was rewritten in C and It was originally created for the links is a NS Ella are you to which is the appliance that you see on the right? And it's basically this appliance that you can connect to a Ethernet and you plug in USB sticks and then you got you have network storage and this platform was running and embed linux distribution However the project Lost momentum and the last non-commit was in mid 2007 so fast forwarding to 2008 and There was a linux distribution called open moco and which was targeted for Smartphones when smartphones started to to become a thing Open moco started using a IPKG as its package manager But since IPKG was no longer maintained a patch is starting to pile up and they decided to fork Since there was a trademark on IPKG and they decided to call it a package though from a open moco Then around the same time Marcin decided to Also adopt a package for open and better open and better at the time was using IPKG as well and all package it's actually Two projects it's a package and then all package utils which is a repo with helper scripts For example, there's open make packages which creates an index Make index and and there's a package build which is another script that helps you build packages and and this Scripts are used by OE to So open moco lost momentum to you because Android became the de facto standard for in UIs on on mobile and then most of the Efforts behind the development of a package moved to OE and below is a list of previous maintainers Thomas move with Dick Chen Graham Gower and before me a Paul Barker so all package now and Right now. It's under the jocto project project umbrella. They provide get hosting as well as the boxy la ticketing system the mailing list is on Google groups and I'm maintaining it and I'm Releasing it twice a year. So June and December. So there's an upcoming release And within the next few days And and I will say that at this point all packages is premature And there was work that I'm going to be covering later on the presentation and to create a plug-in solver and back end Which vastly improved the project and I'm also most of the features that you will expect from a package manager They're done So if you're considering a package manager for for embedded I am highly encourage you to take a look at all package Okay, so now I'm gonna be covering and some of the architecture I copy this quote from the original IPKG FAQ and It's basically saying that and All package should try to Do things like Debian does unless there is a very strong reason not to you And and I will say this is kind of like the golden rule of a package development it is all over the source code and At the very least that's something that I follow Very closely so if there is a bug report I try to see how Debian handles it and I try to model a package after that You may also ask and there's already very good package managers. Why do you need another one? Why not just use Debian a deep package? There's a few reasons some of them are a historical to me The biggest one the one that resonates more with me is is size So here I have benchmarks of an OE build of the Smallest image which is called core image minimal and with the different package managers And if you use RPM the package manager is gonna be DNF Which is based on Python and that's going to set you back 245 megabytes and if you use The package the package has dependencies on parallel and that's gonna cost you 37 megabytes And then a package since it's written in C. I only has one hard dependency Which is leave our archive is just gonna be 4.6 megabytes So big difference the internal structure of an IPK the packages that will package handles is super similar to a Debian package and you have an external and our file and Inside you have a your root FS in a data dot are and compressed Archive and then you have another archive where you have your Control file as well as your maintainer scripts So you have things like your pre-installed post-installed pre-move post-remove Which are scripts that run at specific times during installation remove also upgrades to give you hooks to Raise their things with the system a star services, etc And there's also an optional com files a file where if you have a Pats there those are considered com files So if you are upgrading a package and a com file was changed a backup is created And there's another optional file called MD5 sums so if you have that file with paths and MD5 sums you can run a package verify and that's going to Make sure that the files that you have on your file system has the same MD5 sums it as you know the files on the package and I have involved the Pieces that are required and the other ones are optional and on the right hand side of the screen and I have An example control file and on red. I have the fields that are and required the other ones are optional So really you just need a package name version and architecture To be able to to have a package install a file and that's different than than Debbie and Debbie requires more stuff So differences from the package There are a few and I'm going to highlight a couple The way architectures are being handled is is quite different and And on a package a you can define for specific targets which architectures the target is compatible with So for example, how do you define that in com files? Here I'm saying that this example target is compatible with architecture all x86 x86 64 core 264 and x64 and On the right the column the right does the priority so if you have a package that It's Available on multiple repos you will use the priority to determine which one you should install Following on the example, I'm defining three repos one for the architecture all One for core 264 and the other one for x86 64 So let's say that all of them have a busy box if you do a package install busy box You will install the one with the highest architecture, which will be the one on the repo for core 264 And and this is different than the package on the package targets have a fixed architecture And then the package being installed has to match that architecture And there's some flexibility like you you have the option to use wall cards or there is they all are and any architecture but In general is is more rigid The other difference is that a whole package tries to be a simpler So what I'm showing here is the state diagrams For a few operations here's the the one for install So when you install There's a set of things that that happen you first run your pre install script You unpack your files, and then you run a post install script And on the package there's basically more and fail safes so and if your prints prints a script failed The package has a way to recover from that. So you can basically call your post remove script with a specific in a parameter call abort install and You can have a code in your post remove with a case structure to recover from that and then if you can recover then a You end up in an okay state instead of on a half a Install state that will require some manual intervention to recover from Here I'm showing the diagrams for remove and as you can see is super similar the difference being that and The package has a way to recover from a Remove a script failure The last diagram is for upgrade and I'm really not gonna go through this diagram is Super complex as you can see What my bigger point is that Deep package tries really hard to to recover from problems that you can have with scripts and All package takes a simpler approach approach saying well even script failed Doing a bad state and you may need to go ahead and Follow some mental steps to recover Okay, so here what I'm showing is in the recipe for the latest version of all package and what I wanted to show here is that all package has only one hard dependency and that's on live archive and It has this dependency because a via live archive all the different type of compressions are supported and doing that with custom code a Was hard that was done in the past, but you know was Was a limiting factors and so and back in a package 0.3.0 and Hard dependency of live archive and was added There's a bunch of other things that you can select to to use for example, if you want to use and And Reposigning you want to you may want to enable GPG and if you want to use coral instead of a double you get then you may want to enable coral and The other one that is very relevant here is live solve and If you enable live solve then you will have a dependency of the external library live solve But then you will get a very powerful and solver for dependency management And I just to give an idea on on on space. I think live archive adds about in 600k all package by itself is about in 200k and live solve is about a 500k and this is on an x64 architecture so before we move to solvers and I want to talk about the Package ATS. This is something that pooled and Barker did And I think I think is great. I like it a whole lot. It makes my life so much simpler And so the way it works and is that it's Python based and you can define different scenarios and To check for for example here. I'm saying there's package 8 which depends on B and then there's package B And let's write that down to a package file run or package update and then install a Make sure then that a is installed if not error out and make sure that B is installed if not error out And I'm mentioning all this because this this is terminology and that I'm going to use on the rest of the presentation Whenever we go to a few and scenarios Okay, so now I'm going to be talking about and solvers so and There's been changes in a package since version 0.3.1 that Improved the solver a mechanism a whole lot so before in versions previous to 0.3.2 and I'm going to walk you through how the dependency engine worked So what I'm showing here. It's a scenario where you have package a that depends on both package B and C The green arrows mean it depends and then package C conflicts with D. So the red arrow means conflicts so if I say Install D Then we go ahead and install D and then if I say install a Water package used to do is that it will do a depth search and depth first search So it will start on a and say well and a depends on B Then it's gonna say hey does be has dependencies. No, B doesn't have dependencies. Let me go ahead and install it Then it will go and say hey does C have dependencies and say yeah, it depends on D and This it conflicts with D and this already installed It will error out, but then your system will be polluted because B will be left installed And so this was a major problem. It was basically installing as it was solving. It was a one operation So on version 0.3.2 and I split the operations to solve first and then Modify the file system. So see this is the exact same scenario You have a depends on B and C and then C conflicts with D So you will first solve so her here. I'm saying Let's install D and then let's try to install a During the solving a portion of the operation and we will determine that there's a conflict. So we'll say I Cannot continue and your system will be left Without modifications this second scenario is just Showing you how a successful operation will look like so We have the same dependency graph and then we say let's go ahead and install a the internal solver will walk to the dependency tree and We'll determine that this is solvable. So then it's going to pass all the things that need to happen to the second part and of The process which will go ahead and install a B and C So this was a great improvement Over what we have and however the internal solver remain and Remain to be ahead that honestly and it was the main cause of bugs And and it had a lot of tentacles. So if you wanted to Implement something new you will modify a certain area of the code that then will be Affecting other use cases there. It was very tightly coupled and And it was hard to modify. It was a very complicated. It is a very complicated code On the other hand dependency management is a very well researched topic. So with all of that and I managed to Convince my boss to get an intern that I could work with and to go ahead and Add support for and leaf solve dependency package dependency management library and so I got a chance to work with Eric you and We were able to add Leafs of support a tool package, which was a A major milestone. So what is leaf solve and? Leafs of is it was created by Michael Schroder and from Susie in June 2007 doing a hack week and is library that Solve a dependencies for packages. So what he did is that he grabbed a Mini set, which is an open source library that implements a specific type of sat solver And it's actually very small. It's like six hundred six hundred lines of code and and he re-implemented it both added a package manager specifics. So and for example when when you are solving you want a Favor to leave packages that are installed and installed on the system. So he added those heuristics and enhance a The core algorithm of mini sat to then have a library that is very well suited for a package management dependency a resolution and To me my point of view was this is a very hard a Domain and there are domain experts. So Let's delegate all that to them and and I can focus on in another areas of the project so and Leaves off a was created by open Susie and is used on seeper and on and Other package managers that are RPM based. So what about Debian and Debian has also been experimenting with Sat solvers What they did is that they created this protocol called EDSP external dependency solver protocol and then you can choose I've been installed a solver and use the dash-dash solver flag to tell the package to and I'm sorry a pity get to use a different and dependency solver So package could have used a Similar mechanism or could have plug into EDSP But honestly leaps all these working so well Michael the maintainer is very responsive and he either has guide me into Into the leaps all code base to to fix my own errors or in my own box or he has fixed them for me and That right now. I'm very very happy with Leapsolve and a nice project could be to enhance or package to also support and EDSP Backend but right now like Leapsolve works really well for us Okay, so on the next couple of slides. I'm gonna get covering a sat solvers So, where are they? And they're soft software engines that try to solve the Boolean satisfiability problem So the Boolean satisfiability problem What it's about is that if you have an expression like the one at the bottom The symbol next to the a means not The problem says okay try to find values of a b and c that makes a that expression true And and this problem is NP complete. So it's very hard to solve in a generic way And but sat solvers try to use heuristics and they have been very very successful And sat solvers are not only used for dependency management. They're widely used for a EDA or for routing on FPGAs with millions of variables and So following this example Let's say that you have in not a or b and not a or see if a equals true the only way that That expression is gonna be true is if b and c a are true and That's the solution and it means that that expression is a satisfiable So once that I set a to true The first clause became false or be and the second clause become became false or see The process of determining that b needs to be true and c needs to be true is called unipropagation And that's something that I'm going to cover in a few slides Okay, so how do you sat solvers on package managers? This is kind of what Leavesoft does You first need to Translate your package dependencies into what is called a disjunctive Boolean clauses. So only using or's And I here I have a few examples. So you have a dependency like a depends on B and C you will say Not a or B and not a or C because if you think about this and If a equals true, it means that it's installed in Then B needs to be true and C needs to be true and that's basically a reflecting that depends a relationship and the conflicts a relationship is a Express with two knots because if a Equals true and then not a equals false. So then Basically both cannot be a true at the same time In the last one, I'm saying if a depends on B and there are two versions Then you will use two expressions the one saying well a depends on both B1 and B2 and Then you have an and and another clause when you're saying that B1 and B2 cannot be on the system at the same time and You then translate what you want to do a In bullet losses or install a is chose a remove B will be not be You solve and then you're going to get either a Solution saying this is so bubble or or non-solvable and then you get a list of transactions that you need to apply to Get the system to the correct state And So lips off is a specific type of set solvers because they're different types Is what it's called a conflict-driven class learning set solver some helpful And and this is how it works and don't worry if You don't get all of this. I'm gonna work. I'm gonna walk you through a An example and after this live So basically you start saying you start doing a unique propagation first to make sure that You're in a good state and then you initialize something called the decision level to zero and Then as long as there are on assigned variables You do an assignment and this is where the package manager heuristics Coming to play so you will for example favor to leave packages installed or try to favor And the highest versions of available packages And after you assign a variable you increment the decision level You do you do you need propagation? and And you keep track of everything that you're doing on an implication graph If there's a conflict and you go to your graph try to find out when the conflict happened You then add a clause that is the negation of the assignment that led to the conflict and you backtrack To the decision level before the conflict happened and so if you think about it a Conflict driven clause learning sat solver is is a backtracking algorithm. That's why it is Okay, so this is an example. So in this example, I have a package a That depends on X an X is provided by two packages So it can either be provided by B Which has a conflict with D or by C and So And if these installed What you will expect is that a X will be provided by C the one that doesn't have a conflict So you will end up installing a A and C And an insolver notation, this is how it looks like and the first clause is saying a depends a on B or C Then you're saying a B has a conflict with D and I want to install a and Let's work through through that expression Okay, so you first a set a equals true because a is what you want to install Then you do you need propagation So if you set a equals true and you're going to end up with the first clause being not a false and Then or B or C and the rest is the same and then the last clause is going to be a True you then select a variable to set and that's when a you use package manager a heuristics so which is D because these already installed so you set the equals true Then you replace D on the expression And The first one is gonna stay the same then the second one you're gonna end up with not be or false The third one is the same Then you do you need propagation because on the middle expression the only way that middle expression can be True is if B equals false With that you end up with this expression and Where the only way the first expression can be true is if C equals true and and with that you have a Solve expression that is satisfiable and is that satisfiable with the values of a equals true and B equals false so it stays uninstalled a C equals true and D stays Installed so basically that's trans translates to install a and C which is the correct solution and So yeah, so solvers are great and there's one more scenario that I want to a Walk you guys through that I think is a Shows the power of of sat solvers. So in this scenario I Have a package a at version one that depends on B And then I have package B at version one and I Say let's go ahead and install a so as you will expect you will install a and B version 1.0 But now let's say that you add more packages So you're gonna end up adding a new version of a 2.0 and two new versions of B and B version 2.0 doesn't have conflicts but B version in 3.0 has a conflict with a So you can think of it as as this You have one version of a 2.0 and two versions of B But the higher version has a conflict a with a so this is this is what happens and With the internal solver of a package and a is correctly upgraded to version 2.0 But B stays at 1.0. It's not touched And and this is the same thing that the package does it will upgrade a and it will leave B Untouched a A package a with leaf solve a is smarter than that and it will upgrade B to 2.0 Which is arguably a better in end state So such solvers have the power to to be able to to figure this out Adhoc solvers They try to just a favor higher version. So they try The highest version of B and then if that's a Non-installable they give up and say well, okay, then I'm not gonna touch B and so with all this and kind of what I'm saying is that and Use the leaves all back end If you can spare the 500 K a Is great. It works really well and and on the code base a I'm De-prioritizing fixing the errors on the the box on the internal solver Because lips off just work so well and with only it's the default So if you have a build avoid by default is gonna be including Lips off so future work and I'm terrible at drawings So if some of you want to collaborate and are good at logos and that'd be great and the build system and Right now is based on auto tools and is it could It could use a Modernization so I'm thinking CMake and Right now I think is is overly complicated and Error handling in a package it's Is not that bad, but it could it could be better I think this that that's an area where a the code base needs to be Improved and The sister repo to a package or package tutorials where a lot of the different utilities exists and I Think that repo is the one that it probably needs the most work and it's It could use some cleaning and Date yes, I like it a lot a With a With a language that that I just show you Is very easy to you know, whenever I get a bug. I first go and try to replicate the bug in the ATS and Make that test fail then I will implement my My fix I will run and make sure that it works So so it's great, but it's not it's not exercising a lot of different config options So that's something that could be improved on and Right now there is no website. That's something that's been on my radar for for a while and and Also, I want to talk a little bit about a open WRT and I'm gonna do that in the next slide and but the last bullet point. That's the link to The open bugs and we have a little bit over 40. So that's another great place to to start if you want to engage with the community and with the project which Which would be great So open to a WRT and That's a as you probably know Alino distribution that runs on routers is very popular and And they run a whole package But they basically forked a package on a very old version and then they're doing cherry picks and Back in 2016 there was an effort by Floreen a grandi to make a always old package a Support all the things that open WRT Required so I work with him We putting a bunch of package, but in the end the effort was not picked up and by the open WRT community and and I Feel like there's duplication of efforts every time that There's a split it sounds to me like a like a missed opportunity So I'm not sure if you guys saw this but There this was all over the news. There was a security vulnerability on the O package open WRT fork and The vulnerability did not affect a OISO package. It was on specific code of open WRT and was pretty bad It basically and allowed you man the middle attacks that will let you and bypass a Consistency checks and so you will be running a arbitrary code on on on your router And and the first thing that that I thought when I saw this is if we were on the same code base and Maybe this will have happened so and If there's someone from the open WRT community and it's interested in emerging the code bases and please reach out and I would love to do it. I think some of the Specific things that open WRT need Like being flexible on on on what you link to so you get small binaries is something that we can work on and so yeah, if you're interested, please reach out to me and With that and let's go ahead and go over questions. Thanks again for Listening to my presentation and hope to see you in person at some point in the future. Hello again. Thanks for Watching my presentation. I'm not gonna go over and answer some of the questions that you guys put on the chat window So the first one says How does a package keep track of the install packages? Does it maintain like the package a database containing all information about install packages? Yeah, the answer is yes. It keeps It actually uses the same format so it stores in a status file and in that file you have Package metadata and it and its state So it says like if a package is installed or if a package is half installed And I think it's using a pretty much the same Format as the package and also when you do a package update or package goes and it gets all the metadata and For the different fees that you have configured and is also stores that metadata in in files in the same way as The package or APT. Okay So the second question says Why not give the option to build a package without the ad hoc solver? How much memory will that save? So effectively when you configure a Package to use leaf solve you are not compiling the ad hoc solver So I didn't mention that on the presentation, but but that's how it works. So if you use a Configure dash dash with leaf solve and You're not compiling the the internal solver Which makes it a real pluggable architecture. So if we wanted to add a new solver You know, you will not be paying the price for the Leaf solve or ad hoc solver a hook ins This next question says has you considered mess on for the build system? Hmm And I have not I'm I'm pretty open to use to using something more modern over what we have and so I'm not super familiar with mess on I more familiar with CMake, but I'm certainly open to and Collaborate and use something more modern. So and So, so thanks for the tip. I'll definitely look into mess on The next one says and I think open WRT changed the package format Won't that be a problem to match them? Hmm. Okay. I was I Don't have enough information to to answer this question I will think that as long as the format didn't change in a very fundamental way and I Will be optimistic. I think I think there are ways to accommodate even changes on the format as long as The control file stays the same if it's just like packaging around it the way And it's being compressed et cetera. That's something that I think we can work on and All package has a lot of different Build flags as well as a runtime flags with the main idea being and That you just fine-tuned to to Whatever you need you make the trade-offs that that makes sense for you. So I can totally see having a flag for for different formats being a thing and So the next question says have you compared a package to? APK as you see now by Linux. Oh, okay, huh. This is a very good question. Um, so I saw that coming on the mailing list. Yes, there's there's some interest on APK and I meant to look into APK to to check was a house. It's a dependency management and I haven't done it yet. And I know APK is super popular for Alpine Linux and So so I think this it's a good conversation to have and it's happening on the emailing list, but we we should definitely Look at the merits and of APK and to see if like it makes sense to have a another backend Sorry, not another backend another Package manager added to the family of always supported package managers okay, so William is a saying that He thinks that open WRT replace are with another tar Yeah, okay, so that would be an easy one. Actually, I think right now a package already has support for the Other layer to be tar instead of our so if that's the only thing that change a issue either Already work or it should be in fairly simple to to get it to work. Okay, so That's all I have and If there are no other questions We can finish the session and thank you so much for a Being here and if you want to continue this conversation and I'm on the slack channel so a ping me and we can talk more there