 All right, good morning everyone So the EuroBSD con foundation is happy to present Mark SP who will be presenting to us about near perfect packaging lists Thank you. Yes, welcome. I Also like to thank the EuroBSD foundation for inviting me one thing for the hotel and everything Beautiful sunset here as you can see So this is work I've done on open bsd. It's probably not applicable per se for other bsd's for you might use some of the ideas for say getting better packing list in net bsd and free bsd because there are algorithmic spots which are interesting and There are also implementation details that obviously you won't be able to reproduce exactly since we're using pearl I Wanted to start with something slightly different. I don't know if everybody is aware of that yet, but LLVM Pulled the dirty trick on us. They changed their license from Bsd basically to a patch v2. They were somewhat quiet about it Some people know about it, but while it's way too late to change things But it's a bit like the fsf and gcc and jpl v3 all over again Like at some point if we really want to stand for something and be bsd, we should probably speak up and Tell them that we don't like that because once again for open bsd It means that the base compiler is going to stay stuck at LLVM 8 And we're going to have to use everything in ports all over again So if you don't care about it, that's fine, but if you want to keep Political things bsd. You should speak up about it if you can. Thank you. Yeah So a bit of context first We narrated 20 years ago everything from free bsd while that part at least And some point Theo decided that it was good to actually have binary packages that work so we moved from The traditional way to do things like you built you install you create the package from that You try to install it on another machine and you discover that it doesn't work because you missed some files Obviously to what's called in over system stage installs where you put everything in a specific location on your system That's unrelated to the running system and then you do your package from there We called that fake at the point at that time Maybe stage would be a better name, but we did it first. So we got to choose that name Obviously in order for that to work you need to have some kind of manifest list to decide what goes into package It was generated with Somewhat stupid tools at the time like the first version of things Was just 150 lines of of pearl just Look under The local file system figure out which files are new and create the packing list from there Moving to the staging area idea made things much faster because instead of scanning for the whole file system You only need to scan for the things that you just installed So basically you just need to copy every file name and a few things and notations which look about Nothing like what we have these days like you used to have to say explicitly that you need to rerun I'll they config to get libraries and shit like that And on the other side at that point Package create was a bit of mix between Directly reading the packing list and also using said to generate stuff We soon got rid of that shit Specifically because it's not Does not have any semantic meaning like you can put anything in your packing lists and Package create is Basically unaware of what's going on. So it's going to fuck up about every time said is definitely a bad programming language Said and for CPP all that shit if you can put it in the trash bin and do better stuff It's always a good idea. So about at that point. I decided to finally Replace the old package tools from 3b sd mostly for security reasons because Doing stuff that supposed to be secure and that handles lots of character strings is more or less insane Well, I mean you can do it if you have lots of developers that are going to peak At about everything and even then you'll get bugs. So we decided to take it up to More modern programming language and since we had Perlin base. I decided to do it with Perl Perl is actually a pretty modern programming language when you know how to use it and you can do object-oriented stuff So instead of having a flat structure for packing lists I have a fully object-oriented one where you've got the basic element no cats in the room So you've got the basic packing element stuff base class and Then you have some subclasses that correspond to everything that you can put in your packing list like for instance here you've got Just a part of the hierarchy Which is that objects are basically stuff that actually should exist in the file system Which have a current working directory where you can put absolute files for instance for configuration files, but not for anything else Why you actually have methods that can compute the full name from what's in the packing list, etc, etc And then based on that you have even more specific behavior Like for instance a library for us these days is an object in that hierarchy That's just annotated in some cases so that okay. I've seen a library in that directory So later on before I learn anything I will need to make sure that that library is registered with a system so that way we got rid of almost every Single snippet of shell code that used to exist to support packing list Everything is done through that directory hierarchy these days that so that object hierarchy these days so I did stuff to make a list to Let it be able to cope and it grew to 300 lines for libraries and stuff and Also annotations like owner mode everything Still fairly reasonable Then we got flavors and multipackages if some of you are not too familiar with open BSD Flavors they're just a way to record options in two packages But that means that in some cases you're going to get some specific fragments that say okay That part won't be in that specific version of a package because it's only say X windows client and we are building stuff with Auto GUI for instance and Because you could have flavors within flavors and shit like that I had to get rid of all the said stuff and now these days package create is able to Read everything directly it has a Kind of small preprocessor that's very limited and that's only used to expand variables and Read fragments So for instance, this is a modern packing list. I took something from one of my own ports this is an attack and You see variables you can get them about anywhere You see also annotations like the following files are going to be Saturated or yes, it's ready. That's it. And then you get back to normal mode and shit like that and Here you've got this specific marker that says that varies An adjunct to this packing list that's only use When you are not Building the package with no X11 flavor pretty straightforward, right? except that you have to get make packing list make pay list to understand this and Becomes a bit tougher because of the variables because of the fragments and everything So you also have multi-packages The idea is that since the open BSD team is a bit small We need to Compile once and package several times like for instance instead of having to take apart and make file to say I only want to build Very specific plug-in for QT We build everything at once like you build the wall of QT around with postgres around with Sculite around with everything and then you separate it into several packages that you don't need to install all at once it's actually Makes things much simpler for porters In theory at least because you don't have To decide how you are going to split things up during the build process you only need correct packaging information You see the problem because obviously it makes things simpler to build but then I'm left with a problem of having Make pay list script That's going to be able to sort through everything and put the files in the right location So is a Novel old example like this is a who knew the stuff that use that used to be used for our Japanese people to enter stuff And it has support for several languages like Chinese Japanese Korean and this means that you've got lots of sub packages Okay, so those are just the command for each package and the full list and The package name for each of them. That's very simple very straightforward The part that's not so straightforward and complicated for doing packing list is that that stuff actually Doesn't install in the same location for everything Like you have the actual binaries that goes under local base as usual But the dictionary proper are going to go into some sub directory of variable So this means that in order to support that you need to extend make pay list so that it can take a lot of Options because you have to get the correct environment for each package and that includes different pre-fixes possibly for various sub packages so we're probably around 2012 or something like that and make pay list has grown to about over one thousand line and Frankly, it's it's a mess complete mess Like you've got a list of variables that don't make any sense percent in pearl those are for hashes this means that you have Every variable more or less which is duplicated for each sub package in a multi package You've got the option parsing which is completely crazy like instead of having simple options you've got options that can possibly be suffixed by the super caged name and Wow This is what you get when you take So much simple program and you try to grow it until it Breaks more or less Yeah, even the comments don't make any sense like I actually had the data structure called a haystack Which is just a jumbo of things that we put on the side and we're going to try to find files in there It doesn't work all that well Yeah, so it was time to sit and think about it more or less and after much thinking Because yeah everything Was getting bad the the thing is you've got this bad smell which is that When you've got a tool and when you notice that each new thing you try to implement that should be easy Actually gets more and more complex as time progresses because you have to consider everything Then it's probably time to throw everything away and start over from scratch Yeah, this was the situation at that point Kind of works, but you know, it's probably the most crappy code that I had in the tree at that point so it was time to get rid of everything and around 2015 I guess I decided to stop implementing new stuff I told people no this tool is probably no longer supported. I won't be adding anything to it and Of course people Did need to do stuff so Perfectly reasonable people like Stuart Starting having their own scripts on top of make-pair list to do things correctly for complex situation like Python for instance talk about that later So it was really really bad actually I admit that I kind of messed up on that one because it took me Three and a half year to actually replace your tools The main problem is that The basic framework was quite simple. I knew exactly what I wanted to do and how I wanted to do it But there are actually lots of special cases in the old make-pair list so the basic framework of replacing everything was done in about two or three months something like that and then I had to sit down and look at everything that the old tool was doing and figure out a way to do the same thing in a better way and That's what took a lot of time especially since well, I wasn't only doing this. I was doing this as long-running jobs, but Obviously had requests from various developers to implement various things to think so much it like the PB and stuff So it took much longer than it should have I have no idea how I should have done better, but I definitely know that I fucked up a little bit so as For the design there was stuff there was stuff that was done in a wrong way At the beginning that I could do in a much better way now for instance, we all make-pair list predated moving package create to pearl and It had to pass options in its own way But the new package create again is also object-oriented pearl So this means that instead of doing my own option passing I could just take the few methods that I have in package create and reuse them directly for update-pair list So that I did not have to change parameters. I could just use the exact same parameters that I use for package create and Just literally reuse the code from package create to do my option passing The very cool point about this is that now if I want to implement new stuff in package create More or less update-pair list is going to get about Three quarters of a way there automatically. I don't have to do anything. It's going to work just as is Well, of course, I still need to generate the packing list, but the option passing Just down nothing to do it will work So of course you have to not get mixed up because when you are running update-pair list, you've got Two levels you've got the level of the update-pair list thing proper, which has its own state and then for each Pack English that you want to create or recreate for multi-package setting You're going to have to have a specific state for that packing list The actual code looks like this. I didn't take anything out So in order to process the next package, we're going to create a new playlist reader That one is just going to have a global state somewhere and The local state for this package is going to handle the options by itself There is some maintenance stuff that says okay, I have a Full list of paylists. I just need to Say for this package, this is your packing list the stuff read from the disk and This will be the new packing this that we generate and Then I actually call the code from package create to read the fragments to ring the packing list It's the exact same code. Nothing changes and then when I'm finished when I've done this for every super cage I'm done. I just have my data structure. I don't have to do anything In make paylist that's actually different from package create for reading existing packing list. That's cool So afterwards the global structure of the new tool looks like this First I read the old packing list Then I process what's in there In order to decide which files and which directories I know The idea is that if you have a multi-package structure From time to time when you update your packages You're going to have new files and new subdirectories We actually don't have Any specific support for that Like maybe you would say okay I want to say that if something follows that specific regular expression then it's going to end up in that Package. Nope. We don't need that. We just look at what was there and if we've got a new file We say mmm. It lives in that directory. That directory was actually owned by that package so it's going to end up in the same location and that's quite enough for 99k out of 100 more or less as far as I know Then we actually do the scan of the file system with a somewhat more advanced scanner but it's more or less fine plus Look at some file names plus possibly run some code Like up to them for instance to figure out wherever this is really a library or whether it's just something that masquerades as one for instance We actually copy the objects that we know about in the correct packing list since now we have everything from the file system and Everything from the old packing lists And Missing tags it's because you have some stuff that does not really exist in the file system I'll give you some examples later like for instance configuration files you who usually have a sample configuration file somewhere and you want to install it under slash tc and In that case you get a sample annotation and that stuff doesn't exist in the install directory You have to just say okay the sheet should tag along the file that it's supposed to sample Plus a few details to finish because you know everything is lies in the details You still have to to fix everything at some point So that's it what you have here is the full list of what update playlist does this Let's talk a bit more about the preprocessing part of package create because that's really interesting part you have variables in there and You have files on the file on the file system. Yeah pass and you actually want to be able to figure out Whether there was a variable that was used for that file that you want to use all over again So I actually change a bit package create to support update playlist. Yeah, I know There's just one single line in package create that says okay. I've read this line from a file From an actual packing list and I'm going to substitute variables The cool thing is that I can actually store both versions for update playlist I can actually say okay. I've got the full path with everything substituted correctly And I also have the unsubstituted version where you still see the variables So that you can stay consistent like if someone has used some variables In their packing list, you're actually going to keep those variables and not have to do Any guesswork in most cases because you already have all the information in the store packing list This is probably the most complex part of the new update playlist. It's called reverse subs The old version and the old make-back playlist was a complete mess Like you had To sort your variables by hand say okay. I'm going to put this variable first So it's going to be reverse substituted first and then the other one The new one actually takes a global approach Like first what we do which works very well is to take Each variable contents and sort them by length So that for instance if among your variables you got something that says Libs slash Python 3 like slash something it's fairly long. So it's probably going to Be picked at first for reverse substitution. And then if you've got a much shorter variable Like just the Python version that expands to free Then it's almost never going to be seen because you've got a longer variable first which is cool and This is the place where everything does not quite work automatically So we actually have some specific options to avoid stupid substitutions Like for instance, you can have variables that substitute only at the start of paths Like slash a TC for instance because you've got lots of a TC in the middle of spas otherwise also Suffixes for files like Python compiled files doesn't happen all that often but to be safe Some variables that are so difficult to do automatically That actually we won't ever add them to an existing packing list We will but we will still keep them if I already hear obviously And even we have variables that expand to nothing which is a bit weird because you've got this path and At some point you're going to decide. Okay. I want to Introduce a variable in the middle of there or keep it there if it's already there and whatever That's more or less Python That's a big problem. It's only there because of Python whenever we finally get rid of Python 2 That would be a nice day So this is our Python looks sorry for the small font, but the full line who don't fit otherwise like You've got several fun things in there Like mod pi pi cache is one of those variables that can actually expand to nothing I Don't remember whether it's on Python 2 or Python 3. I guess that Python 3 is probably more complex So you probably have a cache on Python 3 and not on Python 2 something like that And you even have That sheet line number 4 which starts with a variable that actually expands to comment More specifically package create is pretty picky about things Like for instance, if you have twice the same directory, it will just simply refuse to create the package And you can see how this works because here You've got this stuff with oops, sorry, which will expand to nothing because of mod pi pi cache and So this is the exact same directory as the directory on line 2 on some version of Python So in that case you have to add a comment So that one of the directories is going to one of the directories is going to vanish and In the end I actually wrote some specific codes In update pelis to be able to handle that That says okay, you've got this variable which expands to nothing or comment depending on the Python version and Whenever you see twice the same directory You should actually Add that variable at the top so that you don't have duplicate entries in your packing list It's actually fairly straightforward. It looks like This more or less at the last point before actually write stuff to the resulting packing list I just check If the entry I'm trying to write Might be a candidate for comment and if I have actually seen it already Then this one is going to transform into Variable for comments the actual directory name So it happens automatically. You don't have to do anything mod pi comment Is supposed to come into play without you having to do anything How much time do I have left Okay Just one time So the mod pi comment that expands to nothing Also, we got some Specific stuff to handle local base and prefix Because both variable Most of the time will expand to the same thing In case you have a prefix in your packing list which sometimes happen You don't want it to be replaced by local base or for dead pelis to say it's ambiguous I don't know which one to choose So it will always prefer prefix to local base. In fact Let's talk a bit about the file system parser. It's not that complicated More or less you take any entry in the file system and you're going to pass it through a list of recognizer a bit like brokers, you know and The first organizer that actually says, okay, this is something that I can handle you stop there Fully straight forward just simple object oriented structure so that you can add to it right now everything is centralized into make perist, but I could actually say okay, you can extend it if you want for specific tag types We didn't do it that way because of the extra complexity for nothing It's just simpler to have everything in one single location But in theory if you want to make this extensible it's already extensible more or less So each object in the file system is going to have its own type like libraries like even some files that Won't really map to specific types in the packing list. I'll talk about that later The cool thing is that you can even have some specific behavior for some Objects when you are doing substitution of variables like in openBSD We've got this whole mechanism where we take control of every library Sonam and version numbering Because it changes when you change compiler obviously so we can't let upstream do things for us and This means that you've got lots of various variables that expands to almost nothing like Just a version number see but these variables have specific names and Because we know the type of the libraries we will only consider one of these variables for the specific library But we want to substitute So you can do this kind of magic. It's Specific to openBSD, but you could adapt it to about any system There's also a bit of code specific for resolving some sim links because when you do your stage install Everything is going to point to nowhere Because the sim links refer to the final instead location without the destination directory So you have to actually resolve things by hand in order to get to the actual file For performance reason recognizers may leave that around like if you are running object dump to figure out file type and Object dump tells you that it's not the right file type like okay This is not an executable, but a shared object Let's keep that information for later so that we don't run object dump all the time for the full file system That's about it. That's very straightforward code. This part is very easy to My opinion is fairly easy to read and fairly easy to change if you need it The last thing about perfect packing each generation usually is how do we sort for everything? The thing is in many many cases you are not talking about no original port, but you're talking about updating something and The bigger it is the more painful it is to update so you might want to wait for a bit and then changes accumulate and you realize that it's complicated to build and part of the file structure changes and With your tools it was even more horrible because you would end up with a packing list that didn't make any sense that you had to fix by hand So these days it's much better But more or less I decided that I wanted to keep things sorted based on actual file names like for instance if you introduce variables in your packing list It won't it won't change the order of anything Because it's usually what you want you want to be able to compare the old packing list with a new packing list and That's really important This leads to some amount of jitter for Python for instance because if you have packing each generated for Python 3 and you regenerate it for Python 2 The file names are going to change So obviously this order is going might be completely different You also have stuff that is position dependent in the packing list Basically when you read the original packing list, you just look at stuff like Option no comment and shit like that Sorry option no checksum and shit like that or simple and just put it in the right location Because it should take along with this file and update playlist is going to tell you okay I've seen this simple, but I don't know what it corresponds to anymore because this file is no longer there So you will have to do something. That's the one percent that doesn't work Another example of an actual packing list. This is part of PHP So you see you got lots of Variables everywhere. You also have some state annotation like which group you're going to install stuff as There was a choice to make like wherever I'm going to try to group things together according to ownership or group things together according to file system. I Decided to go with a simpler approach, which is to keep the file system structure straight and Well, if there are lots of modes and the owner annotation, that's it. It's not really a problem It's still the possible to read through it And you've got an example of sample for instance, which he's going to tag along As near as it can to where it should be Comments are somewhat tricky because They might correspond to files or they might not correspond to files like you've got this sheet that gets installed And you actually don't want it in your packing list So you usually keep them Sortied along with files if they correspond to files so Things might jitter a bit. This is the most tiring part about the new tools, but I've actually run it in most cases and Since I have big machines to test things on I was actually able to regenerate every packing list from every port and Check more or less that everything works And then waited for my fellow Beniz to check that everything actually worked a few details I Did some tax system for Specific files recently and it was very easy to add to update playlist So I guess that for now this tool might be good for another five or ten years. I don't know maybe It looks like it should work And there are also some new improvements like we are now actually building packages as a separate user So update playlist or as ready to be actually privileged separated the file system scanner runs a speed build The code run as the user that's going to write Stuff in the package directory and as a bonus we can even run package locket So that when you are generating a new packing list, it's going to tell you okay This actually conflicts with the existing packages and that's it So I guess this is about all I'm just going to finish on the slide, which is completely different Society project that I did with open BSD. We actually have nice background pictures these days thanks to fellow developers so if you Look it up on github. You can use it on anything even non open BSD if you want and we've got 500 nice pictures Like stuff from Christophe's for instance who does take diving pictures stuff from Carol Beach of we did lots of stuff in the north and even some from to that took some convincing But it did some fun stuff Okay, so that's it. You have any questions Us on Mike, please Surely must be questions Have I've lost everyone? Maybe if not, well, thank you very much mark