 Okay everybody, this next talk will be by Helmut Krone about how to actually create a new dev and architecture bootstrapping thing without hiring those two up front to spend a couple of weeks and doing it. So Helmut. Thank you very much. So yeah this is my first step on talk, I hope I'm doing it well. There are really many people who have contributed to this topic, so I tried to put them into the footnotes instead of pronouncing them and making a fool of myself. The talk will contain quite a bit of detail which I hope is useful to some of you, so don't get scared by that detail in some questions, situations. Still if you have questions for understanding particular things where I'm too fast, try to interrupt and ask. For discussions please refer to the end or to the both tomorrow. So the first question we should answer over here is what is an architecture bootstrap? So what it means is if we have a new Debian architecture and we want to have the initial population of the Debian archive for that architecture then we refer to that as a bootstrap. It's a means to obtain the initial set of packages. And in this talk we are not looking at the native phase, we are only looking at the part which finishes when we have the built essential base system. Go on offline? That was too long. Great. So once we have built essential packages then we are done for this particular talk. The first question when thinking about bootstrap that may come to your mind is where should we care about it? Why is it important to work on that particular thing? So a good reason to do that is looking how often we do it and it's about every year for the past 20 years. So it's quite a bit common. The most recent bootstraps have been on 64 and PPC 64 EL and there has been an auto archive bootstrap of open risk and we're going to see MIPS 64 EL being added to the archive maybe. And there also are ideas to do a risk V bootstrap and then there also is the Muzzle C library which may turn up as architectures. So it's actually quite common to bootstrap things and we should maybe automate that. Another good reason to do bootstraps is that it actually gives a bit of freedom being able to build the base system from source again in contrast to relying to the built essential system for every architecture built once and only being able to go from there. So it means that we have if we have the built essential packages for one architecture then we can somehow obtain with a bootstrap method built central packages for another architecture which currently is rather difficult. Then if bootstrap was more automated then we could even have more architectures for like optimized builds and maybe x32 comes to your mind which is the optimized architecture for MB64 with 32 bits and other architectures and this may become more feasible to do. And also if Debian is bootstrapable all the time then it may become more attractive as an embedded distribution which tries to tailor a package set to a particular application which Mdebian has been doing and it's maybe now getting better with a more integrated bootstrap experience. So when I got interested in this topic about two years ago I started looking into a sub architecture bootstrap and figured that I'd write down what I experienced and then I figured I could do this in a machine readable way and then there was a little script which is now called reboot strap and it developed into a QA tool which just tries to bootstrap an architecture from nothing but binaries from a built architecture. It's currently running on Jenkins.Debian.net occupying some course over there. It's running for 20 architectures and we do that every day. I think each architectures tried about once a week or like that. It currently cross builds about 100 packages. This is not the built essential base system. It's some part of it. It's an initial set and it needs to get more packages built. So it currently cannot be used to obtain the base system but it gets a bit into that direction. In the process of developing this thing, 190 bugs were filed and thanks to lots of maintainers applying these patches 120 are already fixed. So it's not so much about having a tool that does this bootstrap but integrating all the fixes that arise from that into the archive, into the packages that are being built. It's very great to see the support and various maintainers, in particular the GCC maintainer and many others. So thank you very much over there. So this shall conclude the introduction. The remainder of the talk will be structured that we briefly talk about cross tool chains, then about actual cross building because it's a prerequisite of bootstrapping and then about the actual architecture bootstrap phase. Okay, so when we talk about cross tool chains, I'd like to give a little reminder or new about the terminology used over here. We're using the new terminology to refer to architecture names. So when I say build architecture, what I do mean is the machine I am building on. When I say host architecture, what I mean is what I want to run the build package on and target is only applicable to compilers. It says what the compiler is supposed to produce and whenever these three architecture values differ in some way, we're saying it is some kind of cross build. And if you take nothing from this talk but this slide, then you should remember that whenever you are unsure about which of these to use in the package, then host is most often right with some exceptions. Okay, so cross tool chains. The good news is that this is mostly a solved problem. We have cross tool chains in the archive and unstable right now and they can be installed. They work. That's great. This wasn't the case like two years ago. Good. If this is not in a bootstrap setting, there is a little dance involved, which means that first a little bit of GCC has to be built and the G-Lipsy headers and then a little bit more of GCC and then a library package of G-Lipsy and then more of GCC. And once this bootstrap is done, then we have, well, a cross compiler and the Lipsy integrated into that. Currently, there are two approaches to producing cross compilers. They differ in whether to use cross architecture dependencies which are not really supported by the archive currently. But from my perspective, both approaches work and that's good. A few G-Lipsy patches need upstreaming. There's the work in progress on that. So if you try to do this at home, you need to obtain patches either out of cross tool chain base or out of the bug tricking system where they are filed. So for most architectures in this work, there are some architectures for this which this doesn't work as well. So if you see your patch architecture on this slide over there, then you should get in contact with me to maybe fix that. Okay. So from my perspective, cross tool chains are mostly a thing that works and which I use. Let's get into cross building. Cross building also works in principle for years. Some packages have cross build support which is as old as 10 years. And Debian has been doing lots of work over here. At this point in time, you can mostly just do cross builds with the S-builds version in unstable. Although it gets better if you use the experimental version. You still need to pass lots of different flags to make it actually work. But I think that's going to be fixed. If you just build with the package, build package, then it's just a matter of adding an architecture flag and then you're doing a cross build. So that's great. But before we can look into actual cross building, we need to look into satisfiability of build depends. So it's a thing that we take for granted in the native case is that we can install build depends. This is not so easily done in the cross case when we're using the multi-arch facilities. Because when satisfying two architectures, many packages have not satisfiable build depends and it turns out out of 20,000 source packages, only 3,000 can be, the build depends can be installed. There is this nice page generated by the botch tour of Johannes Schauer, which displays problems about packages. And if we try to get a big picture of what the current problems are, then there are basically two things. It's cross tool chain dependency translations, which I come to in a minute. And the other is a general multi-archify more things. So let's look into tool chain dependency translation. We have this transition going on at the moment. And in order to make packages work, we often add build depends with version right now. So when a cross build sees this dependency, then this dependency is resolved for the host architecture. That means what we actually get here for a cross build is it would install the native compiler for the host architecture. But we cannot execute it for the host architecture if it would work at all. So this is not what we want for a cross build. For a cross build, what we actually want is a compiler that produces things for the host architecture, but runs on the build architecture. In other words, a cross compiler. So we need to somehow translate these dependencies. An idea that has come up a year ago is to have a special package called G plus four host, which you can put into your build depends, and which is supposed to ensure that the compiler for the architecture you are building for is available on the system. The solution has a proof of concept in experimental. It's in the GCC cross support package. And your review is welcome as to make that solution work. But please don't upload packages to unstable with such a dependency. Yet it's not finalized. Okay. So now we look into the multi-arch part. So we have been doing multi-arch for like Whizzy and Jesse, and it just goes on over here. There are still packages which are required for bootstrapping, which do not have proper multi-arch support, but which need to have. And this is just doing that. But there also is this funky multi-arch interpreter problem, which needs a bit more explanation. So the common situation where it occurs is when we have a dependency chain from an application package through like Perl module, which is just consistent of scripts, which in turn depends on a Perl extension, which is loaded into the interpreter. Now, in order for this to work, the application and the extension need to have the same architecture. Otherwise, they can't coexist in the same Perl interpreter. Now, let's assume that the Perl extension is already properly multi-archified and is multi-arch same. So in principle, we could be installing the application for foreign architecture. But then we notice that architecture all has a special meaning, which means, well, it is considered installed for the native architecture, which is the architecture of the package. And that means that if we try to install the application for foreign architecture, the upper dependency cannot be satisfied. Okay? So we could think over here that we should be mocking this Perl module with a foreign tag, but that would be wrong. It isn't foreign. We've been looking at solutions for this and not coming up with anything convincing yet. But there is a workaround we might apply, which is turning the package into architecture any multi-arch same, even though it is truly architecture independent. At that point, you may have to install it multiple times for multiple architectures, but it can transfer the architecture restriction through the dependency tree. It's at most 1,600 packages, which might need this workaround. So that is an idea to do. What still is missing over here is why does this affect the bootstrap? The reason is that the upper dependency can come from a source package. And when we have built-dependence, when we have built-dependence, they are treated as host architecture dependencies. And the host architecture is foreign. So in built-dependence, the upper dependency always is for foreign architecture. And thereby, we get this problem. Okay, so now let's consider the satisfiability problem maybe solved at some point. Now we dive into cross-building packages, actually. So thanks to diplomatic, I was able to select 1,000 packages by popcorn and just build them. Took a day. It was great. I used AMD64 as a built architecture and ARM64 as a host architecture because it was recently bootstrapped and has proper support in the config site files. So I was expecting that it would work fairly well. Well was not so well. Less than half of the packages were actually successful. So this runs a deeper look and problems experienced. So the most common problem over here is that packages still use the build architecture tool chain. If a package just invokes GCC, then it gets the build architecture compiler. What has to be done is using a triplet prefixed version of the compiler which encodes the host architecture. When using the CC make variable, one usually gets the wrong compiler. So solving this problem for non-autotools packages in many cases means setting CC to something good. For auto tools, many packages just work. That's great. Another problem is packages which list Python in buildDepends or other like pro. When we just say Python over there, then what the build dependency system sees is, well, I want the host architecture version of Python. But usually when you list Python in your buildDepends, it's actually tool which is used as a script interpreter to process some kind of input and result in further sources or like that. So what one usually wants is the host architecture and build architecture Python. A workaround for Python is to annotate it with colon any over here. So this says like Python is a multi-arch load and we exercise that and say, okay, we want the any version here which most likely becomes the build architecture version. Another big problem is packages executing host architecture binaries. So a package builds something, it uses the right tool chain, most likely, and then executes it as part of the build. So we need examples for that. One example is help to man. So you run a build for tool and then you want to generate the manual page. So you run help to man on your program which gets executed and then produces the manual page. Except that executing the program you just built is not possible in a crossword. So I don't have a good solution for help to man. Another class of execution problems is packages that need themselves. Like if there's some kind of database on the package which is populated initially with the tools from the package. In this case, a solution can be that the package built depends on itself, which needs further thought to not break native case, of course. And then there's the problem of configure checks. So when running like auto tools config, we execute lots of checks. And for some old configs which haven't been regenerated with auto recon for a long time, sometimes we have checks that can be made work for cross but haven't been yet because the script hasn't been regenerated yet. Another class of problems over here is checks that cannot be implemented for crossing. For instance, if you try to observe the behavior of a library function, like how malloc behaves if you try to allocate zero bytes, does it return null or not? And these checks need to be preceded somehow to the build. And this is what is done in a file called config.site. It currently lives in the package cross and is centrally maintained, but I'm not so sure that it is a good solution for long term because it may become a maintenance burden. So that might need further thought. Okay, so let's conclude our exploration of compilation problems at this point and look into the bootstrap. So how does the bootstrap actually work? What we do is, of course, we first build a cross tool chain. And then we need to build the build essential packages. So like the essential packages, then build essential and everything needed for that. Then we can continue building natively in the best case. Sometimes we need to build more cross to break cycles. Yeah, and of course, during the whole bootstrap process, we need to break cycles. What this means is if two packages will depend on each other or with longer dependency chains, then one package has to be somehow built first. But it depends on the others. So it maybe cannot yet be built. And that's what we call a cycle which needs some kind of solving. Yeah, so I already said something about the package selection. So the idea is we take everything which is marked as essential, yes, and the package built essential and then work from that and take all of their depends. So we get like lip see and other libraries. And we take all of their built depends in order to build them. So only those built depends which are not marked foreign, of course. And then build a transitive closure of this. So packages we built depend on may introduce further dependencies and so on. And if we do this and look at again that botched tools generation, then we see that this results in more than 500 source packages, which is a lot. And the beginning we've seen that the tool we're using at the moment is just cross building like 100 packages regularly. And this needs a lot more work. So let me introduce at this point built profiles which are helpful for attacking these cycles. Both profiles are a means to introduce new metadata into the control file and affect the build. It's a way of building different flavors of the package. People who know gen two should think of use flags. This is basically the same thing for Debian. So what built profile can affect is which binary packages are emitted. This can mean adding more packages or removing packages from the build. Depending on what built profiles are passed to the build. It also affects built depends. So some built dependencies can become conditional to a profile being activated or not. All tools that we are aware of were made to support built profiles starting with Debian Jesse. So if you find tools that don't support built profiles, get in touch with me or Johannes Rower so we can fix them. And finally let me give an example for built profile. We've seen a little bit earlier that some packages need themselves during cross build. And we would rather not want to inflict this extra dependency on the native build. So a way of doing that is we annotate such a dependency with a special profile cross which says, which needs to be activated whenever doing a cross build but is not present in the native build so it doesn't affect the native build. The cross profile is not meant to change which binary packages are emitted on any other thing. Okay. So with this in mind, we can look at how we can get down from this more than 500 packages to maybe less. Or it's actually even more 600 packages or like that. So a good way to make the boot strip easier is to put things in architecture independent packages if that in turn means that we can move dependencies to the build depends in depth field because in a bootstrap setting, we assume all architecture independent packages, we don't need to install build depends in depth and we can ignore them for that part. So that's actually a good thing to do over here. Another way of reducing the size of the bootstrap is to use a profile called node check which complements the depth build option node check. So if your test suite needs a special library or tool, then we kind of run the checks during cross build anyway at least in most cases. So currently all cross compiles are built with depth build options node check. So due to the node check building, we can often drop certain testing tools from the build depends and we can do that by annotating the dependency with not node check. It's double negation but yeah, it works. So the default is to have the check dependencies available. Another profile which helps in reducing dependencies but more in actually getting the cycle solved is that we add stage profiles. Which is to mean build a subset of the package. And this runs an example. So for instance, the OpenLDAR package and the CyroSussel package integrate with each other. So CyroS has a module to authenticate against LDAP and the OpenLDAR server is able to use CyroSussel. Which is great until you try to build one of them. So in this case, we can add a build profile to OpenLDAR to not build OpenLDAR but just build the LDAR library. This example is deliberately easy because it has a cycle between just two packages but in practice we have much longer cycles and these are also solved with stage one profiles. So let me do a little excursion for what the bootstrap tool can be also used for. Which is the Muzzle C library. So Debian has traditionally been able to swap almost every part of the system for another part. Like the kernel and kind of graphics system and the init system. But what we haven't attacked thus far in detail is the glib C library. So being able to bootstrap may attack this one. So why does this mean a bootstrap you may ask? It means a bootstrap because the C library is part of the ABI and the ABI is encoded into the architecture. So to do a new C library we need to do a new architecture. So Muzzle Lucid C library. I call it a POSIX C library because it tries to adhere to POSIX to the point of refusing fixes that makes other packages work. Well, it's a decision to do that. The Muzzle architectures will generally be called Muzzle Linux and then your favorite architecture name for Linux. And an example of how this can be useful even if the port doesn't become useful of its own is that due to its POSIX adherence it highlights bugs such as if you forget to include limits.h for something like using uLongmax. So like other kernel ports we gain bugs and compatibility issues over here. So this has been a fun project in the bootstrap thingy. I'm looking forward to how this turns out. So let me conclude the talk with summarizing a few problems that are yet unsolved. So we've seen that the tool chain dependency translation issue needs a proper solution in the archive. We've also seen this multi-art interpreter issue which I'm highlighting again over here. Let me also say... I'll introduce another problem over here. If a multi-art foreign package is currently uninstallable for the build architecture for whatever reason, breakage and sit or whatever then apt is free to pick another architecture for your package. It may end up installing the host architecture package. It happily does that and then you start building and get very weird failures. So it would be good to be able to say that these packages cannot be installed for foreign architectures. We currently don't have that facility but maybe we arrive at a solution for that. And then there is a problem with Perl which upstream never intended for cross-building. So the Perl cross-building basically means you have a box already for your architecture you're building for which has SSH installed and it does configure by repeatedly SSHing into that box and running the checks over there even if you have a cross-compiler. So getting Perl to properly cross-build is a challenge if anyone is interested in furthering that. That would be great. And if we have further discussions on these topics then they are probably on topic for the buff on all these things which starts tomorrow at 11 o'clock for five hours in Stockholm. So let me conclude the talk at this point and ask questions. Thank you very much. So with regard to this issue about using the host architecture for multi-arc foreign wouldn't it be a good idea to have build dependencies kind of have instead of specifying colon any or something like that to be able to specify colon build or colon host or something? Because then you could be really clear you need this package for the build architecture or you need this that package for the host architecture. Okay, so packages already are qualified in the build depends. All of build depends are usually qualified as host architecture dependencies. Unless you explicitly annotate them with an notation which is colon native which means use the build architecture. But the problem with foreign multi-arc foreign packages is that this can turn up in any level of the dependency tree. So it doesn't have to be a direct dependency. You can depend on something and then it turns out later that you have to cross the multi-arc foreign boundary and then you get the wrong dependency over there. So that's not a solution unfortunately. Isn't the problem there that you have a different meaning of foreign architecture here? Because by cross compiling you also have a foreign architecture that isn't able to run things. Yes. But if you for example but we also have some packages that are only available on some architectures I think wine was an example that only is available as 32 bit thing. So if we assume a very strange package that would if we have a very good working multi-arc build depending for example on wine even for cross compiling something and you are on an AMD 64 bit. Then only disallowing foreign packages even for architectures that can run thing might even exclude more or make it impossible. So it might be a better idea to somehow declare that specific packages and specific foreign architecture cannot be run currently and rather for this you would have to some mean to express which packages have something you want to run. We don't have something like this but perhaps someone has a good idea in this direction. So there is a bug against the package which is asking for different kinds of foreignness but there's not yet agreement on whether this should be living in the package and how it should be implemented. Am I phrasing that correctly Gil? So with regards to the fact that this problem can appear anywhere deep in the indirect dependencies it's not a complete solution but for many of those cases the reality may be that this is actually semantically a missing build dependency from the package that the package should actually be directly declaring a build dependency on whatever tool is being executed because for the majority of things that you're actually going to be running from the build they are things that are going to be run from the build system itself and should have a build dependency declared. I understand that to fully cover the complete solution we would need some semantic extensions but it's quite possible that in many of these cases we should just simply add a build dependency that declares the architecture affinity of it. Well that also doesn't solve the problem as far as I understand it because as soon as you annotate a binary package with the multi-arched foreign tag the architecture restriction expressed on the build dependency no longer applies so the direct dependency does not enforce that anymore by the mere application of the foreign tag. That seems like a bug in how we're interpreting those tags. I don't recall what the spec says but if you declare colon native the spec says clearly that when a package is foreign then the inbound edges, dependency edges on that package don't enforce any architecture restrictions. I think the spec is wrong then if you're saying colon native in the build dependency it should be allowed to enforce it for build dependencies. Yeah maybe. It's not a problem which is practically occurring that often because the archive being that inconsistent is actually quite rare. I'm just listing it over here because it sometimes affects it and shows up weird problems that are hard to debug. After time one figures out that one needs to look at the build dependency. You had that list of 1100 packages you tried to build and only 450 succeeded. Do you have a list of packages that would be the most worthy to fix because they would make the most other packages succeed like the one biggest build dependency problem? So there are multiple classes of problems over there so I've tried to categorize them already. One class of problems is the satisfiability of build depends. For that I think where is the relevant slide? It has to be this URL and it tells you very precisely which packages have problems and how the problems are. The URL without all restricts that to the packages which are relevant to the bootstrap. So that's one part of the answer and then I tried to dissect the list of build failures. So one common problem that has been submitted as a patch to Depthelper is that CMake didn't apply any architecture qualification so it would just build for the build architecture. So I think the tool chain bugs are mostly being fixed right now and the lump work is doing all the little packages right. So in these 1,000 packages I tried to cross build not every package is essential to the bootstrap. I just tried to figure out how cross building works over there. So on bootstrap.debian.net if you go to the main page you will find more links including some that do not yet exactly answer your question but there is a link that for every source package in Debian tells you if that package were compilable how many packages transitively it would make compilable as well. And it doesn't exactly answer your question because that answers the question for all of Debian and not only for the cross build phase yet. But it would be easy to limit the package set so that the numbers tell you how many packages from the cross build phase could be bootstrap if that other package became cross buildable now. You had the example with the test suite being disabled for using build dependencies. What about disabling building the documentation included in many source packages and being built at source time? We do prefer moving the documentation to architecture independent packages because it is a measure that works right now and removing the documentation from existing packages like having a library which also ships documentation but being built without documentation is the measure we consider maybe acceptable but it changes the package contents without reflecting this to the dependency system. So we try to avoid that thus far and try to move documentation to independent packages. How do you identify packages built with profiles? Packages which are built with profiles have a control header in the binary package which says which profiles it was built for. Does that answer your question? I want to... The question about the doc packages is abling. I don't see how... You build a doc package and then you need to build... You said that you prefer to have them in architecture independent packages but you still need to build dependencies installed to build that architecture independent package. That's true, but... You can do a dash b, big b build and then it might work but you still need them installed in many cases. We do distinguish between architecture independent build dependencies and the remaining build dependencies. So since the cross-builds only do architecture dependent builds we do not care about build dependence in-depth. You mentioned there are some architectures which are a bit troublesome like ARM hard-float for example. I remember I've also been struggling with that and after trying several configurations with a cross-tool on G I actually did successfully build G-Lib C and everything to have a full... Which version of G-Lib C and which version of GCC? That's the thing I don't remember right now. I just remember I tried several ones. So we recently switched the GCC and this has only been becoming a problem recently because the old G-Lib C we still have in the archive which enforces GCC 4.8 as a compiler does not build with GCC 5 which is to be expected, right? So we need maybe to upgrade the G-Lib C at some point and then this problem goes away. Okay, good luck then. Yeah, so following up on the question I think two questions ago asking the question asking what is there a list of things that if they were fixed they would unlock a lot of packages. I think that's actually, that gets to the heart of something that I think is really important. You see there's a lot of interest in this topic in terms of the number of people in this room how many of us are actually working on this problem today and how could you give us what we need in order to be effective contributors? So like, you had a slide that was up just previously that showed links to the cross.html and cross one is called all.html are those the things that is like the ordered list of things you would like us to work on or is there other stuff that would be useful and a suggestion to you as you make that response can you post something to the mailing list telling us what to work on? There have been several mailing lists on which one. Yeah, we can do that. So the problem with giving good hints is that all of these problem summaries are hard to understand. So the cross all page is not relevant to the bootstrap that says something about the whole archive which is also a nice feature. So the cross.html page says something about the package we are actually interested in. Unfortunately, 90% of the problems are not to be fixed in some packages. So I've been trying to find a file box for the problems I understand and applying the patches is already very helpful. If you're interested anyway in these things then I'll try to do another cross build for more packages at a later point and publish the logs. You can try to find your package in there and well, you can look at the cross.html and if it doesn't look like a problem that I should be solving then maybe you can do that. So I think we're running out of time. Yeah, so my question is basically in the same direction if I wanted to support this initiative should I just wait for a bug report from you or is there something I can be proactive about? How can I find out if my package is affected by that and you mentioned these profiles, Stage 1, blah, blah. How do I know which features to disable to help do that? I mean, I'm totally unaware where I should look for information. So in general, assume that you don't need to disable things and you don't need to add stages. This is a very hard thing to do because it needs a decision in which package needs to be modified and this needs a very high level view of the package. So unless some kind of bootstrapper tells you please add a stage, then most likely don't. And other than that, I think we need to have more QA infrastructure to highlight the problems maybe in the package tracking system or like that. So another archive rebuild will maybe help with that and other than that, do you have any ideas? Yeah, that's probably the best thing to do. So right now we have the cross-HTML for example that tells you which packages are affected but as Helmut already said that tells you about any packages in the list because it's a problem that goes across package borders so even if your package is part of it it might not be the one where it is to be solved which is also why we currently do not export it to the Treco or UDD because it might not affect your package at all and figuring out whether it does means you have to look into the problem and you might get annoyed with having the problem shown up for your package because it's not the one where it has to be fixed so it's very hard to automatically tell maintainers what to do and it needs human intervention, I think. What would help though is that more people could try to sort out the issues that we turn up in diagnostics and sort them into things that are fixable and things that are not fixable. I'm glad they help anyone who is interested in doing that. I think that we are running out of time and that we should postpone the remaining question. Yes. Okay, we can have a small announcement. So just in case the idea of five hours of this fills you with horror we kind of split it up a bit because the first couple of hours are primarily about multi-arch aspects and then from one o'clock is like cross-building aspects just in case you don't want to try and turn up for the whole thing. We're very interested in people's feedback for anybody who has been doing work on this and obviously we can discuss all this wide selection of problems some of which need to be centrally fixed and some in packages and so on. There's actually a gobby document that has the details in if you have a look. So I guess that we will conclude this session now and I guess lunch, dinner, food, whatever is already ongoing and yeah, thank you for sharing up.