 cross-building talk and please welcome to the presenters of today. It's really not going to be that good. Okay so today we have an exciting all-day session of cross-building multi-arch and arm stuff. It's gonna be great. So I get to start. So the idea is we'll have a talk now where I basically present what's the current state of affairs and what works and what doesn't and how to use the all this marvelous new tech which kind of works and then the next session I want to talk about some of the things that are still broken and what exactly we think we should be doing about it. There's lots of things we could do and there's a question of how much extra work to impose on other maintainers what we think we can get away with and so on. So I'll just tell you a bit about how we got to where we are and the current infrastructure that's been built to give us some idea of what works and what doesn't. How the new multi-arch cross-building works as opposed to the old cross-building ways of doing things and some details about the mechanisms involved. All the things which still don't work and why where we've got to and then there's a kind of sub-talk which is basically Johannes Shower's bootstrapping work. So technically bootstrapping is separate from cross-building except that in practice it isn't so they go together quite well. So for those of you how many people here know all about this already? About half. Okay so we always have to have this at the beginning because the terminology is incredibly confusing and anybody who hasn't done a lot of it will be confused. So this is the GNU terminology for build machines and host machines and target machines. So when we say host we don't mean the machine you're building on we mean the thing you're building for. When we say build we mean the thing you're building on not the thing you're building for. Obvious isn't it? And target you only use when talking about compilers which build code for a particular thing but can build it on another architecture. So most of this is all about build and host. Host is what you're building for build is what you're building on. So there has been cross-building support in Debian for many years. Roman Hodec started Deep Package Cross in 1997 which was a horrible hacky script kind of worked and that's been developed by various people who've taken an interest over the intervening 15 years and we still need it although much of what it does is becoming irrelevant. And Debian has been providing cross-toolchains for many years and that's still the place to get tool chains that work on Debian. The major other part apart from actually having a cross compiler is being able to install the build dependencies in a cross aware fashion and apt cross was the first tool for doing that some time ago. I wrote a shell script and Neil made it into some proper software. That had various issues. It's complicated and it mostly broke so we've had various other attempts since Xapt is a much simpler approach to the same problem. PD build cross is a mechanism for wrapping that in a charoute and building things. Meanwhile over in Ubuntu land Colin Watson wrote Chromios build because they needed a way of cross building Chromium OS and that got later renamed to Xdeb which is actually quite a useful tool for installing dependencies and cross building things using the Deep Package Cross mechanism. Yeah, I guess I should there's the time to mention that what Deep Package Cross does is takes libraries from the host architecture, the other architecture and then mangles all the files around to different paths so that the cross tools can find them and you can install them on your build architecture and that's how we've been doing cross compiling for the last 15 years. More recently, Lennaro started doing work on this. That's who's paying me to do this so thank them for progress. So there's a set of cross tool chains in which are now in Ubuntu which work pretty well but are built in a kind of ugly way. The details are not nice. And the last year I've set up a cross build demon because I got very bored of building things and trying things and seeing if it worked and then building them again to see if they work today. So I'll show you that in a moment. More recently since Multiarch became available we can now do much cleverer things. The system is a lot more reliable in terms of installing cross dependencies. So S build now understands how to do that. So it's now trivial to just say S build something in a cross way and in principle that should just work. And most recently, Tybalt Gurka has been working on multiarch cross tool chains which are built using multiarch as well as which can build things using multiarch. I'll explain the difference in a bit. So in order to see what works and what doesn't, we have a cross build demon. It's online as the URL. So you can go there. That is currently building all of stable and quantal, not all of, sorry, about 300 packages of. And that gives us a pretty good idea of where we're at. Just to explain how this works, this is actually useful outside the cross build concept. At the moment, as many of you have probably discovered, it's hard to set up build these and it really shouldn't be. Lots of people could use one for all sorts of reasons. So because the official Debian build infrastructure is incredibly complicated and hard to install, I didn't use that. It's repre pro does the repository part. And there's a handy tool called rebuild D which is a fairly stupid build demon. But if all you wanted to do was build some stuff for a distro or maybe a couple of suites in a distro, it's great. Anyone can understand how to configure it. You just install it and it works. I've hacked it about a bit so it understands cross building. That's not upstream yet. And that just uses S build to do all the hard work which does understand cross building. So it's fairly simple. You pull stuff from the upstream archive. I've got a filter set to just do the 300 packages we care about for now. Source packages, that is. Then repre pro has a handy thing called build needing which basically says how many binaries are out of date in comparison to the sources I've got. It gives you a big list. And you just feed that into rebuild D which then churns through them all. And the output of S build goes into repre pro. And it will run around in circles until they're all done. This is kind of tied together. All the little green bits are horrible cron jobs and scripty foo which isn't very nice. I've started an X builder package which contains the kind of glue bits. And that will ideally one day just be something you can just say kind of apt to get installed build D and you'll get something fairly simple which works. That will be nice. We're not quite there yet but it's reasonably usable. And that build D sync logs thing at the end. This is all running on my machine at work at the moment. So I just suck all logs over to the web so that you can read it all online. If anyone's got their browsers up you can have a look at the pretty statistics. It's quite nice. I should have shown a picture on here. So some statistics. As you can see they're not great. So this kind of 99 packages we're looking at here is basically the debut strap set. What you need for a minimal system. And as you can see we got to nearly half of it working in precise before we stopped fiddling with that. And then broke everything again. So Ubuntu does better because more things have been multi arched basically. There's a lot of basic tools in Debian which still don't say I'm multi arch foreign which is code for the right thing will be installed. So as you can see we get 65 dependency failures out of those 99 which basically means I couldn't install the dependencies. I didn't even try and build it. These bugs are astonishing easy to fix. It's a one line patch for every package that needs fixing. Just go and fix them all. As you can see there's quite a lot of stuff that would build if its dependencies were installable. So if anyone's enthused to help with this it's very easy quite rewarding. Fixed that was easy. I'd love some help because it takes ages. So if you want to build things you need a tool chain. That's the easy part. You also need to be able to install the build dependencies. So the point about this is you need the libraries for the host architecture but you need the tools for the build architecture so that they'll run on the build machine. So you want a version of Intel tool or lib tool or auto comp for whatever that runs on the build machine whereas you want lib foo this that and the other for the host architecture. And this has always been historically problematic but multi arch essentially marks which is which for us and we can use that information to just tell apps to install the stuff. There are some exceptions to what multi arch says which we mark in the build dependency headers and I'll show you the exciting table for that in a bit. Deep package cross. So the other thing you need is the stuff where you can't run tests during the build because it's the wrong architecture. You need to cache the answers somehow. And auto comp has a nice mechanism that's worked for at least a decade that does this and deep package cross supports that. The other thing is avoiding running tests which you can't run because it's the wrong architecture either because you'll get the wrong answer or you simply can't run the test. So multi arch tool chains. Now when we say that we mean two different things need to be clear about which is which. So there's the tool chain understanding the new multi arch paths as opposed to the old sys root style lib to lib doer paths. So basically where it looks for headers and libraries. I'm not going to read the list out but we used to look in user triplet blah and now we don't do that anymore. The other half is does the tool chain itself use the multi arch mechanisms to depend on libraries. So your arm cross tool chain depends on an arm lib see. And historically we used to take that mungid about with deep package cross build against it and generate a package called arm l cross lib see arm l cross. There's no need to do that anymore. We can just say install lib see colon arm l and use that seems to me that's how it should work. And that is the work that to bolt has basically just done as part of a G sock project over the last few weeks. That's working. Basically we tested it yesterday as far as we can tell it works. There's some ifs and buts and a very long argument about lib stud C plus plus dev we have for most of yesterday. But once that sorted. So those are already available in a repository at should have stuck the URL in here mdebian.org twiddle tib g and you can try them out. Do please tell us if you find anything bust about that tool chain that will become I hope the default mdebian tool chain. And the advantage of building it that way is that it can be built by an auto builder within the archive. The old way building tool chains is special and there's no way the auto builders would ever really do that. Because it depended on things from other architectures until multi-hatch we had no mechanism for specifying that. So these new tool chains should be something which will eventually end up in standard archive. That is the goal during wheezy plus one. But we've managed to sneak a little bit into deep package such that these cross architecture dependencies will work in wheezy. So you will be able to use these cross compilers before they're actually official and kosher. The auto caching mechanism. So this is just how it works. Deep package cross provides a load of files. Cross config dot cash for the generic stuff and cross config dot arch for things which are specific. She contain lots of incredibly boring information like how big is the float on this architecture. And then lots of I don't know what GL is. Some library or other does it have this function or not. Where auto comp would normally run a little compile a little program run a test and see if it worked. If it can't do that we just ask it. Now the problem with this information of course is that it can go out of date if nobody's maintaining it. But by having a centralized list it should be easy to keep it right. And in general this mechanism works just fine. You'll find you can cross build a lot of stuff and it appears to have worked. But in fact for example if you haven't got this right you'll get no job control in your bash. So you know it works but job controls quite handy. There are quite a lot of variables not set in here that probably should be. You need to go through builds and find out what's actually broken. So one thing in the last couple of years of cross building work I have cross built an awful lot of stuff. I have tested the very little of it actually works. Once we're at a stage where you can build a whole image we will need to do a lot more testing to prove that the stuff we're cross building is in fact churning out things that work properly. And if anyone's interested in that we have a load of ideas and nobody's done any of the work needed. You could compare the elf layouts and see whether you got the same parts and check for bits that are accidentally parts of foreign architectures and so on. So we have all the core pieces but to make this work smoothly within debut and you need a whole load of extra bits. So we have build essential for building things which ensures that you have a compiler and make and the whole load of basic tools that you won't get anywhere without. We need the same concept for cross building to ensure that you have a cross compiler for this architecture and across libc and across libc++ and cross package config and so on. So for a long time the set of packages you needed to cross build in Ubuntu and Debian has been slightly different because the tool chains are coming from different sources and slightly different names are used. And that's very annoying for a package like S build that just wants to be able to say install me the cross stuff please. So S build now installs build essential arch name and expect that to be present and to gloss over any differences between distros. And obviously if you're using this for your own work you can just provide your own cross build arch which will say you know if you're the C lang weirdo then you can just say I need C lang for my cross builds. Some packages exist in Patrick McDermott's repository. They're not anywhere very useful yet. They will be soon. It's trivial to make. It just depends on six things. The other problem is that because of the way the GCC defaults mechanism works if you install a tool chain you don't actually get a command on the new API GCC. So you only get on the API GCC version. So every single auto contest will fail and say you have no compiler. I can't compile anything. Technically it's correct. So there has to be a cross GCC defaults package or something that provides that link. So you can just make it yourself. And one of the questions for later is which package should provide that link. Exactly. I have wondered about this and I'm not quite sure. The tool chain people think it should be a GCC cross defaults package and they're probably right. But at the moment you just don't get one on the Debbie and stuff and it doesn't work. Cross package config is perfectly capable of doing cross stuff. But you have to call it in the right way by calling it as triplet package config. So it knows which paths to use when looking things up. Otherwise it will look up your host libraries instead of your build libraries instead of your host libraries. Again there's a question of where should. So there's a wrapper in the package config package which you just need to call as triplet package config. And again the question is where do all those links come from? Which package provides them? Because you get one per architecture. So it's a package per architecture containing a link. And you go. But I haven't thought of a better way. And this problem will apply to quite a lot of tools. Anything which does architecture dependent work needs to be called as triplet tool is the only way to make everything work properly everywhere. And the kind of question is where are we going to produce all these useless tiny packages from? So multiarch cross dependencies and how it actually works. There's this annoying auto convention that you don't do it for the native version. You don't call x86 underscore 64 Linux GNU GCC every time you run GCC. That would be lovely. That would improve my life dramatically. I wouldn't have thousands and thousands of if natively building run GCC otherwise run triplet GCC. The triplet GCC that's not necessarily an auto conf default is probably just that the auto confs are very old. A lot of the auto conf rebuilds and things like that and the auto recon a lot of that work will actually allow the triplet to work even with a native completion. So yeah, there's a lot of places where you could always call triplet something and it worked. That would make things more symmetric. And of course the whole point about multiarch is that it's nice and symmetric. And that's one of the pieces that's missing. So yes. So for people who haven't been following multiarch in detail, it's another fine example of confusing terminology. Packages can be labeled as one of three things. If they're multiarch same, sorry? Four things, including nothing, you mean? Okay, yes. You can say nothing and generally things will behave as they did before or correctly. If you want things to be co-installable where things in practice is usually libraries, so if you want to be able to install the build version and the host version side by side, you need to mark the multiarch same and make sure that none of the files that are different have the same file name. In general, that's a matter of putting libraries in user and then you can do that with a libsdriplet. Anything which you just want to run and it doesn't matter what architecture it is, you just need it to work, so make or Intel tool or orc, all that stuff, that's a multiarch foreign package. And the main reason most of it doesn't work at the moment is because an awful lot of things that are tools like that are not working, so it tries to install the host arch version and then fails to run it. There are some things which can be either. Get text is actually the most common example, things which contain both a library you link against and a tool you run and it depends on the package you're trying to build, which of those two functions it wanted. So multiarch can't tell us which we want, it just says you can do either of these things and you need to specify in your dependencies which you want. That's basically it. So to use this, all you have to do is add the foreign architecture you wish to build and now you can do up get build depth dash a architecture package and that should install the build dependencies. Quite a lot of the time it doesn't, but when it works it's incredibly cool. This is this scary table of exactly what happens depending on what you specified in your build dependency and what the multiarch field says and the any and native are the exception mechanisms for saying I wanted the opposite of the default basically and there's a set of things which are simply not allowed to have that's wrong. It's a little bit confusing, mostly it only apt people need to worry about this. There is an interesting point about the multiarch foreign. There's about a thousand tools that will be useful for but in fact as Steve has worked out over there, we could just fix apt so that for build dependencies it goes well if it's an architecture all package it's almost certainly effectively multiarch foreign so let's just assume that. Now we don't do that for normal dependencies because that will be the opposite of the existing behavior and everything will blow up. We could change it in the future but we can't do it now. So unless anyone objects violently we should probably just do that and loads of stuff will start working. We have to argue with the deep package people a bit which is a pending argument but unless someone can come up with a reason why we shouldn't we should probably try and push that. It's a bit late for Weezy I guess sadly. Transitive build dependencies this is something you'll notice if you're maintaining any packages that have a kind of a version less library dependency which actually depends on a particular version of the library that we're currently making the default. There's used to generally LibDB and dev used to be an arch all package and the problem is that that breaks the multiarch architecture chain because it now doesn't know which architecture version of the library you wanted so we need to make LibDB dev any so that it then depends on a corresponding architecture version of the actual library. So there's quite a few of those that are bust but I don't know maybe we fixed most of them by now but it's not entirely obvious if you haven't thought about this for quite a long time how that should work so I thought I'd put it in. Things that don't work. Running wrong architecture tools so quite a lot of builds will try and run something they just built and of course when you're cross building that doesn't work. We should stop them doing that that's fine except when they do something really important and life gets harder so quite a lot of things use help to man for example which is extremely annoying if you're a cross builder all it does is run the package to get the help output and then put it in the man page seems perfectly reasonable except we can't do that. Now that one doesn't matter much you can just skip it and you don't get a man page buff but then you're doing otherwise the build just fails or you can use QEMU so QEMU can gloss over a lot of the failures I'm about to go through which is great if you've got one but for example for the new arm64 bootstrap which I'm currently doing there is no QEMU there will not be a QEMU for some time that doesn't help us at all so I'm I'd like to fix as much as we can without depending on QEMU but it's trivial for you to make your cross build essential churrut have QEMU support as well at which point a whole load of failures will just kind of get glossed over and mostly work. Config scripts loads of packages contain a naff little config script which tells you how it was built that's great until you're trying to cross build things at which point you get the wrong answers. The fix for most of these is to persuade the people to use package config instead because package config is declarative and it just works in a cross context. All these things don't and there's interesting questions TCL config is a question of where that should live in multi-arch world that's one of the other things to get clear later. Some packages don't cross install so libraries which run some kind of helper to register plugins or something when you install the wrong architecture version of it it runs its post int and explodes and that's very annoying because you only wanted it for the headers or to link against it you couldn't care less about the stupid plug-in thing so we fixed libglib by basically just saying don't care if it fails. Now I don't know if there are packages where these scripts are important and you'd really want the package installed to fail when it's done natively in which case we need a bit of if I'm doing this natively then can bitch otherwise just it's all right doesn't matter. There aren't loads I went through the list of 300 packages there's the only five I found. They need fixing. There are tools which are architecture dependent. So char a path is a thing for editing the R path in binaries you just built and it is used in reasonable number of builds and the problem is it needs to know about the architecture that it's fiddling with the elf headers in in order to do the right thing and it assumes it's just being done natively at the moment so that always fails and practice we can just not do it. I don't know if it ever matters. I'd like to hope not because we don't like our path anyway. Object introspection is a much bigger problem that does scary stuff with binaries to look at the object interfaces and generate XML foo for I don't know what the hell these people do but it's used a lot at the moment we've been able to just not run it and nothing died but I really don't think that's going to work for long. The output it produces is architecture specific and the binaries it scans are done in architecture specific way gloss over with QEMU I don't know really I think you need a cross object introspector but I need someone who will explain what it does and why and how hard it might be to make a cross one and some help will be great on that. So if you're a packager what do you need to do? There are some cross build package guidelines where we started writing down you know things to do and things to not do. In general if you now there's a handy little d-package since some fairly recent version of d-package there's a little header thing to save you put in the boilerplate that's in under the packages in yourself which sets the multi-arch variables and the host and the build variables you just include that and you'll have the variables you need and if you use package config and auto tools or CMake I build systems that actually work generally things will be sorted out for you and debhelper actually all of these things generally will do the right stuff if you do anything clever it gets harder there's also there's another wiki page Peter Pierce wrote which contains a lot of useful info like a description of the object introspection problem so I haven't got time to go through all these slides because there's quite a lot of it it's very interesting but I just give you a flavor of what we're doing for the bootstrapping stuff so there's a thing called libdose which is what the Debian weather thing is done with by Pietro Abato for how much of the it's basically OCaml tools for examining dependency relationship foo and statistical it's all really scary stuff but we can use it to say in order to build this what do I need and what does the network look like and you can just download these tools and try them out so we have a tool to examine the build system let's get this right source packages that are required to be cross-compiled from minimum build system yes so how many things do I need to build in order to build this set of packages and also checking whether the packages sources binary packages actually match up because it turns out they don't in on the Eric or quantal but we seem to get it right in Debian and the problem is that it blows up the tools if you've got binary that there isn't a corresponding source for it so as well I can never build that I'm screwed so the thing about bootstrapping is you've always got to cross build something because you've got a brand new machine with no software for it you can't do anything until you've got essentially build essential so you need to get yourself a tool chain and a make and an orc and a said and various other bits and bobs so there's always some set of stuff you've got to cross build and there's an interesting question of how much of it you cross build before you go now I've got enough to call this a real useful computer and now I can start natively building documentation packages are interesting because there's lots of those you know damn I need tech and they're gonna cross build tech so these tools will let you do analysis to say how many packages do I need to cross build before I can kind of swap over here's some statistics how many things are required how many things are essential as twice as much required stuff in Ubuntu I didn't know that these tools are really quite interesting somewhere in here was the so yes reduced this so in practice if you're running in a song on stable there's 38,000 binary packages and 18,000 source packages really really slow takes like three and a half hours to do an analysis so pretty much the first thing he did after using it for five minutes was wrote this which basically says make me a reduced set so the default reduced set is essential and required and you can add a few packages and say and I also want this this and this and then it will give you a list of how many source packages and binary packages that is and make a packages set of it and then you can do your analysis on that which speeds things up by order of magnitude so that is the set of stuff which can be built from itself basically so yes here's some numbers there's lots of packages in Debian we knew that a basic required and important set turns out to be 645 source packages which is slightly bigger than I expected and it's possible that these tools don't actually work right yet they are brand new this we discovered this last week but it's quite fun to play with what else have we got that's worthy of no yes so there's the point it takes three and half hours nearly four hours to analyze the whole thing or 12 minutes to analyze the sensible size set and this is for analyzing how many things do I have to cross build in order to be able to build the rest of it so for example for that sort of 2,400 odd packages in the required essential set you need to cross build 55 things and unstable anyhow doesn't sound too bad but the bar that was some slightly different analysis where it turned out you needed to cross 158 things so that's the kind of size of stuff we need to do before to effectively have achieved a bootstrap and be able to work on the native machine afterwards just to clarify you cross build 158 packages and that gives you a big enough set to natively compile the rest that's right this is for bootstrapping a new architecture that's correct so there's now a kind of interactive thing you can push little buttons and say what happens if I try and do this package what does it depend on download it and have a play and get dot pictures most of the rest of this you don't care about yeah so we can generate some pretty pictures showing classic dependency cycles so the part of this is not finished is the analysis of the dependency cycles which need breaking so we can now find them or at least some of them and we need to add the information about stage builds to say I can build this package without the database part I don't care in order to make this linear so that is being worked on for the rest of GSOC and we hope we will have something which can actually run through a bootstrap of say the first hundred odd packages assuming they all cross build haha so yes that's my time up and here's a little bit of thank you very much to people for helping out because I haven't done most of this work other people have done most of this work and I just hassle people so yes it's always dangerous listing names but all those people have definitely been helpful others have to I'm sure I think that's all for now and so for people who are having interest in this subject or thought any of that was relevant we'll be spending another hour or so I have a list of I mentioned some of the things there's a few others what do we think we should be doing about x y and z which packages should be providing what to what degree do we assume QEMU and so on yeah thank you much