 Hefyd, hefyd. I'm Wookie. I'm here to represent the arm port today, although I see a worrying number of people down here who know at least as much about this as I do, so I expect contradiction later. Yes, we have a new port, a new ABI. I'm here to explain a bit about how it works and what that means for Debian. We'll start off with a little bit of history just so you know how we got to where we are, some gory details about arm issues and the ABI, why this matters, why that actually makes a difference to what we're doing, what Debian is going to do as a result of these changes and where we're at at the moment. First, just to be clear, we're talking about the ABI here, that's the application binary interface, so that's bytes in memory and on the stack and how they're passed between things, not the API, the actual, which parameters you passed to things. Fundamentally, this is the C calling convention, how things are arranged on the stack between functions and the important thing about that is that everything on the system has to be using the same mechanism. You can't mix stuff using one ABI and the other ABI. You can technically, but life gets horribly broken almost immediately and you don't want to do that. So, we have to make a switch. You've got to use one or the other. Whilst we're doing this, because it involves massive breakage, it seems a good time to change the kernel CIS call convention because there's efficiency reasons for doing that, I'll cover later. That's not part of the ABI, it's basically entirely separate, we're just doing both at once in one go. It's a bit of history. The wider Linux ARM kernel port was started in 1998 on a Acorn A5000 with an ARM 3 in it. That essentially used GCC's existing calling convention for ARM at the time. So this is about why things are the way they are. It's basically history, what was convenient at the time. The kernel CIS call interface was designed to be efficient by passing most of the parameters in registers because a lot of calls have less than five parameters. That was more or less used the existing Grisgos convention because that was what desktop ARM was using at the time, which I think actually comes from the APCS ARM procedure call standard of a very long time ago indeed. The change was, a couple of spare bits at the beginning in the address to return errors and we don't do that in Linux. The other thing was floating points. Simple enough, if you had floating point instruction to use the floating point co-processor to process them just like on x86. If you haven't got a floating point unit present, and it isn't always, you emulate it. So the Debian port was started a couple of years later in 2000 primarily because of the Rebel Netwinder device. Netwinder people did a lot of the initial work on making things actually go. A guy called Jim Pick I think was one of the main people at Debian to do the work. I noticed his LinkedIn profile says he worked on Debian and nothing else for two and a half years around about then, that would explain a lot. So we've supported various machines over the years in the installer and quite a lot are used without explicit installer support or even necessarily mainline kernel support. People use the Debian arm pile on there with wonderful devices all over the place. So some actual issues about EABI itself. Floating point is one of the most obvious. As I said the floating point thing was fairly reasonable. Use the floating point unit for the floating point instructions. The only problem was that in fact they've only ever been two CPUs in the world with a real arm floating point unit in. The original arm FPA thing which demonstrated the principle never really worked quite right and the arm 7500 FE device which was quite popular in its day circa 1996 and since then there have been thousands and thousands of devices, well hundreds of different designs, none of which had FPUs in them. In fact some recent devices have had FPUs but they're not arm FPUs with the same instructions so they're different. So for example Cirrus produce a thing called Maverick Crunch which is basically just about enough floating point unit to process MP3 efficiently. Intel have their MMX alike for ARM. I'm not quite sure what people use that for. Nothing as far as I know. More recently ARM themselves have specified a new floating point scheme called vector floating point VFP which is in recent chip so ARM 11 primarily that flips chip is an ARM 9 with VFP in it apparently. So something like 99.7% of all the ARM CPUs ever sold do not have floating point. So effectively you've never got it. So you have to emulate or otherwise deal with the issue because people persist in using floating point in their software. I don't know why. The original way of doing it was an emulator. The obvious thing to do the CPU will give you a data abort when an instruction abort when you hit an instruction doesn't know what to do with. The kernel can trap that and it will then set up the registers and then run some code which emulates the instruction and then hand back the results as if it had come from a real FPU. The problem with that is that there's hundreds and hundreds of instructions for every floating point instruction you want it to do just in set up an arrangement. It's grossly inefficient so it's really really slow. The original software to do that was a binary module from ACOR and the FPE which was exclusively licensed to Russell King as part of the kernel port so we could at least do it but obviously that's not very useful in free software world. So the net winder people wrote a free software implementation NWFPE and later FASTEFPE which is the same thing but doesn't worry about the last few significant digits in exchange for a significant improvement in speed. So a better way of doing this is to not bother with the instruction trapping and the setup and all that because you know perfectly well you haven't got an FPU before you start so you compile in a set of instructions which will emulate the thing into the original binary. That's much more efficient. GCC provided this functionality it's called soft float. The problem is that these two calling mechanisms are not the same so they're incompatible. You can't just compile a little bit of code with soft float because you cared about floating point performance and leave the rest as it was. You've got to do it all that way and of course the Debian Arm port because it's old used the convention at the time which was the hard float emulation. Soft float came along later so we can't use soft float essentially. The other thing about floating point on arm is that it has a peculiar format and the endianness of a double so the floating point representation is the endianness of the CPU within the bytes but it's always big endian for the two words. So on a big endian arm that's okay it's both of those of big endian so you have normal big endian representation on a little endian arm the words are big endian but the bytes are little endian nobody else does this. The format is IE 754 compliant apparently but in practice everybody's software goes you do what? This means that anything which manipulates doubles itself goes wrong. So that means glibc fortunately nearly everything uses glibc so once you fix that a lot of software works but there's a fair amount of stuff which doesn't Perl, Mozilla of course numeric, various mass libraries Java, some other languages all go and do their own floating point manipulations and have to be hacked about to support this strange format. Now of course over the last nine years we have fixed most of that software but now we're going to have to unfix it again to say ah well if it's an arm only do this if you're doing things old style if it's new style it's just normal so normal big endian it'll be or little endian depending which way around you are I think. So EABI does away with this we just get conventional formats like other little or big endian CPUs. So there's summary of the significant there are a whole load of little detail changes in the EABI which you don't care about and I don't even know about. These are the important ones structure packing was something that used to catch people out on arm it was always aligned on four byte boundaries everything so even chars you only got one char every four bytes in a struct and of course that caught people out because they expect them to be packed four bytes a word so you'd get four chars in it not an arm you didn't that broke a lot of software that changes EABI now has natural packing so one byte things pack on byte boundaries four byte things pack on four byte boundaries eight byte things pack on eight byte boundaries which again is a lot more like other architectures so more software will just work pretty much the same so that structure and argument alignment really a similar issue enums are a bit odd in the EABI I actually allows enums to have variable type so the E in EABI stands for embedded strictly speaking so it was an agreement amongst lots of arm CPU licensees like Philips and Cirrus Logic and all these people who make chips using arms designs on how stuff should be done so that everybody could agree on one way of doing things and the more embedded you are the more you care about shrinking things down and there was quite some pressure for an enum that only had four possibilities to be represented in one byte and you only wanted to use four bytes for an enum if you really did want to represent millions of things nevertheless the Linux people decided they weren't going to use that because too much stuff would break apparently so GNU Linux is actually a slight deviation from EABI as probably used by other compilers and will remain as four bytes and as I said the floating point stuff is fixed now you can interoperate between you can choose whether to use VFP or soft float i.e. hardware floating point whatever is provided or maverick crunch whatever or soft float emulation depending on what you've got and you can mix and match that code so why would anyone care about all this stuff what's wrong with the old EABI well as I said most of the arm stuff which was arden caused software breakage some of which still hasn't been fixed in the nine years we've had to try and fix it all that nearly everything builds for arm these days but there's a fair amount of software that doesn't actually work still if you start looking in numerical libraries you'll find it's all bust not that we really care so it's just easier people's software works and there's less porting to be done strange bugs don't appear the ability to interwork hard and soft float is important that can make big differences to how fast things go people did some tests and found new ABI was 22x faster on something with a little bit of floating point in it so you're still emulasing but it's 22x better that's worth having for normal code it'll be a tiny bit faster but not a great deal we also get some standardisation we didn't have before now you can build the same binaries at least in principle using arms commercial ADS compiler and GCC or even green hills tools whatever they should all spit out the same binaries and in the past it wasn't true if you used ADS you got completely different binaries than if you used GCC on the same sources and that was annoying also commercial debugging tools will now work so if you're actually a professional developer and you want to go faster and people have cool toys to play with you can still use GCC you can still use that fancy stuff it works with thumb which I'll talk about in a minute probably not a huge issue with indebian but handy nevertheless and interchangeable binaries is again something we free software people don't care about much but it's certainly useful for some people so the point is that binaries targeting palm os linux or symbion os should be the same which means people can produce nasty evil bits or binary blob code which they can sell to all of those people and have it work so it's convenient for binary sellers but doesn't know the difference to us the syscall convention is slightly more efficient I'll cover that in a more tiny speed up the big disadvantage is that all this new stuff is entirely incompatible with what's gone before that is of course a big deal so the syscall convention as I say this isn't really part of the ABI but people have been wanting to change it for a while and now is the time if we're ever going to do it so we're taking a random current this is the interface between user space and the kernel you have to call stuff there has to be an agreed standard way of doing it and we don't want to change it very often the old way essentially put the parameters into registers and then called swise a software interrupt it just jumped somewhere using this table in memory offset by the call number that was one for your instructions than the way we do it now which is to explicitly put the call number into register and then jump to the same place the first one works better on von Neumann architecture which is when you have a combined data and instruction cache because the table is almost always in the cache so in fact you even though in theory you're reading a data table out of memory in practice you've almost always already got it so it's really quick and saves you an instruction but if you've got Harvard architecture which nearly all ARM CPUs since ages have had which is a separate data and instruction cache the old scheme pollutes the data cache and all you're wanting to do is execute some instructions so that's a bit tiresome so the ABI method is slightly faster in theory in practice of course at the moment kernels support both of these call mechanisms and at the moment you generally leave that turned on so if you keep the support for the old mechanism then it doesn't go any faster which is where we're at at the moment so this changed in 2.6.15, 2.6.16 which I think was the beginning of 2006 it had to be a corresponding transition in Glib C up to 2.36 it used the old scheme after 2.4 it used the new scheme just a little bit about the history of where this came from it's obviously been driven by ARM Corporation ultimately as much as possible they've used external open specifications so ELF, dwarf debugging format and standard C++ ABI some internal documents like the AAPCS and stuff and some of the extra bits and bobs and there's also new instructions which are coming out in forthcoming ARM architectures instruction sets which they kind of thought if we're going to do that while we're changing everything we'll bear that in mind and a few new bits so it's a reasonably open specification it is open you can read it all, do what you like some timeline of when and where things happened code sorcery were doing the GCC changes to support all this which obviously is the first stage that was finished towards the end of 2005 the first time you actually get a compiler that would spit out a free compiler that would spit out the ABI binaries GCC 3.4.4 in 2005 a few people so Nokia in the end 770 and Montevista started using that but still with the old kernel syscall mechanism so that's kind of half new ABI or three quarters or whatever you want to call it and then this kernel syscall has changed a bit later and Debian started deciding we ought to do something about this in 2006 I worked on tool chains so code sorcery so we had a working GCC 4 tool chain in the beginning of 2006 but it's still difficult to port Debian because of the problem that you need a working system to build all your stuff on because everything's natively built and cross building everything's a pain in the bum so life got a lot easier once the OE people had managed to cross build a whole working OE setup so now you had a file system enough like what you wanted to end up with that you could actually build all the stuff you wanted to end up with on so Lennart did a sterling job in actually getting all that working in the beginning of this year that's targeting V4T I'll explain what that means in a minute but Lennart's not a DD so those are unofficial packages so since then Ricky over there has set up a couple of buildys with RL and we've been building away for about a month and a half now and that's going quite well so we pretty much have a functioning port I'll come on to the actual status towards the end just a little review of what you need for this to work 3.4.4 was the first thing that would do this but the mechanism changed between GCC3 and GCC4 it was just an option saying build it like this it later became a different GNU architecture so it used to be Linux ARM GNU now it's Linux ARM GNU-EABI G-Lib C you need the right version so 2.4 was the first one to support this properly but in fact it didn't work very well on ARM so in practice you need G-Lib C 2.5 to have stuff that all works and kernel support crossed all blah blah blah CPU versions so now we get on to Debian and what we're going to do about all this one thing you need to understand is what these version numbers mean so as well as all the ARM 7 and ARM 9 and ARM 11 which are core design version numbers we also have instruction set numbers which basically use the same numbers in a jolly confusing fashion unless you follow this in loving detail so the version 3 instruction set was what was used in the ARM 3 RISPCs quite a long time ago the version 4 instruction set was DEX strong ARM which was later taken over by Intel V4T means the version 4 instruction set with thumb instructions and then there's version 5 which is what most CPUs available now are using and there's version 6 which is just starting to come out and we'll have funky new stuff I'm assured the issue here is that thumb interworking is the thing that allows you to switch between thumb and normal instructions and that uses instructions that don't exist on all these instruction sets or do slightly different things so there's issues of which of these instruction sets you can support and therefore which processors and Debian does its best to support everything that's reasonable and we have to decide where to go on this list so the officially ABI spec only goes down to V4T doesn't support V4 I'll explain why in a minute so thumb for those who don't know is a 16 bit opcode set instead of the 32 bit set normally used the advantage is that you can fit exactly the same code into 30% less space if you're a mobile phone manufacturer that's great and you can use a 16 bit wide ROM instead of a 32 bit wide one and save money and wires and routing space oh it's just great 16 bits, 32 bits who cares so there is nothing in Debian that uses thumb right now partly because you couldn't under the old scheme in a sensible way I don't know whether we ever want to produce anything using thumb but who knows but ABIs explicitly allows every function to be one or the other so you can swap from thumb to normal and back every function should you so wish to do that the CPU needs to be synchronised you need anatomic instruction that makes sure you know the CPU's date on V4T the BX instruction is used to do that on V5 it's the LDI LDM unfortunately V4 chips don't have the BX instruction it's not there so if you build ABI stuff every function call will contain a BX instruction that every function call will abort on a V4 CPU the problem with that is that strongarm is V4 and all our buildys are strongarms at the moment pretty much and there's still quite a lot of those in use you can work round that by just checking where the thumb is supported that's what testLR-1 does and if it isn't just skip the BX instruction it'll still work there is a GCC patch to do that recently but it's not tested and we haven't really proved that it all works to our satisfaction so right now everything's being built for V4T so at ABI import we could just stick with the existing ABI it's not broken it's just a bit slow, it supports everything however I think the whole world is moving to ABI well ARM world we are probably going to be forced to follow we've done the work now so we get significant advantages in terms of things going faster and being able to use floating point where it exists and it is starting to exist in new CPUs and also a whole load of software that in fact has never worked properly will start working because of all the weird shit we don't do anymore as I also mentioned we get binary compatibility which allows the use of commercial tools which is useful to some people however there's a huge problem that we have to change how are we going to manage that so there's various ways we could have done this you could just rename all the library packages that's the classic transition mechanism within Debian whenever C++ people change their conventions again we have to go through a lot of pain the problem with that is that every single library package has to be renamed so that you can control the point at which you finally change over which takes ages to manage it was six months for the last C++ transition if we had to do it for effectively everything that uses C it might be a couple of years worth so that means every library package in the whole of Debian would have to be broken for a couple of years just to support the ARM people wanting to change their ABI which I don't think we could down very well so that wasn't going to fly we could define a new architecture and normally architectures are fundamentally different things PowerPC, ARM, X86 ARM and new ABI ARM aren't really incompatible things it should be the same architecture but the problem is GCC doesn't treat it that way GCC claims it's a new architecture all your GNU triplets, all your automake all that stuff says it's different so there's a lot to be said for following that lead and treating it as a new arch the advantage is that it doesn't affect anybody else we can do our transition quietly on our own and you can just have them both side by side which is especially an issue if we're not going to be able to support the old CPUs with the new ABI disadvantages is that if you've got an old ABI ARM machine you can't upgrade it to a new one you're changing architectures, that's a reinstall or I'll say a deb takeover magic which might work but you can use all the archive space in Debian for having everything twice arguably a better solution would be an ABI field which specified the ABI and this deals with the general problem of ABI transitions not just ours C++'s or libpungs or whoever's which has been in the multi-arch proposal for a while unfortunately it's not part of Debian yet so we couldn't use it because it's not done it would have been nice to try finally you could have just said we'll just make a conflicting libc new style libc which conflicts with the old libc and then everything will depend on the new one and you just have to install the new one and install everything else all in one go the problem with that is it doesn't actually work and most of the port would have been uninstallable for probably months we rebuilt everything so that you could actually press the button to change and that wouldn't have given us a mechanism to support the older CPUs either so as you can tell from the bias in that discussion we decided to go for a new architecture it's a bit of a problem with the name obviously the best name is ARM unfortunately we already used that so we had to pick a new one and army L basically little endian arm was picked slightly confusing because there was already an army B which used the old ABI and there still is they could just change and have two called army B which is actually using now using the new ABI I'm not sure what's going to happen there the reason why army B came into being was really because there was a binary ethernet driver and the only way to make it work was to rebuild the whole of everything the other way around fortunately that was reverse engineered so the problem went away so most ARM CPUs can run either big endian or little endian more for conventional reasons and because early CPUs only run little endian there aren't huge important reasons to run big endian there's only really network process of people who spend so much of their life swapping ethernet packets around that it's worth rebuilding the whole of the system the other way around to improve your ethernet packet shifting functionality so where we're at now as I said the Riku's buildies have been chunkering away for a month and a half or so and we now have 74% built which is pretty good it's just starting to asymptote I think we're getting to the stage where it isn't that things aren't built yet it's that stuff doesn't work so we'll actually have to start fixing things again 2 thicus buildies that's all attached to the unofficial building network of Andreas Manages I think it's a fine resource the aim is to have RMEL is that say 15 in Lenny which means qualification the usual release qualification requirements we've got to get 95% of it working or at least built and so on and so on I don't think that's going to be an issue we've got a year or something that should be no problem there is work to do but I believe it will be done there is this question of whether we're going to support strong arm or not and that main issue there is if RMEL doesn't support strong arm and we think strong arm there's still enough strong arms to matter then the old arm port will have to stay around and we'll have to have two for as long as anyone cares about strong arm and that could be a while, there's a lot of them out there or we can fix, we can novel the new compilers to support v4 so the thing I didn't make clear perhaps earlier was that if we put that extra test in to not run BX when you're on the wrong sort of CPU that happens for everything so every function call has two extra instructions so we lose some efficiency there now I'm not sure how much difference that makes in the real world, it might be 2% slower or something, I think we can live with that for supporting the whole world and in four years time we could drop it because nobody cares about strong arm anymore so that is probably the major remaining issue I think oh no, the other one is of course what are we going to do about all the poor people who've got to transition their existing arm machines to new arm but it's not broken but we're quite like to take their port away within a few years so something will have to happen I think research is needed on whether we can reasonably reliably automate that process so people have got a fighting chance of changing everything at once without it all going horribly wrong I think, I understand Dev Takeover is not really supported anymore I don't know if anyone knows anymore about it but in principle that allows this to happen so it's probably much magic and we could try using that that's it, that's the end questions are there okay are there any reasons other than strong arm support why you would want to keep the arm the arm architecture around consecutively with Army L apart from giving people a decent period to change over which is obviously affected by how much pain and aggravation that is I don't think so I mean I have seen a certain amount of what's wrong with the old port kind of thing but to be honest no this is supposed to be a complete replacement Joey has something to say just to follow up on that we could for example decide that we want to do the transition from arm to Army L by upgrading the old system to the new version and then just replacing every binary with the Army L version which means you have to upgrade all the packages first to Lenny and all the arm packages upgrade to Lenny and then you know you can swap in the Army L builds or something like that it might be helpful to have all the Lenny stuff we don't know yet so I didn't understand that, what was that again Joey upgrade the existing arm system to the new Debian release to Lenny and OK and then you say OK I'm going to run some program which just pulls down all the Army L Debs all the equivalent Debs and then unpacks them you have an Army L system so you mean do a Lenny upgrade and then do a Deb takeover type thing however we manage that which would mean we'd only have to keep the two in parallel for one release OK I would just say that keeping them in parallel for one release seems like a bit much to me and that scares me yes I understand how that could be an option but I think it's going to be a hard sell to the FTP masters for my point of view I think OK we should probably talk about it some more I mean there's an arm boff I just looked up 11 o'clock in the morning this week in two days time we want to sit around and discuss exactly this issue how are we going to do this who what my question I think will be what are the alternatives to keeping them in parallel for one I can't see another way of doing it from where we're at now but if there's another way of doing it and we have time then fine I'm just not convinced that keeping them in parallel unless you're doing something evil like Joey always plans to do then I don't see there's actually any advantage to keeping them in parallel other than you're saying that you're not going to support strong arm on army L and strong arm still has a user base that you want to continue providing yes that's currently where we're at now that's the main reason OK another question about the thumb architectures and the extra instructions what happens with inline with this particular patch or has anybody even looked at what happens if you try to do an inline function with the thumb I don't know somebody here must I seem the compiler takes care of it but maybe it has to be a real function call an inline function call can't be the wrong instruction set Simon now the compiler sorts this out so it's just a function pollock and epilogue and if we're inlining we can leave these out so it's not a problem if we're inlining we can leave out the function pollock and epilogue of course and since it only affects the epilogue we can inline things with another calling convention the calling convention or rather the compiler cannot generate code that switches in between a function at the moment I think so I think the answer is you can't do that I believe the issue is on the return from the function call which doesn't apply in an inline function case there is no return because you've inlined it so you don't need the clutch right but that's not v4 stroke v4 t issue that's a generic issue which doesn't apply because the compiler doesn't do that I understand what you're asking Steve but I don't know the answer has the GCC people they'll know thumb interworking means that when you're returning from a function you can either fall into the thumb or into arm code so you have to be in a same state at that point if you're inlining you can check it back to the arm state and it's not a problem after that anything else? I wonder if you could show us the build de graph it might just be that I'm partially colorblind but I wouldn't be able to figure out which line was arm the L and I didn't see one that was rising the way you described I appear to have lost the bookmark so any of someone can tell me where it is so the rising light blue line is us so we've just about I'm doing the microphone and round about then we'll probably have to start doing some work on the things that don't build fortunately there will be things which just fix themselves like mono basically doesn't work on old arm did for a bit once but never really has properly but it does work with new ABI that's one of the reasons we may be forced to change is that the rest of the world's fixing stuff for new ABI now and kind of going oh we don't care if the old ones bust tough so yes I'm going to go to Ricky for actually making this happen rather than the rest of us going oh you must do that I think we're done, anything else? OK, thank you