 And now let's put together Debian Porting Fund for Everyone by P2 and Steve Langasek. Hello, I'm P2 and I will start this workshop by giving an overview on all sorts of portability issues. And Steve will then proceed with this more practical point of view and will also do the workshop, the real workshop where people are supposed to be helping in finding some example bugs or solving some example bugs. I will first cover some portability issues which are general to the C programming and application level programming on Linux systems or also other Linux systems. And I will then go on by explaining some more hardware related bits and also on problems with writing portable code which talks to the hardware. Then Steve will continue with some more practical budgeting. The first thing would obviously be why are we trying to write portable code? There are various reasons to do that. First is correctness. There are C language and other language standard definitions and programs are supposed to be as close to that as possible. And porting to different architecture is a good way of exercising this portability and to verify if the code actually adheres to the rules. Debian calls itself the universal operating system. You obviously can't make that claim if you run only on one architecture. That's rather similar, rather clear writing. Debian is also the most used embedded distribution, although that might sound fairly strange. What's actually meant is that Debian is very often used as a basis for people deriving their own custom distribution for embedded platforms. As you might know, embedded platforms tend to use all sorts of non-intel non-standard AI32 CPU architectures. So portability is quite important there. Hardware advances will also make Debian feasible on new platforms. Think of mobile phones which probably in a few years come with like 256 megabytes of memory and a 1.8 inch hard disk of like 40 keys or more, which is more than some people have on their desktop at the moment. It's also very nice and funny to play with other architectures and systems because it gives you a completely different view on how you can do computing. I will show you, it can be quite different. Sorry, yeah. The question was, it's still Debian, how different can it be, but you will see it can be quite different. There are quite some differences in the underlying hardware, which still show up even for application programmers. I will start off by doing some C-type, the generic C-type problems. There is a definition in ANSI-C which explains how the types relate like. You see that the character has always to be smaller or equal to a short internet long. A long is always at least 32-bit and an internet short at least 16-bit. One important thing to note is that a pointer is not necessarily an int. There is mainly older Unix software that assumes this and this breaks seriously on 64-bit machines because they tend to use 32-bit int but 64-bit pointers. Another less well-known problem is that the signage of a char is also architecture-dependent. On most systems, a char is signed like on Intel. On PowerPC, for example, a char is unsigned. Now, if you use a char as a loop counter and you do a decrementing loop and you just check if it's never below zero, then you just create an infinite loop. Some tips to avoid these sort of issues. Use int as much as possible. The C-language standard specifies that int is normally the fastest way to do any simple integer operations. The only reason to use something else would be if you have to communicate with an external system or if you write a file, send a message over a network or something like that and the standard for that protocol specifies you have to use an 8-bit or a 16-bit or whatever entity. In this case, since EZL-C99, we have nice types which help us there. Never try to abuse char to save memory. It's useless to think that your program will use less memory because if you use a char as a loop counter, even if you know that it will never have more than 256 loops to do. Also nice is that the newer GCC versions give you far more warnings and sometimes errors if you violate these rules so that most of the problems can be found at compile time. Bitfields is another tricky area as you can see on the slide. There are actually two ways of representing a bitfield. You see that there is a structure with two bitfields, one bitfield zero, which is three bits, and bitfield one, which is five bits. On an AR32, it looks like the first line. The bitfield zero will be in the lower three bits and bitfield one will be in the upper three bits. Obviously, PowerPC guys decided that it should be different and they put the first bitfield of three bits in the upper three bits. Now, guess what happens if you try to port code which doesn't know about this. In the end, this is another common problem. There is obviously more than one way to represent a multi-byte entity and if there is more than one way, it will also be used. Consider this interesting number, zero x, so in hex number, one, two, three, four, five, six, seven, eight. In little engine, you will first see that the lower two nipples, or the lower byte, will be on the lower memory location, so seven, eight, and then so on, five, six, three, four, one, two. In big engine, it's just the other way around. For Torah, we also taught on PDP and nearness, but I don't think it's very much used in practice anymore, so you don't really have to care about it. But the Linux kernel still provides macros for that. NNS mainly matters in external interfaces again because protocols, file formats and stuff like that typically define in which NNS you have to provide the data, so that means you have to do conversions because otherwise you would not adhere to the spec. The best way is to use macros to convert, to use macros which convert the data between the CPU and NNS, which is, yeah, obviously CPU specific, and the NNS you actually want to use. Never rely on the fact that if you write an int, you will be, yeah, it will always be little engine or big engine. Alignment is another problem. Generic on risk CPUs. Most risk CPUs require an aligned access, and aligned access means that if you do an access of an entity which is larger than one byte, for example, a 16-bit or a 32-bit access, the address has to be a multiple of two bytes or four bytes or in case of a 64-bit access and eight byte. Now Intel processors, EATR32 processors and AMD64, typically handle the unaligned cases in hardware by doing multiple fetches and combining the results in the right way. Risk processors generally don't do that, so they either rely on traps. So if you do an aligned access, an exception will be generated. An exception handled in the kernel will trap this and will do the necessary accesses and combining stuff in software. Obviously, this is slow because you have to take an exception and you have to run quite a few instructions to do the work, and it also is not possible in kernel and on some architectures. So if you are the kernel drivers, you can't rely on this. And obviously, if you have to do an exception, an unaligned access will never be atomic. So if you do a store of a four-byte entity of a 32-bit word on an unaligned access, you cannot be sure... an unaligned address cannot be sure that the store will be handled before someone else writes to this location. But even on Intel, this is not... in an online SMP system, an unaligned access can also be not atomic. So better try not to use unaligned accesses at all. Now that's not always possible because some protocols use structures which have unaligned fields, and then, yeah, you have to do it, otherwise you can't use the protocol. In some cases, the compiler can help you by generating code. In other cases, you have to write a special function which does the unaligned access for you, which is generally faster than trying to do it by going on traps. There's also one architecture, at least one architecture I know, which is ARM, which does not trap for unaligned access but just gives you interesting results. So be aware. That's ARM. Actually, ARM processors can generate exceptions, but due to historic reasons, this is not used in Linux. Perhaps it changes for the new ABI. I'm not sure about that. Yeah. So my name's Dan Frazier. There's a tool that works on at least IA64. I'm not sure about the other architecture. It's called PRKiddle. And with PRKiddle, you can actually tell it to change the behavior of a process when it hits one of these traps and you can make it maybe crash and then get it backtraced. It makes it a lot easier to debug. Yes, that's possible on some architectures which do generate traps. On Alpha, for example, you can have it generate a kernel message when an unaligned access happens so you at least know that there is an application and you also know which process is actually triggering the unaligned access. Yeah. My name is Riku Voipio. On ARM, it actually is possible to trap them on current system but the kernel defaults on undefined behavior. So there's a PROC variable. You can set to make it like segfault the application or fix the unaligned access. Okay. Any more questions on this part? If you don't mind, I'd like to follow up on Dan's comment about PRCTL. I'm over here. It's your co-presenter. Ah, I'm sorry. Steve, yes, I didn't see you. Yes, PRCTL, a very nice tool, by the way, has actually been ported now to HPPA, IA64, and Alpha. The Alpha port went in in, I think, 2616 as the first kernel that actually supports the necessary interface upstream to be able to do that. But those are the three architectures that we now have that support, where we can look at unaligned traps which are normally handled by the kernel, but for performance reasons, we really don't want them to be happening anyway, and so this is a great tool for actually being able to debug those on those architectures. Of course, if you're running Spark, you have no choice. It throws you a sig bus no matter what, and we'll get to see some of those a little bit later in the presentation. This concludes the part on some of the application layer or application level issues you might encounter. I will now give you a quick overview on how systems and processes look like in modern systems, or how they can look like, because there are thousands of variations possible, obviously, but I will give you a few hints on how it can be done, and I will then go in further detail on how hardware accessing peripheral hardware works and what sort of pitfalls you have to be aware of. So I will first start with what is a bit of the classical Intel system. It's not entirely up to date because the newer generation Intels don't really look like this anymore, but most systems which are currently used like Pentium-3s and some of the Pentium-4 systems have an architecture which is vaguely like this. So there is what is generally called a CPU complex. It can be one or more CPUs, which share one bus to the main bridge to the outside world, which is called the North Bridge. The North Bridge generally has an interface to the memory subsystem, which is mostly SDRAM or DDR2 or 3 or whatever they found these days. And on the other hand, it also has PCI and AGP. PCI is one of the most important expansion buses which are in the current systems, although this is being replaced with PCI Express. I will show you on the next slide how that architecture looks like. AGP is actually invented for graphics cards because the graphics people obviously always need more bandwidth than the standard bus can provide. They made sort of a hack on PCI. They added a way to do prefetching basically to send commands while the current command is being processed, which allows for faster access because it reduces the latency of the transactions. So the main components, as I said here, are the CPU complexes. The South Bridge is generally the chip which is attached to the PCI bus, which could be one of those device blocks on the lower part of the slide and which used to be ESA, but the ESA bus is no longer implemented, but there is still stuff like the PS2 keyboard and mouse controller, the parallel port, the serial port, stuff like that. Yeah, the memory subsystem, obviously. Frontside bus, PCI, AGP. An up-there on-side system is probably one of the biggest changes which happened fairly recently. As you can see, this is also a dual CPU system, but instead of the CPUs sharing a single bus, they communicate with each other via a hypertransport link. Hypertransport is a narrow 8-bit, but very fast point-to-point link. It's not a bus. Every CPU has its own memory, or can at least have its own memory, but it can also access the memory via communicating over the hypertransport link. Obviously, accessing non-local memory is slightly slower than accessing local memory. Yes? No, I think it's like... I don't exactly have any, but I think it's more like perhaps five or so, but there's definitely not thousands and not that bad. I mean, I'm not sure if Linux kind of actually uses the non-uniform memory architecture characteristics already to, for example, to make sure that the process which runs on one processor has all its data on as much as possible of its code and data in this memory, in the memory attached to the processor itself. Yes? This was done in the 2.6 kernel for AMD 64. Okay, so it's implemented now. I know that there was talk... They were speaking on doing this, but I didn't know if it was already implemented or not. It obviously makes sense to do it. For the IO part, there is an hypertransport PCI Express bridge. PCI Express is basically the successor of PCI, which is becoming more and more popular, also in Intel-based systems. It's narrow. It's basically a serial link, a 1.2-gb bidirectional. You can bundle them to have more bandwidth. So it's a sort of a scalable system. And it has the main advantage that it's... Yeah, because it's serial, it has some advantages, which I will explain in the next slide. Behind the PCI Express bridge, you can either have PCI Express devices or, as well, PCI Express PCI bridge, which allows you to use normal or, well, old, whatever you want to go out, PCI cards in a PCI Express system. And obviously, behind the PCI bridge, you can still have the south bridge for connecting your PS2 keyboard. So the main interfaces are, as I said, as I already mentioned, the processors, a hypertransport PCI Express bridge, a PCI Express PCI bridge to connect existing PCI devices. And, yeah, the interfaces I also mentioned already. Some trends in system design, which somehow try to explain why we have this move to fast serial links. As you will have noticed, not only PCI Express, but also ATTA changed from a parallel to a serial interface. And I think there must be more. No. Okay. There are a few observations, which explain this strategy. First, CPUs, yeah. CPUs obviously have become much faster than memory for people who have been doing computing longer than a while. You will notice, you will probably remember the C64 with the one megahertz processor. And without caches, because the memory was about as fast as the processor. So there was not an actual problem. But these days, the processor is not like two to three gigahertz. And there's obviously no memory which can cope with that, at least, not an off chip memory. So we have caches to hide these delays and the bandwidth. Yes? May I ask another question? If you go back to the required slide. Yes. If that's possible. This one. The battery died. Okay. Oh, it's working. It's obviously faster to access a local bridge than to access a bridge on the other processor. You should see that's a sort of a switched network. I mean, the CPU will obviously allow you to access a bridge and the devices behind the bridge which are connected to the other processor, but it will always be faster to access them on your local system. This is why it's so scalable, because you don't have the single bottleneck of a bus, so you can actually add more processors and keep resources locally for high performance. Okay. Yeah. Another problem is that bus and memory bandwidth have come up quite a bit. If you remember that the ESA bus was like 8 megahertz and PCI was like 33. PCI X is like 133 megahertz. So the bandwidth is actually improved also for memory. Like we have now up to 206, well DDR3 is like, I think, double clocked 133 megahertz or so. But the point is the latencies have not gone up in the same way. Accessing an SD RAM is actually quite slow for the first, if you only want to access a single byte or word. It only becomes fast if you manage to use the burst modes. That obviously is usable, but in that there are obviously caches help because caches will automatically trigger a burst access to fetch a complete cache line. There are other things which help, like DMA controllers are obviously very useful in this sort of system because you can then program a long transfer and which will automatically lead to efficient bursts to both the bus and the expansion bus and the memory subsystem. Then parallel buses have problems when the speed goes up. The problem is that all lines have to be almost equally long because you have to wait that all lines have their correct state before you can actually sample them. Or at least you can only sample them when they are all stable, obviously. This means that routing buses, fast buses on PCBs is a very hard task, which is why high-speed serial links actually are a solution. You would obviously wonder why didn't we do that before. The problem is that you do need extra logic to actually implement these high-speed serial links. It only became recently economically realistic to implement this serializer, this serializer logic on the chips. Before that, the logic took too much of the chip space to be economically realistic. Okay, now I will go in more detail on some of the problems that we can see if we try to access hardware or device registers. The first thing I want to talk about is out-of-order transactions. As you can see in the small three-line or two-line power PC assembler code, there are two stores. The value of register R20 is stored on the address of R21 plus 0x20 and same for R22. Now, you would think, ah, this is two instructions in sequence, so they will be executed in sequence, and if you look from the outside, the values will appear in sequence, in the same sequence as your item there. Unfortunately, this is not true. At least some architects, like PowerPC, for example, reorder the memory transactions because they can sometimes be more efficient then. Obviously, if you try to write to hardware, if you write to memory, this is not a problem because it's always cached, and if you read it back, the processor will read it from the cache, and it will make sure that you get it in the right order, and if it's not in the cache, the load will obviously stall until all the writes have been finished. Obviously, if you're writing to hardware, the sequence does actually matter because there is another device at the other side which might expect writes to happen in a certain sequence. So that's why they have an extra instruction which is called EIAO and stands for enforce in order execution of IO. I think it first found a monic and then looked for a nice explanation for it, but still. Also, bus bridges can lead to... can also reorder transactions in certain circumstances, and the way to have that is then also to use a barrier instruction, but obviously if it's a bus, you don't have explicit CPU control over that anymore, but in general, if you read, all the outstanding stores will be completed. So you can always do a store then read from some dummy location which is also on that bus but you will only get the result when the write is finished. That way you have a sort of a barrier but you are sure that the writes happen in the order you want them to be happening. Then there is another quite common problem being non-coherent IO which is like... you have DMA controls and the processor obviously. They all access main memory, but the processor obviously has a cache and not on all systems is the cache coherent with regard to DMA transfers which means that if another DMA bus master or another DMA capable device writes into memory, the processor might not have noticed. This does not happen on Intel systems which is why not so many people know about this problem, but it's quite common on arms and also on the smaller MIPS systems. There are a few things you can do to actually make it still work even though you don't have the coherency. You can either invalidate the cache lines which will be written to because the CPU programs the device so you know which locations are supposed to be accessed by the device so you can invalidate the cache lines to be sure that there is no caching. The CPU does not have any cache left for that location. This is mainly used in all sorts of streaming IO. For example, a network packet comes in you give it a network buffer you give it the others of the network buffer then you flush the cache lines for that bit and then when you get the interpreter the transfer is finished, you read it and you will be sure that the processor will go back to memory. Another way is that you can also declare memory to be non-cachable. This is mainly useful for things you have to update you have to read an update fairly often or in some cases you have to access more than once for example, microcode which is in main memory which gets fetched by the AMA by the device you only have to load it there once and then start the device and you seldom have to touch it anymore. Ring buffers are an example of the other case these are mainly used to store pointers and status information so you have to frequently update those and then it becomes rather annoying to have cache flushes. Addressing is another interesting topic and one of which also leads to a lot of can need to a lot of portability problems There are basically three sorts of addresses in the system. We have a virtual address which is what normally kernel and applications use that one gets translated via an MMU or page table to a physical address a physical address is actually the address which appears on the front side of the central system then obviously you have bus addresses if you have multiple peripherals you also need to indicate which peripherals you want to access and which registers you want to access and also in that case other addresses are used to give a more concrete example they have obviously other addresses on a PCI bus which are the ones you see if you do a less PCI you will see those base address registers which values most of them are the addresses of the register spaces yeah, but obviously on an Intel system this is quite simple because actually all physical addresses which do not correspond to memory so the north bridge sees a physical address it will it has been configured to see how to know how much memory is available and any address address which is beyond that area will be automatically reported to the PCI bus but this is not true on all systems on for example alpha systems power PC the bus address might be an there might be an offset between the bus address and the physical address being if you want to access PCI bus address 0 you have to actually access physical address 8 and then 7 times 0 for example there are multiple there are a number of translations possible identity map is the easier one that's the Intel case fixed offset is what I explained it can also be page based for example alphas and I think also AMD64 support sort of a page table between the accesses going from PCI and going towards main memory so this page table has to be set up first the accesses actually work then there are also cases of bus addresses which are not memory map the IO port some typical Intel x86 processor are probably the most well known example because you have to use special instructions to actually generate those cycles IBM embedded power PCs also have something like that they call device control bus you also have to use special movement move instructions to actually access those the best solution to cope with this problem is to always provide some sort of an abstraction function to hardware so never just dereference the address you have but try to and try to have an abstraction like write word write byte or read word write byte or read byte and have them do the right thing then atomicity you can obviously have multiple processors you can also have multiple bus masters on PCI obviously if you want to read and write in some cases you want to be sure that the access is atomic because otherwise the other end which needs the information will always see a consistent image reads and writes are generally atomic but most always only if they are aligned if you are doing an aligned access it might not be atomic if you do in some cases you need to do an atomic read modify write that's obviously more complicated because you can't do that in a single instruction on Intel you have a special log prefix to do that which only works on exchange and I think one other instruction if you do with another instruction I think you get an illegal instruction exception MIPS, PowerPC and ARM use a bit similar scheme they actually have a strategy in which you can retry the access so you load it then you set a specific bit and the processor will watch if someone else was trying to access that memory location then you do the store and the result of the store will tell you if someone else accessed the memory location or not and if it fails because someone else looked at the data while you were accessing it or while you were processing then you have to restart again ARM has a swap instruction which is I believe very similar to the Intel one but I have seen it very seldomly used I think mainly because there are no SMP ARM machines obviously you can't rely on this sort of behavior if you have bus bridges don't think you can do atomic transaction in video memory or other sort of memory living on a PCI card which sits behind three bridges because those bridges may just not support any form of logging especially if you want to use if you want to be sure if you want to rely on CPU looking you have to do it in main memory I think by now it should be clear that if you try to access hardware from user land you will run into all sorts of interesting problems in general I think it's best not to do that you can do it, you can em up DevMem or you can em up PCI device and start accessing them but your code will be likely highly the best solution or the best strategy against this problem I think is to separate the transport of the commands from the actual logic of the driver I think there are a few nice examples in Linux which already implement sort of this ID firewire for example there is a library liprow3094 which allows you to directly access firewire devices from user land in that case the application does not have to know how to access PCI, how to talk with the firewire controller it only has to know how to speak the firewire protocol with the protocol which is implemented by the device this is indeed mainly used for this is used for example for in grabdv to grab the dv frames from a video camera it's also used in freebop for audio over firewire I think that's the most common applications for usb there is something similar it's called libusb and there are quite some packages which actually use this I think gphoto is one of them I believe Sain might use it too and there are various more less well known gadgets which only have user land programs which are accessed via libusb SCSI and ATAPI device actually have something similar there is a generic IOCTL which allows you to submit any sort of SCSI or ATAPI command and the most well known use is probably cdrecord or other programs which write cds or dvds there are probably other examples of this sort of schemes as well yeah you wanted to ask a question I just want to precise that using libusb for example gives high portability on other channels for example you can use it on freebsd you can use it on macOS 10 or even on Windows so your program will have high portability yeah obviously if you have a library the interfaces can also be implemented on another operating system even if the back end is different so that's another reason to use the libraries if they are available if you really have to do user land hardware access then I strongly advise you to use abstraction layer because sooner or later someone will realize someone will come to you like hey your driver and board does not work on my 15cpu MIPS system with 5 PCI bridges and this strange memory mapping please fix it so then the only way to have to cope with this sort of architecture problems is to have an abstraction layer which so that you don't have to mess around in all of the code and solve this issue I think I'm almost at the end of my bit there are any more questions there is hardware out there that does things in a specific byte order and so despite the fact you're using libUSB you find that the hardware is assuming that certain byte order is out is being used that's a very good remark the libUSB guarantees you that the bytes you have in your user land buffer will be transferred to the device and vice versa but obviously the messages themselves might probably have an indian specific layout and obviously you should make sure that your program copes with it properly but at least you don't have to you can at least write a user land program in a portable way which works with USB devices yeah there is another example I didn't mention that for part port there is also something like that but I don't recall the name anymore which allows you to actually hardware independently access the parallel port stuff from user land yeah I don't know if there are any more questions so Steve I suggest you continue with your part of the slides okay thank you very much Peter with the change of speaker here we are going to have a little bit of a change of focus obviously you can tell Peter has given you lots of good information about hardware specific porting issues many of which are specific to kernel land in fact that you usually will not run into for my portion of this talk I'm going to be focusing on things that you're going to run into in the process of maintaining Debian packages specifically things you will see in the wild on one or more of our architectures that result in a package failing to build and just let you know what the overall format is going to be for the next half hour 45 minutes or so I'm going to start out talking a little bit I have a few more slides then I'm going to take about a 5 minute break during which I'd like people to go ahead and get organized into groups if they want to stick around for the workshop portion where we will actually be tackling a few build failures that have been handpicked from the build logs on buildy.debian.org so that we can go ahead and actually see how some of these build failures need to be addressed so you get some practical hands on experience with resolving bugs of those kinds so that next time they happen to your packages you don't have to come to a workshop like this to fix them so let's just let you know what we're going into here let's see I do have one URL on the server which I don't have listed on the slides and it's going to be of use to you if you do have a computer here and you want to go ahead and get that loaded up right now I'll give you that URL it's ftp colon slash slash Homer slash workshop dash links dot html was that did I speak that clearly enough that everybody got the URL I see neither yes nor no one yes okay moving on okay so when we're looking at build failures they generally fall into a number of categories that you'll find build failures falling into first of all you can have just plain software bugs in your package where the software does not handle the architecture specific hardware details or processor memory details whatever that may be correctly which as we went through surprisingly very few of the current build failures that we were able to find for this talk fall into that category it seems as though the porters are pretty good about making sure we get rid of all those and so we get into different categories of build errors which are the main ones that we're really dealing with today when trying to make sure that Devin is ported to the different operating systems you can also have architecture specific build dependency problems for example if your package build depends on Java you will see that it will fail on some architectures due to there not being a viable Java implementation today for all of our architectures so it's something that we hope will eventually be resolved but it's a tool chain issue that's just not in place yet we don't have a solution for Java on all of our architectures at this point you may have build depends being temporarily unavailable at the time the build Devin tried to build the package in which case you will see build failures for that which are build failures you just need to talk to the build de-maintenors about in order to get your package retried I'm sorry go ahead with the question of the notion the question was is there a standard way to contact the build de-maintenors in order to request the re-queue of a package the answer is that there are aliases set up where it's architecture name at build devin.org which is the address you can send requests to those don't necessarily guarantee that your request will be processed any faster than it would normally because the build de-maintenors each do have their own approach for handling failed builds and some of them just more or less ignore the mail and process them in order regardless eventually they do get to all of those failures regardless and they do eventually get given back but if it's something that you know that you can communicate with them actually IRC tends to be more effective if you in practice talk to the maintainer there yeah just telling everyone please use as little wireless as possible we should have quality of service but for some reason it does not stop your packets so please use wireless only if you really have to and use as little bandwidth as possible because the streams are useless right now so going on here you may have a case where your build dependency is in the archive but you're missing a version number on it so your package will succeed in building on some architectures where a newer version of your build dependency became available before the build detried your package and then on other architectures where that version build dependency did not become available first it will fail and so you'll have to go and see a different version of the build dependency was used and that's the common theme in why it failed on one architecture versus another you can find tool chain specific bugs where you'll have things like GCC in the middle of building your package decides it throws an internal compiler error that's almost always a tool chain issue rather than a bug in your package that doesn't mean you can ignore it because your package isn't going to go and you're trying to get it updated into etch and it's not building then it is an issue that you have to deal with rather than just saying oh well the architecture doesn't work or the compiler doesn't work you can't just ignore those kinds of things you can have build failures that are specific to a build environment in particular we have three architectures in Debian which are alpha, MIPS and MIPS cell which use d-package build package dash rsudo instead of fake root as most people are used to using for package builds that was put in place on those particular architectures at one time or another due to problems with fake root that existed on the architectures when it was implemented as far as I know fake root is now functioning correctly on all those architectures nevertheless your packages are supposed to build correctly when using rsudo as opposed to rfake root so you will see some cases including some that we'll see a little bit later where alpha and MIPS cell fail everything else succeeds and it's because those buildys use sudo and your package can't cope with that and then you can just find some build failures which are just plain buildy specific bugs the most common cases of those we've had recently have been the HPPA buildy randomly having bash segfault rather it's obviously not anything you can fix because if bash segfaults it's not your fault so you really have to go back and just get that package recued as well we have a question here with ours another cause of buildy failures which actually I haven't seen in the past month or two anyway there are some sbuild environments on buildys where a buggy package doesn't uninstall itself cleanly and then it caught the sbuild chroot on the buildy gets corrupted and then your package won't build because of a different package had a bug in a previous release right that's a good point we do actually hope we have those bugs fixed there was actually a dpackage bug prior to the sarge release where dpackage would not correctly handle rolling back an attempt to purge a package directly if you tried to purge a package and the post-RM script failed dpackage would say oh well it failed but no you already did everything you needed to so we'll consider it uninstalled I'm sorry what it would do is it would remove the files because it's in the post-RM already it's removed the files runs the post-RM script the post-RM script failed it would roll back the state and say oh the package is installed is what it would do and so you would see this manifesting in build logs as oh your build dependencies are already installed and it goes through to build the package and because a major file that's supposed to be part of the package you build depend on just isn't there but we think that dpackage bug is actually fixed now so in general packages failing to purge correctly should not be breaking build these at this point as far as I'm aware okay so yeah so those are the various categories of build failures you can run into and you know to deal with them first of all you have to notice that your package has failed to build I'm sorry go ahead Joss there have also been build failures related to package not purging because of circular dependencies I'm sorry can you say that again some there were some build failures because some package in the previous build refused to remove because of circular dependencies and also broke the build environments okay I can't say that I've seen any of those but that's yeah don't do circular dependencies they're bad so yeah the first thing you have to do is actually have noticed that your package isn't building in order to do anything about it and there are a number of resources out there whether you are a porter or whether you are a package maintainer whatever your involvement is in Debian where you want to make sure that packages are building correctly there are pages for you to watch that will tell you what's going on the URLs that I have here which I in that link I gave you earlier I've got those links in there as well so you shouldn't have to cut and paste all of these there's a lot of people have been using the people.debian.org igloo has a page set up status.php which allows you to look by maintainer how packages are doing across all architectures one thing that's kind of missing from this is that it doesn't really interface with build logs in the event of failures so if a package has failed to build I think it just simply shows up as being building I don't know if it actually shows up as being building okay that's right it does show up as maybe failed I got confused because I looked at this page just before and I looked at it on my own packages and it was showing several of my packages as building where they had actually failed but they had failed within an hour or two of me looking and so it had not checked that to know that the packages have failed so ignore that comment it's actually this will give you pretty current how packages are building correctly on all architectures after you do an upload it's a very good idea to look at this page a day later and see where things are to see whether your packages are actually building or not there's also a page directly on buildy.debian.org Ryan Murray has recently added this address where you can check the status the want to build status of a package just by buildy.debian.org package name and it will show you its status across all architectures so you can see whether it's actually gotten built or not and yahoon back there also has another page on buildy.debian.org which allows you to check the general status of all packages and want to build for architecture you can get at this information from some other pages on buildy.debian.org that are part of the main buildy site what this page specifically lets you do is you link directly to build failure logs if there's a problem and it also documents things as maybe failed in the event that it's there's a log that looks like the build failed well the build did fail because it didn't produce any packages but it has not been marked as failed by the buildy maintainer or anything like that and whichever one of these pages you start out from eventually what you're getting to is you're going to end up at a build log which shows you something that happened when the buildamen tried to build that package and what exactly went wrong and that's where the meat of this process is looking through that build log and usually at the end but sometimes way far up from the end exactly what happened and went wrong so now that we've got this we know that we have a package that failed in this example we're going to look at db4.3 first this is a known failure which I guess if the wireless doesn't really allow it at this point there's no sense in going ahead and looking at that right now the exact failure is copied there exactly what's happening and we see in this case it's an unmet dependency of java.gcj.compat.dev depends such and such but it is not going to be installed this is an error that everybody I'm sure has seen at some point or another just running on systems and what do you do with a build failure like this well this is not a bug in db4.3's upstream code because it's doing the right thing it's portable code it's just not portable to platforms that don't have the correct java available and in this case it appears that hppa does not have a java that's known to be correct and usable with the db4.3 java bindings that doesn't make it a bug that we can ignore db4.3 and db4.4 both have this particular bug on hppa right now where they are not moving into testing the newest versions are not moving into testing because that's not available but nevertheless as I say there it is out of scope for this workshop we're not going to be trying to fix missing build dependencies on java I'm not asking anybody today to yes please port java to hppa or any of the other architectures forming that doesn't fit in 45 minutes sorry let me start again what is the recommended solution in this case we have got a build dependency that is available only on a subset of the release architectures it might not be feasible to port the build dependency over to those architectures it might just be too hard to get done in the time we still want the package presumably on the architectures where it is possible to have it can't we just decide and say that it didn't ever build on that architecture or it has been removed from that architecture it's only supported on the subset that it does work on yes that's a good question in terms of releaseability of a package the actual requirement is not that your package builds on all architectures the requirement is that it's supported it's available and current on all architectures where it can reasonably be supported and how we normally define reasonably be supported is did a previous version of your package build on that architecture if it did well it was supported at one time so why can't you support it now is usually how the reasoning goes there are extenuating circumstances when that's not possible I think that the maintainer should work out with the porters for that architecture what the porters believe is the appropriate solution if the build dependency if they think it's something that they need to have ported and that it would be inappropriate to remove the old version of your package or that's an option they may say yes if it's not ported the code that's supposed to be ported getting to work on our architecture any time this year and yes in that case they would go ahead and authorize removing that from the archive on that architecture alright if I may follow up on this what happens in a more complicated situation in which say on some released architectures you need the old version of your package but on some other architectures you need the new version because you know the world has moved on can we release Edge for example with mismatched versions of package X I'm assuming X is a hypothetical right let's say package can we release with multiple versions of a package across architectures conditionally yes I've yet to see a case where I felt it was actually valuable to do that I mean other than things like the tool chain GCC we've got currently four different versions of GCC in the archive at least four and some of them work better than others on different architectures and there's currently a discussion about whether we're going to try to move to GCC 4.1 as default for Edge and on which architectures we'll do that and what the compiler will be and what the architectures and so on and so forth that's kind of a that's a special case if we're talking about the package used to build now the new version doesn't build can we just ship the old binaries well yeah we also have to ship the old source and is the package itself really that valuable if the new version doesn't build is it actually that valuable that we want to effectively bloat the archive by having two source packages carried around for that in my experience the answer is almost always no I'm getting an FTP assistant shaking his head at me I'm not sure what that means but yeah right that's a good point Yorvan was pointing out that in the event that you don't have in the event that all the architectures you're shipping aren't built from the current source version the archive software will not allow you to do security updates in the out-of-date architectures so the only way to actually do that if you are going to say let's ship different versions on different architectures is really to make it to he didn't have a question I repeated his comment break my train of thought here that yes we would have to actually create two different source packages for the old versus the new version and give them different names in order to allow that to exist in the archive without causing problems for security support and other kinds of updates within a release okay moving on so now we we've found the build log and well you have to go through and read it to figure out what's going on in this package and one of the things that Peter was talking about earlier was people writing software which assumes that pointers and ints are the same size as I mentioned this is increasingly uncommon fortunately in the software we have in Debian for this to be an issue or at least for this issue to stay around very long the AMD 64 porters in particular have taken a machete to all of the packages that had this problem in the archive and so now that there is a a popular 64 bit architecture running around there's very little code remaining in the archive which has problems on 64 bit architectures due to this assumption there are still some I've given you an example here which is S-W-T-G-T-K this failure I happened to have linked to the alpha build log for it but this failure exists on all of our 64 bit architectures and what it comes down to basically is this is one we cannot fix today or anytime soon because Java in its APIs makes certain assumptions about the sizes of types that it exports and so you cannot just say oh well this has to be big enough to store a pointer so we'll change the type of it because then the things that are expecting the type to be of a different size break instead and so that's a hairy one that we've had going on lately so there's also a class of these types of pointer integer bugs that doesn't cause a GCC error at least if you don't have W error turned on and that is where you have implicit pointer conversions where GCC doesn't know the prototype of the function that's returning a pointer and it will convert it back to an integer that will work fine on 32 bit architectures but on A and B64 or I64 that'll actually cause a seg fault at runtime David Rosberger has written a filter for this and I run all the build D logs off these architectures through it as an aside I'm looking for additional things that will parse build D logs and look for class problems that I can add into this and if we can do that we can catch a lot of stuff that don't necessarily cause a build failure but can cause failures at runtime yes thank you that's a good point as well that brings to mind another class of failures which are pointer to failures which are not caught by the build Ds automatically and some brilliant programmer has decided to explicitly cast his pointer to an int before passing it back and there seem to be a lot of examples of this running around among people using G object and I do not know why that is I've asked people if there's some example tutorial out there where people are being taught to program this way so that I can pick up the manual, print it out and bludgeon whoever wrote it they assure me that there's no such manual but yeah this type is a pointer don't cast it to a G int before you return it that's not good so yeah here's another example of build failures this one is the free type package which on alpha and MIPSL it fails because those are the architectures that use our pseudo when they're running the command the parts of the build that have to be run as root and although we've seen some problems before with packages that don't run the same, that don't build the same under fake root and pseudo this is a fun new class of them as a result of behavior changes in pseudo where pseudo no longer passes most of your environment variables including the one that tells you what your current directory is so you'll get to take a look at this build log some group of you will get to take a look at this particular build log a little bit later and see exactly what's going on there in order to fix that but yeah, here's an example of the lines from the build log which shows you something a little bit long something's a little bit wrong is that it's trying to remove files and slash Debian well, I don't usually have a slash Debian when I'm building packages or any other time to be honest and I think this is my final slide which is that step three, once you've identified that there's a failure and started looking at the build log step three is that you have to start hacking there's no magic formula for fixing all of these build failures you will learn to recognize certain kinds of build failures so that you can fix them fairly quickly but I can't in this session you know, tell you follow these steps exactly and you will never again have to think about a build failure because you will you will encounter new and different kinds of porting bugs as you go along but after our little five minute break I'd love to have some of you stay around who are interested in doing some porting fixing and we'll the question was do we need a laptop for that well, I'm actually hoping to do this in groups because I don't have enough material to go around and give everybody their own porting bug to fix I'd love if I could you know just give everybody a bug and say okay well, my goal for this session is to have 70 RC bugs fixed when we leave but realistically we're going to be working in groups and so yeah, there will be need to be a laptop per group but it's not going to be anything like everybody needs one so yeah let's take five minutes here and people can shuffle around, people who don't want to stick around for the workshop can go ahead and get up and leave at this point the streaming is going to suffer progressively because I'm going to have people logging into remote build these and I consider that more important than the streaming so as much as I love all you viewers out there we're going to have to make some compromises here if that's what it takes oh, do we have thought we had a question and we do not