 I would like to introduce you to Martin Husserman and Jörg Sonnenberger, both are long time NetBSD developers and they will now present us their ongoing question for modern tool chains in NetBSD. Please Martin. Thanks and for presentation technologies. Okay, this all started a long time ago but the most recent quest we went on started about a year ago when I got set up with the amazingly outdated GCC tool chain on NetBSD wax. Nothing compiled anymore. There were lots of errors introduced by the buggy compiler and so we tried to modernize this and modernizing a compiler in NetBSD is not easy because we have lots of architectures and we gave up on finding a single one size fits all solution. So we will see what problems we have to solve and what we did in the end and what options we now provide to the users. We have always been pretty reluctant to use the latest GNUDE tool chain because it tended to be buggy in every dot zero version of course and as I told you we have lots of architectures and testing is not easy but we had to do something because the only modern tool chain we had was the optional clang LLVM stuff we had and that did not support many of these architectures. So we did a big GCC modernizing campaign and also reorganized lots of other stuff which we will see in a minute. Oh, sorry, yeah, besides the compiler itself we have to do updates to the supporting cast to everything that the compiler requires like microcode things and exception handling stuff and various support utilities. We also did changes to the C startup code and got away from the one provided by the compiler. And until recently when we applied for this talk we had a huge list of GCC versions in tree that was 4.1 for wax, we had 4.5.4 I think or 3 or 4 and 4.8.something. We have PCC still in tree, Y we will see later and of course clang and LLVM. GCC feel 4.1 is now luckily gone because wax upgraded to 4.8 and still this is not enough because we have one oddball architecture that does not work with any of these tool chains. And this is work we did not ask to alone but a large crew and I put them on the slide so they can recognize them. But we are not going into details here. So let's look at what kind of problem matrix we have to solve. We have like 70 architectural that we built binaries for. As I mentioned one is not in the set of officially built things we built on the cluster because it has no entry tool chain yet. And even if we ignore this oddball we have, if my script didn't calculate it wrong, 39 architectures. Probably half of them are variants because we built arm in little Indian, big Indian with hardware float with version 6, version 7 and so on. But beside that there are quite a few of differences. For some of these we have options. So we can for example on E386 or AMD64 we can use Klang or GCC alternatively. And for some of these machines none of these compilers can be used at run time which is a big problem. So this is a list that our build cluster generates for each of the builds we automatically generate which usually happens like every few hours. This is a, I picked a special one that's all successes, no fade builds. And you can see some of these architectures down here. And just to give you an idea what we are talking about, you see lots of arm stuff here and it ends with wax and then there's this strange X68K which is not a typo but it's a Motorola 68K based engine workstation probably only used in Japan I think. Yeah, the missing one is the PlayStation 2. Maybe someone remembers it. It's more or less a fun project. We used to run on the PlayStation 2 until NetBSD 3 or something. Then we started using the C99 stuff in the kernel and we didn't have a working compiler that could do PlayStation 2 code which is a MIPS R5900 CPU and so we had to give up on that part but recently somebody else came up with patches for GCC current and integrated support for this CPU in GCC current and which then became GCC 4.9 and that's the point where we said, hey, we can revive this and so we did this and the toolchain is too young to be in base but we have a package to build it and but it's external so it cannot run on the automatic build cluster. That's one thing left out. Okay, no serious work is going on. Nobody is waiting for PlayStation 2 support. The other thing I already mentioned is the Sun 2 port. Sun 2 machines have like 8 megabyte RAM. They have a Motorola 68K processor but less than the 68 to 0.20. So there are strict limitation. They cannot run shared libraries, at least with any of the tools we have and so we don't manage to build a modern GCC even to the link state at it because it's just too big. They have a virtual address based limit of 16 megabytes which is way too much to fit GCC inside so. Yeah, actually we had to fix the Wax port because we could not execute the GCC we built for it. We cross compiled it and we could not execute it natively. We had to fix the kernel to allow bigger binaries. Okay, Sun 3 is in a similar situation. We can link the GCC there but if you try to run it on a machine which has like 16 MB RAM, you will have to wait ages and it will run out of swap no matter what you give it for anything but hello world. So this is not a feasible option at runtime for this port. The solutions for this are pretty easy and that is also the reason we still have a PCC entry. PCC, if you don't know it, it's kind of a fun project that Anders Magnusen took. He revived the ancient portable C compiler and made it work and it's, if you know from dust times the old compilers, it's like Turbo C or Turbo Pascal, it's RAM and the code is there. It's really fast, it does not spend a lot of time optimizing, it does not generate that bad code but it's a toy and there's no C++ front end for it. No usable at least. There's some beginning of something which might turn into C++. Somebody actually mentioned C front in that context, I don't want to know. So this is not the idea to give somebody an option to compile things like Firefox or GNOME on a Sun 2. It's nonsense obviously. It wouldn't run anyway. It's just the idea that we want to deliver some traditional Unix system which always used to include a C compiler and so we have to ship something and we will probably end up in compiling it with GCC 4.8 and delivering PCC as a runtime compiler and everyone who wants to build something can then use PCC to bootstrap whatever and extend it. So no C++, so no Firefox, it's a pain. I don't care for that platform. One thing that probably is not obvious to non-NET bestillers is we do allow GPL v.3 code entry but with restrictions. We are aware of the strange license and we don't like it and we want to get rid of it, certainly, but we also want to have a modern toolchain and this clang and LLVM don't work for all of our architectures yet. So we had a big debate as always and we came to a solution that we put this code into a separate directory team. We restrict GPL v.3 stuff to basically toolchain stuff, meaning things you likely are not going to need at runtime or if you are an abandoned vendor and have problems with these license, you can leave it out of your runtime stuff. You can use it for your build but you don't put it in your device. So it's in a special directory and this worked out pretty well. We extended it to use for other licenses as well and we have a farm of source external license something stuff where we import all the external things and we reach over to this tree from the base tree, your make file includes and path settings and build it from there. This is a point where I'll take over. As we mentioned, we have three different compiler families. We have GCC which is GPL3, the last GCC with GPL2 was removed because one person that really required GPL2 for certain contracts can now use clang so that's no longer a problem. We have the option of using LLVM and clang and of course PCC. One of the problems we have is LLVM and clang currently only fully support x86 and ARM. Fully supported means in this case you can build a complete release and don't need anything from GCC for it. PCC currently has a pretty mature code generation for x86. M68k and Vax is on the way but not there yet. For clang you can build mostly usable system on PowerPC. There's one interface in libc which can't be compiled because it uses a union as argument for a variadic function and code gen support for that doesn't exist at the moment. But very few things actually use it. It's part of the system 5 inter-process communication stuff. At the moment clang-compiled clang on PowerPC crashes on pretty much any input. It's difficult to debug where exactly it goes wrong but it's mostly a case of bit tender loving care needed. And Spark64 are in a pretty similar position so we hope to have clang as a choice for those soon as well. And there's some very good arguments for using LLVM and clang. Compared to GCC you can build one binary of the compiler and it supports all the platforms without having to rebuild like 90% of the compiler as you would have with GCC. One of our dream goals is to build a toolchain once and be able to compile all the 70 architect settings with this one toolchain because the toolchain build itself is like 30% of the complete compile time for a release so it's a significant problem. What you have to do at once for the cross compiler and another time for the native compiler so we want to avoid that. One of the interesting features of LLVM is the integrated assembler. Basically you don't need to call a separate assembler. It's directly integrated in the process of creating object files. And it also allows doing a lot more sanity checking of the input than GNUR assembler currently does. It creates some fun for programs that basically want to compile a file to figure out certain structure sizes and so on and they tend to use inline assembler for that which is syntactically not valid. And this completely falls apart with the integrated assembler now so I have plans for rewriting the corresponding tools to use a more sane approach but it's not that easy. There's also one problem for the initial platform setup for modern LLVM you need C++11 support which is not in any of the currently released versions of NetBSD so if you want to build a clang on it you need to install for example the GCC48 package to be able to cross compile it. GCC wants to move in that direction as well so it's something everyone is going to hit sooner or later so the primary problem with PCC is that the code that generates is not very fast. If it's simple code it's decent enough but anything that really depends on inlining and fancy optimizations is going to see serious slowdowns with PCC. On the other hand GCC and clang are mostly on pair when it comes to the performance of the generated code. GCC still has some advantages when it comes to vectorization and open MP support but that's not so relevant for the base system more for dealing with third party code in package source or whatever. So the next big goal for us is to get rid of all the runtime components of GCC. There are mostly three of them. The first is a common runtime code which is linked into every binary and be it a program or a shared library and that's completely done. We have a mix of pretty straightforward assembler for it and from MIPS and some other platforms that have too many different ABI's. We have a CSTAP for part of it to make it easier. The second important component is libGCC. I'm going into more details in a bit and the third one is C++ one time which we are going to replace with libC++ and libC++ RT more on that later as well. So compiler RT provides routines like hardware division which are sometimes not present. For example, ARMv4 and up to ARMv6 can't, don't have a divide instruction. So you have to do that in software and we have very neatly optimized routines for this which are actually faster than the GCC implementation. So this provided quite a nice performance boost for those older systems. Another example is floating point support. We still have the same float code in libC as the other BSDs have. Yes, now it's libC and we are currently investigating which version is easier to use and there's one functional difference because compiler RT version doesn't support non-standard rendering modes but almost no one is using that anyway so it shouldn't be a problem. There are some special cases like Superhitachi and HPPA which require a small code fragment to be linked into every shared library because those functions can't be, can't use the normal PLT stuff, that's the so-called millicode. We are not completely sure yet how to deal with it. Historically, those are positioned in the independent code in libGCC.a which is still pulled into the libraries. We likely will introduce a separate library for them like libMilli or whatever and adjust the compiler drivers accordingly. For Superhitachi, we have the problem that the division functions have special restrictions on which registers they may clobber and until recently it wasn't possible to use a normal shared library function for that because lazy binding stuff actually clobbers a register the function is not supposed to clobber. We are likely going to just fix GCC definition and make it use a normal code. The only real reason for not having completely switched over yet is we want to preserve the libC ABI. We don't want to do a major version bump at the moment. So we have to preserve libGCC underscore s and we have to do it in a way that still works with a simple versioning used by it. So you can't just create an empty shared library because everything that tries to link against it will now fail. So what we are likely going to do is use indirect function support, which basically means the dynamic linker calls a function to tell what's the real address it should be using in the GOT or PLT is, but that still has to be written. For C++ support, we have a very nice implementation from the LLVM project. It's pretty readable. If you understand C++ and if you feel like dealing with very, very heavy template use, but it's pretty okay for that. It has a nice comprehensive test suite, which is very easy to run. Again, this is something GCC is horrible at because you need to install some tickle-based Dijak NUV and it's not properly documented how to do that and, well, once you get it running, it's nice, but getting to that point is difficult. We have the low-level support on the BSD licensed, that's code originally written by Pascal and the free BSD foundation and the net BSD foundation together basically obtained a re-license so that we can use this. There are some other languages we haven't really decided on yet. For OpenMP, we are most likely going to use the Intel runtime. The big problem of the Intel runtime is the build system is a mess. It needs Perl and lots of little magic that needs to be completely rewritten. The other option would be the GNU OpenMP runtime, but that's going back to LGPL3. Not something we want to do. For Objective-C, there is a free implementation by one of the free BSD developers, but at the moment we haven't imported it into base because outside Apple, Objective-C is basically only used by GNU step and that's something not so many people care about. For Fortran, there is currently no plan for shipping a runtime and compiler support is still somewhat open. This leaves one big topic for C++ especially, which is exception handling. There are some other languages that can interact with the runtime and of course one of the nice features you get is you can create backtraces of optimized C code even if it doesn't have a frame pointer for example. So for example in the kernel in DDB we have the problem that AMD64 normally doesn't include a frame pointer, the ABI says you don't have to. On I386 it's pretty much a similar issue with recent GCC versions because it's much more aggressive about dropping it, so getting the unwind handling into the kernel allows us to more easily produce reliable backtraces. The general approach here is you save the current thread state on the stack and you go the call chain back on the stack and for the program counter for each stack frame you look into a separate table to see if there's basically a catch block and if there is one you also match the types and if you have a match you execute the cleaner pendlers and are done. So for NetBSD ideally we want to have the titanium-based C++ ABI on all platforms. There are some variations for example how type pointers are stored and so on but those are relatively small. Next we are using Apple's libunwind but so heavily modified you'll likely not want to run a different against the original version. We only provide the unwind interface as used in libGCC for example and not the second interface of HP libunwind, one of the reasons is namespace pollution and the other one is actually adds a significant amount of code which we don't want to have. One huge problem is ARM. ARM has an exception handling ABI which can be nicely summarized as not invented here. Unlike what everyone else is using they don't use the normal titanium C++ ABI with draft instructions nor they have to roll their own things and they do it in a way which doesn't even map easily on the titanium ABI. For example Apple uses a different format called compact unwind which replaces the virtual machine used by draft with a description of the common frame layouts but it doesn't change any of the other aspects because it doesn't have to. Some is different and one of the issues you have when you want to use ABI in the kernel is you get references to some functions and no one really wants but it's part of the way unwind data is written that's annoying. So what we ended up with is patch the compilers to just generate what everyone else is using. In GCC this is something like 20 lines including a micro to identify we are not using the standard code and the platform support in lib unwind ultimately is something like 90 lines of C code most of them copy and paste and 40 line of assembler. It's much simpler than the implementation for example in GCC for the ARM EH ABI. Unwinding on Spark is somewhat interesting because Spark is the only still actively used platform with register windows. Basically the idea is you have a set of local registers, output registers and input registers and when you do a function call the input registers of the calling function become the other set. So basically you are switching the register set and the old local registers just disappear and the chip itself has a very large register set with S1 somewhere and if there is an overflow because there are too many stack frames there's a trap and you move some of them out onto the real stack. That's very nice because it makes function calls pretty cheap and on the other hand you have to deal with it for unwindings. So there's a special draft instruction to say okay I'm now entering a new stack frame and the other specialty of Spark is all other instructions save the next address after the call on the stack. Spark doesn't, it keeps the old one and basically the return is supposed to increment it. So this needs some special logic as well. HPPIA has a similar feature. It stores in the return address the privilege bits. So you can move from higher privilege code to lower privilege code and the other way around and you don't have a fixed value for this bit if you are in the kernel. So you have to mask it off. And then there's the VAX. The VAX is very interesting when it comes to function calls because it's a very definition of a complex instruction set. So what the call instructions on the VAX can do is realign the stack. The entry point of a function says I'm going to use the following registers to save them on the stack. The call instruction looks at this mask and handles it appropriately. The return instruction on the other hand will clean up the argument kind of like the old Pascal ABI on DOS used to do. The funny part is the return instruction itself doesn't specify how much argument space there is because it doesn't know it if you think about printf for example. So this is actually saved on the stack. And this is something a draft can't really represent easily. What is actually done is you specify the draft instruction in the caller to say okay I have pushed 20 bytes on the stack. And during the unwind you modify the stack pointer accordingly once you have basically emulated the return. The other interesting problem we have for VAX is lazy binding. Lazy binding in the old days just left an intermediate stack frame. That's not something you want to have. And one way to deal with it is to instrument the dynamic linker to provide appropriate instrumentation for unwinding. The other one is we can try to clean up this intermediate stack frame. What we actually do in most cases is we basically remove the stack frame, go back to the original caller and just re-run the original call instruction now with a fixed address. That's a really nasty hack but it works surprisingly well. And the other problem we had both GCC45 and 48 created unwind data that's just completely wrong. I have no idea how that ever could have worked. So with that we are done. Are there any questions? In that case, thanks for your attention.