 So, we will now talk about building Debian with another compiler by Sylvester Eledrupe, and well, be my guest. Thank you. So the weather is not very good, so I'm going to try to speak loud so that you can hear me. So the first thing that I have to say at the disclaimer is that I'm not paid by Apple to do this work, and I'm not into the GPL versus BSD license discussion. I know that it matters for some people in Debian, but for me it is, I don't care actually of this issue. So what do we have currently in Debian? Most of you, I'm sure you are already aware of that, but all the C, C++ and Objective-C that we've got in the archive are built with GCC on every arch. So why am I trying to rebuild that on other archive, on other compiler, sorry? The main reason is because we can. So it's something great with Debian is that if I want to do something, I won't have any boss coming at me saying, no, you cannot do that, you are not allowed to do that. So I can try to do that, and nobody told me to stop. So the other reason I try to do that is because it is fun. It was very, it is a very nice work to do that, because Clang is behaving the same way GCC is on most of the arguments. So it is not a very hard job, but it is very fun and enjoyable to do. More seriously, if I'm doing this work, it is because the more you use compilers, the more error you can find. I will show you many examples where Clang's dataxing that GCC cannot. So it really improves the code that we've got into the archive. The code is also, the more you use different compilers, the more the code is correct. And it's more portable. That means if you use some specific GCC extension, you are pretty sure that it won't work with Visual Studio or Intent compilers and so on. So it's very important with these kind of tools that you make sure that the code will run on many compilers and platform and operating systems. And another advantage of especially this compiler, it is because there is a lot of investment on this compiler by the values actor in the MBD field. So for example, the latest benchmark shows that Clang on ARM is getting close or sometimes better than GCC. And finally, one of the things that I like is we have been able in Debian to decoupling the Linux kernel and replace it by two other kernels. You know that, the K3BSD kernel and ERB. So what I like to do is also replace GCC by another compiler. I'm not talking about removing GCC, but providing an alternative. So we just lost electricity, so we had to re-initialize the video system. So the tools that I worked with was LLVM and Clang. So they did a nice logo. You can see Apple, they've got some money so they can buy some nice logo. It started as an academic project by a guy called Chris Latner. It was designing at the beginning to be a versatile platform that you can use for various research subjects. Actually, he did that while he was doing his PhD. And he used to be also a GCC developer. And the funny story is that he proposed LLVM as a new version of GCC. The way he proposed that was not really well accepted in the GCC community. So he failed to do that. But he had a pretty good idea. People from Apple contacted him in 2005 and hired him and built a team around LLVM. So the goal for Apple, as you know, is that they are not very open source friendly. They were trying and they succeeded to build an alternative to GCC with another license, which is way more convenient for them. So basically LLVM is BSD, Clang is the same. And most of the tools in the ecosystem are under this license too. Now it has a very strong community. That means that many academics are now doing their research on LLVM, mostly because GCC is pretty hard to hack in. It's not because GCC is not good, but it is mainly because GCC is an old software with a lot of legacy code. It is way easier for new students to get into LLVM because it is 2000 code. There is also many individuals who are involved in LLVM because it's fun. And many corporations are also involved in this work, for example, Google is investing a lot, ARM, MIPS, Nvidia, etc. So what is Clang? Clang or Clang or Clang, as you wish, you can pronounce that in many different ways. It is a C, C++ and Objective-C compiler. I know that there are some people who are planning to build a Fortran compiler with Clangs, but I don't know how far they went. So it's fully based on LLVM. That means that it is two different packages, but Clang is using a lot LLVM behind to build the binaries and do some checking. It is now the default compiler for Mac OS X in Xcode and the fold from PPSD switch in the last release. You have the sources at the bottom. It has some advantages. So one of them is the base code is more recent, as you know, when we are writing code now, it is easier than handling code which has been written 30 years ago. As I said previously, there is strong interest from manufacturers because the code is easy to hack. It is easy to provide new backends. There is plenty of tests. So on the main list, you can see that not only Apple is contributing, but many other actors. And it is supposed to be faster to build codes than GCC. So an example, on my daily job, I'm working for SyLab, a free numerical computing software. There is a figure from the Jenkins that we are using with GCC, which is 24 minutes. I have to explain that we are also in these 24 minutes building the documentation, some tests and so on. So it is not only about building C or C++. And the C lang is 20 minutes. So we have 20 minutes more, sorry, less time to spend here. I'm going to present a lot of source code. I hope you can read it correctly. It has many other advantages is that it is doing very clever checks. So in this code, GCC, as you can see on the top right, it is not able to detect there is a mistake in the programation. And here it's pretty obvious to see what is wrong, but if you are reading a full base code, it can be very hard to spot. So with Sylang, Sylang is able to see this code very easily. So in this case, it is triggering a warning that has changed with the W error, but it really helps when you are doing some code. And I will show plenty of examples. It has a side effect also, which is very interesting. There is a new competition in the compiler. So this one is a URL that has been posted on the GCC Wiki a few weeks ago. Basically, they are upset that Sylang people are saying that the warning are way better and they are way better. So they're trying to show that in the next 4.8 release of GCC, the warning will be better. And actually, this page is an answer of this one, which was a list of all the advantages of clang over GCC. So even if Debian stick with GCC, which will be the case for years still, it is interesting because it's really improved also GCC. So GCC developers are taking some ideas from the LLVM community and so on. So it's very good for the ecosystem of free software. So I'm going to present more closely what I've done for Debian in this field. So I try to rebuild Debian archive with Sylang. So the method that I used, and which has been published on sylang.debian.net is this one, I agree it is crappy. It was just to make an experience. So basically I'm replacing the GCC command by Sylang. Since Sylang is behaving the same way GCC is, all the argument or almost all the argument are correctly understood and processed by Sylang. So it's very straightforward to switch from GCC to Sylang for C and C++ and Objective-C. So I must say that in this case, I only been interested by rebuilding the package. I haven't tested the quality of the binary generated. I haven't tested the size of the binary generated, neither the performances. It is not what matters to me at the moment. I think we will do that after, but for now if we are able to rebuild, it is already a good thing. So I published the result last February. So in the archive that we had at this time, 8.8% of the packages failed. So it is a quite good number and quite unexpected. I was expecting way more failure. In the 2.9, we had 15% of the packages which were failing. So the number are pretty good, but I have to say also that the Python module are included into this one, the Java module, the Perl. So the number are good, sorry. The number are good, but you have also to take in account that many packages are not built with C or C++. So the number with the 3.1, we did a rebuild a few weeks ago. So the number increased. So we have 12% of the packages which failed. I'm going to explain why this changed. So I published the result yesterday evening on my website, clang.debian.net. So we have all the failure. We did the rebuild with a new system of Luca Nussbaum to rebuild the archive. In the past, we were using a French grid computing system which was called Grid 5000. Now we have access to the Amazon cloud. So the goal for Luca in this field is to allow all the DD to access to the infrastructure to rebuild the old archive without him to be the bottleneck. So if you want to do some experiments, you can ask to Luca, he will be very happy. So we have enough money to do something like 60 rebuild per year on the Amazon cloud. So I use this one. I've got the permission and now I'm able to relaunch it myself. So we are pretty happy about the result. So now I'm going to talk a bit more about the difference between the 3.0 and 3.1 of ceiling. As you saw, we increased the number of failure by 4%. So one of the most recurrent here is this one. So in this case, there is an argument, param, SPP, SSP, buffer, blah, blah, blah, which is a GCC argument, which is used in many programs in the archive and it generates a warning. And many packages in the archive are built with the dash W error flag. I was very surprised about that, but many package are using this flag in the archive. So since C-lang detects that this flag is unused during completion, it will trigger a warning. And because of the dash W error, it is going to turn into an error. One of the other stuff that caused many issues in the archive here, 20, so it's not that much, but 20 plus 20 is a lot if you count every check that he's doing. So in this case, it is just some security check. I know that GCC done the same during the last release and we've done some work with the hardening and all the format security. So this one was not understood by GCC by default. C-lang is enabling way more argument and check than GCC by default. So I'm going to make a quick overview of the values errors that I found in the archive, we're building the archive. So this one is funny, it is one of my favorite, it is just that developer thinks that if you use O dash O and you put a number after that, it will improve nine times better than O one. So this one is 48 times in the archive, we have also, one of the most used is dash O six. Some old guy involved in the compiler world explained me that in the past it has some meaning, but it was 20 years ago. But still many software in the archive are still expecting some better performance improvement with O nine. Which I would use in is not the case. Well the usual meaning is that it will use the highest optimization level. So in GCC it is supported to supply any number which will trigger as the highest available optimization just in case in the future somebody introduce higher one than three or four or whatever. Which is not always a good idea as you know. Well ideally C-lang should follow and accept any number just the way GCC does. But it doesn't have any meaning. Yeah I understand your point. Yeah of course. So there is different in behavior. So I'm a bit mean here because I haven't put W all for GCC but GCC obviously is able to detect this kind of error. But in the archive we still have 132 packages which do this kind of crap. So in this function as you can see we are expecting a result and we return nothing. So first that means that we are not able to check the return of this full function. And that also means that we can get some crap if we are using back the result. So it is bad code and I think this kind of things helps to improve the code that we've got into the archive. And not only us but upstream if we publish all the results they will be able to see that okay this code is crap. And we should fix that. So just to explain here it is C-lang is detecting out of the box the programming error and it is failing where GCC accept this code. And this is the other way around. Here we are returning a value of the void function. So GCC here is able to see out of the box without any warning the error. But C-lang considers this programmation error as an error and stops the build. This one is pretty funny because it is an interesting different perception of the C++ standard by the GCC folk and C-lang folk. I won't go into the detail of this one but basically with the friend class tag coupled with the static declaration GCC will propagate the friend class tag into all the father classes. You could expect that only the children get the property but also the father it is because of the static. So I reported the bug on GCC and they say it's not a bug it is a feature and I went on LLVM say okay CCC is doing that and they say it's not a bug it's a feature. So basically we have two different behavior here. I forgot to put the reference but it is only for four packages into the archives. But it's pretty tough to extract this piece of code. And another thing is W-hole. With W-hole in Clang it is triggering way more warning in GCC. So this code is not wrong. It is simply that people from the LLVM and C-lang community think that we should not write this kind of code because it is confusing to add the parenthesis are useless in the conditional. So these things that we should remove the code. So when you enable W-hole and W-error it is triggering an error. And as I said previously I'm very surprised by the amount of packages in the archive which are using W-error flag. It's very important. It is huge number. So it is causing a lot of failure in the rebuild because of that. One of our kind of errors that we've got. On the previous slide. But isn't this code common with the use of preprocessor definitions? So if you have a preprocessor definition which was the equality in the parenthesis and then you use if and that macro wouldn't that be something that's common out there and even valid? Macro or what? Preprocessor definition. If you define harsh define parenthesis a equals one parenthesis and then you do if. You mean if the preprocessor is able to detect that? No I mean this isn't actually invalid. This isn't actually. No it's not invalid. This would actually be a good thing to do to put it in parenthesis. Ah okay if the macro is replacing yeah. Yeah that's right. But you would get the warning anyway. You will get the error also. Yeah right. But one of the interesting thing with Selang it is and JCC is doing that in the last release also it is deflating the macro. So when you have an error which is triggered by macro you see the original code which has been defined by the define. So it is way easier to debug when you are dealing with macro but JCC is doing that also now. As you know in JCC there are some extension to the C standard C plus plus here. This code is invalid in the C plus plus specification because the declaration of an array must be static I mean static it must be explicit and not coming from a variable or function. So Selang on purpose are not handling this case because they think it is an extension and we should not they should not manage this case. It is a point of view. Yeah it's possible to read. JCC also is able to accept code which should be refueled. So in this one it is about the unqualified lookup. It's basically that the A so if we go here the A function is calling this template which is calling after the B function but the big function is declared here. So in C and C plus plus it's not valid because the declaration must be known before the call. So here it is JCC accept this code while it is invalid and they are considering that as a bug but it's very tricky to fix so they are not planning to fix it. But we have four plus seven to eight I don't know. I know that in the eight they are doing more check but I check the bug and it is still open this one. So because I have a daily job and I'm doing many things. Devian I have been lucky and I found two great students in the Google Summer of Code. You know that it is not very easy to find good student but I'm very happy that I found two amazing students. So the objective of Alexander mentored by Paul who is watching in theory here, the camera. Yeah. Just one quick question, did you, how well did C-Lang do with Boost? Did you try to compile the Boost library? So Boost, yeah it's working. It was one of the proof of concept of the C-Lang developer. They got it all the way through? Yeah, it was their proof of concept, Boost. Okay. It is well known for big C++ base code. So as I say, Paul and I are working with Alexander, our Google Summer of Code student. We improve the Debian infrastructure to be able to have Debian build transparently without any things done by the maintainer or bloater automatically building the packages with C-Lang. So the first I would put because we needed that for the project is to write a documentation for WannaBuild. I know it is surprising because it is one of the tools that we are using on a daily basis, but we didn't have any tutorial to install it. So we've got one on the Wiki. You can have a look. Some information are missing, but I've been able to install the WannaBuild without any knowledge of the tool and everything is working out of the box. So I think it is a great improvement. So what Alexander has been doing also is to work on how to replace the compiler within our infrastructure. So we iterate and one of the solution at first that we consider what to provide a new package which could be called a default C compiler, default C++ compiler for trend and objective C. And after some discussion with various DD, we came to the conclusion that this one might be way better to go. It is a lot of more work into the archive, but I think if you want to decorate GCC from the Debian infrastructure, this one could be the way to go. So if you have a look in many C and C++ packages into the archive currently, we are explicitly doing what I wrote in the line in the middle, CC equal GCC and CXX equal G++. So we are really basing the fact that the package will be built on this compiler. While we could use CC and C++, which are just seemling to the other, which are already managed by C-lang and GCC and G++, which could also at some point be used by other compiler for proof of concept, or if you want to develop a new C++ compiler on your own, you can easily use Debian to make all rebuild to make sure that what you have done is working. So what we did for now is to write, actually actually, it must be fair, Alexander wrote those patches. He wrote patches for the DPKG as build and wanna build to decorate those tools from GCC and use more CC and C++. So the midterm is right now. So we still have a month with the student to improve things. So what we are planning to do next is first to get the patch applied. Obviously, I'm sure that the maintainer, we ask us many things and many improvements on the patch, which is normal, some tests and probably I will have to explain them why we want to do that, but we hope that this patch will be applied. We are currently creating build-d services, or for now we are only working on those processors. Arm will come pretty quickly, I guess, for the other architecture. I don't know if people want to get involved. For now it's not my concern. I'm mainly working on the classical processor plus arm because it is what ceiling support really we'll see for the future. Official, you know what's mean official in Debian. This is why it is in Italic. Yeah, no, I understand what the word official means. I mean, what do you mean by official? Do you mean you intend to replace the binaries in the archive with some binaries? No, basically, by official I want this. You see it is the build service. See to me official is things that get installed into the archive. Sorry? To me official is things that get installed into the archive. It's not, I want to do a parallel build automatic build of the service. Then clearly not official. When you claim upstream supports arm really well, I will tell you that that's a filthy lie and I actually have to land a bunch of patches for ArmHF, so just so you know, it's not supported as well as you think it is. Okay, I haven't tried, but it is what they pretend at least. So, I would like to create... Sylvester? Yeah, sorry. Before you, you said you wanted to use CC and C++ binary files rather than just set variable names. Why do you prefer to do it that way? Because I'm lazy and it is the other way to go. I don't have to hack any code. I'm just editing the virtual machines that I'm providing for the rebuild and I'm hacking it out of the box. Okay, so you're not suggesting that that's what we should do in... No, no, it is what I've done when I did the rebuild when I started to work on that six months ago. But no, what I'm proposing is to use CC and C++ scripts or binary or link instead of the crappy hacks that I've done. So, what we are going to do next is to have a new suite so for not make it official, but for having a parallel build of the archive. I would like to have a new suite with automatic test and maintainer to be able to see the result on their PTS. Or on the PTS, they can have, okay, you have your package which is failing on ceiling. If you want to have a look, it's cool, otherwise it's no big deal. Um, we are also planning to add a link and warning. So, to say that when you are building and you are explicitly calling GCC or G++, to ask you to change that for CC or C++ instead. Here's a mic for... Please, no, not user bin CC or user bin C++. That will break every cross build ever. Why is that? Because the cross compilers are not gonna be in that path. We don't want people hard coding paths to compiler binaries ever. So, we'll have to discuss on a way to do that properly. And in the ceiling build services that we are going to provide, we are also considering to fail out of the box when the package are using GCC, G++, or CPP and not ceiling. As I said previously, I'm not trying to push that to replace what we have already. Okay. Well, on the IRC, I think following Steve's question, Kibi says that we have a CMake already for this. CMake? It's something different to build system, but I'm not... Kibi knows what it is. Sorry. And in parallel, with the result of the rebuild, I'm trying to set up a first repository of packages built by ceiling. So in the long term, this will allow some further possibility. Ceiling is very helpful to create plugins. It is way easier than GCC. And for example, there are some research project on poly which is a polyadol optimization. Basically, you go back to algorithm to optimize your function and your loop. And you can get some huge performance improvements, especially on matrix computation. There is also one tool which is provided by ceiling and I would like... It's one of the things that I'm going to do soon is to rebuild the archive with this tool. It is a static analysis tool which means that you can find some very tricky bugs in some cases. If you go in this loop and after that, in this conditional, you can find some memory leaks, some new pointer assignment, and so on. And I won't do that, but if some people are interested, we could also consider the rebuild of Debian with the Intel compiler, which at least in the HPC world, I consider other reference to rebuild software. And the other student that we've got, I'm going to be quick, is André. And he worked basically on the packaging of two libraries which are also part of the LLVM community. So libc++ and libc++abi is basically to provide a replacement to the libstdc++ and both the STL, the standard template library, and the abi. So what we've got currently is just a small c++ code. If we build currently with c++, with c long plus plus, we've got a dependency on libstdc++. And if we are using the other library, we can see that we don't have any more dependency on the c++ standard library of DCC, but the ones that we package. We are also considering to do the same with libcc. As I said previously, I'm not asking to change Debian. It's a proof of concept. You never know what the future might be in four years, five years. And having that might allow us at some point to say, okay, we want to produce a version of Debian with another compiler or a new c++ library. I know we are far from it because there are plenty of things to check, but our friend from FreeBSD did it and they will switch also in the next release to libc++. So, Mattias, if you want to troll a little. So with the packages that you're uploading, do they have any dependencies on packages that are licensed under the GPL? Are you missing the beginning? I say that I don't care about the BSD and GPL issue. It's not my concern in this field. Do you think that that would impact the primary driver behind C-Lang? So? The primary motivation for doing C-Lang? I know what you are referring to. Basically, Tom is referring to the fact that libc++ is under GPL. I guess I don't know, I haven't checked. But what I just saw is basically a replacement to this library. So, what we are trying to do with Paul and our two students is basically to make a full rebuild of Debian without GCC or G++ and to use some third-party tools which are libc++ and libc++ API and C-Lang. So basically, the license will all be BSD. And to be fair, it is because Apple is investing a lot to have a non-GPL Mac OS X platform. It is a reason why they are producing that many code. And since it is very expensive, many other people are involved. So, well, I haven't just checked one of your examples for the warnings. And we found out that the warning actually for the return from a function is given with GCC. Is? Is printed out. So, and you declare that it's not. Okay. So, really. I check between, I use the 4.6 and 4.7. So maybe I use the 4.6 at this moment. So please do not continue the tradition of the clang developers to compare Apple and Oranges and never say what version of. I haven't said a word about benchmark but old benchmark about ceiling and GCC sucks. I haven't seen any good benchmark about that because they're not comparing the latest version. Right. I really like the comparison. And, but, but please give version numbers and so that everybody can check for that and. Yeah. Warning, it may have just, sorry. In the case of that particular warning, it may have just been that that's a warning that's on by default in ceiling and requires WOL or turning that warning on specifically in GCC. That doesn't mean the warning wasn't there. You mean this one? That one right there. Yeah, yeah. Yeah, I say that with WOL it is trigger this one. Oh, okay. I mean by default. By default it's not in the slide. No, no, I say you could watch on the video but with dash WOL it will show. It is just default behavior here. I recommend people use WOL anyway. Look at the title. It's just default, different default behavior. I'm not, I'm not, I'm using GCC every day. I'm more using GCC than ceiling actually. So I'm not again GCC and trying to push ceiling. If you know that you have to add minus WOL to get this warning, then do it and, well. Yeah, but actually if I show this slide, if I show this, take the mic. Use the mic. He's just telling us what caused the failures that he put in slide three, really, yeah? He's not trying to sell an agenda here. He was just giving us information about which compiler generated which errors in his rebuild of the archive. I don't think that's an unreasonable comparison. Thank you. Hello. Why did you replace simling the GCC by C-Lang or could you use Dragon egg plugin? You were talking about that? Yes. So this one was the first version that I've done. It was, when we were rebuilding the archive with grid 5000, I had to produce a virtual image, S build image, a table to look at, to be already set up for the run. So I did, I instantiate the S build on my laptop. I do the hack inside. Basically, I just copy and paste this and I save the virtual image and I upload it somewhere to look at. But in the new system that he has set up with Amazon cloud, you have to provide him a set up script which will automatically do that. So it's the same code. But I am providing him with a script and in virtual machine will launch a set up script at the beginning. Good now. Have you tried Dragon egg? Dragon egg is not exactly that. Dragon egg allows you to use LLVM optimization from GCC. It's a GCC plugin and no, because it is a second, it is another step. But I know the Dragon egg developer and he will be very happy if I could do that. And one of the great thing of Dragon egg is you can also build other code, Fortran code and many other things with that. So Dragon egg just combines the worst diagnostics from GCC with the worst code generation from playing. Well, GCC has less good diagnostics as you did show in some cases. And the Dragon egg plugin just takes the GCC front end and uses the playing code generation, the back end to generate the code. And that is known to be worse than GCC. So I think it doesn't make sense to use Dragon egg for real world development. Or if I do that, it is only to help the developer of Dragon egg. But actually some people are thinking to do the same, but the opposite. I mean using LLVM as a ceiling, as a front end and GCC as a back end, the proof of concept. So on, well, you're proposing that essentially one of the things that you're proposing is that Debian should change the way that all the packages are referred to the compiler and refer to CC or C++ and not GCC and G++. And I'm not opposed to that as an idea, but it opens a small can of worms, which is if we invoke GCC, we can have some reasonable expectations about what options we can pass and what the behavior will be. If we instead invoke CC, somebody is going to have to decide what exactly that means and is it a bug in the package if it makes, you can't just change a package and say we'll change it from GCC to CC because maybe the package as you see doesn't build with Selang. And then is it a bug now in the package that it uses CC? It's like using a bashism without declaring bash maybe. This needs to be thought about, I think. For the argument, it's not an issue because they are considering that they should respect and provide all the argument of GCC. So for Selang, it's not really an issue about arguments, about code and GCC extension, yeah, it is an issue. But it will be a huge issue, for example, if someone tries to build the archive with Intel, because I don't think Intel is following the same argument as GCC. Right, so your answer to that question is CC should have the options of GCC. Since Selang is following the same option, yeah. Well, you're proposing that as a general rule in Debian. Yeah, okay, but GCC is a standard in arguments. Terms of arguments. And I'm not trying to standardize all GCC or C compiler argument, I'm not crazy. Sort of as a follow up to that. Last question, Tom. Okay, sorry. In the Java world, Gentoo had hooks for different JVM argument strings. We never did that in Debian. And it just turns out that all of the JVMs that were really running now accept basically sun options. And unless there are more foreign JVMs that come into the free world, I suspect that for the sun options are the ones that people are going to use, even though there could be multiple different implementations of JVMs. I'm not saying that's a great precedent, but that's what we've done before. I don't know, Doku, you have any comments about that? So it is a real last question this time. And you can ask it in French, yes you won't. I have some experience with the Intel compiler and the option says quite different there. So we've actually separated out we're trying to use C-Lang and GCC and ICC at the same time. And we've separated out the options completely and try to run. I don't have the same experience with Sylab. I'm building on every commit is built either by GCC, Intel and C-Lang and the same argument are working. You use the same. Yeah, I use the same, but maybe it is just I'm using only normal argument. Thank you very much.