 Welcome, everybody. So today I'm going to talk about Clang and how we can use it on Embedded Linux. You might have heard of it being used on Android and iOS. So today I'm just going to present what I have done for it to work on Embedded Linux and what the limitations are and what different advantages and disadvantages are. So this is essentially the agenda. And feel free to ask questions or if you have any comments in between. I would love it to be more interactive session. So we'll talk about what Clang is and what the project's goal primarily are and how we can do cross-compiles using Clang. And how far we can take it to make a system compiler for Embedded Linux. And I've primarily worked with Yocto as my framework. So essentially I'll also talk about how Yocto could be used to generate Clang-based SDKs and use them. And then I'll go over some of the additional tooling that Clang provides in addition to the static compilers. And of course it also has runtime and the compiler runtime. And then I'll also go over how that runtime can be used instead of GCC runtime in your applications if you wish to use it. So what is Clang? Clang is actually a front-end for CC++. And it uses LLBM infrastructure to generate the back-end and the code generation. Primarily it is CC++ and Objective-C. So there aren't many other front-ends. And LLBM is actually a long-living project. Clang was added afterwards. So it's a reusable infrastructure to do various compilation technologies. People have used it for dynamic compilation for various other use cases. The latest release is 3.9. Most of my work is based on 3.8. And some people pronounce it as C-Lang. It is actually Clang. So I've included how it is pronounced. So the goals of the project are to be GCC-compatible. In fact, it implements all the extensions that are documented by GCC. And some extensions which are used in softwares and are undocumented, they were missing. But some of them are not implemented because the project thinks that they are not used as commonly and they are hard to implement. So we'll go over a couple of those extensions which actually came up in Linux kernel. And it intends to be IDE friendly. So it integrates into or it has one of the goals to integrate into different IDEs like Eclipse and QT Creator and others. And it uses a BST-like license, a LLBM BST license. Nowadays, there's a discussion actually going on to relicense it under Apache 2. And so if you are interested in that discussion, it is going on mailing these days. And it conforms to ISO CCC++ primarily. So it doesn't support Fortran or any other language front-end, if you wish for. So it's actually library-based architecture. So the whole compiler is consisting of several libraries. And the reason being that they consider themselves to be embeddable and pluggable. So you can write tools basically and invoke all those libraries in your code, if you wish to. So its pluggability is in core office architecture. If you want to design more tools for code analysis or any other tools, then it's very easy to use it and extend it. So you will see that everything is lib something. So if you have a parser is lib parser and you have front-end as libcfe and everything is a library there. And user-friendly diagnostics. So this is one of the goals that they always want to be very friendly. Many of you who do C++ would be knowing that you get very cryptic messages sometimes. Error is elsewhere. And the messages coming are very cryptic, very difficult as a user. So it tries to be very friendly to pinpoint the error messages when you are compiling the code. And it also actually offers fix it hints when it can. So it will actually point you what you could do to fix the code, which I find pretty cool. But so one of the side effects of this goal has been that GCC also has improved its diagnostics quite a lot. Starting 4.9, 4.8, the whole diagnostic framework was rewritten. And a lot of new diagnostics were added. And even color, you can have the color diagnostics in GCC now, which was a big taboo in the past. So even today, actually, I compiled a program not long ago. And I see that Clang still has pretty good diagnostic support. So if you can see. So I have actually a small test case there. So it's intentionally written wrong. So there is a semicolon missing after you declared the structure. But now, if you compile this, for example, using Clang, then you get the pinpointed error. So let's try what G++ tells us. So somewhere the error is there, I know. But right now, I cannot scroll it. Oh, you want to know the version? I don't know. It's not 2.96, for sure. So I'm finding it a little challenging to see. I think the Clang is 4.0, which is upcoming release. And GCC is 6.0. 6.2. So it's actually Arch Linux. So whatever Arch Linux provides me. So it's fairly recent, both of them. So one of the goals they have is to use less resources and speedy compile. And so one of the tasks I did with that is I used the WebKit. Actually, there is a WebKit for Vellan project. So I compiled it using Clang and GCC, Clang 3.9, and GCC 6.2 again using the Yocto framework. And I didn't draw the graphs and percentages, but you can do that yourself. So Clang, it spends 2,297 seconds to compile. And with GCC, it spends 2,838 seconds to compile it. So that's roughly, I don't know, maybe around 30% probably or something like that. So it has also got some of the options to split your debug information such that it can speed up the linking time. So obviously, I didn't use that option to compile WebKit there. But if you use that option, I guess the linking time would reduce even further. So you can get it even better than 2,297. So minus G split dwarf is an option that you would use if you want your link step to be a little faster. So who is using Clang today? Debian Experimental has it as an optional compiler. So you have around 90% of packages in Debian Archives which can be compiled using Clang. LLVM Linux project was initiative to get the Linux kernel compiled with Clang. And so they have done a lot of work on both sides where they made the kernel compile with the LLVM Clang as well as there were patches that were required for Clang. Free BST uses it as a system compiler now. And OpenMund revised one Linux distribution which has done a lot of work. And they have switched since 3.7 version of Clang that now Clang is their primary system compiler. And we'll cover some of the open embedded work that what we have done here. And it's being added there as a external layer as well. So the current norm, we have embedded Linux is primarily cross-compiled. And GCC is a primary compiler. And it supports many architectures. And it's a very strong point for GCC. And if you use one of those architectures, it's a very to-go compiler for all those. There's a long list if you go there. I listed a few of them here. But this list is more ever-growing. So whenever there's a new architecture, it shows up. But primarily, if you have a wide variety of CPUs to worry about, GCC is a pretty good choice there for that. Clang right now has a limited set of architecture support. Primarily, it has ARM ARM 64, X86 X86 64, PowerPC 64 MIPS, and MIPS 64, Cold Fire. And there are a few others that I might have missed. But it's not as extensive as you will find in GCC. So be aware of the fact. So how are the tool chains organized? So just the sequence. What are the components involved? You've got binutils, which is your binary utilities. And then you've got the cross-compiler, which provides you the CC++ or any compilers. And then you have a standard system library. You have a choice between G-Lib C, UC-Lib C, Masso. And then you have debugger. Pretty much GDB is in the game. And the build sequence looks something like this. So you have to have all the prerequisites. So prerequisites include all support libraries that you need to build the compilers and the supporting tools. Once you have that, you build your binutils. Then you bootstrap GCC. Then you have some portion of kernel, essentially the API headers. And then you have Lib C headers because there is a bit of cache-22 to solve. So you fake around some startup files and create some dummy .so files. So you can build GCC. That's where you do a full cross-build. So GCC can do shared libraries and stuff. And all this is done to make the GCC's build system happy. So it is faking all those things underneath to enable all that support. And then you build full Lib C. And then, of course, you build runtime and stuff. So that's how the whole tool chain I explained in two minutes. But it's a complex process to put together a cross-tool chain based on GCC. Clang comes, actually, with a cross-compiler. So you use a single compiler, actually, to compile everything. It's so cool. I've been doing cross-compiling for long. So I like the fact that I can just create a sim link and call it with the triplet that I have. And it knows that it has to do cross-compiling. So I don't have to do all the cross-compiled builds and everything. In fact, the compiler I just showed you on my Arch Linux. I can turn it into a cross-compiler without any issues, just creating sim links, which are like ARM, underscore, Linux, Clang, and whatever. So in the grand scheme of things, if you look at the build sequence I just described, we still remain same. But it's actually going to simplify the build because we don't have to do the full, these two-stage GCC bootstrapping and stuff. So finally, it looks something very simple where you build the b-nurals, you build the Clang, you build the headers and libraries, and then actually this libc header should go away. And the full libc, and you have everything. So it simplifies the tool chains themselves, essentially. And if you care for more than one architecture, then you don't have to have two different cross-compilers. You can just have one and use it. So if you want to do cross compilation on your desktop for embedded Linux distributions, you can install Clang on any distribution you use. And then you could download any of the pre-build tool chains from either Yachter Project or Linaro, which put together the cross tools for the architectures that Clang supports. And Linaro does, of course, for ARM. And you would then just install these tools. And yeah, I have it in very small font, but I hope you can read it. And then you just specify the target and a GCC name. So there is an option to Clang. And you tell it what your triplet is, and that's it. And you are able to cross-compile the applications using Clang. Why you install the cross tool chain is we'll cover it slowly. But there are certain components which are still missing in the whole sequence. For example, it does have a built-in assembler, but it's not yet fully, it can't do everything right now. So many times, many applications wouldn't work. So you tend to use the BNutals provided assembler. Linker is the same way. It does have a LLD project, which is still under development. So not everything would work, but you can fall back to use GNU Linker. And that's why you use the cross BNutals to complete the gamut of the things. And of course, in this case, we are using the compiler runtime also from the GCC tool chain. That's why the cross tool chain provides that when you're building the application. So this is just application only. And as I just said, that you will use GNU BNutals. And you could actually use the same setup. Now you have actually a cross tool chain. So essentially, you can build anything with it if you want. You can build the kernel. Most of the kernel work can be done using this cross tool chain as well, where you will deploy Clang to compile kernel. And so it's very easy to use in this way, it to do cross development. So not everything yet is compilable, as I mentioned. Linux kernel, there is a lot of work done by BN and others from Linux Foundation. And all those patches, I think most of them have been merged. So there are certain patches, I guess, which are still remaining. But I think most of the subsystems, they are able to compile using Clang. And X86 kernel, I think they ended up booting also. But there is still some work left there. System C library, G-Lipsy, there is a work that is under progress to use Clang to compile G-Lipsy. But it's not yet complete as well. So there is a wiki page if you want to contribute or look at the status, you are welcome. But it's still work under progress. G-Lipsy community is interested in getting that option to compile G-Lipsy using Clang. So a hybrid approach is needed. As of now, you will need two compilers if you want to create a full embedded Linux platform. Chromium OS has done similar things. They have a Clang overlay. And very similar to what we will see, what I did in Open Embedded, where you can choose which compiler compiles what. And that works during the sporting times when everything cannot be compiled. So you can effectively choose which applications you would use one compiler or other. So all this work for Open Embedded is in an external layer. Some of you know Open Embedded. There can be extra layers that you can layer on top. And so this is called MetaClang. And all the work is housed there. It has added a variable you could set called toolchain. And you can choose if you want your application or your package to be compiled using GCC or Clang. And then there is a global default. If you don't specify anything, then you can also set this globally. So it defaults to that. So you can also say, I want everything to compile with GCC, but just this application, I would like to use Clang. Or you can do the vice versa, where you say compile everything with Clang and just stick to these few packages to use GCC. Then there are certain packages which don't compile with Clang. So they hard code the toolchain variable in there. So they don't offer the option for you to compile them with one or either. So that is the current setup at high level. So building it is easy. You can all just clone the Yachter project repo here. I've used the Pocky, but you can use Open Embedded Core or any other distribution which are based on Open Embedded. You just have to add the MetaClang layer. And that basically brings the Clang into your build. So you can then easily build it. So there are some things that are non-Clangable yet. And there is a list I have towards the end of my presentation that I'll show that the list of packages that are yet not Clang ready. But there are a few common use cases that you find which are not supported in Clang. And you might find those. So some specific GCC extensions that are not implemented in Clang. VLAs is one of those. And there is explanation why they don't support it. As I said earlier, they think that it's very less used, but it's very complex to implement in the compiler. So that's the reason why they don't implement it. Nested functions, they were always bad. So they were not specified in the standards either. So Clang doesn't have it. So most of the time, people ended up cleaning up their code to not use nested functions. And it also has one problem where it pretends to be, it defines GNU-C to be 4.2. And that's actually a very old compiler. And it supports way beyond that. So if you define GNU-C to be something newer, to like 4.9 or something, then it behaves a lot better to compile your applications. But the problem is not that it should do it this way. Probably the application should look for the features they are looking for rather than checking the GNU-C version and deciding whether what's supported or not. But yeah, you will see that why application is suddenly disabling or thinking that my compiler doesn't support a feature. The most probably reason is they are checking for this underscore-underscore-gnu-c variable and making that decision. So the right fix is to really check for the feature you're looking for. So many of the fixes we have done either have been submitted to the packages upstream directly. And they have been accepted. And some we have hosted on the different open embedded layers because either there is no interest in upstream or they are not accepted upstream. So we keep following up with the upstream on a possible solution for those kind of portability issues. Some of them have been fixed. They are upstreamable, but hasn't been upstream just because of laziness. So you can build images. What you can do today if you don't do anything, you can build basically graphical images as well as console images, fairly complex images using Clang. Although it won't be 100% Clang which is compiling everything in there. It will use a different compiler for G-Lib C. It will use GCC. And then there are other packages that will be compiled using GCC. But somewhere around 80% to 90% packages you can compile using Clang when you consider the system. You can also generate SDKs actually using Clang Open Embedded. So what it will do is it will generate the SDKs with both the compilers. It will have GCC as well as Clang in there. And it exports to new variables called Clang CC and Clang CXX. So if you want to use that SDK and you want to keep it portable, which means you want to use one or another and you want to test it, then you can just define your CC in your environment to Clang CC and vice versa so you can switch between the compilers easily. So that's the reason for this to be there. So all these are in the SDK environment. When you install the SDK, you will get all those variables already in your environment. So I've done a little example here, the GNU Hello World. So it's fairly simple to compile it using SDK and setting CC to Clang CC. And you can easily then compile using a cross compiler based on Clang. So you can also use same to build a kernel. So I tried the LLVM Linux kernel, which basically has all the patches. And I tried to build it for ARM64. But I ran into some compiler errors when I tried. Maybe it is fixed now. I'm not aware. But there you go. You can start contributing patches to kernel if you're right there. So I didn't myself bother to submit patches. But I think it's a good opportunity for some of enthusiasts here who might want to use Clang. So talking of Clang, we have more tools there actually. That's one of the strong factors that you will get with Clang LLVM. And one of them is the Clang has a static analyzer. And you can easily build it along with your rest of the Clang tool suite. And I've just shown here how you would run it on muscle C library package. And it actually produces a very nice HTML output. And you can basically go through it nicely, look for all the issues it reports. There are a lot of false positives you will get. So I have uploaded actually a result there. So offline, when I upload my slides there, you can go and look for those. So the muscle scan, I did report to the muscle community. And at least two fixes came out of it. So that was not bad. These fixes were applied to the muscle upstream. There are more analysis, more tweaking can be done, and probably more issues can be figured out. But it's easy to configure for any of applications that you might be interested in, at least from the static analysis point. It's worth it. So it has more tools. Given the API nature of Clang, there has been a slew of tools that has been written around the infrastructure. So there is a Clang check actually. Many places that I've worked and maybe you already are doing that as well, people have syntax checking. So people will always say tabs versus spaces and all kind of stuff. So Clang check actually helps in that sort of scenarios. You can deploy Clang check and get rid of all those problems. And it can be easily integrated into ID, stuff like that, if you have IDs. And it's very prompt in also it uses this fixit mode that I just explained that it can warn about those. So it integrates into that infrastructure. And it nicely tells you, hey, you got a space here where you have defined it to be a tab or something like that. So you can easily fix your code as per your coding policy, whatever you have. So you can encode that. So Clang format is actually a little bit more. It reformats your source code files. If you have that, again, the commit policies that you like to enforce, and you want to convert in large projects over there, Clang format helps with it. You can reformat all your project and start from there and then deploy a Clang check on every commit. And it also have a lint too called Clang tidy. So you can run Clang tidy for your development. So there are more tools, so I'll not cover all of them. So it's also providing some runtime. And we'll talk about the runtime a little bit more here. And there is some sanitizer work that is pretty cool as well. GCC also has got sanitizers now, but Clang has got more. And they started with this long before. So they are now actually getting into much advanced stages of the sanitizers. And they are becoming quite useful. So Libc++ is actually a C++ runtime implementation that you know Libstead++. So it's a direct replacement for that that is implemented by the project. And it comes in two different libraries. One of them is the Libc++ ABI and alone is Libc++, which implements the STL. And there is another library for unwinding support called LLVM Live Unwind. So as you can see, you could specify minus STD option to Clang to choose which C++ runtime you would like to link to. And you can choose STD Lib to be Libc++ or LibsteadC++. So it's a nice way to toggle between those two. One thing you have to be careful is that it's not ABI compatible with Libstead++. So if you have an application which then loads another library which depends upon Libstead++, then you might not get it working together. So if your application needs standard C++ library, it can be either Libc++ or it can be Libstead++. It can't be both. So given that you just do that analysis internally, you can get it to work. The good thing about Libc++ is that it's almost 50% smaller and 30% faster in execution. So and GCC has changed the Libstead++ ABI in GCC5. So that created more problems for Clang, which they actually fixed in Clang 3.9, so which is now it can deal with the GCC5 transition from the old Libstead++ ABI to the newer ABI. In order to support the newer C++ standards, GCC has to do that. So the other component it provides is called Compiler RT, which is the C runtime. So it is providing you all the compiler built-ins and it has a full support for LibGCC interfaces. And then it also has runtime for sanitizers that I'll just cover in the next slide. And also for profiling, there is support for supporting the profiling and coverage collection. So there are several sanitizers here. Address Sanitizer is pretty good. And Thread Sanitizer, so look through those. They're easy to configure. And basically, there are just options that you will invoke with Clang. And you can utilize them. But many of them don't work on all architecture. So that is a work in progress for many of them. But if you use ARM and X86, you are pretty much covered most of the time. But if you are on MIPS and PowerPC and others, then they lag in getting all those implementations a little bit right now. And so Lib Unwind is actually a replacement that you could do for LibGCC exception handling library that you have. So you could use the Lib Unwind, the LLVM Lib Unwind as a replacement for that. It implements the HPE Lib Unwind that we all know, all those interfaces functions. So what I did is I tried to create a binary that would basically have no lib straight plus plus or any of the C run time from the GCC compiler suite. And I was able to manage to get a C plus plus application with exception handling and unwinding in there to work by linking to the Lib C plus plus for STD Lib and compiler RT to do the C run time. And Lib Unwind to do the exception handling and unwinding part. So I know the fonts are a little small there, but what I've shown there is I've linked the application and dumped the needed sections of the library. And you would see that it doesn't list either LibGCC in there or libstreet plus plus .so in there. So it can essentially replace the whole run time. You wouldn't require that in there. So the limitations, not all packages can compile with clangable. We talked about that. So I have a link here, at least for open embedded. There is a certain list of packages that are non-clangable right now for various reasons. And the integration into the IDEs that we use in embedded Linux in general is work in progress. It works with Eclipse to a certain extent, but then there are others that it can get integration into. And upstream kernel doesn't yet compile. Those are few of the ones that I know of. And but every day you see that packages are getting compiled. So this list is shrinking as we move into future. So that's all I had. So if anybody have questions or rants, yeah. A kernel address sanitizer, which is a user space address sanitizer, which is used in the kernel. And now GCC allows that. Yes. What about the clang? As I see, this feature is an advantage of GCC above clang. That is true. I think, as I said, it doesn't yet provide that for kernel. There has been some, I think, one discussion on that. I'm hoping that it will get that implemented sooner. But I think I don't have any updates on whether this will get in there sooner or later. But good point. Yeah. So you said that some patches set up strings were refused by upstream packages. If I understood correctly, you used some patches to fix compilation with clang that were refused by some packages. They are still under discussion. You can put it that way. Why do they discuss? Well, most of the times, because it works, right? So it works, and if you don't have any other use case, then why do we make my life hell by changing the working code? So you have to understand that he may not have the means and stuff to test it out. So he doesn't feel comfortable taking such patches. Any more questions? Yeah. So I plan to do that work when I go home. So I don't have constant experience with doing the performance analysis, runtime performance analysis. But I would encourage you to look at what Fronix does. So whenever they come out with a newer version of either clang or GCC, they run all this kind of benchmarks. So you can see the progress that has been made in there. So I guess right now, if you believe those benchmarks and if they are done correctly, it's another subject. But I can just talk about the numbers. The numbers has been improving for clang over the period of time. And right now it is in a position where it doesn't matter. I think it outperforms other compilers in certain benchmarks. It doesn't in others. But if you look at the progress it has made from 3.0 onwards up, you will see that it has been getting better and better at a much faster pace. Obviously, being a new compiler, I think that always happens. Yes? Why do you have to invest in another compiler, two-chain, and not the effort to invest in making GCC better? If you ask me. So actually, redundancy and that is one thing that can do better. And you see it does not compile as a source to be added. And so if you take the best of the other one to GCC, there is a lot of effort based on that. Yeah, so you have more choices, right? So essentially, it's about also I like tools. So for me, I like more compiler technologies. So I like GCC. I like Clang. There will be a third one. I'll look at that too. So I think essentially it gives you there are certain advantages as we talked about, right? It's embeddable. So many people like it that they want to embed it into their own tools. And there were certain restrictions or stuff like that that a lot of users had felt in the past. And essentially, GCC code base is a little older. So rewriting those, I mean, you could have done that, obviously, but it's a similar amount of work than going out and doing it in a new environment. But I believe that it's a good thing to have multiple tools. So you don't have a single point of failure. And it actually also improves certain packages. It improves portability. So if you take a package and you compile with both of them, so I didn't cover here, but there are several cases where Clang found errors. It basically people were using subscripts as characters. So they will say, A, and then they will expect it to convert that character to whatever the numerical value is. And all those kind of things, it found so many hidden errors that were there forever. So in a way, it's a good thing. I think it's a better place being Clang on the sites, not so bad. If you look at it, it also provides some level of competition, so other tools improve. And overall, the whole community benefits. That's my perspective. You may not agree. More questions? So I guess, no more questions.