 Hello everyone. So today we will go through LFVM and its space within the free basic ecosystem. So I'm David Carlier, a contribute to various open source projects but more or less related to the business world. It can be video games, it can be enterprise oriented software. More related to the topic of the day, and I have been committed since May 2018. Ok. So what is LLVM? LLVM is a compound of tool set and frontends. Frontends which are able to generate what we call LLVM IR. IR stands for intermediate representation. It's kind of high-level assembly but much more architecture independent. So if we take as an example clang or clang++ frontends from your source code, the lector regenerates symbols which will be sent to the CMA to generate the AST which is abstract syntax tree. And then to the code gen to generate the LLVM IR. So with LLVM, we have also tools. You can build nice just-item compilers. You can extend LLVM itself. For instance, you can make module, what we call module pass which is an extension on the compilation unit perspective. And then function pass, basic block pass and so on. When you can do some checking, you can add some instruction, remove some instruction if you like. It's in the compilation times pass. Right? You have in addition the possibility to do some static code analysis. We have what we call sanitizers. We go through later. All of this available in FreeBSD since 9, 10 ish. First, it was just an option parallel to the old GCC 4.2. It was first a need to replace the old 4.2 which was the last GPL2 version. So it was quite a blocker because from this you can do C++11 and so forth. You can do only C99. So it was time to replace this for the system. So then it became full part of the system since FreeBSD 10. I mean, it's used to build the kernel, the other land, most of the ports. So yes, FreeBSD code base needs a lot of changes to fit clang criteria. Yes, a lot of changes to fit more modern style and so on. So then, as the time goes, more and more architecture was supported. AMD64, ARM, and then maybe now maybe only Spark64 remains a bit behind the rest but that's already a nice progress. So, yes, here we go. So, sanitizers, what are these? It's not really to clean up things like the name I say. It's more to detect at runtime some type of bugs. So it's kind of, it completes pretty much well the statistical analysis part. So, Sanitizer gives many different runtime gallery to detect some type of bug for memory, for rest condition, for several kind of overflow. For instance, we have memory sanitizer. It's mainly about initial variables. Address sanitizer, it's more for double free, heap and stack overflow. Right, for only in NBSE also in addition the leak sanitizer, which is pretty effective and also have much less performance drop compared to tools like Valkorin, for example, which can be 20 times slower. Whereas with address sanitizer, it's 5 times slower sometimes. So, yeah, it's pretty much. We have undefend behavior sanitizer. It's kind of small switch knife sanitizer. I mean, there is no shadow memory mapping like address sanitizer or memory sanitizer. So, it was, for instance, possible to port it to OpenBSD. For this reason, it's kind of small sanitizer. It's only for intergovernable flow with aligned pointers. And the performance drop is pretty small compared to the right. You can combine it with other sanitizer. Whereas memory sanitizer and address sanitizer, you can't use them at the same time. They are mutually exclusive. We have a nice waste condition detection called thread sanitizer. So, all of them are supported by FreeBSD. In addition, we have components like deep further to do some fuzzing and X-ray instrumentation to do some performance benchmarking. Right. So, for example, the very basic code, address sanitizer, it's perfectly capable to catch the first error, the double free. It's perfectly capable to catch it. So, you use the extra free as well. As you can see, it detects the first heap overflow, as you can see. Displacer line. Memory sanitizer, as well, is capable to catch this initialized variable, which can go under the radar very well in production. It works in production of error, but it's not correct code, obviously. As well, thread sanitizer is perfectly capable to catch this obvious waste condition. Like this. Again, it shows you where the problem lies. Ah, sorry. So, here, memory sanitizer was able to catch the initialized variable very well. And if an behavior sanitizer is capable to catch those two obvious errors, the alignment issue, and then the entire overflow, just below, as you can see. So, here, to show you the flag to pass to the front end. So, memory sanitizer, this memory, address sanitizer, address, thread sanitizer, and behavior. So, we mentioned earlier the leap further component. But what is buzzing all about in the first place? It's a testing technique to catch certain types of bugs with software, mainly libraries, I might say, which relies on external inputs. It can be just reading a config file, it can be listed in a socket, whatever you like. If we take an example, an image, picture pressure, if you want to further the picture format detection, if you want to further how it detects the PNG, GPAY, and so on. So, the leap further will use inputs which we can call corpusses in the buzzing vocabulary. So, those corpusses don't have to be full picture, can be just the first bytes of the picture format. Then, the leap further we take those corpusses, we proceed to do some transformation which we call mutations. It will insert some random bytes to some random offset, remove some other bytes, eventually, in order to trigger segmentation fault, ducera, whatever. Those mutations will be then stored so they can be reused once you fix your bugs. So, buzzing is meant to be run long enough, I mean hours, at least, if not days, if not weeks, if necessary, in order to cover as much as possible. So, as you can see, that completes pretty well the unit test we all know. But, leap physics is, buzzing is nice, but there are some corporates. I mean, as I said, it fits better with library, because with monolithic applications, for instance, if you want to further NGNICs, that can become very difficult. It's software relying on events, and leap further runs the code several times. That can contradict pretty well, pretty much, the application work for this case. You might need to do a lot of conchages in order to fit leap further needs. So, here, to display how leap further works, you have your first binary, you have one or several conferences. We have also, as an option, it supports dictionary. Dictionary is a sort of way to guide the buzzing. Sometimes, you may want to avoid too much pointless randomness. Let's take as an example, you want to fuzz HTTP server, you may not want to fuzz keywords like get, put, let, et so on, just maybe some part of the client request. So, dictionary is a good way to guide a little bit, make more sense of the buzzing. And then, with the corporates, eventually the dictionary, the input undergoes some mutation. Those mutations will be then stored in the same place as the original input. So, oh, oh, in practice, work leap further. You need to at least implement LFVM for the test when input, which takes as an argument the mutated data. So, it's a C function, right? And then, you do what you have to do with this. So, for instance, there is an obvious overflow here. So, that's why I recommend to combine further with, at least, a sanitizer, like, a sanitizer, for instance. Once you compile your first binary, ah, sorry, you will see. So, I type this, and then it comes with several options. You can show how much one it will do. The memory usage limit. If you want to do some parallel jobs, ah, what you can do as well is the max length of inputs. The initial seed for randomness. It has plenty, well, again, on FreeBSD, the leak detection is not supported, but there's many... So, then, you have to create a meaningful purpose folder. Ah, sorry. For instance, I ask him to run 200 times with this input folder. And then, it was able to catch the overflow. So, it created a crash file. So, yet, normally, it should create some mutated data with an hash code, and then, the transform inputs. Here it is. So, we have now X-ray augmentation. It's, as I said earlier, it's for doing performance benchmarking. For example, you are doing, just release a new version of your software for your company, and then, a customer of yours calls you to tell you that this new release had a severe performance drops compared to the previous version. So, X-ray augmentation, at least, will help you to find out where the bottlenecks really lies. So, X-ray augmentation will, when you compile your binary with X-ray augmentation, will put some augmentation hook in each function and free, in each function excite for the instrumentation function, because you can choose which function you want to instrument and which function you don't want to. As for instance, the more function you instrument, the slower the binary will get. So, you have to choose carefully which part of the code you want to instrument. In order to do this, you need to add those attributes. So, you want to instrument always those two. You can also say, you don't want to write these attributes. To discard the function, you want to avoid to instrument. So, by default, once you run your updated binary, it will create a file, but you can change the file naming. By default, X-ray log is the name of the application and then HASH. And then, with LLVN X-ray, you can find out, you can do some accounting, that means showing where your application spends most of the time. You can order, because it will generate kind of CSV-like presentation. So, it says here, the first version of the Fibonacci function is the bottleneck in this case. So, that's nice adding some attributes. In practical, I might say, you might not want to touch your code too much to that degree, at least when it's your corporate work. So, fortunately, there is another solution. It's via an external configuration file. As for you. You can say, please always instrument those two. And you can say, like these two. Never... Assumons this one, please. And then, you can pass this config file as follow. Same thing, it generates the same binary. So, to summarize, summarize. Here, your binary compiles with X-ray assumotation. So, I mentioned assumotation hook, but they are empty until you run the binary. And then, it will fill with the timer. So, in the beginning of each function, the excite point of each function in order to generate the delta. So, with LIVM X-ray graph, you can generate the code graph. And then, from this, you can generate SVG. For example. So, X-ray assumotation works well with multi-thread case, but you have the possibility to aggregate data because it can become very verbose, obviously. So, you can aggregate, can tell to aggregate the data in one point. So, yes, that will be all, fortunately, my few BSD machines that crashed yesterday. So, I could not use it. If you have any concern question.