 Yeah, that's so with that out of the way it gets started So like I said usually in the information security community the focus of security testing is are things like network switches which process External data and provide access to private infrastructure. So compiler is not let's say a popular target so Let me be clear and say that our threat model is not by the user who's a developer So we don't assume that the person using the compiler is delicious, but rather we focus on If the code generated by the optimizer, it's correct, right? So for example, if you have a function, this is you will style program essentially what it's doing is providing function for which returns x and essentially setting that x to So if you and it's storing at location zero in the end in the memory the value returned by So essentially you expect this to store The value two in the location zero, right? So and this is what you get on the right-hand side after optimization, which is a speed and store zero two So essentially we want to make sure that It's correct code, right? And it behaves the compiler behaves in the expected Um, so for those of you who are not familiar with fast testing Fuzzing in a nutshell is essentially like this So you in a in a loop you what you do is you generate input and feed that input with the program on the test and Essentially you can do it for as long as you want, but typically You won't have any gains beyond a certain threshold. So at some point you do control C or end the whole process It's been shown to be surprisingly effective at finding bugs because most often unit s capture a subset of program behavior and here we Randomly generate things that are usually chronic cases somehow and break the program However, applying traditional fast testing is limited because our use case is essentially testing the compiler and it's very important to generate valid programs for the compiler, which means let's say you have the valid program on the left Which defines the contract and a function foods at the contract that does something, right? And you apply and the father let's assume that the father applies a mutation mutation is essentially any operation which Tweaks bytes ads bytes or moves bytes from the bytes free on the left So you can think so the father sees the input as a stream of bytes or on the left and just tweaks By some ads removes So you could create a mutation like the code show on the right, which basically tweaks the keyword function and public Right because it's totally random. So you can imagine that with a very high likelihood around the mutation at the At the input level it's going to be simply rejected by the pasta and we don't want that because we The program to be parsed and then optimized and then test the optimizer So this clearly won't work or it works, but it's not very efficient So basically the learning is that Housing a compiler requires generating valid programs and now generating valid programs requires some sort of structure awareness So let me talk about what a structure awareness and how we approach this problem So essentially we start with a high-level specification the specification is written in the interface description language called protocol or protocol buffers It was originally developed by Google, but it's used for various purposes Also interesting so essentially in the protocol language you can Define units of data as messages and each message contains one or more Fields from other messages. So for example It's useful to talk of the specification of the you programming language that we're testing in a top-down fashion at the very top You have a program so the message program Which contains a repeated sequence of a message called block? which we also define as a repeated sequence of a message called statement and So on and so forth so you can you could although it's not shown on this slide that you could Define a message called its statement a For statement so on and so forth which contain other fields and then make statement a union of all these statements You could use the keyword one off to make the union of these statements So essentially build a specification of top-down fashion until you have all the leaf nodes typically literal use or constants and stuff like that and Yeah, essentially try to cover as many aspects of the programming language possible bear in mind that this is fully Handwritten so it's not exhaustive or complete but for the purpose of testing the hope is that it covers sufficient language features for us to get a sufficient of assurance that Things work is expected. Of course, you can find this full back at the link below if you're interested So the next thing is input generation. We have the spec. How do we convert this back into a valid input? We don't generate the input ourselves Fortunately, there is a library called Lipton above new data, which is also developed at Google Which takes the specification shown in the previous slide egg and converts it into a valid input Which is an instantiation of the spec so each input is essentially a tree For example It can look like what is shown in the look yeah, so you have it defines blocks and blocks contains a statement Which is an if statement in this case and the if statement has a condition which contains a binary operation Which is an equality and the first operand of that equality is a variable reference The variables ID is zero and then The constant that's it is being compared against is zero This is the textual form of a protobuf message for clarity But of course, this is not Doesn't really make sense yet to feed this to the to the compiler, right? because the you Optimizer does not recognize protocol. So we need a program that converts an instance of the protobuf message Into a valid you program and this is where the converter program comes in so this is something that we have to write but fortunately, this is not too complex and It's about cos lights of road So essentially converter as a source to source translator The input is the probe of civilization format and the output is the new program. So We talked about the protobuf message In the previous slide x how it looks like when you can work to use code is shown at the bottom So essentially it's an if statement with a variable called x underscore zero if it's equal Check that it's equal to zero Of course, this is a snippet of a larger piece of a program So it doesn't make sense to feed this to the body yet, but this is just to give you an idea how the conversion looks like and what it What's the input and the output of the conversion process? But in reality, we have a complete valid program which compiles and That's something so to put these two pieces together what we have is a specification That we had right in the beginning and the library Protobuf Use a library to generate input, but that input is not ready to be fed yet because it's a protobuf language so we use a Program called we write a program for the protobuf converter which converts from this language to a valid test program That can then be fed to the compiler. So finally we have an input that could be used to test the compiler, but then Testing the compiler actually requires encoding an expectation somehow. So imagine that you randomly test Randomly create a test program, but you don't know what it's supposed to be doing what side effects it has So how do you encode an expectation, right? So what do you check? How do you check that this it's doing the right thing? The approach that we use is differential fast testing So essentially it involves tracking the side effects of a program using execution ways While the program and then running the optimized version of the program and comparing the side effects So we use the original program as a baseline to compare against And we compare that with the optimized program. So it's not a complicated However, before we can do that, we need an execution case which cracks side effects of the execution of the program, right? So we need to know somehow what is happening to check whether it is happening correctly post-optimization And this is where the you interpret a concept Yeah, you interpret as essentially an interpreter for you programs That was written by Chris. So essentially what it does is it interprets arbitrary you programs. So Yeah Apart from interpretation what it additionally does is outputs the side effect of the program as it plays That race can be thought of as a spring. So for example, you have the test program on the left You feel it to the user interpreter. It executes it Step by step and then creates this execution base shown on the right which can look like You load something from memory from some address x and then store it store the store some value and I don't why Do a data copy so on and so forth. So we've built the execution phrase of the test program using the user interpreter And finally, we are ready to actually like put these put all of these blocks together and test the Optimizer so we start with generating the the program feed it to the interpreter Optimize the same program feed that again the optimized version to the interpreter And then we get to execution phrases with essentially strings and then we can simply do string equality check, right? so if The execution phrases of all these versions are equal everything's fine as we expect if it's not equal that's about Most likely in the optimizer, but in practice. We've also had situations where the bug is somewhere else, but with high likelihood it's the So essentially like I said, we found about this was to be pretty effective in practice We found about seven bugs to which were in the optimization Rule that was used to essentially optimize programs fiber in the experimental you look to my server Yeah, it's not supposed to be used by users right now So that's the purpose of testing anyway So Yeah, too bad in the EBM optimizer service user production, but fortunately is low very low severity Yeah mostly because the The buggy word question was optimizing constant. So it was It was a very specific pattern that was going along and this pattern Could be detected visually because it's a compiler and constant or Yeah, that's essentially why it was low the others were of course in the experimental version, so That's why we're testing it Um, so that's what we've done challenges for me going forward The main thing is we would like to find high severity bugs before the compiler ship, right because That's what matters the main problem with fuzzing the compiler for correctness is that it's usually Look, it's a slow process. I mean typically in fuzzing you select a small piece of Code that security critical and positive which means typically you would like execution speeds of over 100 per second But for example, if you want to test a component We started to test the EDIV to encoder inside solid The problem is that compilation is slow and this is perfectly fine for the use case because Developers can spend an additional second or so to save gas, right? The only problem is that if you apply for fast testing it it can become a bottleneck so To find ways to Basically make it more suitable to fuzzing is Chalice that we are currently working on so in conclusion, we started doing continuous structure aware fuzzing to detect problems with the optimizer and alert us whenever there's a bug in In the code base It's has been so far used for mainly testing the optimizer and data and decoding It has decent assurances, but bear in mind that testing is not formal Doesn't give you any formal guarantees Yeah, take that with a grain solid, but yeah That's about it. Yeah, thank you So how much do you know about the coverage that you're able to get with say Fuzzing With one day, it's hard to say but with you said you run it every day, right? Oh, but then it builds a corpus over time. So we started and then It builds a corpus and the same corpus is used so you can think of it as a cumulative curve It improves over time and then keeps increasing essentially. So essentially right now as it stands Just for the optimizer. It's pretty good It's about 90 93 percent something of edge coverage. Yeah And we keep when you keep an eye on it and then see if we can improve it somehow Idea of what's missing is it is it syntactic features of the input language that aren't in the protocol buffer specification? So I believe the syntactic issues are not a problem right now So So the thing the main challenge is to keep up with a language improvement So a new language features get added and that has to be reflected in the protocol specs so essentially to keep up with it and Making sure that when there's some change That chain is covered in the process. So yeah Yeah, maybe what is missing is to check if All brunch statements inside the optimizer steps are covered at a very low level, but at a high level it's hard to say How did you decide So regarding gas costs the new interpreter is oblivious to gas. There's no notion of gas. It just runs code Regarding how we decide basically There's a bug file by the fuzzing program and we it has a minimized input We try to rerun the input and check the side effects It's pretty simple workflow mainly because of the interpreter. So Get an execution phrase which is sort of readable and you compare execution phrases pre and post and It's pretty straightforward so It can quickly tell you whether it's a bug in the optimizer or There's some other code. Of course, like I said, we introduced code and there could be a bug in the code that we introduced But it doesn't take longer than a few minutes Typically decide whether which bucket it falls into