 If you cannot see it, please go out. We cannot allow people standing in the room. I see there might be one place if you shift, but not everyone. Or three places up there. So it's time. So welcome our next speaker, Olexi, and his talk about Spectrefast and bringing Spectre-type vulnerabilities to the surface. So hello, everyone. My name is Olexi Olexenko, as you said. And today I'm going to talk about how can we detect speculative attacks with common and conventional dynamic testing techniques, such as fuzzing, for example. It doesn't work. OK. What does that work? OK, now it works. OK, so let's start with the motivation. Does work, yes. Let's start with the motivation. I noticed recently one pattern in information security talks that usually the speaker begins the presentation by trying to scare the audience off. So let me give it a shot as well. So are you ready? One, two, three, boo. So what do you say? It's not scary? You think it's actually kind of cute and cuddly? Well, let me tell you something. You are wrong, because this is a vicious beast. This is Spectre. This is a speculative vulnerability that was detected a few years ago, or rather found a few years ago. And it allows us to, or the attackers, to bypass common memory safety techniques that have been deployed for many years. So to remind you how it works. Here we have a normal buffer of flow. We have an access to an array with an index. This index could be larger than the size of the array. We will have an overflow. We read some memory from outside of the bounds of this array. Have we normally fixed it? We add a bounds check here. So we check whether this index is within the bounds of this array. And yeah, it's supposed to fix the problem. So that's how it has been done for many years. It's applied in memory-safe languages. Well, in most of them. This is how we patch memory vulnerabilities in C and C++. And it's supposed to solve the problem, right? Right? Well, no. Because now we have Spectre. Spectre works by bypassing this vulnerability as false. Say we have the same snippet of code. We have the array of size 10. And we execute it repeatedly several times. So first time we pass the index of one. Accordingly, the bounds check succeeds. They execute the memory access. Then we execute it again. Now the index is three. Again, true. We execute the memory access. Now it's two. Same story. But suddenly the index is one gazillion. It's literally one gazillion. It's way out of the bounds of the array. It's somewhere far, far away from it. So what happens here? Well, the thing is that this comparison, it takes some time to, well, to calculate it. So for example, we might have a cache miss while doing the comparison. We might have some hardware fault. So we take some time. And modern CPUs, or rather the architects of modern CPUs try to squeeze as much performance as possible out of them. And we really don't want to stall the CPU for this long period of time. Because until this comparison is resolved, the CPU doesn't know where to go next. Whether to go into the true or to the full branch of this code. So how do they deal with this normally? Well, they implement so-called branch predictor. So here we have a situation. We have the CPU asking this module, branch predictor, what will happen to us? Well, the branch predictor looks at the history of this branch or similar branches. Sees that in our specific example, it passed once, it passed once again, it passed once again. So we had three passes. Well, the fourth time probably the same thing will happen. You will pass. So the CPU, being naive, it believes the branch predictor and it starts to execute this memory access and all the following code as well. Although it doesn't exactly execute it, or rather does not execute it as normal, but it executes its speculative. What speculative means is that it does all the computations necessary or it loads all the data necessary, but it does not commit the results. So all the temporary results during the speculative execution, they are stored in internal buffers of the CPU and they are not visible to the software. Eventually when the, yeah, so this, and eventually when the CPU detects that this prediction was wrong, it just discards the results and starts over from the correct branch. However, the problem here is that even though the speculative execution is not visible to the software, it still leaves certain traces on the micro-architectural level. What it means is that, for example, it might leave traces in the caches, it might start some execution units in the CPUs. For example, AVX unit is fired up during speculative execution or could be. And the attackers, if they know about this feature, about speculative execution, they could derive the results of speculative execution. So for example, the result of this memory access from these traces, so it's possible to do. However, you might ask the question, isn't it a CPU bug and shouldn't it be fixed in the hardware, right? Well, it's understandable. It is a valid question because effectively what we are doing we're given to the CPU a valid code which does not have a vulnerability and the problem is that the CPU itself is executing this wrong note in the way that we expected it to execute. So it's kind of supposed to be fixed on the hardware level. However, if we look at the strategy at what the CPU vendors are planning to do with these vulnerabilities. So here's, for example, a table from Intel's website. It lists as roles future models of Intel CPUs. And here as columns, we have different versions of attacks, micro-architectural attacks, not only spectra, but also meltdown for shadow and some others. In most cases, it will in fact be patched on the hardware level. So for example, meltdown, it's mainly in hardware level. Yeah, for others as well. So MCUs here stands for firmware patch. However, the story is different for spectra because here everything is supposed to be patched in software. So effectively what they are saying is that even though it's a flaw in our products, they are not going to directly deal with it on the hardware level. So it is effectively our responsibility as software developers to deal with the hardware issues. So what can we do about it? Well, the first or rather one of the first proposed solutions that was to add serialization points everywhere. So serializing, there are such instructions in X86 CPUs that do serialization. Serialization means that they effectively stop speculative execution. So for example, Intel has L-fence and the suggestion was to put L-fences after every conditional branch. So this does fix the problem. It does prevent spectra. However, there is one big issue. It's very slow on our benchmarks. The programs became somewhat around five times slower. There is another better approach, most fiscated. We could add data dependencies between the conditional branch and the memory access. So here's how it's implemented. It's somewhat better because it doesn't completely stop speculative execution but instead only delays the memory accesses so that it could be vulnerable but it's still pretty bad, it's around 50% overhead. So from this we can conclude that we need more precision. We want to patch not the whole program but only those parts of the program that are actually vulnerable to spectra. How do we do it? When we came up with this conclusion, we met in our group and we asked ourselves this question, how do we find spectra? Well, we thought, can we draw a parallel with other vulnerabilities? What are the mechanisms and the tools that are used to detect other types of vulnerabilities? And one of the trending techniques that we came up with is fuzzing. So how does fuzzing work? Just to remind you, it's a relatively simple conceptual technique. We just test the application that we have with the randomly generated inputs or pseudo randomly generated inputs. So in our case, in this specific case, we would generate a random integer, assign it as an index and then execute this code repeatedly. To actually detect that we have a buffer flow, we might want to use some memory safety techniques. For example, address sanitizer is one of those which would verify before every memory access, it would verify that the pointer is valid. So it points to a valid object and not to some random place in memory. However, the issue here is that even though this technique would detect memory safety violations, it will not detect spectra. As I said before, spectra or speculative execution in general is not visible to the software. So here, this check will always be valid and address sanitizer will not detect anything. So now, just ourselves, how do we make it visible? How do we make speculative execution visible, which is a hardware feature? How do we make any hardware feature visible to the software layer? Well, let's just simulate it. Let's actually execute the code that is ran speculative. And here's how it works. So basically, that's the idea behind spec-fuzz. We simulate speculative execution and then we fuzz. So let's say here's the concept. Let's say we have this control flow graph. We have node A entrance point. It could be, for example, an if-else statement and it goes either into node B or C. So C could be the if branch, B could be the else branch. And our simulation, what it does, it inserts an additional node A simulated, which is identical to the original A, but it has the opposite exit condition. So for example, if A enters B, A simulated would enter C and vice versa. However, speculative execution, it doesn't last infinitely. So it has some boundaries. For example, on the Intel CPUs, it lasts at most 250 instructions, so it's rather short. To simulate that, we additionally take a checkpoint before entering the simulation. So we store the state of the process and periodically we check whether we reached this boundary of 250 instructions and we exit the simulation and rollback. So how it works, we would execute the, we would take a checkpoint, execute the simulated branch, execute the simulated path, then eventually when we reach 250 instructions, we rollback, but we notice that we rollback not to the simulated node, but to the original one, and then we execute normal code. And we repeat the procedure for every conditional branch in the program, in the application that we test. And so yeah, coming back to an example, to the code, how it works in the code. Here again, our victim function, we have, as previously, we have condition, memory access, and we store the result. Here, now bear with me, we'll have an assembly because specfuzz is implemented as a backend pass in LLVM. So it's effectively compile time binary instrumentation. So we have to go into assembly. That's how this victim function compiles, too. We have comparison, jump if less than, then we do a memory access and we store the result. And this is the instrumentation. As I showed before, before starting the simulation, we take a check point. So here we call the functions specfuzz check point, which stores the state of the process. Then we insert the sequence of instructions that reverses or makes the direction of the branch opposite. So we had the original jump was here, jump if less than, we inserted jump if greater or equal, so the opposite direction. Then in the end of every basic block, the call a function that checks whether we reached this limit of 250 instructions, it actually depends on the CPU, but for Intel CPUs currently it's 250. And if we did reach, it rolls back to the checkpoint. And finally, we want to actually detect those memory violations, so we, oh yeah, we also count instructions here. And yeah, so to detect the memory violations, we just use the address sanitizer, which, yeah, so the call is send lot, which checks whether this pointer is within the bounds of the origin. So basically that's the idea. That's the concept of spec fuzz and high level implementation. Now let me show you a quick demo of what you can get out of it. So that would be a demo of fuzz in OpenSSL. Let's see what it works. Okay, perfect. Here we go. Here we have OpenSSL, which is built already with the spec fuzz. We have a driver for OpenSSL. Here's, oh, too quick. Yeah, here's the driver. So fuzz server, it's a driver shipped with OpenSSL. So it's a fuzz in driver, something that we test. That's the application that will be fuzz. Yeah, and if we run it even with a single input, that would be the output of it. Here we have a whole bunch of lines that are produced. Every line here is a single vulnerability detected or a single speculative overflow detected by spec fuzz. So you immediately can see a difference between normal testing of an application and testing for speculative vulnerabilities because here we have many more results than normally. Normally when you would test in applications, for example, for buffer overflows just with adjacent users, you would find maybe one vulnerability in an algorithm. Maybe in a day, maybe you would spend a whole week fuzzing and you won't find anything if it's a long established project. Here in a single second, you might find hundreds or even thousands of vulnerabilities in the workspace. And all of them are, so even with the single input we found a lot already. Yeah, so we have to deal with this. Now how fuzzing looks like here, let me move it away so it doesn't cover. Yeah, now we can fuzz this thing. Yeah, for fuzzing, we are using here just Hunk Fuzz. It's one of the fuzzing tools that are available. This one is from Google. We fuzz the application with the Hunk Fuzz. We do 10 runs here on a single thread. We use the same fuzzer as previously. So it's the driver, server driver from OpenSL. As a start in inputs, we use corporal of inputs from OpenSL again. And notice here that that's the difference from normal fuzzing, that we are piping this output, this long list of vulnerabilities into a script that collects them and aggregates into a JSON file. Yeah. So let's run the fuzzing. Fuzzing runs. Of course it takes some time because of this instrumentation program becomes quite a bit slower. Still usable, but slower. So it will take some time. Yeah, we can skip it. Okay. After fuzzing we got this file, analyzer.json. And this is a file of all these aggregated values. It's not very useful. Every line here, so the hex values here are addresses of the offended instructions in the binary. So it's not very useful. It's more convenient to work with code lines. So after the aggregation we also symbolize the results. So here we have another script that symbolizes them and turns the binary addresses into the code lines, into the addresses in the code. Yeah. And to show you one example of what we can achieve. So what we found in OpenSSL, now there will be one vulnerability that we found there. So it's in the crypto part of the OpenSSL. Yeah, here we go. The vulnerability is here. It's in the crypto part of OpenSSL and we have the same pattern as I showed you before. It automatically found a vulnerability, a speculative overflow. We have a memory access, we have an array access, controlled index here, and the indexes bound checked here and here. So almost exactly the same pattern as I showed you in the very beginning. It's an OpenSSL. Yeah, so that was the demo. But, no what. We found the vulnerabilities and we found many of them. If you have a list of say 100 or even 200 of vulnerabilities in your software, you probably don't want to patch it manually and go over all of them and try to fix them manually. So to deal with this issue somehow, we implemented another tool. It's more minor but still quite useful. We took a few of the tools that automatically patch programs but patch all the conditional branches in the program. So for example, as in the beginning I showed there are tools to add L fences after every conditional branch or add data dependencies after all conditional branches. We modified them to instrument not the whole program but only those parts that we either did not test or those parts where we found some vulnerabilities. So it's a white list approach. And let's see what we got there. So here there will be a plot of speedup of several applications, several libraries that we tested. We have OpenSSL here, we have a few parsers. So this one for example is JSON, YAML. We have a compression algorithm HTTP. And on the Y axis we have a speedup with regard to full instrumentation. So it's a ratio between the runtime of the application when it is, when spec files is applied. So we have only partial instrumentation, only those parts that are either not tested or where we found vulnerabilities are instrumented with respect to full instrumentation. So all conditional branches are instrumented. And here are the results. They're quite different here. In some cases we found very few vulnerabilities like in just me and JSON we found I believe only two vulnerabilities. And accordingly the improvement is huge. So everything else does not require instrumentation anymore. And in OpenSSL we found quite many so the improvement is not so much but still it's quite considerable. It's around 25% already. And mind that this tool is on a rather early stage. I believe we can do much better right now. But it's already a good start I would say. So yeah, that was it. If I give a presentation on a rather high level there is quite a lot more to it. So for example how we deal with nested mispredictions for example several branches could be mispredicted at the time how we deal with collecting results and everything else. You can see it in our paper. It's openly published. It's on archive, you can find it. And we also published this tool on GitCup so you can try it out, here's the link. But please be gentle because it's first academic code and we have recently refactored it so there might be issues. But if you find any issues just write me or open an issue on GitHub. Yep, so that was it from my side. I will ask for your questions and if you want to talk to me just either find me on the conference or write me there. Thank you. Were executed if they differed over a certain amount of time did you say, well, it was speculated? No, not really. We actually execute the speculative path. So we are forcing the program. So we have a conditional branch. We have the branch which the program should take. So given an input. So in, for example, in there was the example here. Yeah, so for example here. Say X value is 10 here or let's say the size of the array is 10 and X value is 11. So it's out of bounds and we are not supposed to enter this function or rather this branch. We are inserting such a snippet of code that forces the program to actually enter this branch, execute for some time and then it rolls back and executes the correct code. So that's how it works. Yes, you could. We just found it more convenient to implement it. I can't even see the question for that. Oh, okay. So the question was whether it's possible to implement it without source code. My answer would be, yes, you can do binary instrumentation. We just found it more convenient to do it on the LLVM level. And again, it's first prototype. So you can reimplement it as binary. Yes, could you please repeat? So it's a nice talk. Thank you. Have you thought about combining the, so you instrument the program to add the speculative branch as well, basically. But then you do fuzzing. Have you thought about doing static analysis instead of fuzzing to build a sound analysis? There are actually approaches for static analysis. Quite many. There is a respecter from GR Security. There are some tools, there is a tool from Red Hat and so on. But so far from what I saw in my experiments. So the problem with static analysis depends which kind you use. So if you have symbolic execution as static analysis, then it will be sound. But the problem is overhead. So it takes a lot of time to analyze it. And there are approaches for that too. There are tools for it, which try that. For example, Spectre, there's this one. And as of more classical static analysis which is like pattern matching, the problem there is false negatives. So these kind of tools, they normally find only those vulnerabilities that or rather those instances of Spectre that are envisioned by the developer. And Spectre might have different variations and those won't be catched. Yeah. Great, thanks. And actually it's in the paper. We have some evaluation for them too. A little bit. Sorry, I speak a little bit English. I would like to ask you, could you be so kind to share with us Libre SSL results? Because it's a fork from OpenSSL. It's security focus. And another question, what's happened if I tried to disable my multi-training in BIOS? Can barely hear it. Could it repeat? Multi-training. Yeah, if you disable multi-training, it becomes better but it's not sufficient. Yeah, so it's usually a first recommendation to disable hyperthreading because with the hyperthreading enabled, the attack is easier to launch. But with the disable, it's not impossible to launch. It's just harder. So it like rises the bar but you can still, so the thing with hyperthreading, do I still have time? Yeah. So the thing with hyperthreading that, okay, so if you want to start speculative execution, you have to first train branch predictor that it should go into the path that you want it to go. And you can do it either inside the program, as I showed in this example, but it's not always possible. So sometimes you cannot control the direction of the branch with the input that you fit to the program. With the hyperthreading enabled, you can train branch predictor from a hyperthread, from a different hyperthread because it's shared between hyperthreads. Otherwise, well, it depends, of course, on the architecture. Did you already tested Libre SSL? Sir? Did you already tested Libre SSL? No, we didn't. Could it be so kind to do it and share with us results? We can talk afterwards, but yeah, I'll be clear. I'll clearly do it.