 Hello, and thanks for being here with us today. I'm Stéphane LeSainte, I'm working for VHCloud, and I will be representing with Agata Grousa from Intel. Before getting started, a few words about myself. I joined VHCloud 10 years ago. I did a few different things during that time, and I'm currently part of the security team there. So today, we're going to talk about a little side project of mine started early 2018 named Spectre and Meltdown Checker. So a quick summary of what we're going to cover in this presentation. So first, I'm going to talk about security vulnerabilities and what changed when Spectre and Meltdown came out, then of course, how to fix them. Then we'll talk about the script itself and show you how it can help you and showing you're secure. We'll talk about a few governing principles I have. Developing the scripts and of course, I will do a short demo. Then I will leave the mic to Agata who will be talking about her experience working with the script, contributing to it, namely adding new vulnerabilities to check for, and some explanation about what to expect from the script and what not to expect from it, which is also of course very important. So let's go. The first class of security bugs I'd like to talk about are implementation flows. So this is really the most common category of security bugs and those are a flow in the code written by the developer. So you have a lot of categories you probably already heard about, such as buffer overflows, insufficient tip and checking which can then lead to some different problems such as a change action and control program behavior, et cetera. Then you have all the REST conditions kind of bugs with their own subcategories, et cetera. This is not exhaustive at all. So just to show you that those kind of bugs are extremely common and there are dozens and dozens new bugs every day in this category. In that case, who is impacted? Well, it's somewhat limited because it's everybody using a specific version of a specific software. I took an example here in OpenSSL, a CVU that is four years ago, and in that case, it was a bug in the DTLS implementation in OpenSSL between version 110 and 110A, which is, as you can see, pretty specific. And what's important to understand here is that other implementation of DTLS were not impacted. So for example, new TLS was not impacted. How can you fix those kind of bugs? Well, it's pretty easy. You patch the programmatic code and then you release a new version and, of course, you tell your users to update the software. I took the example of the portion of code that was written to actually patch the vulnerability I'm talking about and, as you can see, it's fairly easy. It was just a missing check in an if block. So those kind of flows are pretty common but also pretty easy to fix. Then you have the design flows. In that case, it's still somewhat common, but it's a flow in the concept. The code is implementing a knot in the code itself, which means that as a developer, if your code is completely flawless, perfect, you've done all your tests and everything. Yeah, it's very cool, but you're still vulnerable because the problem is in the concept you've been implementing, not in the code itself. So in that case, the impact is a bit broader, of course, because anybody using a software that implements this concept is potentially impacted. I've took another example in the TLS world, which is the beast attack. And all the TLS 1.0 implementations were vulnerable because the problem was in the specification. And so, of course, what's important to understand is that OpenSSed was vulnerable, TLS was vulnerable, and all software using those libraries was also, of course, vulnerable. So to fix those kind of bugs, you publish a new version of the concept, a new RFC. Here, we have TLS 1.1. And then wait for all the developers to implement the new concept in their software. And hopefully, we can hope that the old version of the concept will be phased out rapidly. The thing is that as of today, almost 10 years ago, 18 years after the beast TLS attack went out, we still have one-third of the web servers that are still accepting TLS 1.0. So those servers are still vulnerable to the beast attacks. So as you can see, it's a bit more complicated than just patching code and releasing a new version of your software. Then, of course, you have the third and last category of hardware flows, sorry, the hardware flows. So before 2018, most of the people thought that those kind of flows, exploiting them required physical access to the chip, for example, by playing with it, undervolting it, overvolting it, and seeing how it can react. Of course, it is only possible in very controlled environments such as R&D, R&D lab. And so even if the proof of concept are still interesting, most of the people, not to say everybody thought that it was impossible to exploit really in the wild. So even if it was interesting, it was not really a big threat to real life. Yeah, and then 2018 happened. And the world collapsed, you know. Actually, the security community understood that it was actually possible to exploit those in the wild and in a way easier manner than what was previously thought. So in that case, who is impacted? Well, pretty much everything, all software, including the operating system itself, because the operating system is just some other kind of software, as long as it's running on the vulnerable hardware. I've put a few examples there. Of course, Pecto-ML down with the first, but then has everybody understood that it was possible, then a lot of research was made, and of course, sure enough, new vulnerabilities were found with call names because vulnerabilities have to have call names now. Of course, we had four-shadow, full-out, down below, you name it. How can you fix that? Well, of course, you can buy new hardware and hope that the new hardware doesn't have the flow, but you can't always do that. And if you don't want to do that, well, the answer is complicated. Let's see in more details. First, you have the CPU microcode. You can see that as the firmware of your CPU, because nowadays, of course, as you know, CPUs are no longer just a bunch of transistors and logic gates. This is more, way more complicated with a lot of complex blocks working together, and the microcode is here to ensure everything is working correctly and it can be updated. This microcode might expose new features, new switches, new knobs, that can be used by the OS kernel to mitigate a vulnerability, because, of course, you can't patch the vulnerability, you can just mitigate it. So the kernel also needs to be updated to be able to use the new microcode features. If you're running in a virtualized environment, you might also have to update the supervisor because the supervisor is responsible for showing also features to the VMOS kernel, which also needs to be updated, because it might be the same kernel than the host, but it's still a new instance of a kernel that might need to fill with what is exposed by the microcode and path through through the supervisor. So that's why all those layers have to be updated and have to know the new microcode features to be able to correctly mitigate the vulnerabilities. Also, in some cases, for example, on Spectre Viant 1, you might have to update all the software, such as your Firefox, your Bash, your XIs, you name it, because for some kind of vulnerabilities, you have to have a new version of the compiler and recompiler the software to ensure that this doesn't produce the bad order of obcodes that make the availability more easier to exploit. So, as you can see, in the worst case, you might have to update five different layers of your system just to mitigate one vulnerability. So it's really complicated. So now let's talk about the D-Day. A few days before, they were starting to have clues everywhere. For example, KPTI was merged on the last day of the year, which is pretty odd, in the Linux kernel, very, very late in the release cycle of the kernel. Usually, a feature as big as KPTI will never be merged during a release cycle. It will be merged at the beginning of the merge window and then the release candidates releases higher to ensure that it's stable enough to be released. So, merging so big a patch at LC6, we say this really never happens. Linux also explicitly asked the maintainers to backpull this patch on the other stable kernel versions, which also was pretty hard. Another very strange thing was that that feature was enabled by default. This never happens. Such big features such as KPTI, which is a completely design change, it's a design change, but it usually takes years to be enabled by default because, of course, there are a lot, a lot of different systems out there and it has to be thoroughly tested by a lot of people because before we are sure enough that it can be enabled by default. So, those things were really red flags for the security community. Some people also noticed that Microsoft was working on patches similar than KPTI for Windows at that same time. So, at that point, we knew that something big was coming for us and so we were just waiting in fear, you know. And actually enough, it happens. Spectrum Middown became public on the third. And that's when we felt that, okay, all systems are potentially vulnerable. Your microwave oven, your fridge, your phone, your computer, of course, everything running, some kind of CPU and some kind of operating system or software might be vulnerable, maybe. It became public a few days or weeks before the planned end of the embargo. So, the microcode updates were not completely ready for all the CPUs and the kernel patches were moving a lot. I also noticed at that time, silent backports from the Linux vendor community where some vendors were taking patches from the Linux kernel mailing list. Those patches were not stable. They were not yet committed to the vanilla kernel, but still they were integrating those patches to their own kernel, trying to protect their users. Those backports were silent most of the time. So, four days later, it's January the 7th. It's a cold snowy Sunday. How am I couch? And that was a pretty hectic week. I read a lot of things about the vulnerabilities a few days before, and the thing is, the facts were evolving literally every hour because, of course, the vulnerabilities were completely new. So, the vendors were seeking sometimes to trying to understand which model of each CPU was probably vulnerable or not, etc. So, it was really developing every hour and changing every hour. So, I thought, okay, as a sysadmin, as somebody who is managing infrastructures, I just want two simple questions. Is my hardware affected at first? And then, if the answer is yes, are the mitigations placed on my system? And the answers to those questions were very complicated and evolving all the time. So, that's why I wanted to write a script to try to get simple yes-no answers. And, of course, the script would evolve every hour if needed. Now, a few governing principles I have developing the script. First, when information about one vulnerability is not yet completely available, I assert a system to be vulnerable by default. This is a legacy from the Spectrum and Meltdown era, but it's still true today. When a new vulnerability comes out, it's really clear right now which model and which CPU and which vendor is vulnerable or not. So, by default, I assert every CPU as vulnerable and then as the information comes out, I update the script and say, okay, this model is known to be safe, et cetera. This way, I'm sure that the script never tells you that you're safe when actually you're not. I also never outcode kernel or microcode versions in the script because, of course, this doesn't work with backports. And I prefer to directly look into the kernel image and see deep in it whether mitigations are compiled in. In this way, it works with silent backports and it also works if you have a kernel image you don't have the source or configuration for. The script will be able to tell you. Same thing for microcode. I directly query the different flags and tell you whether it's okay or not for which vulnerability without outcoding some microcode version anywhere. I also don't trust what the kernel says too much because you might have an old kernel and sometimes some vulnerabilities things evolve over time. And so your kernel might tell you that you're okay or actually you're not. So, the script doesn't trust it too much and has its own logic implemented in. I will also never attempt to run, exploit our proof of concept to gather information on your system because for this kind of vulnerabilities, proof of concept are really error prone and it would create a lot of false positives and false negatives so I don't do that. I also never modify the system I'm running on so that you can run the script on your production system without any problem. It should also be possible to inspect a kernel without actually running it. You can point the script to an image and it will tell you whether the mitigations are complete or not. I'm also positional compliant so that I can run on old Linux flavors and BSD also. And of course the fifth layer we've seen before the other software is out of scope. I'm only looking at your CPU, your microcode, your kernel and maybe your hypervisor if you're running the script inside the VM. Okay, so now it's demo time. So let's first launch the script in hardware-only mode. In that mode the script tells you facts about your CPU and microcode. So the first part here is showing a couple of flags that are this is the switches or knobs I was talking about earlier. You can see that even if I had a somewhat old kernel and also I have a very custom kernel, an old CPU, sorry. I have a fairly recent microcode so you can see that the IBRS and IBPB features are supported by my microcode. So this can be a fuse to help or this is one of the ways to mitigate for example Spectre Viant 2. So however I don't have enhanced IBRS this is a feature that requires actually hardware support of the CPU so of course I'm not going to patch enhanced support on my CPU enhanced IBRS support on my CPU by using my Solving Harron. This is a bad idea. So as my CPU was released before in 2018 I just don't have it. Then here you have a lot of flags for newer CPUs that tell your operating system that they are completely unaffected by some vulnerability. For example if I take mail down here there is a special flag which is named rdclnow and if this flag is set you will see a yes here and it will tell your system that the CPU is completely unaffected. This is the case for CPUs that were released after a vulnerability. Of course the design was changed in the CPU so that the vulnerability just no longer exists. So it means that the operating system doesn't need to put in place any mitigation for this vulnerability. So as my CPU is old I have no everywhere here. There is also a special line here telling me that my microcode is not known to costability problems. I think this is the only place where I did add code some microcode versions because at some point there were a batch of microcode that were released that ended up being bugged and at some time, after some days of uptime your system might freeze or reboot so of course they were updated since then but I put the version in the script so that if you see a yes instead of a no please stop what you're doing and upgrade your microcode but please don't keep this one. Then there is a line that tells me whether the microcode I have is the last known version for my CPU model which is the case here the script has its own DB of microcode versions and we tell you more about this after. Then there is the portion where there is one line per vulnerability and it tells you whether your CPU is affected by one vulnerability or not regardless of the mitigation that may be in place on your system. So as you can see on my CPU how you have most of the vulnerabilities except for Shadow SGX because my CPU doesn't have SGX extensions so it's okay and it's not affected by Zonbilode variant too. So now let's relaunch the script with the full output. So as you can see the first part is still the hardware check then you have one paragraph per CVU pair of vulnerability I'm not going of course to detail all those lines we don't have time for that but as you can see it goes into great details in showing you each precondition that has to be met to ensure that at the end your system is not vulnerable either because it's not affected or because it correctly mitigates the vulnerability. And for some kind of vulnerabilities you can see that several mitigations are possible so it shows you which option and there is a little conclusion at the end of each paragraph telling you you're not vulnerable or you're vulnerable. So as you can see it's pretty much green. I've disabled one of the mitigations for this demo to show you some red flags so I've disabled the MDS class of mitigation on my kernel then rebooted and as you can see the script correctly sees that the kernel I have supports the mitigation so it's clear compiled in also on the hardware part I do have the where is it? I do have the MD clear bit I don't see it but it's there yeah here so the microcode is up to date also but the mitigation is disabled at runtime so at the end I'm vulnerable and it tells me that it tells correctly my microcode and kernel are up to date but the mitigation is disabled then at the end of the output of the script you also have a quick summary to tell you whether you're okay or not okay for each mitigation you can also modify the output to get a more shorter output which is easier to pass in your script or monitoring systems as you can see you can also use JSown etc. of all those options available I will let you test so I will show you now how to inspect a kernel image which is different from the kernel that I'm currently running on so let's point the script to a kernel image so you have to specify the image if you have it you can also specify the configuration but you don't have to if it's not available and you can also specify the map file if it happens to be available on your system this is a typo so the script runs again and then it's not checking the currently running kernel it's checking the image I've pointed it on the command line so as you can see the image is somehow hold sorry my cat is doing some noise sorry about that so as you can see this is a fairly old kernel so as you can see it doesn't have the specter mitigations in place and same for all the mitigations however I do have KPTI which is what's needed for meltdown so if you do compile a new kernel this can be interesting to test you can also test without specifying the configuration of the map file if you don't have those for example if it's a kernel that you have you didn't compile it yourself you can't point the script at it it will tell you that accuracy might be reduced but it will still try to do its best to tell whether the mitigations are here or not what you can do also is marking a CPU that can be interesting if you have a special machine that has some exotic CPU on your production for example and you count you don't want to directly reboot on a new kernel and check you might want to mock actually a CPU on some development machine so you run the script and then it dumps a bunch of environment variables that you can set and if you do then the script will run in mocking mode so it's not looking at the current CPU you have but it's looking at the mocked CPU you have of course in my case I've just mocked the same CPU that I have so the result is the same some words about the database of microcode that is built into the script so the sources are the MC Extractor project on Github which has a lot of microcode versions for a lot of different CPU vendors and it also queries the Intel firmware's Github repository which is quite neat and so the source of information is very good and so that's how it can tell you that your microcode is up to date or not of course you can update the database yourself okay that's it for the demo and so Agatha the virtual stage is yours thank you Stefan welcome everyone hope you are enjoying Open Source Summit so far today I would like to talk about my journey with Spectre and Meltdown Checker and how you can start contributing to that Checker tool shortly about me I work for Intel for over 5 years as a performance engineer currently I am mainly focusing on investigating performance impact of Linux kernel, security patches and Intel microcode few notes about disclaimer the views and opinions expressed here are solely my own and don't represent the views or opinion of Intel or any of its subsidiaries and affiliates also Intel doesn't control the content of the tool and like any Open Source tool it might not always be up to date you should not rely on it as the only way to identify potential vulnerabilities that might affect the system on which it's running with that in mind before I go in a feather I would like to set the tone and explain terminology I will be using throughout the talk first I want to distinguish between kernel and script patch during this presentation I will call kernel patch as a kernel update and script patch I will just call it script patch I will also refer to affected and vulnerable during this presentation so I want to make clear what I'm talking about affected means a given vulnerability applies to a system vulnerable means the system is affected by vulnerability but also not secure and might need a security patch as a fix after a fix is in place the system is not vulnerable anymore finally a transient speculative execution methods exploit micro architectural side effects of transient executions therefore allowing malicious actor to access information that would generally be prohibited by architectural access control mechanism as Stefan already mentioned it specter and meltdown checker is a widely used open source hardware vulnerability checker tool my journey with the script started almost two years ago so far I have built script patches for MDS, TAA enhanced IBRS and SRBDS you might ask why do I contribute to the script well that's very good question there are several reasons first and foremost the script gain on popularity over time more and more Intel internal and external customers use the script to quickly check the status of the system moreover our team builds and upstream that enable Intel hardware mitigations from OS therefore I'm able to have an early access to the kernel fix and microcode before its actual release what is more new script patches are complex I will show more details later given the type of work I do it's given me the opportunity to have a clear understanding of how to detect the status of any given vulnerability and implement this on the script since the script is growing in complexity being already familiar with it helps me achieve this task also by being involved in the script I try to make sure that by disclosure date the tool is updated so that end user can see the status of the system regarding new vulnerabilities as soon as they know about it in addition the script is now part of our mitigation process every time there is a new kernel update changes are incorporated in the checker script and last but not least giving back to the community collaborate with the engineers around the globe and building my expertise as an open source contributor the next point I would like to address is script capabilities what script does it doesn't do in updated kernels there is a vulnerability file inside sysfs file system that shows a human readable vulnerability status to transient execution attacks the script ribs this information from sysfs however the script is so much more than just reading output from sysfs file system before I go there let me first explain what script won't do for you if your system is vulnerable script won't magically fix it for you it won't install any extra packages microcode or dependencies think of the script as a read-only tool it won't change any parameters or configuration files on your system script does it best to establish if your system is vulnerable to speculative execution but it doesn't guarantee that your system is 100% secure in contrast what script does firstly regardless of the kernel and microcode version script helps verify if your machine is affected by the execution vulnerability and if script hasn't known CPU security mitigation in place moreover you can use the script in multiple environments from bare metal through virtual environment and containers now that we know what our script capabilities let's take a look at installation process ideally you should run the script with admin privileges if you don't script does it best to run assessment but it cannot get access to all available information if not a root script will still run but you will see a warning signals indicating you are not a root to download the script you can use any Linux command to transfer data Git clone will work too you can use curl command as well or wget command before installing critical files from internet such files should pass checksummage and identity verification next step is to make sure script is executable cmat command will do the trick as mentioned earlier to get the most out of the script run it as a root now if you have good understanding what script does and doesn't for us how to download and install the tool let's identify what needs to be done between discovering cve vulnerability and pushing a script patch first, we should find out system that is affected by a given vulnerability set of CPUs will vary depending on the vulnerability list of affected CPUs is available at the main kernel website next, after understanding what a given kernel update does develop a script patch when the patch is ready go through rigorous test cycles to check if system doesn't crash when script is running it's actually happened to us before logic was incorrectly set first it was written to new MSR and then it was checking if that MSR exists it was crashing because it was trying to update non-existing MSR another questions does script do what it's supposed to do after you enable and disable kernel update does script correctly show output on affected and unaffected systems does system as output corresponds to script behavior after changing kernel security mitigation parameters like MDS equal off does status of the screen reflects that in the output is correct in all possible scenarios that include bare metal, virtualization etc as a result of this validation we might need to debug the patch more debugging on the next slide while debugging is done it's equally important to catch and the last new issues after all is done we are ready to release script patch to the open source community as mentioned script partially relies on the output of the vulnerability file in ccfs you can check it by running the first command beside checking cpu vulnerabilities script prints and check many other things for example it will tell you if you need update kernel or microcode which hardware flags are enabled if you are using most current version of microcode it will check your kernel version cpu id, readmsr it will also parse cpu details you can also check kernel without actually running it that's why script has an offline mode that allows to inspect a non-running kernel this mode is automatically enabled when you specify location of the kernel file config file and system map files live mode which is currently running kernel is a default configuration if you need to debug script then use dash v twice as a parameter showed on the slide the main difference between relying solely on the output of vulnerability file and script is that older systems with older kernels might not have system files at all also cpu security patch it's usually back ported but not to all of the prior kernels therefore you might have a situation where you have a ccfs file but no information about specific vulnerability you are interested in specter and meldown script solves all of those issues as it checks not only ccfs but also msr and many other things I mentioned it earlier script patch is complicated let me show you how complicated this slide and the next few I will be focusing on recent cpu vulnerability called srbds also known as special register buffer data sampling provided flowchall we will break into details on the next few slides just bear with me full pdf will be available at the conference site itself when writing a script patch you want to make sure to identify a list of affected cpu's full list it's available at the first link here on the bottom number one in this example you can see which intel processor are affected by srbds most of them are family 6 but it's not full list so the first steps is to convert this into code for this case I needed to create a list of affected processors as shown in this image you can see how the code checks for the cpu family and models of my cpu and compares it against the list in the form of set of if statement which is here the top for srbds having the list of affected processors wasn't enough there was one caveat describing this second link that's the caveat in yellow and on the bottom it's the full description for the case of kabilake l and kabilake systems I had to check the stepping of the cpu and then check some of the capabilities of the system mds and tsx to be precise only those kabilake and kabilake l systems of particular stepping that also have tsx enabled might be affected by this vulnerability in the code this is a set of if statements where I check for the cpu model stepping and some of the capabilities of the current cpu along with the microcode update Intel added new hardware capabilities that can control mitigation script will check presence of specific cpu id which is here when of value of mentioned cpu id it's one which is here that means there is msr that controls enabling and disabling srbds mitigation when value is zero just here on the bottom that means system doesn't have this capabilities and support for enabling and disabling srbds doesn't exist therefore system is vulnerable the new msr that control enabling and disabling srbds is msr 123 h stands for hexadecimal which is here if msr123 exists on your system you can enable or disable mitigation by changing the value of that msr when value is one which is here that means srbds mitigation is disabled which indicates your system is vulnerable when msr123 value is zero which is here that means mitigation is enabled and your system has mitigation in place therefore system is not vulnerable to srbds and this is how does it look in the code first you read relevant bit from cpuid as you can see in the code to the read cpuid function this function reads particular bit and compares it with a given value passed to the function which is this red value here if it returns zero which is equivalent to truinbash in this case it means that the bit9 had value one with this we know if the system has microcode that supports mitigation for this vulnerability if the bit9 is set to zero it means there is no microcode if it's one the microcode that mitigates the issue is installed in the system next if this bit9 is one we must read msr123 to check status of mitigation if this msr is one it means that the mitigation is enabled however if one the mitigation is disabled which means system is vulnerable script is done you run it and you realize your system is vulnerable what do you do well that tricky question because the answer is complicated you might need to update microcode or kernel or you might not be able to do anything as fix for your specific system might not exist that being said script will do it best to explain what is wrong and missing but it won't fix vulnerability moreover you should follow instructions provided by your OS vendor to install microcode updates also there is a user guide available that will help you manually update the microcode here link in pink please reach out to me if you have any questions I am the owner of that piece lastly from the OS you can also change kernel security parameters such as mds equal off instructions with available configuration are provided here at green link before you make any changes make sure you understanding trade-offs and implications and that brings us to the end of our talk there are few key points that I would like you to remember from the talk those are firstly specter and meltdown checker is a read-only bar script that does it best to determine if your system is secure however don't jump too quickly to any conclusions number two due to its simplicity and the fact it's open source script gain on popularity and it's used by many number three it's important to continue contributing to the script as that's the easiest way to check if your system is impacted by the newest vulnerabilities there however some drawbacks some of them number one with each mitigation script grows in size and making it harder to maintain as well as develop advanced features number two code clean and optimization some of the code is redundant and least of affected CPUs it's still hard coded we can do better than that and last but not least the best way to protect your system is keeping it up to date with the newest patches and software with that in mind thank you very much for your attention there will be Q&A later on please feel free to reach out to us with any questions