 Vulnerabilities to improving low-level defenses. Please give a warm 34c3 welcome to Benjamin and Philip Thanks for the intro. Thanks for coming Today we present our research on microcode This is joint work with our colleagues and our supervisors from Rohe University Bochum First of all, I want to give a small disclaimer the Technical details in this talk are quite specific to AMD K10 and K8 processors The newest release processor that our staff works with is from 2013 And also Most of the findings are from reverse engineering So if you want to replicate the experiments at home Please proceed at your own risk because we may trigger unintended behavior in the CPU So let's get started First of all, we want to grasp a general idea what microcode really is We want to give an architectural crash course We want to thoroughly explore the question whether it's hackable and In the end there will be some demo time So first of all, what's microcode? If this is your CPU you can Imagine microcode to be a small computer within your CPU Which there are all sorts of complex things that we will cover in this talk and we will see how we can deal with it So first of all, I want to Give some previous work first of all, there are AMD patents that are publicly available and They give a good general overview of the microarchitecture details so then there's the website chip architect and there's a rather detailed article about the physical placement of specific components on the die of the chip Then there's the anonymous blog post from 2004 optochon exposed and Basically, it's revealed for the first time that Microcode is updatable and that some MD CPUs even Accept microcodes if they are modified and the checksum is corrected So this basically was the initial idea of our research Then there's a paper security analysis of x86 processor microcode. They cover mostly security stuff related about microcode updates from intern AMD processors and Give a general overview So then there's the work of a Rigo He's he probably has some internal knowledge And is also really doing really cool stuff with AMD microcode so Now that we know what microcode is and what related worked is Let's have a look at what it's actually used for so first of all, it's used for instruction decoding It's used to bugs to fix bugs in CPUs that are in the field already. So they are already rolled out It's used for exception handling on the architectural level. So If there is an division by zero for example, the CPU somehow needs to detect that and pass the exception and information to the operating system and this is handled in microcode also microcode is used for power management within the CPU and Can be used by the vendors to implement complex CPU features like Intel sux for example So we heard that microcode is used for instruction decoding and we now want to see why this is So the x86 instructions that architecture is quite complex It is an variable length instruction set as you can see here there's an instruction with one byte and then instruction with several bytes and Usually the first byte indicates how long such an instruction is which helps during the decoding process But of course there's an exception to this there are Instruction prefixes which delay the decision how long instruction is going to be one further bite And on top of that there are several of those and they can also be applied combined to form all kinds of complex instructions also There are instructions that extensions Here we have a vector Floating point addition and subtraction with packaging a precision floating point units and This instruction has several operands and it's quite complex to Decode and later on also quite complex to execute or to be executed within the CPU and Do it to this? complexity we need a Small computer within our CPU to actually decode this Now we are going to have a quick look at how this decoding looks like We have an x86 instruction on the left pop ebx dereferenced and This instruction gets decoded to several microcode instructions or micro ops On the right you can see those at first we read the top of the stick and Load this value in the tambourine Register afterwards this register gets stored in the location indicated by ebx and afterwards the stack pointer gets incremented So we also Just heard that microcode is used to update CPUs that are erroneous and that are already in the field and this behavior probably is Motivated by the infamous inter pentium f diff bug in 1994. We are certain inter pentium processors would produce Slightly of results for certain floating point operations This was quite a mess and Intel had to pay it on to replace processors in the field so Both Intel and MD wanted to avoid this in the future and added updateability So one example this year there was this Intel KB Lake bug There are certain hyper shredding behavior Or certain hyper shredding conditions would lead to unstable system behavior and this bug was fixed with a microcode update So then we had on the AMD side also some Bugs there was an AMD fan and bug in 2008 Where TLB entries could not be cached reliably and this bug was also fixed with microcode so Again quite recently this year there was an AMD russian bug Which also got fixed by a microcode so let's now have a look at the inner workings of the micro architecture and How micro code is embedded within that? so an instruction gets executed on the CPU and first it gets from the main memory to the caches and at some point it gets to the decode engine and gets decoded to micro ops there The micro ops are then scheduled possibly reordered To the pipeline and the pipeline then leverages the numerous Functional units like an arithmetic logic unit or address generation units In parallel to actually execute the micro instructions We now zoom into the decoder and have a closer look there We have the instruction buffer that stores the current stream of bytes and Several other kinds of decoders We have short decoders that can each translate one simple x86 instruction to one micro op and pack them together And put this pack together sing into the scheduler then we have the long decoder which is able to Translate complex or more complex x86 instructions to several micro ops Now we get to the most interesting decoder, which is the vector decoder on the very right this decoder contains the micro code engine and An x86 instruction that gets decoded by the vector decoder actually triggers a small micro code program To be run within the CPU and this micro code program may generate an arbitrary number of micro ops That then gets scheduled to the pipeline and executed so We just heard that within the CPU certain micro programs are run So they need to be stored somewhere and there's an on chip rom the micro code rom which stores the micro code and basically also the micro micro programs So then we have micro code RAM During runtime The CPU can get the micro code update and the micro code update gets stored in the micro code RAM that you can see there Then there's all kinds of curcuitry around to make the whole thing working for example the addition unit that increments the program counter for the micro code engine and Most importantly for us The match registers They basically provide us with break points in the micro code ROM and Those break points can be set at certain micro code ROM addresses and once Such an address gets executed the control is redirected to the micro code RAM where the micro code update is stored This is a very important mechanism for micro code updates to actually Get control of things happening in happening in the CPU and Changing behavior like sanitizing sanitizing inputs for instructions for example so How do we actually update the micro code for a CPU core? Well, we need to be in kernel mode. We need to Load the micro code update into the RAM We need to write the virtual address of the updates into the given mission specific register As the micro code update gets gets loaded into the micro code RAM Those updates are not persistent and once you reboot or reset the CPU It's in its initial state again So the micro code update file format you can see here It contains an header with several fields such as state patch ID and a checksum and and This is followed by the match registers which contain the contents for the break points and afterwards the micro code follows the micro code is followed is organized in so-called triads and One try it contains three micro apps and one sequence word the micro apps are actually Microcode instructions that execute code and the sequence words Is for control flow redirection? so now we want to answer the question whether this is heckable so We learned it's updatable we have update drivers in different biases and also the linux kernel So we know the procedure how to update We have original micro code updates by the vendors They are distributed through BIOS updates Corbwood also has some quite a huge collection of them We know the update file format and There are hints that there's no strong cryptography protecting the integrity of the micro code updates so This is a hex dump of one of the micro code updates. I just want to have a quick glance We can see here a repeating value over and over way too often and If we color more values That are the same values. We can basically see patterns emerging. So Meaning we have no strong crypto applied there and We've learned that the CPU accepts modified update if the checksum is corrected. So Yes, it's heckable At this point we knew that we had to generate a lot of micro code updates in an automatic manner To be able to trigger behavior change and from that behavior change learn of the inner workings so we built a framework and The framework Contains out of nodes and the nodes run our own custom written X86 operating system. That's very low noise environment. So we control all instructions that get executed and The operating system runs on computers Powered by AMD processors So the nodes are connected to a Raspberry Pi via serial For data communication and the usage by opens Connected to the reset and power switches on the mainboards to automatically power up and down and reset nodes because Then the nodes execute random micro code. They often hang and are not recoverable So the whole setup is retrieval from the internet to have remote access and just as a convenience feature So this is what it looked like in the very beginning at our home this is what it looked like later at university and now we have the tools for automated testing and used that for Generating heat maps That the name is a little bit misleading. Let me just explain what we refer to a heat map as to Basically, it's a mapping from micro code ROM addresses to the corresponding x86 instruction So an x86 instruction is implemented in micro code And the micro code is located at certain ROM addresses and the map basically says which ROM address implements what x86 instruction We generated the heat maps by iteratively hooking all micro code ROM addresses with the breakpoint registers and then executing all x86 instruction Yeah so Once we have the heat maps we can reliably execute our own bits as micro code by just setting the breakpoint register to a known location and executing the corresponding x86 instruction then the control gets redirected to Our own micro code and the updates and we can just put random bits there and the CPU will Interpre these random bits as micro code and executed so because there is no documentation on the micro code instruction set We basically had to conduct an unknown instruction set analysis with a black box model because there's no publicly available assembler disassembler compiler or any documentation But luckily we had an oracle the CPU itself We can just feed it inputs and observe the outputs and from the differences in behavior We can infer structure and coding and meaning So we now have a quick look at how the CPU oracle looks like first we feed it an x86 instruction and We feed it an initial state Which basically contains out of values in x86 registers and We feed it in micro code update that we've generated on our own the metric is there is a Corresponding metric is there to the x86 instruction that we execute so that The micro ops that are also contained in micro code update That we fill with random bits get executed by the CPU afterwards We get an output state that basically Mirrors the state the CPU is in after it executed our random bits as micro code Very often when the CPU executes random bits as micro code The CPU would just crash and sometimes we would see no difference in the input state and the output state after some weeks of Proof forcing pretty much. We finally got to an Bit string that would not crash the CPU and that would yield a difference in input and output state and This basically was our Early attempt an initial step for further analysis Now of course you want to know what this bit strings actually doing so what operation is it executing and To get that we started to Toggle one bit by one bit in the bit string that we found randomly and here we change bits on the very right and we saw that the output changed again and After changing the bits on the very right several times and looking at the outputs and inputs We finally concluded that this bit string represents an EAX at immediate instruction on microcode level so we then use that knowledge to build a small database of Upcode operands and up code fields and here you can see that we determine the length of the immediate upcode field and Put it there So if you change other bits like some bits more to the left we get other outputs and We already know on the very right the immediate field if you change the mediates the EAX output Changes again, but it changes differently and if you look at the binary representation of the inputs and outputs We can after several attempts can infer that we found an XOR So after some more testing we found that the operation field has a certain length that can be seen on the slides and This was basically our starting point and we leveraged the framework to conduct a lot of automated testing where we Sometimes more randomly sometimes less randomly Tuckled bits and selected bits and we filtered the outputs to be Interesting so that we only had to look at a few Set of outputs and interesting outputs are the CPU didn't crash There are changes in behavior. We also had to manually filter out some random noise that's sometimes occurring and then After a lot of work don't worry if you can't read it We got to this So we have quite an exhaustive list of operation fields That are sometimes immediate sometimes registers our size fields and if Flags get propagated or not and so on so One other thing that we wanted to conduct is to infer the logic of microcode ROM Triads so try it that are stored in the microcode ROM We can't read them because they are in microcode ROM areas on the dial itself So we wanted to indirectly infer the behavior of some of those We first used a breakpoint register at the known address to Get initial control. We would then execute the corresponding x86 instruction to get control in our first microcode RAM stage And we would then write microcode in our update to jump back to microcode ROM To exactly this ROM address that we want to analyze we then add one to this address and Put this address into another metric register to jump back to our microcode ROM RAM so our second microcode RAM stage and this would then output the The the results and so that you can compare input and output state again But this time we didn't execute our own random or less random bit string But we executed and microcode trial that's stored on the ROM This approach has several topics for one The microcode ROM trial might just modify some internal CPU state that we don't know so we can't really observe it or grasp it and Another disadvantage is that the microcode ROM try it may not execute the try it one address Further but it may just jump somewhere else microcode ROM in this case We would lose the control but we designed our micro ROM ROM stage to not Notice this and so we would just try another microcode ROM trial in this case So quite late in our project we decided to give it a try and read the microcode ROM from the CPU die itself so We decap the chip and delay at the chip and Took a die shot looks like this Maybe you can see in green The microcode ROM areas So they take quite a bit of the CPU surface If you if you zoom in a little bit more with a scanning electron microscope You can start to see bits and patterns and if you zoom in even more you can see white dots That are either a little bit more to the left or to the right That means they are either connected to ground or to VCC and that in turn means that this specific ROM cell is Representing either one or a zero We used optical character recognition to Get let's say most of the bits out and we could actually After some rearranging because the physical layout is a little bit strange to us software guys at least We could actually find microcode instructions in there It's still a challenge door to us To get the exact mapping from the physical layout to the microcode ROM addresses So let's recap our reverse engineer results real quick. So we generated those heat maps We found in total 29 micro ops. So these are logic arithmetic operations memory load in stores We can write the x86 program counter and we have a microcode conditional branch So we also reverse engineer the features of the sequence word For example, it can be used to just execute the next try it It can be used to signal sequence complete Which means that the coding of the current x86 instruction is completed and that the next x86 instruction will be executed And the sequence word can also be used to branch with a microcode and conditionally Then we also found the sub substitution engine and substitution engine can be used to automatically put Operants that are in the x86 instruction to the microcode instruction. So that an eax in x86 instruction would automatically Medically be eax in the microcode instruction as well This heavily simplifies the implementation of x86 instructions in microcode So we also want to augment x86 instructions until now. We can just Replace the logic of an x86 instruction by setting a breakpoint register to the entry point and Writing our own microcode there, but we also wanted to extend existing logic and preserve the original semantics and we can do that by either Jumping back to ROM to execute the original triads or for simpler x86 instructions We can also emulate instruction logic ourselves and there are some examples so Once we reverse engineered the x86 microcode instructions or some of it we started to implement our own microcode programs and The first micro program that we wrote is in simple instrumentation, which is just a proof of concept more or less it pretty much just in microcode counts how often a certain x86 instruction was executed Another instrumentation that we implemented is a small framework that allows to hook x86 instruction and redirect control to an arbitrary function that's implemented in C for example We also implemented some remote microcode attacks. So given backdoor CPU that has the backdoor implemented in microcode If the computer visits a website it may trigger the the backdoor and we implemented sample websites that can trigger such a microcode backdoor and We have two versions available One is implemented with ASMJS and one with WebAssembly We also implemented some micro programs that contain cryptographic trojans, so they are harder to detect and the microcode the cryptographic microcode trojans either introduce a timing side channel in the constant time ECC implementation or they allow to inject faults to enable fault attacks on cryptographic primitives So now we want to have a quick look at how such micro program looks like The micro program is given in our own register transfer level language that we developed to implement micro programs efficiently so that we don't have to put together long streams of bits on our own So the first instruction you can see is a subtraction that's actually used as a compare So the value in T1D, which is a microcode internal register Is compared to the value in EAX Afterwards follows a conditional jump and we want to first consider the jump not taken path That's here this is just a couple of instruction that set up an integer division and Here we see in the very last instruction that we jump back to the microcode ROM So what's happening in this case is that we mimic the setup of an integer division And then we jump back to the original triads and ROM to just continue the normal integer division process Now we consider the jump taken path Which only gets executed if the value in EAX matches the value in T1D And this specific case We add one to the program counter and then we write this Value to the x86 program counter. So this basically means This microcode vector if you will Would compare EAX to a certain magic value and if that certain magic value is found Then the x86 program counter would be incremented by one which means that all successive x86 instructions will be Executed out of out of alignment or disaligned This is a very useful primitive that can be used for Javascript exploits for example without the presence of a browser bug So now it's demo time and I'm handing the stage to Benjamin okay, so First of all, I have an EPC with me that is running an old AMD CPU And it is actually a CPU we can target with our microcode programs And at first I'm gonna quickly walk you through the program that's actually currently loaded inside the CPU and I booted your standard linux on it and Yes, it's just some Microcode vector and first you load the values that you want to trigger on and Then you perform some magic to condense it down to one bit if you actually trigger this value and They're actually the actual Magic happens here You read out what is written inside a buffer in memory and why is it important? You'll see in a second when I show you the actually ever script payload we're gonna run but essentially you read out some memory and place it inside a temporary adjuster and perform the arithmetic on it and Then you implement actually as actual semantics of shift right double, which is an x86 opcode And you do this because if your trigger value didn't match you don't want to You don't want to cause any trouble because Some sometimes the kernel might use this instruction or some other application on your host system might use this instruction So it is important to preserve the semantics if you don't want to trigger a vector So at this point we know whether or not to trigger and We can do we can now Conditionally add some values to adjusters or otherwise modify them and we do exactly that and At first we condition is that eax to 11 which happens to be the exact we use this call on linux and Then you set some other adjusters that we need to actually launch the system call with the arguments we control and You do this for multiple adjusters and In the end you conditionally write the program counter To a specific value give them actually in the Javascript payload. So there's a Javascript payload. You can choose where you want to go and This all happens conditionally it only triggers if an input to this instruction measures our magic constant and How did we achieve it? This is a web assembly module and I carefully crafted some calculations that contain disaligned x86 instructions and As you can see we end with an interrupt into a kernel. So we trigger and execute system call and the buffer we're reading from is shown here and Essentially, we just say please pop us a calc and This is an unmoved unmodified Firefox and I haven't I haven't added any Code to it and it's not running a vulnerable version as far as I know so I just press calculate and Because I attach a debugger we're going to break here in a second Need to wait a bit because this EPC is actually really slow, especially if you're debugging Firefox and there we go Now I'm just gonna step through some instructions that are part of the hook and you to set in order to get the address I want to go to and now we are inside the web assembly module. I showed you earlier This is all code that is emitted by Firefox during runtime So an attacker is free to choose what code should be emitted by providing the appropriate web assembly module and As you can see this is the execution of our shift right D opcode and we actually backdoor this one and What I took special care of is placing Our constant inside an argument register. So actually our backdoor will trigger and because Gdb is going to lose control in a second. I'm going to cheat a bit and I know that we're gonna jump six bytes more. So just add six and Press continue and We are disaligned inside instructions and as you can see you perform all the things we actually coded into the microcode to prepare the argument registers and If you if you compare it with the web assembly output, I said please emit this opcode followed by a jump five and please emit lastly these two opcodes and Exactly these opcodes we have so Let's just continue and hope it works this time And of course it didn't let me just quickly run it outside of Gdb and there you go. You basically start and As already said we can also implement some cryptographic backdoors and because I need to reboot for this to work I'm going to show you the not triggering case first So we have your we have a standard signature verification and it just says well it's signature because everything is fine but now we are going to Quickly apply the backdoor and request our custom kernel because we need to somehow apply the update and Why is this actually reboots? I'm going to show you a different demo. I have This console is actually going to push a script to a Raspberry Pi currently sitting in the binary assembly in the CCL and this is connected via Syria to an AMD node and this is going to run the microcode we tell it to run and First I'm going to show you what happens if I just say don't hook anything in the microcode so you just go ahead and say run it and So just so you should pay attention to is ebp currently at zero because I didn't I didn't change anything and Now I'm going to say, okay, let's hook it and suddenly something changed and what changed is This is a microcode. I push to the CPU and all it does is push the next x86 instruction and to be executed and Store it on the stack and then just jump to a location. I predefined and this location ends up all the way over here So we actually get control in x86 without having to write any like program and rewriting stuff. It just works So this is also something we can do we can introduce a light-fired microcode hooks and Actually the EPC finished booting. So let me quickly set up VNC again Okay, so again, we're on the EPC But this time we booted a different update for example this update wouldn't trigger the back door and we placed in Firefox but instead you're going to run the crypto demo you saw before again and Suddenly it says invalid signature because we introduced an error into the calculation that is performed doing the ecliptic core calculation and using this you can actually perform the cryptographic attacks and Just a bit of theory I introduced and cryptographic Weakness into an otherwise secure crypto by introducing an arbitrary error and Using this you can reconstruct key material and all of this is done in microcode alone I didn't need to modify the binary that is running. So that brings us to security issues that actually crew of this setup You can you can push any update as you saw. I mean I modified it live push it to CPU it accepted it You can backdoor it as you saw and you can't really fix it because you would need to do an hotfake called you and to choose some proper cryptographic checking and Of course, you can hacky fix it. You just bricks up that mechanism But in the end it isn't that bad because you already have a really strong attacker motor Someone had to actually modify your bias to apply this on every boot up. So usually this isn't a problem for you and In the end, yes Microcode can be reversed and you can change it And if you want to talk a bit more about it or even try your hand at modifying microcode yourself Visit us at the CCS zero. We are in the binary security assembly and we have a setup with us Including the CPC and an old K10 that you can push it live It runs our anger us and also we pushed our sample updates and trigger HTML fires to get up Including an update driver where you can patch your own CPU, but please be careful This can actually break your CPU. We didn't manage to break ours, but we managed to reset our bio settings at some point So, please be careful Now we have time for questions from the audience You see microphones around the arena with numbers stuck on them on the internet We have a signal angel who is already telling me that we have questions from the internets before we get to the questions I said in advance you may be asked to leave the hall Entirely when going back and now I'm asking you that when you leave Please leave entirely and re-enter the hall through the main entrance not the hall But the whole messa here so now from the signal angel in the internets. Thank you We had a question that came up quite early in the talk Is it possible to cause physical damage to a CPU using malicious microcode has either of you You know ever break the CPU Yeah, we didn't break any CPUs apart from the ones we decapped to be decapped to put in the microphone electron microscope and Maybe it is possible. We haven't managed it yet but The design of at least the AMD's we analyze doesn't look like you can unless there are some feature insights Microcode that you can trigger. We haven't found it yet. Maybe it's possible, but most likely it isn't Microphone 6 yes Is it possible to fix performance issues with microcode? so We've thought about that as well For example for binary instrumentation. There are several different approaches to that you can emulate the code like wall crimes doing for example Or you can instrument the code statically and but they all have like Tropics like either they are slow or they are not complete like the static rewriting for example at least not complete in general So microcode can actually be a quite performant and complete instrumentation Basis that an instrumentation frame work could be based on The problem with microcode or is that at the moment? It's quite limited because for newer AMD CPUs and pretty much for all Intel CPU microcode is closed But maybe the CPU vendors will open it up a little bit and then it could be used to increase performance for instrumentation frameworks it can also be used to Implement new instructions But I doubt that it's going to be very faster than if you just implemented in x86 instructions Microphone one yes Have you looked at the older Intel microcode updates as well because if you look at them They are like encrypted, but it's clearly not strong crypto It's either a constant stream cipher or a small block cipher So because also I have previously succeeded in actually loading patch to microcode updates So they have been able to control the contents But so clearly the verification mechanism isn't a very strong either So, okay, we didn't have a close look into all the Intel updates But that's a very good question and we should probably have a discussion offline. So Feel free to visit us at the assembly and then we can have a look together would be cool Microphone to accurate is it to refer to the microcodes as a Risk architecture rather than sis. I mean are there any instructions in there? You wouldn't typically find on on a regular risk architecture so The question was better is reduced instructions that like Architecture so I can agree on that The microcode instructions are mostly quite simple One quite interesting feature actually is that you can have a three operand mode where you can give Three registers and you can have one destination register and two source registers Which is not possible on x86 as you all know probably but it's possible at microcode level. So it's quite interesting Back to the signal angel Thank you How do you know the microcode entry point for a given instruction? That's a very good question We showed you the heat maps and we generate them by Basically setting a breakpoint at each possible microcode ROM address and then just trying all x86 instructions and Obviously sometimes we would have multiple hits for one x86 instruction because not one home address is enough to implement the whole x86 instruction and What we then can do we have eight metric is there so we can set eight breakpoints at once We would just set eight breakpoints at those different locations and see which one gets triggered first and so we could actually step-by-step debug in which order microcode ROM addresses get executed to implement an x86 instruction and so we in this way we also find the entry point microphone three You said that Those microcode updates are not persistent across reboot right across a Processor reset doesn't mean that when vendor issues a microscope update It is actually a program and you know patch to a bias that should install the update on every boot even a boot sequence like right yeah So the CPU vendors they don't publish those microcode updates But they put them or they gift them to the mainboard vendors and mainboard vendors usually put them into bias or if he updates and Those updates then contain the microcode updates and during boot they get installed Also, there actually is on linux as a mile into microcode package And this actually performs microcode update on every boot and I actually hijacked its driver on to perform our custom update So depending on the OS we are running you either get it from your bios or you get it from your operating system microphone for Wanted to ask You mentioned it might be possible to use Crafted Microsoft microcode update to actually patch out the update mechanism or how do you think it would be to? to implement To implement that inside say Libre boot and make sure that the microcode update mechanism is patched out Very early on in the boot process. So this would actually be quite easy to do It still requires some reverse engineering work on our side But through the heatmaps we already have quite a good understanding Where the microcode update mechanism is probably going to be because once you set breakpoints there You can't apply microcode updates anymore so It would require some work there, but then it would be quite easy to just have an Microcode update mechanism in Libre boot and very early in the boot disabled the update mechanism Thank you very much signal angel Can you use this trick? So that the CPU is revealing any secrets e.g. keys out of secure enclaves like TPM like entities in the CPU Probably not at least not on AMD, but we can't really tell you Definitive answer to that because the CPUs we can control don't have insecure enclave. So we didn't test it and What we actually observed is that the microcode follows the same permissions that the code that actually triggered the microcode for example if you have an MMU Microcode actually follows the memory protection to your set. So most likely no, but maybe we can't tell you yet Microphone 2 so you said you have 29 microbes. Do you know how much of the total Space that supported this? So do you think there are some microcode instructions you didn't find yet or is it all there? So it's very very likely that we didn't find many of those instructions There are still Certain regions within the microcode instructions that we don't understand and if you toggle them it crashes or the behavior is Different and we can't tell What operation it's doing? Also, we can just observe some of the registers and we can't observe all the internal workings of the CPU. So a lot of instructions that Modified the internal state of the CPU. Let's say it's enabled or disabled certain features Or enables and disabled certain slow of has paths for some Features we don't notice that and do it to that we can't We don't know what the instruction is doing what a bit stream is doing that we are testing currently So probably there are many instructions. We don't know Microphone 1 Have you considered ways to say my if my microcode is being backdoored to detect that? Can you can you help me in that situation? Okay, so During our research, we didn't have a look into this yet But we already had some internal discussions regarding this point. It's it's a really good question. So If you if you use microcode to hook and x86 instruction you introduce a small timing overhead and If you carefully measure that instruction, you can probably detect that suddenly it is x86 instruction is doing more than it should actually do so you can detect most of it But it also depends microcode is doing not only decoding as we've seen in the talk But also other things in the CPU. So you might be able to hide torch and stare that I'm not detectable that easily Microphone 2 again Okay Considering that you can basically re-implement instructions. Could you implement a completely different instruction set? Yeah, you could do that and I showed you in the related work the Troopers talk from a Rico and he's actually doing something like this. So yes, it's possible We are still not far enough of our knowledge to Do that, but here's the support as a busy sink. He has some internal insights Microphone 2 Okay, two questions actually the first is have you found any ways to trigger the co-processes that x86 has plenty of So Let's say floating point unit and and art extensions, right? We didn't look into that but from the AMD patents. We know that there must be some way to do it Okay, and second question. There's Bad the decoding. There isn't there those short operations you showed are they not going to this microcache and through this micro op decoding and They're directly emitting those instructions. So are they safe in my code and cannot be altered. So the short decoders they translate simple x86 instructions and We we know that there's also a way to Hook not microcoded instructions. So they are not safe. We know that this mechanism exists, but we didn't find it yet Mike on one Hi Maybe you said it, but I miss it, but did you did you contact into did you get any feedback? So We did not contact Intel We contacted AMD this work is also published on newsnicks and They longer than 90 days before we sent them the results and the findings and Asked for feedback and they didn't communicate much with us probably because they already have an up-to-date Seek your more or less secure. We don't really know but strong crypto Protection mechanism for microcode in place for their latest to be your architectures. So they were probably not that interested in this All right, great. Let's give our speakers another big round of applause