 Hi and welcome to our talk back though in the core My name is Alexander Kroh and I'm a vulnerability researcher at the vectorize Besides that. I'm also a CTF player and through that community. That's where I learned this wonderful guy We've been playing together for a few years now, and yeah, this is our talk about About microcode. He's a grad student at the things technical University, and it's about microcode and having CPUs But first before we dig into the microcode self need a little crash course on computer architecture So what we see up here is a CPU. It's a CPU core from the sky like processes Inside of it we have both the front end That's the up top and we have the back end down beneath it and some memory subsystem in the bottom The front end is mostly in charge of doing stuff like fetching new instructions. So that's where your Your high level I'm saying high level for instructions here That's the x86 instructions going in there into the fetch queue and they are getting Decoded and then they will be translated into micro operations and micro operations is what's actually running on the CPU and micro operations are then scheduled by the back end The back end is the ALU. So the arithmetic logic unit and floating point unit and stuff like that and Through that the sorting stuff can go to memory through the memory subsystem We're only going to take a very tiny section of this CPU and they dive into today The one we're diving into is this little tiny section. It's instruction queue and the decoding phase So here when a structure instruction comes in It goes through the queue and if it's a very simple instruction like just a simple move or a addition xor Something simple like that. It will go straight through the simple decoders and they have a one-to-one Sort of mapping between micro operations and the high level macro operations So we call it macro operations and micro operations from the ones going deep in the CPU so more complex instructions will go through the complex decoder and What we're looking at today is the really heavy instructions So that's something like return from a system call or like when you enter a hypervised space some of these instructions are Multiple hundreds of lines long or multiple hundred micro operations long So they can't just be directly decoded into our micro operations They go to the brahm and on the brahm we fetch Yeah, we have like kind of like our normal CPU. We have microcodes So do you have an instruction pointer running through and executing code and today? We're looking at that space. So let's took the so yeah, so what why do we need this? Why do we need this complex set? Of microcode well first of all we have the complex instructions those cannot be Directly implemented transistors, but also bugs do occur in CPUs So we we had like a spectrum meltdown a few years back setting the scene of all of this and just a few weeks ago We saw these simply bug so more and more of these kind of CPU bugs are part CPU bugs are popping up here and there and To fix these bugs post-production. We need microcode updates So besides the wrong area where my microcode is stored We also have a small RAM address base and that is writable and that's what we're going to look at today Normally these updates are assigned and encrypted by Intel. So you can't really Have that deep dive on the go look what's actually pushed by Intel to a CPU's There's hasn't really been a way so far to like inspect that And we'll see what kind of damage we can do with that if we actually get the access so This is a table showing how the address base Is laid out inside of these CPUs. It's a very simple address space. It's totally linear. So it goes from Zero down to zero x 800 or 8,000 sorry And somewhere in the middle we see a split going from read only at 7 C C or 0 That's where the writable address space starts so that's where we can inject microcode and Out on the columns we see the instructions themselves They are we have grouped them together in this table because that's how they're grouped in the CPU We call they are called triads So one triad is free instructions and one implicit knob instruction So every fourth instructions would just be an implicit knob. You can't address them You can't jump to them, but they won't really execute anything Besides that we have the last thing and that's a secret word sequence word It's stored separate separately from the instructions and the sequence word is what groups these Triads together. They are the ones in charge of doing Flow controls so they can like do branching based on testings test instructions and stuff like that So let's take the first example of how a Instruction could be implemented microcode. This is the x outer instruction It's exchange and addition in the same time. No normally we see these x86 instructions as a One atomic thing, but they're actually not so this x addition Is composed of these three? microoperations So let's read through it. So the first thing happening is that we take the source register In this case on the first line and or it with a zero the source register is the RPX In this case because it's on that side and our ex is the destination register So or ring with a zero that just means like that's basically just a move We move it into Tim zero and Tim zero is a physical and temporary register inside the CPU that you normally can't access Yeah, so that's a simple move then we do a sir exit sir extend But in this case is 64 bits. So it's basically just a again just a move. So we move it into the R64 source register. So that's basically taking our eggs and putting into our DX So that's the exchange part of this instruction After that we do the final step and that's the addition So we take the safe temporary register and add it back into the Destination register and we also stores the resulting value in the destination register And now at last we see the final part. That's the sequence word coming into play and in this case It's a symbol you end and that means that the instruction the macro instruction the exit will end at this point So we will go and update The instruction pointer and fetch a new one Now simple example. So what you saw before what's the temp zero register? That's a new one to most of us That's a temporary register only used inside microcode and it's not visible to Yeah, to the macro code and we got 16 of these register Zero through 15 they're called Of course my microcode can also access all of the normal registers like our X RB X and stuff like that Besides that we got eight a floating point registers just same thing as the temp registers There's eight of them for XMM instructions also temporarily then we can of course ask some system access system registers that can be like Segmentation of memory some some of those registers They're the ones that holds the state and last we have the use state register and that one will hold What? What modes the CPUs are running and all the critical stuff So like are we in 32-bit mode? I win 64-bit mode Are we in a hypervised space and stuff like that and often the CPU will check that flag and do Conditional it jumps in microcode based on what state are we in? Are we in privilege mode? Are we in kernel space stuff like that? Of course From microcode we can also access memory. There's multiple ways of doing this One is for example the virtual address based and once we usually do from macro operations So we can storm a store and read memory just as that We call can also access directly through this physical address base during just around all the page tables and stuff like that so yeah, and Besides that we have some very very tiny memory dedicated for only Microcode and inside the CPU. That's the u-code memory It's a separate address space from where we are storing the actual instructions So data and instructions are separated completely, but we are here. We have around 0x100 qx That we can also use a temporary storage Each entry is 64-bit wide and it is used the difference between those and temporary instructions Is that these can be used across multiple macro instructions? So temporary stores is for temporary storage inside one macro instructions and this can store Save control registers when switching mode and stuff like that and then we can fetch it out later So each entry is like has a dedicated purpose Other than that we also have two buses or a bus and a fabric these are used for communication within the CPU One of the very interesting one is the control register bus It can talk to all components inside one single CPU core. That could be the caches that are on the core so for example an L1 cache and maybe an L2 something like that and Besides that we have the very important one the microcode sequencer. That is the one in charge of Scheduling instructions and going through the pipeline So if we can go to the control register bus, we can access the microcode as well Other than that we have the Intel on system fabric that is mostly used for external communication So the HTT driver and USB stuff and other components that are shared between some of the cores So that is mostly for external communication outside of the main core Yes Okay, so now we talked a bit about that. What how do we place these updates in microcode? So we talked about that it's a ROM area and that is read only so when in install pushes a update How do they actually update a ROM area? That's where these match and patch registers come into place what we see here is the bit fields and how a Match and patch registers is laid out and and we have 32 of these and each time we fit a new new instruction we will check these fields and If we get a hit we will jump to that address instead So we have the present bit that will tell if this match and patch register is enabled Then we have a source field that is if we hit this address go jump somewhere else And that will be what's stored in the destination and as you see up here Both source and destination are to you at us. So they are microcode addresses But they're shifted by one. So one loose one bit of position and we'll talk a bit about how that works so So here we have an example of a match and patch register So let's say our CPU is running and the microcode instruction pointer hits free c8 Then we have a match and patch hit because we see a entry in the match and patch table with the same value That means that we'll instead go and jump directly to 7c 0 0 and start executing from there instead And this is what intel programs into the cpu Every time we apply a microcode update. It happens every on every reboot But we also find out through trial and error That this loss in position means that If we do hit Free c9 We would actually also get a hit and then we'll jump jump to the red area instead So we will jump one micro instruction further on So how can we use this or should I say abuse this when researching CPUs? So a lot of things we have been doing is to do dynamic inspection of state Just to also to figure out what this code doing but also to figure out what instruction is doing So here we have a simple example taken from the ROM It reads first From the RAM after space into temp 3 and temp 2, but what if you want to inspect the state? Well, one thing we could do is that we can tell the match and patch registers to go and jump somewhere else So we see the the over crossed line the red line that is a test use date and Then normally we would the sequence word will tell it to jump Somewhere to the general protection fault and do a fault because this is a privileged instruction But we can change that so we swap that out and say go to the RAM area And in the RAM we have put our stuff and we will flip the test case And when we do that We can tell it to either go and take the left path and say move these temporary registers into the Interacts and rbx these are registers that we can inspect from from a debugger and from normal user space Or if we are in a privileged mode, we don't want to screw up this instruction So instead we send it to its normal path and send it back the way we came from So we don't mess up the instruction and the CPU will still continue to function and run happily ever after Okay, so how do we manage to make these changes now? How can we access The microcode sequencer when it's all locked down and locked up by insulin? Well, luckily the people at positive technology they found a vulnerability in the Intel management engine that runs a software called trusted execution unit And they found a buffer overflow and that engine runs on a very highly privileged level And through that we can enable debug features debug features only insulin is supposed to have access to and that unlocks Hidden instructions and undocumented instructions so Yeah, we have We took the poc and expanded it to these dev boards. That's actually the board Lying up here that we are presenting from And we have of course prepared Flash image and stuff that you can flash on so if you buy one of these you can go and replicate some of our findings Yeah, the two instructions that we mostly use Is the u debug read and the u debug write these two instructions can write to Microcode it can read the data and read and write to the the data areas we talked before The right instruction also has a special feature where you can put in an address in the regs register And just tell it to go jump straight to an address you specify in the microcode area And it can also talk to the control register bus that we talked about before So that's how we access all of these microcode features Um really good findings over at the post-exonology side but Yeah, so we tried all of this we made our rub chain and we got access to these instructions And we found this repo Called custom processing unit It's a assembler that someone wrote for Assembling microcode it works from ify's beers. Um, so you can you can take and compile Microcode and put it on a flash drive and from there you can apply the patches to a cpu And we did that and was hoping that it worked But as soon as we booted up our linux system that we're running on All of our microcode patches. They were gone And what our initial thought was like oh we can't read their code We don't understand it. It must be wrong can't be anything else So of course we started writing our own assembler And we turned out to be very wrong And their their code was running perfectly But doing this we took a a kind of different approach So what you have seen so far has been their syntax the example we had before But we made a more dynamic library. So actually we made a Linux dynamically shared library That you can put in into any cproject that can change microcode on the fly so The code or the assembly code looks a bit more like this, so We have more dynamic tooling, but also at the cost of Syntax So what we see here is our function as yolo And the first value or the first array here is ucode patch That is the one we are going to apply It moves one three three seven into one of the temp registers Then it do dead beef into rbx And it takes temp zero and moves into rax So nothing crazy going on here and then it ends the instruction Then we put then we use the patch ucode function to put these instructions in the RAM area and after that we will hook the The sys exit function the six exit instruction and we use that one because it's kind of a nice playground instruction Because it's never really used By linux kernel or anyone because it's a insole only instruction, so it doesn't work on any AMD And that is nice because when you fit a dbm or punso or whatever they'll work on both both hardware's so this Instructions is a privileged instructions and they're really used so we'll snack that one and use it for our And we purpose it as we like Yeah, so we did that and we started to play around with With the microcode and one thing we could do with this more dynamic approach and because our We do it from the c language of sort for so we also have the power of all of c and it's um And it's a dynamic powers in that we can program From c we can program microcode changes versus putting it on a flash drive flash drive and being more static So we did that and we start just putting in So we made microcode that traced where Is the current instruction pointer from macrocode and where's the current microcode instruction pointer Then we stored that to the normal ram address space and we kept doing that And at some point when we lost our microcode changes We know exactly at what instruction we were losing these changes So we could backtrace how can we fix that and then we discovered that in instruction That's a instruction that's reading from the port iospace. So it's reading from hardware We found that that has a lot of side effects And when we took a look at the Opposite one the right instruction it was like three lines three instructions long This one was a couple hundred long. It has a a was way bigger than it should be So it turns out that Doing execution of the in instruction. They have a hidden side effects from in society where they Check the state of the the micro architectural state and reapplies microcode patches if someone's something seems odd That's at least our interpretation of this And what we did was that at first we just tried to nuke out the entire instruction And see if we just like make it a big knob instruction and see if that would work Well, it did work and we started keeping our microcode changes, but Apparently the colon needs this instruction. So now our colon didn't work but There's a solution for that. We just go 10 instructions down Try and disable it from there. No working. Okay. What about five instructions longer down and yeah, sure enough when we When we chopped off that half of the instruction We started to getting stable microcode changes So now we can update microcode from a linux user space environment instead of just through a official So, yeah That's nice So I think we're going to do a demo of this Yeah, so For the first demo, we're going to show this yolo function, which We had shown before So we're running this on job squared board, which is already red unlocked and exploited So it's just ready to to run If we look at this This code again, uh, let me just This was the yolo function, uh, it sets OX lead in rex and OX dead in rpx And then we have this wrapper around it, which has some inline assembly that just calls this as a q instruction um if we then Just uh open it up in gdb So now we're right before the yolo Function and if we step over it now our patch should be applied Uh, and now we're right at the search exit instruction So if we step now what should happen normally at least is we should get a general protection fault because this is a privilege instruction Um, and we are in user space, but what actually happens Is that uh, if you look up at the top, uh, rex is set to oaks lead and rpx is set to oaks dead So our patch was successfully, uh added um now for Probably the demonstration you've all been waiting for the back door What uh to show it we are gonna open up a browser and Here we have Just a site hosted here with A sweet little uh Link here click to get a calc. Let's try and press it and a calc pops so So that's very neat, uh, but our exploit actually doesn't have anything to do with the javascript engine. So wait, maybe if we open up firefox Uh, maybe We'll get another calc But we wanted something even more general than that So if we actually just copy the image link here and And pop it into our shell and just w get it another calc pops So apparently just getting this image pops a calc. So, uh, How the hell does that even work? Okay, thanks. Uh, yeah, so you're probably wondering how this works. So Let's um, so for this when we were engineering this we are for thinking about okay, what kind of instructions can we target? What where is the place that we should? Um put this back door. So the wrong the ramps very limited. We have few spaces and We needed a good instruction. What makes up a good instruction? Well, it needs to be run very frequently. So every program should be using this instruction furthermore this instruction that we are hooking into needs to touch some kind of user control data because like we can't store the entire back door inside of this tiny Microcurve space. So we need something that touches user. Uh, that has user input So our attention was drawn to the syscall instruction The syscall instruction is the one in charge of handling Kernel requests. So when you Make a syscall you in the rax register You put a number specifying what you would like the kernel to do And in the rc extra dig register the cpu will put the return address So where should the kernel return to? After the syscall has been handled if it's successful Then we should send it back to user space at the address specified at rcx And also a nice feature is that a lot of syscalls take like parameters And some of those can be that user data that we're looking for So let's jump into some solo code We can't show the entire microcode changes because it's very very big. So we'll Do a solo c thing that will tell how this factor works So we hook the syscall instruction right in the beginning of the syscall instruction And then we first check Is rax says to a syscall write Because if it's a write instruction, then it either writes to like standard out or it's doing file manipulation or stuff like that So for example, if if chrome or firefox is going to cache A image you need for later, then it will do a write to the file system So that's pretty neat. Then the next check we made make here is that we just check for a magic value We don't want to execute every image pulled from the internet So and if if one of these cases fail go do the normal syscall But if we hit both of these conditions The first thing we'll do is that we will save the current state of the cpu So we'll take all of the the registers that we touched doing our actual shellcode and we will store it into memory So that's just what we're doing here. We'll save the state of the cpu so we can exit cleanly After that we talked about we set rcx And we put it at rsi and the reason we take rsi Is because as the rsi register Contains a address for the image data. So that is the image data itself And then we add a small offset and then we will now say okay colonel when you're done doing your request Now this is a new address that you should return back to after handling Whatever you're going to do So when you return to user space execute actually our image data But now you may say oh wait, hold on a minute cpu's nowadays has Memory has the mmus memory management units. So you can't just um Execute arbitrary data everywhere you like That is the neat trick about the syscall instruction Because linux has a syscall called mprotect So instead of saving the image And doing the right syscall Well, let's put a syscall this is mprotect into rx and change the request We make to the colonel and say to the colonel. Hey this address space in rsi Can you please go and make that read write and executable at the same time? And then when the colonel jumps out of the colonel space It will jump heavily into executable Code that we control that is our image. Oh, well, it's starting the metadata of the image. So you don't see any Anything on the image itself So let's briefly go over What what shellcode we actually put in it's it's super simple the first thing we do is do a fork And then we have two process running in parallel The parent process will now be the new main thread actually handling like Handling the browser like chrome stuff. So what we do is we restore the saved context From the ones we saved in memory just um Yeah, the step we did where we stored all the registers into the image states itself We'll know now pull that back out and put it back in place in the cpu in that thread And then chrome can continue its execution Happy and won't notice that we have spawned a child thread now the child thread That can just go to all the evil stuff it wants in this case. It's popping a calculator So before we finalize, let's look a bit at how did we do some of this reversing process? Of course, there has been some pioneers in the field that started out this positive technology and Yeah, custom processing unique et cetera But still a lot of unknown instructions in there So how do you reverse? instructions in a new microcode assembly language where half of the Upcodes is just doing unknown stuff Well, one trick we did is that we used a lot of what we call side by side reversing So on one half of the screen we take the soda code for this instruction And on the other half we will have the actual implementation And then we can go back and forth and see oh on this side. We have free compares There or like free rides or like looking for bit patterns in the soda code and seeing Oh, we have the same free rides over in the actual implementation And that and just after that we have that unknown thing The unknown upcode, but what is it in the circle code and can we extract in from from that? That is one thing another is just like tracing like the dynamic tracing we saw before holding In the middle of instructions and pulling out the data that could be like from gdb that you saw before and also Yeah, just maybe copy pasting stuff like if we have a A big instruction then copy paste the whole thing into the ram address space And just redirect it now you have a clone and now we can start flipping one bit at a time and seeing What slight changes did that affect most of the time is the cpu crash bit But some of the times we do cataloging and we see different values in our register After that instruction Yeah, we talked a bit about playground instructions like sys exit because it's never used but there's actually more of these So some of the virtualization instructions for virtual machines And stuff like that that could be vm ride um, vm enter and some of the Hyper hybridization Those those are instructions that is also never used unless you are doing virtual machines But if you don't do that these instructions will never be used and some of those take arguments in form of registers So there you get uh playground instructions Which also happens to decode registers so And finally before heading off we want to talk about future work that we or even you might do because we are Uploading everything right after this and making everything public So we really wanted to find cpu box Or should we say exploit cpu box because we actually get a little bit of help from intel here If we do apply our microcode patches The official ones from intel then we can also look as look inside the matching patch registers that we saw before Now we know where do intel? Like hook from and to in the rom and rem address space So now we can have a look inside what box has they already discovered and patched that we have never seen um And our goal or hope is that we could red unlock the cpu So put it into this intel debug mode without having the physical access beforehand If we can abuse some of these buck we think that it could be possible that one could take control of the Micro instruction pointer and point it to some of these debug instructions That would be a very cool thing to do And hopefully we get the time to do it Now last I want to say thank you to especially to kelvin union. They have been that's our ctf team They have been a great help both sponsoring But also having a hundred plus hackers around always available to answer any stupid questions you might have about a cpu Like really thank you guys and also Really, uh, we acknowledge the work that margello plastic technology and trust impressive units has done to lay the ground for this talk Thank you for listening in And do you all have some questions? Yeah, that's one So so the question is what are the primitives? Primitives could you imagine building from this so in general? I think this is a very strong primitive like putting a backdoor so hidden away It's basically completely undetectable because the os can can see it um, but I could imagine like for example game hacking and stuff like that like you can hide it away from any, uh Any, uh, like a cheat engine check or stuff like that. I couldn't imagine that could be useful there other than that I don't know. I don't know Yeah, that's one Where can you buy the dev parts? Okay, the dev part is a op squared and we On the slide you will find a link to our documentation and on there It will have the exact a dev board that we use you can order it online and We have a pre we have a flash image that you can just flash on it And then you're ready to go to play with this Okay, I think there's no more. Thank you for listening