 Hello, can you guys hear me at the back? Thank you. So welcome to DevCon to US. This is the systems engineering track My name is Yash. I'll be your moderator for this track The next talk is by women long on and the title of the talk is a spectrum a lot of primer So I'll let him take over. Thanks. Okay Good afternoon ladies and gentlemen. I think you all heard of the name spectrum and meldown Before if you haven't heard it, then this is not the way to talk for you. So Basically, yeah in this presentation, I'm going to talk about How this spectrum meldown this kind of security but come out come about and and then what kind of vulnerability actually discover in those Secret about and what are we doing from the open open system side to medicated the impact So The reason why we have this kind of security vulnerability because We all of our quest for ever Higher and higher performance in the computer chips and in doing so we are doing we are making some shortcut that In the end lead to this kind of security on the video that we we talk about and The some people would be fresh doing things to aggressive So and finally I will talk a little bit about What the computer chip maker are doing In the future to try to mitigate some of the the problem that we currently have in our current set of CPU chips So first of all, what is spectrum and meldown? They are a new class of CPU on 3d due to a capability in computer hardware for speculative execution and They're up to now There are a number of different Security one reading this country the most relevant one as follow we have special be one which is called the bomb check by pass We will talk about each of the one pretty in a bit more detail in the following side This slide just give you The set that I'm going to talk about especially to is French target injection and and then we have meldown Which is we internally replace the variant B. So they all have a number of system one two three four five and After that I talked about this speculative store by pass Which we internally for variant four and then I was surprised a little bit about a new one called L1 terminal fault, which was just an embargo three days ago So this is a new security bug that people are talking about in the past today and They're also beyond the main one. They're also a couple of variant That I am not going to talk about because we will just stick to Too much time like the we have a special We want from one call function by pass on store or speculative Full and then there's another one called that's better. They allow you to do People attack From on the network side, so we don't have to actually one on the computer itself you just send some package to the computer and then you can Extract some information out from it, but the thing within a special is the The big way is very low so you can extract maybe And feel by at the hour or so. So yeah, although this is filter possible But the marketer that you extract is very minimal. So unless you can have Continuous to the tech for for a few weeks or so, you won't be able to get much information out of it Okay, why we have this kind of security card First of all, I would like to talk about the different Memory within the computer hierarchy. So at the boot we in the city chief The process maybe you have is the register you can usually assess the data in the register within one clock cycle And then beyond that you have to assess the data from the test In modern computer chip, they're usually about a few level of cash. You have the L1 cast Level two L2 cash and L3 cash and then beyond that you have to assess the data from memory This table show you the latency Or the amount of time you need to assess the data from different level of the memory hierarchy So from the register, you need one cycle from the L1 cash About four cycles All this timing depends on the CPU itself. Different CPU may have different Timing for different level of cash This is just one example. I use a Haswell i74770 CPU as an example so for that CPU The amount of time you need to assess data in the L2 cash is 12 cycles and from the L3 cash Depending on where where you are the data L3 is the one single set of cash shared by all the CPU core So where it is to Accelerate the core years and where the cash slide that the data have. The timing can vary a little bit. That's why it is a Minimum and maximum so depending on the location The further the data away is from the actual CPU core, the more time you need to assess the data And for where for physical memory You actually need quite a lot of time So we are talking about about 100 nanoseconds in order to assess one data from From the memory and with a 3.4 ticker CPU One clock cycle is about 0.3 nanosecond So you can think about 0.3 nanosecond compared with 100 nanosecond. You're talking about a difference of 300 So you can do one operation In one cycle and then the next operation you need to get the data from memory And then you can see you have to wait About 300 cycles before you can actually get the data and in between if you have nothing to do The CPU will see idle doing nothing basically This is why modern CPU have a lot of way to speed up the operation You listen to the talk in the morning about The cycle the life of a CPU within in a millisecond The speaker talk a lot about the internal operation of the CPU and what kind of optimization they can done to Speaking up and making it faster So in order to hide all this memory, you can see there are a lot of way that the CPU can do The easiest way is pipelining so basically you can Pick up an instruction into smaller operation before your micro op so we can fill them to four or five or even more And then in each cycle you do one Michael often any cycles to a second one and you can pipeline the whole thing so you can do things in parallel So in essence in extend the time you have to do Sorry in essence you extend the time that you have to do Maybe I can use of my In essence you can extend the time that you you have to do the instruction, but at the same time maintain still maintain a very high cost speed But there is only so much you can do pipelining So the second way the CPU can help to speed up operation is by doing order execution so In the instruction stream you should have the first instruction and second instruction and so on When they are one in computer then They are not actually one in the order of the instruction stream the computer chip will analyze the dependency between instruction if they find that the instruction and They have no dependency then they may execute them our order some instruction may get execute ahead of the other one and so on So they can extract a pair of some within your computer instruction stream and Beyond the order execution you also have To do branch prediction because whenever you do you hit a branch instruction more than computer software are usually very branchy especially the general purpose one You will hit a branch instruction with the FE 6 or 7 instruction So when you hit a branch instruction, then the computer had the CPU has to Evaluate the condition that lead to a branch to see whether you take one branch or the second one and Depending on what kind of Instruction you do before the branch it can take a while to determine whether you you are taking the first branch or the second one so In most computer hardware, they have some Staticated logic to do branch prediction So you you predict our time which one you are most likely to take them and then execute those instruction ahead so and Then we go into speculated Execution why is cross speculated execution because usually after the branch prediction So we go after a branch if you your prediction hardware particularly Then the instruction after the branch will be retired you after immediate after that If the prediction is wrong, then they have to follow all the instruction that you have done before and for all the intermediate data and and then go to the next branch the second branch and Do the expression again? So This create a pipeline for store and slow thing down That usually don't happen that often but when that you happen to slow down the the computer chips. So what? So all the instructions are in quality predicted. We will go back That means the for within the architectural state like the way the content register the content memory the Conditional flight sexual those information will will be changed back to the original form So you won't see Any side effect because of the misspecified branch, but there are them my micro architectural Stay like the content will catch. They are no go back Because within the when you do expect related execution It possible that you lose some data from memory and those data will be stored in the cache Even if Then you found out that you have miss predict the branch The the information in the cash won't be fall away. They stay in the cash and And is this Changes in the micro architectural state that lead to the audience speculative execution, but that we are talking about So now I've talked about what is a side channel a Side channel is basically and such an attack basically attack based on information game from the actual Implementation of a CPU as I said before the micro architectural state by the content of a cache Will not be rollback. So those information are in the case and if you have a piece of information in the case What you can observe you that if you are trying to assess the data there in the case It's much faster than when you try to assess the data They are in memory and you have to lower it first into a case and then from the cash to the register Yeah, we take a longer time so for the touch channel and Typically the more company use types side channel is the timing information Like the time you need to assess a some date kind of some piece of data There are also other side channels are possible like You are able to monitor the power consumption of a CPU or of a commercial system We can infer whether the system is busy or the CPU is idle or things like that You have some external device you can actually can monitor the Radiation or even the sun coming out from computer and then infer some information on what the computer is actually doing for For for the time of a spectrum meltdown attack and then the side channel they use is a timing so What the That tap What the what you can do in order to perform the attack is before you execute instruction you read the time the current time from the The TSC counter and then after you execute instruction you can read the TSC counter again And see how much time he left between Before and after the instruction and from there from the actual execution time of the instruction You can infer whether that piece of information you try to assess in the case or or in the memory And catch my channel is the more common One that I use in all those attack Other side channel are also possible like in the in the next better paper. They talk about another side channel By using the time you need to execute an AVX 512 instruction, you know, those are the new SIMD instruction they are in the new Intel CPU and Don't don't sign instruction. You're very resource intensive when you want those instruction What happened is the CPU actually slowed down. They will do the clock speed because they consume so much power and Usually when you don't use an instruction and what the CPU does is they turn off the circuitry of those Though execution unit that use the they are required by the AVX instruction and When you that mean that the first time you execute a AVX Instruction you have to stop the well, you have to turn on circuit and They want a bit and then before you can actually want the instruction. So that's what is latency so what the people are talking about is they look at The latency in doing the execute those instruction and see whether the instruction unit actually the execution unit have been turned on before or not So those are the piece of information that they can use to infer whether Some of the state within the CPU chips Okay Spectrally one. This is the first Spectrally the attack this calls in January. I think in January 4th So basically it is just a simple branch instruction and If statement and you if the thing is up if the condition is true, you do a memory access but the thing is This kind of simple Instruction you what we call better gadget. So those are So you can make use of this special gadget to assess a secret information within the another way BX So the branch predictor that The way that the CPU train the branch base detector is that it will you have some kind of internal state table and It will determine how much the plan had taken in the past If the branch have been taken many time in the past you assume that the next time you will hit the same Instruction again, it will take the same branch. So past history We use the past history to predict the future outcome so you can actually train the predictor to predict that one bunch is more likely than the next so the The attacker can have a piece of software that train the particular branch prediction adjusts to do a certain branch and And then after the train the predictor you then execute these particular just So What the attacker can do is first thing out all the possible data item They are in the A away So all these data will not be in the cache now Now you pass a value X Let's say a very big value so we get assessor and a very very far out of the bound of the ship of the Way check so the actual X value can be much larger than the length but if the If the link parameter happen to you need to Assess from memory Then we in before the actual link can be acquired from the memory There's a pure of time that the CPU have nothing to do and so speculated execution kick in so you assume that the branch Will be taken so you will go ahead to try to read the BX Away get a data in and then as you said to you infer Use the data from the secret Secret value and then use it to do in WSS and our piece of data that you know that that can be brought into a cache and after the branch fell Later on you can then load each of the value from the airway One by one and see how much time you take Let's say the secret area is one then you speculate execute the instruction To assess the content of a BX time 512. That means you will assess the 512 value of the airway so you will get that piece of data will refresh into the cache and After the speculated execution you can assess each item from from a seal to a 512 and then a 124 and so on and then you can determine which one you can You can do it is the shortest time. Then that is the secret value They're actually in the BX away that you want to to get So this is how they infer the value of the secret value By using speculated execution, but they are looking at how much time You need to assess the way we add a cache And to do you see which secret area are the most likely one and And this reason why it's called a banter by pass so actually by pass a banter and Do the speculative execute instruction to get the data from the secret value so the way to fix the issue is you can You can either insert a Kind of low fans between the blind instruction and the way assess to box speculation A low fan is an instruction that you see for why to allow you to stop Doing any speculative execution forward until you Until the result of all the periods instruction have finished So in this way No speculated execution will be allowed it another way to To fix this issue is using some what is called a data dependent except masking Like in the Linux kernel. They have a special macro go away in that no spec that allow you to Deference and away index the way it does it is It make use of the band reading itself In the computation to put you another index value because that's computation depend on the Deattable band that in the check so that and Because of the day dependency Subscoring instruction will not be it will not be allowed until you resolve the difference That means you up after you get the band check where they will be in so that slow thing down that Block the speculative execution from from happening I see a 10 minutes Okay Yeah, and yeah, the other one is spec band target injection Which is basically The A lot of CPU have some kind of indirect branch instruction that you get the target of the branch adjusts from a register or from memory location instead of Dali branch to a certain instruction and That indirect branch value can also so be there also some circuit in the CPU to speed up the the indirect branch by By speculated executing of where the branch adjusts will be it's called a PD the branch target buffer and This like the branch Protection unit can be changed so you can train it to assume a certain branch adjust so if the branch that happened to be a Meme in user space then the computer will execute some Speculatively excuse on instruction in in the user space, which is very dangerous if it happened in the within the kernel which we have ability to see You see every data in the in the memory of just a So there are two ways to fix this one ability one way is called the Whippling the return trampoline, which is basically a software technique that kind of similar and Indirect branch by using a return instruction instead So it is it's a dance around the stack how to manipulate a stash so that you can do the branch by return instruction another way to fix that is the you can make use as best Special MS out there are machine specific register that provide by medical code update to limit indirect branch prediction there What we call the IPRS the indirect branch restricts speculation when you turn on this mode then the the Interest plan prediction unit we will know that if you are in different privilege mode like you are in the user mode You do something about branch and then after that you go to kernel in the kernel mode then it will skip we ignore all the I Training data for the user space in the branch and just focus on what you what you have done in in the kernel space You're also sometimes you also first the order the indirect branch prediction data out so that you won't use it within the the kernel space and Okay Let's go to the next one Mel down Mel down is actually a Different a slightly different kind of a speculated attack in more A bug in the CPU is actually so for 86 meltdown left the Intel CPU, but not AMD the reason is because For the Intel CPU they are doing a lot more aggressively in terms of speculated CPU execution so Before a memory assessment you are supposed to check whether you are allowed to assess the memory that particular man address before you move ahead to load the instruction but in the case of Intel CPU the The protection check was done After they have already started speculated execution or speculated loader the memory data that In the case of AMD they Wasn't checked beforehand, so they don't have this problem but for for Intel them apparently they they did as in a different way and The way to Medicaid this Problem is to limit what you can see in the user space so in the user mode because in order to execute the Mel down attack you you have to be able to see the whole kernel or just play and and To fix that what we can What we have done at least in the internal and I think in all in in the other who has like window also It's called the PDI page table or kernel So when you're using when you're using more you only see a limited amount of Colonel data And then when you go calling the kernel they switch the pay table to see the whole set of kernel space so but that switching of page table course time so that there is why All this mitigation will slow thing down Yes, we have done some benchmark it will slow down performance by about Few percent depending exactly what what you're busy doing okay, and And then this is the variant for you call the speculative store by pass Actually Okay, let me skip this one This is the latest one called a one terminal for Okay It is probably actually quite similar to meltdown is again due to Intel doing too aggressive a job in speculative execution so you know about virtual memory when you have They have a page table to translate the virtual address to the to the physical address and There is a bit in the page table to defy matter the page table entry is whether or not if the page table entry is some whether it's supposed to generate just a simple page for but what Intel does is the They they even if the page is not whether they will assume that the address portion of the page table entry correspond to the physical address of The of the data page that you want to assess and actually speculatively load the data and Once the data is low in the cast you can use a cast Timing sign channel to retrieve the data, but that would happen when the data Is in the L1 cache If the data isn't there one cash, you will just you won't do anything it's Kind of a shortcut that they use to speed up the performance because maybe they think that There is the most like the case that the data will be And for this one There are a number I mean there are a number of medication that are possible We in the kernel we use a technique called PT table inversion so for those entry with With a non read address Usually what is stored in the page table entry itself is some kind of metadata They usually start from silver and repeat the inversion instead of selling from silver We start from the very top The last possible value going down there Because in many cases you won't have a fit the system that have much money Many they are allowed by system for instance most modern x86 video Allow for a six-bit Physical adjustment which can say to a 64 terabyte You won't need to find system The most common you will find We are a few terabyte most so you start from from the top then the The address store in the page table entry won't match anything system physical memory So you won't do any You won't be able to to do spectacular execution on for on those adjusts But the problem is within a VM The it will be a VM you have a host of virtual memory and you don't trust what the VM is funding the VM may contain a most kernel that use this medication to attack the host and So in order to To mitigate this issue. There are two ways that you can do you can We need with the hype when the hype also need to go back into the VM you have to first the content of our own case But even then it's not enough because we need if the CPU support SMT so you can have two flat Serving the same core They also share that everyone cash So you want one flat is wanting in the VM another phrase one in the host then the the one in the VM can spy on what the What the host the other friend in the host can read into the L1 data cache and so through the only sure way to To avoid the problem is actually to disable SMT Okay I'll stop them here as a time stop Any question? I think we have five minutes for question Yeah, so we have You Your three minutes you want something to explain for three minutes. Okay. Um, this is actually the last side Looking forward to the CPU manufacturer actually Trying to fix some of the issue in silicon. Why now what we have done is soft In the software which is kind of like a worker one It's not a permanent solution because it's slow thing down and making more complicated within the operating system itself So this at least for the meltdown and special v2. I think for Intel in the next Next generation CPU the the cascaded they're going to have silicon fix for for this for those two and also for the for the L1 TF one of pretty and Polly the SSP also the only one that harder to fake is better we want because branch prediction is Is very fundamental to How all these speculations? How to Fundamental to improving the performer of CPU and it is pretty hard to to fix that in silicon. They may have provide Some way to make it harder to to exploit the one ability, but we still probably need to Find all the issue what a special gadget in the code and try to minimize the use of those and Also one thing is Synthesis is a new cause of CPU on ability and there are a lot of researchers actually researching in this area so we expect to There are more to come in the near future and Okay, I'm that's the end of my presentation. Thank you for your time