 Okay, good afternoon everyone, can you hear me well? It's not quite sure, but louder. Is this better? Okay, I'll try, I'm not very good at talking loudly, but I'll try. So I hope everyone has had an enjoyable lunch and delicious lunch, so it's always a challenge to give a talk after the lunch because everyone starts kind of falling asleep, so I'll try to make it entertaining. And so this talk is kind of a result of me somewhere around having classed autumn, trying to kind of research into the topic. Okay, so we have a Linux kernel stock. It has been kind of our target, so various a talk for really long while. And I was trying to understand, okay, so what's the current status? Because there were some countermeasures, notable countermeasures merged and passed to combat some of these talks, and I was trying to understand, okay, so what's the current status? So are we good now? What's missing? Do we need to do still something or we can just say that okay, this is done. And so this is basically what this talk is about, so I will try to give you also, so in order to answer the questions, of course, I had to go ahead and study all the existing talks and the countermeasures and trying to not just study details of each of the talks because, I mean, I can give a talk, like explaining you one of these talks for an hour easily, and this is not my goal here. My goal here is trying to, and this was not my goal also there when I tried to study it, so I try to kind of understand whatever patterns, whatever common things what wire talkers are succeeding constantly in this area. So, and then I did some gap analysis after and there was also proposed protection which we tried to merge to mainline starting from maybe a new year timeframe. I'm not very optimistic now until it'll get merged after all, so but I'll get there, so. So before, so I guess here, I don't have to explain anyone why it's important, so wire talkers really go after Linux kernel while it's such an attractive target. If you are, for some reason, not aware about it, I encourage you to go listen to a case talking on Wednesday on kernel self-protection project, so he'll explain you in great details why kernel is such a nice target, but what I kind of started asking myself why the Linux kernel thread stock, so. So why the kernel stock? And to answer this question is that, so I would like to kind of keep you kind of small, I mean, there are two main things that usually if you're an exploit writer and you're trying to write in volume, like working exploit against the kernel, not just Linux kernel, any operating system kernel, I'm concentrating mostly on Linux here because I don't know actually anything else apart from Linux, but if you're trying to write and exploit against a kernel, there are two main things you should remember when you're comparing your kind of exploit writing to the user space. The first one being is that in user space exploitation, I mean, if you crash to your process that you're about to exploit, it's usually not so bad. So I mean, you can recover, the process will get usually restarted unless you have very kind of highly sensitive intrusion detection system. It won't get probably noticed because I'm in user space process tend to crash and they will get restarted and so on and so. If you now in a kernel exploit area and you actually crash because if something goes wrong and if you kind of screwed up with some offsets and things, you're gonna crash a kernel and that's much worse. Because I mean, it's much worse for many reasons. First of all, there's no automatic restart possibility. So I mean, somebody will need to power this machine again and get it back to you. Second, if you are targeting any production system with high odds, it will raise some alarms because kernel crashes is not so kind of standard event. We're not expecting our kernels to crash randomly. And then also when even if you kind of everything is okay and kernel is restarted and machine is restarted, you most likely have to start from zero because some things have changed. If it's randomization in place, your layouts have changed and you'll have to start figuring it out all over. So this thing is that it's like if you read the books on kernel exploitation, they will keep it like every single second page is like do not crash the target, do not crash the target. So this is very important for attackers. And the second thing why it's different when you write an exploit against the kernel is that you really have much less understanding of a memory layout around you what you're trying to play with. So in user space, you're talking in particular process you usually have quite some understanding of what process what libraries are get loaded, what kind of things happening and the kernel is totally different. So in particular target, you might have different set of processes running doing different system goals, everything like interrupts happening, everything can kind of affect all of this picture and you really have much less visibility and control over it. So that in mind, so what is so special about Linux kernel stock which kind of makes it attractable? I kind of named one of the three things which I think it's important. So first of all, it has a deterministic structure. So here I have an example of this kernel stock for x86.64. So it's actually four pages of size, grows downwards because of how the stock is positioned in the memory. So the high address being and low address being at the bottom. So it kind of grows down. So it's having these, it's usually starts so what to use this stock for in case like, I think everyone knows it but just in case. So for example, if you're a process, if you're a threat running in user space, you issue a syscall and at that point you need to, you're transitioning to a kernel space and the kernel needs to start serving that syscalls in behalf of, for your behalf and you can't allow the kernel execute on the user space stock because it's unsafe. So you're gonna use this nice small and complex stock to kind of process your syscall for example and it's gonna be very deterministic how this whole thing is gonna be layout. So you're gonna have a feature extraction beginning which is gonna have some of your syscall parameters short and some very important things. And then you'll start as you're processing your syscall and calling function, you'll start piling your stock frame similar like you're doing user space. Some of the part it will be unused. It used to be also that you'll have this special threading for a structure located at the bottom. Now it's not like a case anymore if you're having that, that config option on some. But as you can see, it's very deterministic. Every time you're gonna enter a syscall because this page is allocated once only for each thread. So any subsequent syscall you're gonna enter and start growing your stock from the same place down and kind of exit to need and you're gonna have the same. Of course stock frames might be different and based on what kind of syscall you're processing but still there's a lot of deterministic and usage and it's highly used because well this is always what you're gonna use for processing syscall. It's predictable because this is how the way you're gonna be using it. And it's also as many things have shown usually stock leaks are very common so it's quite easy for talkers to locate where this stock is actually situating your memory so that it helps you combating this thing that you don't quite have this visibility of memory out. So this is kind of the three things which I believe which has depending on very different, like you can have very different attacks but still these three things have been like a key of what at least has also been a very kind of important thing for talkers. So now before I'm going to explain you these attacks I want to kind of make one distinction because it confuses me and past and sometimes it also confuses people because terminology is kind of a bit confusing so there are two different types of attack so everyone usually knows what buffer overflows so this is like I showed you an example of a kernel stock but user space stock will look kind of and they actually started to have this old attack timeline so the talk started back in 1996 of user space so what you will typically have you'll have your stock frame you'll have an arguments for this function you have a return address and then you start piling your local variables and if you manage to have the local buffer which is located and developer didn't do the due diligence you can it allows you to doesn't check proper boundaries you can overflow it because since stock grows downwards you're gonna overflow the other way around and you go overwrite the return address and this is a classical buffer overflow and there's a long history of attacks and protections against it but it's not quite of attacks I'm gonna be talking now so what I'm gonna be talking now mostly it's what we call a stock overflows and I stole the definition from one researcher John Oberhide he has been actually behind some of his attacks that I'm gonna show you and so his definition was that the stock pointer when the stock pointer gets decremented beyond the intended bounds of the memory which was allocated for stock this is what he calls a stock overflow and I really like his definition it's very kind of descriptive so here you can see what the stock pointer because when you execute on a stock you always keep it tracking of where your stock pointer pointing to and if you somehow by doing something you manage to get your stock pointer out this is the stock overflow so this is like very important to understand not to get confusing and now I'm gonna walk you fastly for some of the attacks and I try to kind of there's many more attacks and I try to select kind of the attacks based on the class of attacks it represents to show you one particular kind of interesting pattern which attack is used and it will help us to understand the mitigations which were implied here which were developed later so and as I said I don't have time to go into details but I will try to highlight the most important things about each attack so first attack I'm gonna show you is from the first in 11 so and it used to be and I have the references at the end for all the slides behind each of his attacks on courage if you're interested in details and we can talk also after but we can go ahead also and check the attacks they're really interesting even details I really like many of them I really like pieces of art so in certain extent so the first attack is this initialized attack so it's actually cases attack so what it did it used this property so not property it used to feature that you might have an initialized variable located on the stock so what it is it's usually kind of some object and it's a structure union union it was in this particular case which developer in certain path they forget to initialized some part of it so explicitly initialized in the beginning and if you manage to find an overlapping path like subsequent C's goal in this case it actually was the same C's goal just with different parameters so if you manage to find an overlapping path which kind of so your kind of strategy here would be that you use first C's goal to pre-fill some data and that kind of which later one would be this part of the data used in a struct later on so you pre-fill the data for attack and control value and then you release your subsequent C's goal where you will use this initialized kind of pre-fill data for you to resolve it and this is possible because as I said that the kernel stock is done in a way that each subsequent C's goal starts from the same place it goes down and kind of starts building the frames and when it exits but nobody cleans the stock after you it has whatever it had from previous so so that's possible and what is commonly also used in these cases is this copy to or copy from user primitives so the primitives which I used to copy data to and from between a user space in a kernel and here it was a copy from user data and this particular talk others are also published so what you do here is that you will if you're able to find this copy from user goal with the destination pointer which you can pre-fill using this previous kind of preceding C's goal so you're able to influence basically the destination from there to where the data is copied to so what you're basically creating is arbitrary write primitive what kind of exploit writers call it so which means that you're able to arbitrary override kernel addresses at certain places and it's very powerful and in this case and how you do take it from there that talk it's actually not so important for many ways and it's like your imagination is only bound in this particular case it used that primitive to override a function pointer with socket destruction function pointer which was an attacker control and when attacker closes the socket this function pointer overwritten function pointer gets invoked and it would just go ahead and execute this traditional payload to kind of raise your privileges so this is like how you can use a copy from user if you use copy to user source pointer it's gonna create your arbitrary read primitive because you're basically gonna tell that okay copy me this address content of this address to the user space copy me content of that memory address of current user space you can just get the data out of kernel so to summary so this the key things which were used here of course we had to find an initialized variable which one unrelipping path you have this primitives here helping you to do the job but again all this kind of is based in fact that you can precisely override a certain offsets and pre-fill the read needed data and then a subsequency is called resolved if you are not precise you're really crashing with kernel and this is what you don't want to do so this deterministic again structure is very important here so the next the next talk is called start checking same year and it had it's kind of had this assumption so it already assumed the talker to have an arbitrary write primitive to begin to begin with and then how do you get it I've just shown one way with initialized talk how you get an arbitrary write primitive there are other ways so it just made this assumption but here if we kind of heard in a previous talk case use it to this arbitrary write primitive to override a function pointer it's not always that you have this luxury I mean the function pointers you might not find a suitable one function pointers tables might be protected so this kind of tried to be more generic and say okay what can we do if we have an arbitrary write primitive but we want to get high privileges so what it tries to show you here how from arbitrary write you're actually getting an arbitrary read primitive and together they're very powerful because if you have both you can read the data you can find the data you need like location of a credential structure and then you can overwrite that to get a root privileges so how do you get there how do you get from write to read using the stock and why so again first answer is that because stock is easy to find it used what it called like self discover location of the stock it's used it for either like your own process for a child process depending on these techniques they used and what it needs for it it also kind of assumes that you managed to leak some address some pointer which would be pointing to some address in the stock so for example if you managed to leak the pointer to some local variable which is located in the stock you can kind of using a simpler arithmetic because stock is fixed and you kind of it's aligned and everything you can basically calculate the stock base address so you will know where your stock starts and ends and you already suddenly enough you have a right primitive plus you have this perfect stock structure or not stock structure stock stock pages that you know where they exactly are and you can try to kind of proceed from there so it proceeded from there using different techniques and the first technique is kind of it used to fact that we used to have this special structure at the bottom of the stock which called the tread info it was located at the fixed bottom of the stock easy to find and the goal of that kind of first approach is then try to kind of override because you have an arbitrary right try to override something in that structure which allows you to rise privilege a classical approach was like at some point of time was to override the address limit variable located in a tread info because if you can put to kernel then DS value there like if you can change that you basically allowing this copy to and from user to not verify arguments and copy your data between the kernel and kernel to kernel or it kind of basically gives you this primitive thread away and read primitive including but it's not so this first one is not actually that interesting because it kind of relies on this threading for being there and many things and it's actually it does many other dances there to kind of get there but I like the second one more because it's more generic so there it doesn't assume many thread info at the bottom what it tries to again have is what it does actually essentially it spawns a child process so attack response a child process the child process uses the stock self-discovery to find where it's stuck located so that we get these addresses it passes this info to a parent that parent is aware now where it is and then what happens is that it will put we so it will put a child to sleep and one important thing I forgot to mention is that it did have to find this copy to user call which would use which would be kind of done from on the in the context of the child but it would because it already has an arbitrary right primitive it doesn't need to have any on this uninitialized variables pre-filling it can just go ahead and overwrite this copy to user source pointer and that would give you straight with this arbitrary right so it will pass the child it will kind of put it to sleep overwrite a pointer re-restores the child resumes the child and then the child will perform this copy to user call with the piece of memory the information of memory the address that you want to know and this way you can actually walk this whole thing from trade info you can know where the stock struct is from stock struct you can know where credit struct is and so on so you can find the right location and you already have an arbitrary right so you're gonna overwrite it so this was done and again for both of these techniques use different approach and everything important part is that when you need to precisely overwrite a certain point of time the location of the stock and this is very sensitive again if you get it wrong if you overwrite something wrong you're gonna crush a target and I can almost feeling myself as this exploit writer books we don't want to crush a target so this is very important so we want to create it reliably or exploit writers want to create reliable exploits so they would never go for approach it works once in 50 only times so the next time okay so the next talk is so if previous two talks we were talking about something which happens to win the stock itself now they're getting kind of more interesting scenarios so we are getting into what we call inter-stock exploitation so there was the subsequent year a talk which was called stock is back and now we are finding again we're using this child process scenario but what we will do here is that we will spawn our children processes and they will use the same trick to rediscover the stocks and pass the information to the parent and we will do it until we find us in the situation when the parent and child stock is aligned and it used to be that we didn't have any guard pages in between so you really to look something like this so you have a parent stock location and child stock on location underneath and at that moment so when you find this kind of a situation so you're gonna put a child right to sleep and what will you so what you kind of internally what you need to do you want to really kind of try to overwrite return address of the child but you can't do it unless you have a way to extend your stock frame so you basically need to go as much down from your own stock frame to reach this past year past the end of your stock and into the stock of your child process and in user space there has been actually old work showing how to do it in user space from 2005 and there are a number of things if you have a local calls which allow you to give similar things but in kernel a local is banned for example and if you allocate a big local stock variable compiler is clever enough it will actually put it for you of the kernel stock because it really doesn't want you to do that which is great but what you used to have in kernel it's this variable length arrays so the example I've shown here it's basically an array with the length which is not known to compiler to compilation time because here like here in example it might depend on some runtime variable length and if you find such an array defined on your path which you're trying to explore it and the size you can control which means you can put it as much as you want here not as much, you actually want it exactly as big so kind of go down and not as much so you precisely want to stop at this return address thing so you can basically create this variable length array you can get safely over the kind of things you don't want to overwrite you will overwrite the return address of your child in this case and then you will resume the child and the child basically returns to attack control address and I guess again you can take it from there into kind of different paths you want so and again this one is different approach it's of course uses the fact that you're able to find this variable length array and uses the fact that you cannot kind of get this close allocation it actually wasn't that hard in that time at all you don't have anything in between so you can just safely kind of overwrite but again you need to be able to do it safely and if you need to know exactly how long your variable length array needs to be to overwrite a particularly precise place so this deterministic and predictability is extremely important a more recent example so this is from 2016 the guy behind this attack is Jan Horn I'm kind of personally really like this one because it's pretty in my point of view of prettiness so it's again it's also inter-stack exploitation so because we're not just doing it when we're stuck but we're not having a child process here underneath we are what Jan did, he's basically creating sort of child process, he's creating data pipes and each data pipe will get allocated a page of data for it and it will kind of keep creating these pipes until he gets into this kind of picture here so until his process is stuck because it used to be that they would be allocated from the same kind of body allocator so you will get to this scenario where they are co-located and what you do at that point is that you will, and of course here he used to trick that he found an arbitrary recursion bug which was an encrypted fast and there's many details to that but that arbitrary recursion bug allowed him to kind of basically build as many kind of keep doing the stuck frames as not the stuck frames calls and he able to kind of recurse past the end of his own stack into this data pipe that he controls into this pipe that he controls and at that point when he made the recursion fast far enough he would pass the process he would overwrite return address by writing into a pipe in a particular point when he resumes the process and the process returns to address and again the whole thing continues so again different like you had to find the recursion bug and he actually like in some of the conversations he had he said that it's actually pretty difficult to kind of really precisely align all these things how they kind of to make sure that the stuck frames align and you get to kind of to overwrite the right place but the deterministic structures also very important here because you want this whole thing to be stable and you want to know where you need to overwrite so the last I think yes last talk I'm going to show it's I just want to kind of show this for the sake of an example that probably maybe if you're already thinking when I started talking about this interstack exploitation because the pictures that I showed here they didn't have any guard pages in between so kind of is the guard pages solution to it all well in 2017 we had the stuck clash which basically showed but not quite so because attackers can use the again this variable length arrays to basically jump over the guard page if it's present the limitation of course where you have to find this variable length array and it has to be not fully kind of writable because if you start writing into that you're going to hit the guard pages is what you don't want to do but better talk basically showed how to kind of use this variable length array to perform a jump over guard page and then after the jump I mean you again you're in some other memory allocation which can be a child stuck like we showed in previous example can be some other memory allocation and you have basically two scenarios like we've already seen here you can either kind of start overwriting the return address of the child or you can start overwriting the data on the stock which went past so you basically kind of you are freeing your choices what to do after but the most important is that you're able to you're able to escape the stock so your stock pointer is now out of the discussed in beginning with stock overflow your stock pointer is out of the borders and you're kind of you're enjoying yourself there so now let's talk about your country measures so of course all these attacks each of them kind of did the country measures in Williams kernels this ones are very early on which are not so specific to attacks are showed just to kind of tell that well it has been long topic there has been some randomization which have been initial randomization put into kernel to make sure that it's harder for attackers to kind of figure out the places and to know what's loaded where is the stock protector for simple buffer all flows you have a canneries where you kind of have this non executable bit that is not so easy for attackers to place the payloads and just execute them so you kind of try to prevent them we have even debug VX which is trying to warn you if you have any of these areas enabled so to kind of try to see not don't do that but this is all the early ones the interesting ones come are much more recent actually after Jan published this and showed this is a talk with the cryptophase bug got the VMA based stock it was merged set of patches by Andy and he basically moved to stock allocation from body allocator to VMA loc region it's much bigger region we could afford starting to have guard pages so we started to have guard pages in between and we similar the same patch formally different feature but it was kind of developed the same type remove the thread info out of there so you kind of start to have the stock looking something like this which is much nicer from the view and now one obviously we already talked like in two places about this variable length arrays and why is it bad so that also kind of there was a big afford in case case cause leaning with afford within the kernel hardening project of VLA removal through the whole kernel and I think it was in 2018 he declared it kernel to be VLA free which is a great thing because we just basically removed attacker jump primitive for any kinds of attack including the stock and more recent thing was the merging of the stock leak it's a GCC plugin and this plugin is basically trying to kind of its main goal is trying to fight this uninitialized stock scenario so when executing on your C score and you're exiting before the exit to the user space happens what it does for you it's going to actually go ahead and clean your stock frames is kind of basically poisons it with values to make sure that nothing really staying there from the previous C score which can be reused by attackers so and this is kind of important thing to have okay so after we looked and all these attacks and very recent especially recent protection so what the open question is so what do we have left is it kind of what is the state now so what I'm trying to claim is that despite the fact that we have this protections marriage like which targeted this particular things like VLA's and uninitialized stock and this absence of guard pages what remains is still this this properties of a stock which haven't kind of which been kind of core still core things to all this attack so and so this deterministic structure predictability easier to locate and also another kind of angle to it where is this for example think some of the content measures which already we're they might not be enabled and all the gestures because they can be performance impacting and they can be quite a lot of performance impacting depending on the load so I'm giving here an example of for a stock leak with one of the micro benchmark which I have been actually asked to do by Ingo and Andy and under that benchmark of the stock leak for example I say about 80 percent overhead and that's already merged but it just indicates that it might not be enabled everyone it's a GCC plug-in so you have to enable the GCC plug-in infrastructure and do all of that and also all these protections which have talked about they have they have existed for a long while not an upstream kernel but they have kind of been merged to upstream kernel only after somebody have shown and working and explored so it kind of shows that we have some kind of not maybe an issue but like can we be really more proactive and see what we can do even if there isn't I can't show you any kind of exploit so if you have everything that I talked about is enabled and put on I don't have ready-made exploit which I can show you to save it okay we can break it all but can be more proactive so and and the feature kind of that I started working past was it's not my idea and it's not a new idea it's actually a very old idea which was developed originally by Pax team in 2003 it was called run-key stack it was part of chess security people know which is non-mainline a set of security hardening patches and the main idea was very that okay let's remove this deterministic and predictability by adding small random offset from beginning of the stack on each c-score so every time I approach this issue c-score when you start building its own kernel stack it would be kind of always upset it on some kind of random value so you will always you won't exactly know where we stuck where ptrx starts where stack frames are so every time kind of you do it it's going to be a bit different and when we started thinking about it we also kind of thought we'd always we're actually our option of putting this random offset you can also do it kind of you can either do it the Pax way on beginning or you can do it in after the ptrx structure so you basically kind of try to randomize what happens to your normal stack frames and we can kind of we consider these two options I have been suggested so the option one has some benefit when you're doing certain attacks which would kind of use the fact that the talker is able to store some reliably stolen data in ptrx so I've been given kind of this example of a talk by one of the explode writers and but the problem of this first approach is that what also been kind of also understood while talking to people is that if you have a ptrx enable scenarios and if you do some kind of a lot to do some cash probing attacks it's quite easy to figure out where ptrx are even if you have this small random offset because we can't do big offset I mean stack is so limited it's only four pages long so we can't and if you start to allocate more pages for the kernel stack we might have even longer discussion with mine trainers so you'll have to do some small offset and it might be very easy to figure it out so the event is a result to kind of vent with option two which will basically leave ptrx in the place and it will do this random offset after the after the ptrx and then from there on if you start building your stack frames so we feature a config option which and the feature was posted I think first in kernel hardening and then we had long discussion on it in LKML so you can find this discussion anytime so the feature was called this config randomized case stack offset so it did it it's the way and I think the implementation I have written a number written fully like a number of times based on discussions of mine trainers and it turned out to be like implementation is very different from Pax for chair security only with initial inspiration and idea is the same so it's kind of it's randomizes on syscall entry the Pax was doing it in exit it has I mean for this example this is adjustable of course it has about five bits of entropy for randomization so it's actually very tiny you can see the macro here it's basically what it takes because it uses this aloca call which is bound for kernel but it was actually suggested by Andy himself I think Andy Wingo who suggested it so kind of for this case we were okay using it so and it's it made it really beautiful and small but of course the devil is in detail so then we run into this we kind of it went really nicely with discussion and kind of adjustment of everything till the round out of the problem okay where do we get randomness from so when we need very little of randomness here so like I mean we talk about eight to five bits that we will be happy with for this feature but the problem is that we can see it's called path so we need a faster randomness like really really fast and then I started looking into what we have in the kernel and I tried to kind of summary most common options for you in the small table here so the first kind of and this is what Pax for example using was really times the read times time counter thing which is which is very fast speed wise but unfortunately it's pretty weak it's kind of it's at least theoretically it's considered not so theoretically maybe considered to be prone to the timing attacks so even if you use the smaller bits of the value that it returns it's still you can one can kind of theoretically say that this is not very very secure so but it was the fastest clearly and then I started looking what else do we have on the other kind of spectrum side of a spectrum for example for exit six we have the ERD run instruction it's a CPU instruction to get the randomness from the CPU it's very good randomness quality I mean it's properly cryptographically secure random generator is which is there unfortunately it's very slow so it was completely out by just just parameters which are measured when you have things like P random which is the pseudo random in kernel pseudo random number generator which is also very fast not as fast as times time counter reading but also very fast and I was first very happy with finding it okay we're gonna use that but then it turned out that actually when you look inside what it is it's basically a set of it's a linear combination of couple of alpha stars and this whole design is like 20 years old and it's actually a little bit breakable if you assume that the tracker could get output of this could get access to this bits offset so it's like it doesn't need to collect that many before it breaks it so it's fully linear and it's so it's I determined it to be like even weaker than the lower bits of times time counter and kind of route that out from the design so the middle ground which I call it was this get random bytes interface which is basically interface towards in kernel cryptographically secure random number generator which provides very good randomness even crypto level randomness but it's it's based in charge of 20 and the problem well I found the performance kind of to be middle ground and acceptable but of course like when you start talking with maintainers you have to take it in I think this number of this so I had this benchmark which I was given I believe by Andy and then I think Ingo's numbers that he said he would he said he would be sad if he sees anything more than two to three percent degradation and as this table kind of nicely shows you that you are not able to reach that goal and by the way he told me to measure everything with page table isolation off because he for us of you know it's nothing to do related to this feature but it's it's this isolation between kernel and uses page page tables which were done because of the speculative execution and it kind of the main thing for it it was it would slow down this is called performance so when I was asked to measure this feature I was said okay measure against our good old days please not this kind of slow down thing so so this is the bottom line and as you can see that like if you measure even timestamp counter against that is six percent so and of course nothing like this get run them by doesn't doesn't kind of it's it's a four percent in first case of 14 and and that seemed to be too much for maintainers to consider but again this is like micro benchmarking that doesn't show anything on on on real case scenario so if you again use real load example like for example carry on compile time you see that the percentage increase we're talking about is really tiny so it's it's really not going to kind of affect your real workload but if you stuck to benchmark micro benchmarking when you can kind of get stuck when this is where this patch actually patch that stuff stuck about it's it's it's the micro benchmark performance so because I looked into different of this option so I was looking into can we speed up get random bytes but I have pre performance profiled it but it's it's used in early run in certain cases which slows it down but when it was of course charger permutation itself which can't be kind of touched and so I didn't see any secure way to speed it up many insecure ways but we don't want to go there so in principle what kind of my idea here was that you don't have we are not in the case where we need the crypto level randomness we're not because the cryptographic secure number generators we have a number of principles or number of features which we don't need like for example backtrace resistance so that you're able to kind of produce your if you're seeing certain random numbers which produced by your generator but you're not able to kind of find out whatever previous ones we don't care because after this offset is reused it's used only once in a system call and it's not going to be affected in run time anymore so this kind of feature wouldn't make sense for us of course the future resistance so what you can't predict the future input is important but so I looked at some point can we for example if we have this B random which is super old or in a kernel can we actually substitute that or can we start to use a bit more modern kind of up to date random number generator like this PCG number generator for example which is not cryptographically secure but it still it does provide you more it provides you better properties not just random and statistical properties but still like it's not so easy it's supposed to be not so easy to invert the state from it but again it's not cryptographically secure so I'm not proposing to using that for crypto but it might have been possible to use it for this case but I have made this proposal on the mailing list and I don't think anyone kind of been interested to in this whole idea much so I kind of didn't go that way so okay so to summary so suppose we even get this marriage switch at this point I don't think we will get but is it like so we add this Tucker entomization so but one thing is it like other good now is it is done so I think we kind of important thing to remember about randomization that it's always just a kind of way to raise the bar for Tucker's so and it can't be like any randomization it can't be panacea you're usually kind of need to tide couple it to additional measures to make it robust and for example things that would be really nice to have in the kernel is things like this control flow integrity so but I think we are still not so close getting it anywhere to be usable in upstrip kernel so but it might be pretty nice so here are the references which I promised so this is basically references for all attacks I talked about and they're really great to kind of read the details and I really encourage you to go kind of study the it was a fun process for me and I learned a lot of things from this and I'm grateful to every single explode rider from this list that teach me enormous things so I don't know if you have time for Q and A or I'm fully out okay so questions so this maybe because my have a hardware background but this feels like something where hardware in the CPU could help and I can't go any more than just this feeling so are people looking at that at all so this AirDirand instruction which I mentioned that's the CPU instruction to obtain your randomness the problem as I said it was very slow so it's also possible that where I'm more been pointed for example some of the perf kind of the instructions which not security instructions but relate to obtaining some randomness from some physical kind of hardware things what I didn't want I want this to be generic enough so I didn't want to type this to any because this is we're talking about upstream kernel we're not talking about particular project so it had to kind of work on any hardware so you'd have to anyway have some backup which is reasonable backup to kind of the to work it if you don't have this instruction or you're in different architecture or something even of course this code would be executed in x86 case so in principle I could type myself to x86 and say okay we use AirDirand or something else but ideally I was really hoping to get something which is generic enough and because you will need to make this generic backup but this generic option seemed to be very difficult. This is a question regarding the hardware support as well like so you see I guess if there was a register that could maintain the limits of the stack such sort of checking could be done cheaply in from the hardware itself so wouldn't that be another like a boundary this is a stack pointer and this is how much it is supposed to how much it can go for this particular context so. You're talking about the stack overflow I think it's partly done so if you're trying to kind of now with this part of this V malach stack allocation and guard pages so there is this detection which suppose so if you get to this double fold and you're verifying that it's closed to the end of stack so it's your understanding with your overflow with stack and stuff so where is the software not hardware specific but software specific logic with Andy put where which tries to kind of verify you the stuff but I mean that logic is only basically bomb kind of it's gets and walked when you double fold and in order to double fold you kind of get to arrive to double fold and if you're kind of managing to trick this thing to kind of pass this like our jumping or something when it's what is the hardware logic you're gonna use so. Thanks for the talk Elena so I have a question about I would like to ask your opinion while talking about hardware like for example Intel CET which is using the shadow stacks which should be quite efficient and also this arm has this pointer authentication and so on how do you see using these technologies for example for protecting kernel stack? It's again it can be useable technology again the problem with that is that it's gonna be hardware specific so you're gonna have one Intel kind of technology for this thing be I don't know arm for something else so I'm different things I was trying to kind of again we're talking about upstream kernel not particular project so you try to think make something generic so in that light it's kind of they said to specific so probably if you have some concrete product you're doing on and you're able you're not limited by the upstream thing you can kind of be you can even go and I mean patches are very you can enable it with and if you're able to take certain I don't know what performance hits for example would be from using of that even AirDurant which is a CPU instruction which was completely like I was like when I looked at the performance numbers I was thinking it's a CPU instruction it should be fast and when I was looking to numbers it was horrid so I'm afraid with that my I don't know what's the overhead for that technologies but it's yeah so but if you're not performance sensitive I think you have a lot of options here and you can make it really easy so I saw a tweet this morning that LLVM support for control flow guard has been submitted for review so LLVM can build the kernel but for the compiler LLVM can build the Linux kernel kind of okay okay GCC unfortunately doesn't have it yet maybe we should switch to something else so was it the VLA's had to be removed to get LLVM support was that related okay okay I think we're out of time in this session. Thanks Elena. Thank you.