 Okay, so it looks like it's time to start so hi everyone I think I have to get started because since it's video recorded also I'm going online We have to start in exact time. So hi everyone. I'm Elena And the topic I'm gonna be talking today is a work presenting a work We have been doing for now for about one and a half years and you see a lot of people who have contributed here I just wanted to call everyone out everyone out who even contributed a ton a bit to this project And I have given another talk last year virtual talk at Linux security summit on the same topic But it was a year ago and we were pretty early in the I mean, it was the same project pretty early in establishing our kind of methodology and fuzzing setup and they didn't have much results yet and so on so so now It's really the time to go to present what like what advance you've done in the past year and also Basically what I want to go over is this is to remind okay. Why are we doing this project? So why is it needed? What are we trying to do? How do we try to approach this problem and then walk you through different aspects of a kernel? Which we believe will need hardening under this confidential cloud computing threat model and and I'm sure I don't have nearly Like full list here. So it's actually like one of the point of this talk is to also initiate the discussion to kind of To see and and kind of to make you as a clinic security community start thinking about things about this things if you're interested in the use case and threat model and and and and Basically start discussions together is okay. What else is missing? What else we have to be looking into and so on and I'm gonna show some of the results also to end So let's start. So why do we need to card and whether we need to harden Linux kernel for the confidential cloud computing guest? so traditionally the legacy VM scenario like Like you can see here if a trusted computing base of a guest, of course, it includes everything that runs inside the guest But then it also it had the full dependency on the host side So the hypervisor, I mean if you're talking about a Linux based host So the KVM and give me a new user space everything used to be trusted So the guest would literally like the KVM had full introspection into guest memory Registers it could get anything It wants and and and this has been a model for a kind of decade or so now when we think of the Confidential cloud computing and we start to have this protected VM guests the threat model changes So now the whole point of this confidential cloud computing technologies like Intel has Intel TDX MD has MD safe And I'm sure there is more coming. So the point is that now you don't want to trust the hypervisor So you want to make sure that are your guests. So you have this hardware technology. I mean, they're different They implemented in different ways. Actually doesn't matter how it's done But what it guarantees you it guarantees you that the guest is protected from hypervisor It guarantees memory protection protection of registers and so on and so forth and For example, I can I know I'm interlocked with architecture Better much better. So in Intel case, there's some special software It's called TDX module and it's it's a software module which kind of plays a role of the shim of the secure streaming between the VM guests protected VM guests and we untrusted hypervisor and it kind of makes sure that we The hypervisor can't really do a lot of bad things to the guest And I'm not gonna go into details how TDX works. So anything with that is this really separate topic But when you think of it, okay So this protected technologies give us this nice protection of this VM guest But why why would we need to think about carding? But unfortunately the problem is that even if we have it all protected and let's assume it all works absolutely perfect So let's not go to try to think like what can be problems of all these technologies They still need to be able to guess still needs to talk to the host So it still needs to talk to hypervisor for actually quite many things So we still have a lot of this Interruption primary if we have a lot of this kind of powered Operations happening so if an every guest would need to read an MSR or perform a Mami or Porta You or anything of that it will actually it needs to go and and and and then listen many of his nurse It needs to go and request this operation to be done by the host and in TDX specific case It's done with one hyper call which is TDX specific So this is just implementation detail. It doesn't matter how it's done. All I can on the high level, but we But at the end you you start to get this input So basically all now if this MSR CPU ID reads all of this With PCI config space reads, but you just kind of consume now suddenly becomes untrusted So it can be anything. It's a malicious input and same applies for the shared memory because Many of these are protected kind of I mean at least a current both I think I'm D and TDX to tap the shared memory between the Protected guests and the untrusted host and anything that meant shared memory Which is usually a lot of DMA going through is also unprotected. So it's untrusted It can have any inputs quotes cost a host in hypervisor and controlled And if you think of this this is in and if you start looking into the kernel source code This is a huge attack surface So I have some numbers for 511 while it's old numbers, but it doesn't actually change that much for current versions So it's it's it's a lot of kernel locations Basically where you would perform all this different kind of MMI reads or port IO reads a PCI config space reads most of it and drivers which is to some degree good because As I'll talk a bit later there is a way to limit what kind of drivers can run and guest But still even if you kind of drop the drivers There's still huge attack surface even in the core like if you talk about x86 in the core x86 architecture code And just core kernel code which performs a lot of these things which is now can be untrusted And when we looked into the code, it's it varies a lot of the complexity of handling these untrusted inputs Now it varies a lot. Sometimes it's just simple bit reads Masking like nothing happening where really nothing to worry about but sometimes the complexity is very It's very complex. There's a lot of struct parsing going on and things like that pointers being passed and And and and the main point here is that that code in the kernel This is the standard mainline code, but it has never been written in mind that that inputs can be malicious Maybe with exception of things like USB and stuff Which has been kind of for some level of fuzzing in things networking also network drivers Kind of all best taken into account, but not with low-level things We never thought that like you know an MSI read or you know PCI config space read can be now untrusted and now it really can from a point of view of the guest So and and a single bug if you have you know buffer of a flow out of index array and something which is exploitable It's it can actually render with Technology with fancy hardware technologies protected guest technologies that can render it useless because yes You have protected all the memory and registers and everything But you have just consumed this malicious input You've got an exploit and and you took over the guest and you're running inside the guest and you have all of it control You see unencrypted memory because you're inside and everything so that's why it's actually very important to think about this It's it's a really a model change It's a new threat model But we have to start thinking about if we want to run secure Linux kernel guests in our confidential cloud computing So what we have been doing about it is if we developed like this this approach how we of course like our focus has been for TDX But it's it's it's literally applies for any confidential cloud computing and for kernel We try to kind of do this iterative approach with Different steps. It's really iterative. It's not like step one do second third and we are done We kind of kept doing it in the loop So we try to minimize the amount of things we have active in a guest So of course we kind of minimized the drivers only enable like a small set of virtual drivers to be active and so on Disable all the subsystems. We don't need in virtual guests For example, there's a lot of subsystems We would never use or like on course and things like that Like minimizing open airports and so on so whatever we can we try to minimize Then what has to be enabled for you know for the guests to run and interact we have to basically we have to go Through the code auditing we have developed restatic analyzing tools for doing this and tracking this information Using smudge and I have had actually took last year as I said which goes in more details How we did kind of the methodology we have and I have the pointers that we enter our both Documentation and tooling which is all open-sourced both for audit and fuzzing and And and and yeah, and after the code audit or at the same time as a code It was a new to do fuzzing because of course code audit is even like static analyzer driven. It's very error-prone We might not find all the places we might miss humans might miss and make mistakes and so on So we actually developed an extensive fuzzing setup for for reaching all these locations And most of it is only active during the boot So it's actually not very easy to reach you can't use any existing fuzzing tool Like sciss-collar so anything because they're really not designed to insert Like it's really sciss-collar is more like way for the sciss-collar path and inserting all that way And now we have to basically reach all these millions of code locations like PCI config space reads and such you just cut it through code through the Kernel source code and we don't want to do Instrumentation on each of it because that would be way too much work and very intrusive for the kernel so we want to have like generic setup where the fuzzing setup can be injected, but This is like an overall thing have been doing and then what I want to go now is is really just to go through this distant aspects How how did they approach it so as I said we have this a bunch of things which Which the guest kernel can read or can obtain kind of we Can do reads for all of these things and the input and that case will come from the host We don't care because it's kind of the other way around and and there are all these things And I'm gonna go like just one by one. They're all in principle in our case down using the specific hyper call But it's just a transportation method. It's not it's not actually doing anything It doesn't look in if you think that if you could use that hyper call for do any checks You actually cannot at that point because you literally don't know anything about of this thing What is the type of thing that you're consuming so you can see yes It's a part of your read and here's your value But you don't have a context at that point the context only matters in that source code location They actually perform this part I read so so it's just a transportation method. It doesn't matter So if I start going through this one by one, so I mean for MSR's I mean for us They're kind of separated in two groups. We have a bunch of MSR's which are trusted So, you know, but like they are controlled in our case by TDX module Some of them simply are not allowed to kind of you can't read them in the guest because the features have been disabled There are some which are context-switched And they're also fine. So because you you want to go like the value would be provided to the guest upon the read natively and you'd want to do the soldier tour for asking that value from the host But then of course, we also have a bunch of untrusted ones And in our case the if you try to read this MSR by if a kernel tries to read this MSR from any point of code You will in the guest kernel you will get the special event with virtual exception insert in the guest kernel We have handler there and handle a look and say, oh, yeah, this was an MSR read Let's just ask the host because it's obviously like, you know We don't get it natively and this is where the host will input your MSR value And this is where it can provide a malicious input. So the MSR is actually pretty easy We are we have done the audit and fuzzing and because the handling of MSR is usually so simple It's like bit masking and stuff where we haven't been really seeing much of the or actually haven't been seeing any issues With regards to MSR reads We did see things like you have to always be careful because like they might enable certain features Which you might not want to have enabled in the guest But otherwise there's not like in the index out of bounds or anything with regards to MSR So but we we still fuzz them and kind of treat them Miss pay CP address are very similar again So we have trusted an untrusted case and and we have the same thing that we try to kind of disabled Here is as much as we can and this was kind of the first strategy But then our recently it has been changed that We don't even allow most of the CPU ID reads out of protected guest To the host so they only around the small range at the end which is Which is intended like the software of software for KVM things of the software Controls CPU IDs and that kind of shows most of our Problems for CPU IDs because this range is actually very little used in the core core or kernel code. So The portal is slightly more interesting. So I mean we don't support portfolio from user space So if that happens user space tries to read a portfolio from this this will be not allowed But then of course for all the kernel portfolio done in the kernel code What we did is that we have this small kind of a low list of ports Which we are allowed to read from be from the host or to minimize this part of attack surface minimization You have this example of just code snippets like what kind of port we leave open for this protected guest and And this portable filter applies for early and normal portal operations not no active and decompressed mode we kind of so far think that it's kind of reasonable compromise that we It's not very big fat in that mode We do out it and fuzz it regardless Of course taking into account that these things are only once which are opened again Porta is is is also pretty simple and And we haven't had any interesting findings here, which I guess many people could kind of think this in luck But this would be the case But but regardless you can have and because we also have to remember that all our findings are limited only to the core Kernel and set of drivers we enable so we are not gonna go and analyze absolutely all the drivers is a lot of drivers in a kernel So of course if you are running, I'll let's say if you will be running your Linux guest with some Drivers which is outside of virtio set and they would be doing some port a you and and you might need you might need to add this This portfolios into this filter least But you will also have to perform this kind of fuzzing rounds and things you can use our Methodologies to do that, but we haven't done it far between drivers. It's it's way too big kind of thing to cover So so every time I'm saying that we haven't made any findings It's always limited only to the subset of drivers and the core x86 code we tested So it's it's not of course like, you know all yes config of all the drivers and stuff So moving to MMI you are Similar situation to the port a very similar report to you but so we don't enable it from user space We have this patch which busy or the set of patches which are which makes the MMI you're kind of opt-in so so that you have to explicitly kind of Indicate that you want to share this MMI region with the host because by if we I think early on we had the Implementation was kind of the other way around that all of this MMI Automatically became shared and it is very dangerous because like you suddenly is now don't have any control of what kind of regions you're sharing with the host from the guest and And you don't know what is your talk surface now? It's all all opt-in is kind of not wrapped nicely for PCI devices So if your PCI devices authorized to have device filters are set if its devices authorized from this allow list It will automatically all of its MMI or mappings become shared But for these devices we already do the fuzzing and auditing and then we know We kind of we went through that attack surface and we know that it's okay So every time you're kind of similar here from MMI you every time you will add a new device to your allow list and things you will have to you will have to do the additional hardening and And basically follow the same process so but for the Virtio and for virtio drivers and for core kernel again We haven't had much of findings on the MMI yet But the next one is actually one which which has been the most one of the most Problematic so the PCI config space so so PCI config space. I don't know how many people know So it's pretty low-level kind of kernel code It's used to kind of obtain in in a lot of its in Device probing stages and things that gets used to obtain a lot of information about device configuration And it used to be like, you know, this is basically you really really hardware So nobody thought this being untrusted, but now all this his PCI config space is host controlled So of course can insert anything. We're kind of limited by only allowing this this traditional early this probing for CF-8 We don't allow this mcfg space type of PCI config space But still it's you can do you can do this PCI config space access which is going to be host controlled and And we have actually found quite some issues with even with the drivers We have enabled and in the core code. So I have example here, which is for 515 And this is the code from the radio or core This we've been modern map capabilities. It's it's an overtail one of overtail core code and And then you can see here it performs and this is very typical actually scenario for a lot of PCI config space So it performs bunch of PCI config space reads and now all these reads like a Highlighter just one the bar where it gets a bar while it always reads a host controlled essentially So it can be anything up to the value of the that variable now It's u8 variable and then and it's not known of course like case was presenting all those things Which like you can try to determine it compile time or it's of course nothing is known at compile time It's fully runtime and it's runtime only when it's like, you know If you actually start fuzzing and kind of reaching with the fuzzer But what what happens later if you ignore this addition for now Is that it will go and use this bar values to perform different calculation on this resource allocation? And it will use it as an index into the resource arrays And and and you can easily see and and and this resource array something like five or something like that So so you can easily have like out of bound accesses straight there and and for like simple fix like example for 515 is that you can add this checks now to basically control that your the bar value that you got now is Is is is basically have this very trivial kind of value check on on on before you start kind of passing it later and kind of Sometimes it's not even trivial because you can path this consumed and when you can do some operation It's in reassigning things and at the end all this host input propagates And when you can do some indexing and based in that value So it doesn't apply I think for 517 or 515 the code was changed was actually check already there. So this this slowly kind of gets fixed In the main line, but there are a lot of this kind of style issues and with especially with PCI config space with what we saw and are and we have the patch also which kind of I mean we do out it in fuzzing And this is how we found these things or for our methodology, but what we we find out that it's very difficult to Again, we have this such a future tax service. So we're trying to minimize it as much as we can So we have a patch have a link here Which tries to block this PCI config space access also for similar like for mmio from devices, which are not allowed But there is actually a bunch of problems with this patch So still have to figure out how to make it work correctly is currently make some issues and live with it And then some of very weird kind of side effects. We don't them yet sure what happens So it certainly works from security point of view blocks it But like the side effects are not quite currently kind of are very desirable. So So, yeah, so that's the PCI config space. So definitely problematic and are very actively used by a lot of drivers in probing stage So the I can imagine many drivers having kind of the similar issues because again from the driver perspective from that code has been written like the Person who wrote the driver never thought but lucky not can get malicious input from PCI config space read So it's this code has never been written with that in mind and the last the last one from this The list of different kind of types of inputs you can get is it's KVM specific KVM has its own hypercodes and CPU IDs And and this are kind of they were very kind of drastic about it because they all entrusted obviously call comes from KVM So we just disabled pretty much everything you can and left only the things whichever like trivially secure or Essentially trivially secure so we Kind of see if you need it to to think about it later if it functional is actually needed We haven't needed it for now, but it's definitely one of the things also to think about if you think of securing your your guests So moving on into shared memories. I said there's another big big area where you Typically what you call do you typically use your protected guest is that you set up a shared member because memory is all protected So it's private to the guest the host and hypervisor cannot seem to memory So if you want to do some DMA or anything you need to establish a shared pages Which is essentially kind of unprotected unencrypted or maybe encrypted under a different key, but it's accessible to the host so and and and and and this is also the And a lot of DMA happening as I said and all the vertio happening at least in our case It's all of the vertio is gonna happen over that shared memory pages So we have spent We have this list of virtual drivers which we enable you can see the list here And maybe we'll have some more in the future and we have to basically inspect for all of these core drivers and things We have to inspect how it handles all this kind of Data structures and things it keeps in the shared memory because again, it's all can be now entrusted and and we have false So vertio is complex kind of in many ways It has many modes and supports and kind of types and we have tried from one side We've tried again to disable as many as possible. We said, okay We don't need the virtual PCI legacy mode. We can disable your mama you and so on Support only this it has this different modes of types of weird cues and things it supports So we kind of disabled everything which is not we don't need and not commonly used and then we basically again We audited and and fast the rest for the drivers. We allow not for all there's more vertio drivers We just don't need them yet but then again here it's the catch is also to remember that we We do this for course for the for the core driver code and how the driver handles it But the driver to the end is going to obtain some payload data from the host And it's going to send it up for somewhere to process and that payload data is interested And it has to be verified separately if for a moral point of view it's considered application data You can protect it in different ways if you want like it's just like you know application data payload But we can't do anything about it. It's out of even colonel hands like after it's it's it's has been delivered In other aspect you had to think about is that of course everyone is usually like you know We want to run secure computations inside of this protected guest and so on and and and we need randomness everyone is like crypto needs randomness secure randomness and We don't want our randomness to be controllable or observable by the host now and and of course for Linux It's we have the Linux RNG which runs This is the primary source of cryptographic randomness that a lot of users base in kernel uses and its main It has a number of structure It has like a number of sources of interrupts But we if you don't enable any hardware support of things and so on the main source is going to be just just interrupts And interrupts are host observable now, so they can't really rely on this like It's it becomes kind of scary to think that now you're kind of your Entropy and and things inside of your guest this is can be controlled But at least it least observable by the host a predictable to some degree, so But fortunately if you think of deaf you random which is kind of gets an output out of this Charger 20 base here in G It has this option Implemented ages ago that you can you can indicate through the config random trust CPU You can say that you can you trust CPU hardware and the number generator Which present on different types of CPU Intel has it on this through the same early run ten er deseed instructions And then it will on every iteration of this every time you ask this random Bits out it will add it will kind of inject a fresh entropy From these instructions in our case on and and this is not a host controlled thing And then also it will do an early seeding of the generator with church a base there in G using this The third year on ten er deseed instruction again. This is not a host controlled So at least by forcing this forcing this options are basically forcing to trust this Hardware number generator. We have some way of arguing at least that we have some Independent source of entropy, which is not in the host control It's not like perfect because horses still will observe the interrupts and stuff But it's already better than just saying that lucky hosts can observe all our entropy at least to some degree and Then we also like when force things like looping and things like that to make sure that we We don't we always like we always require is to succeed and we don't go want to fall back to things which are untrusted Going further Another thing is the timers over on one secure time And we for us we have only limited trusted clock inside a protected guest it's based in TC and it's kind of limited So it's not full clock. It's lit has just properties like sync with synchronize and monotonic and There's no guarantees that it will match a real time But if you want to do it, you can kind of do your own setup inside a guest and use some Time server and then and perform using this local TC and so on you can kind of establish this process of getting the real time But what we need to make sure is that we need to disable the other clock sources Which which all kind of fall back one way or another going back to the house so the KVM clock obviously in KVM controlled and the ICP IPM H-pad and so on so so we really kind of we went and then disabled a bunch of these things to make sure that we kind of we always The only use it is see because this is this is something we have at least some Reasoning or we can have pride some guarantees about what what what the time is In other interesting kind of aspect I'd have to think about if you try to kind of think of a whole picture of protecting the secure guest is the ICP I So there are a lot of ICP I tables which virtual firmware will pass you like we passed to the guest to the kernel Apollo startup in our case TDVF is a virtual firmware is basically just a special kind of breed of We call it TDVF, but it's a special breed of EDK 2 and and and it will pass all these tables Luckily for us with tables a part of our remote attestation So you can't just pass a garbage tables and see what happens to your ICP I and I am my interpreter and your drivers And we do an additional thing of again We're going defining this allow list of tables which we even want to consider in the guest So I mean virtual firmware can try to pass as many but this is the only list We're gonna actually like allow it to be run To just minimize the attack surface, but even all of that said some of these tables are very complex and and and we have a lot of features and It's not clear like they probably don't even need all these features and It's not clear if like even if these tables that let's say attest it It's not clear if like how much of the attesting party would understand like, you know All these details where and and and you can kind of who's going to be developing these tables and you can have vulnerabilities and Interpreter and so on so this is like kind of an area We haven't really dived into the kind of we understand somewhat the risk there But we haven't had any bandwidth to go and start because I'm an interpreters complex There's a lot of things on all this ICP I kind of Table structures are complex enough and and we need to we need to be able to also in the future to think of how we want If we want to properly harden because I expect not not for all confidential computing Yes, we would have this setup that the ICP tables are at least like a tested or trusted to some degree But if especially if they are not and you can you know our input arbitrary garbage event this becomes actually a very big problem Ah Going next when when you have to think of things like what changes on things like interrupts and panics and For us like the only we in 46 to epic for interrupt handling We have some limitation. We have some additional limitations in what kind of non-nma NMI interrupts are allowed to be insert But still it's it's like the host has KVM and host has some control over what what what they can input where supposed to interrupt mechanisms so on and The bigger problem is also that all now IP eyes like between the pro different communication between the processors The virtual CPUs in our case because we're running visuals. They're all still host controlled fully So the KVM of hosts can just drop them. They can't like you will never think that they can deliver And and and this is what is used during panic So if you want if you and maybe let's say you you you reach some security issue and you want to be understand Now you have to panic or kernel the panic is kind of complex operation It's not atomic it issues with different IP eyes to different error in our case virtual CPUs to notify that this happened and the host can just drop them It's just they would never be delivered and and where is this it would be nice to make it kind of reliable We've had some discussions in it. It's not clear how to do it Like properly and and we don't really have any kind of concrete Case we can show we found where it could be a problem So we because denial of service for us is is not this outside of threat models So we host anyway is in control of kind of you know starting and stopping this protected guess We just can refuse to start it So we don't care about denial of service, but we haven't we try to think of where is the case We can think here where the consequences would be something worse than denial of service So but this is on the fact that we have kind of we're panicked in one CPU virtual CPU And the rest has never kind of keep going and we never got this notification If we can actually end up in security scenario where this is gonna be kind of we're gonna actually have a security problem So it feels bad. So that's why we want to try to think about it but then kind of ask put people kind of if people think of they'll find a way to kind of to see how we can we can make this reliable or Provide a kind of good use case For for judging why this this should be okay or not. Okay. Like like what not, okay? a Bit in the private memory management. So this is probably the only thing which is very 2dx specific even more like MD and also hundreds the private It has kind of similar somewhat similar concept where the memory also so for this protected guess the memory pages Which are private memory pages in a guest they have to be accepted before they can be used it's a security kind of property of the architecture and It's typically a lot of it is accepted by virtual form there when it starts building with kind of memory layout for the guest But not everything because it's very hard. It's very kind of performance It's not optimal to start accepting a lot of memory in advance So to some of it has to be accepted by the Linux colon in runtime And it's important to not to re-accept twice because what is going to happen in kind of you know by hardware is that if you Re-accept the memory page twice the page is going to be fine, but it's going to build zeroed. So if you happen to run some, you know You're running some application. It's a page. It had some secrets there I know your keys and and and you re-accept it You kind of you you you tricked your guest to re-accept a page And then suddenly the page would be wiped out and you're running your AS with zero keys It's not going to be very secure and you have no way to even detecting it as this from the process which uses it So so there's a logic implemented which kind of takes care of that And then we have also another angle is that we need to make sure because we constantly get this virtual exception events when Something is not behaving like like a shoot inside a guest kernel And and we have this crucial code sections like syscall path and stuff where if we get the this virtual exception It can be security problems So we're also trying to block this basically we have a couple of cases here It's not even maybe very interesting which will generate us which we can get So the host basically the KVM and or Malik Harperweiser and the host can trick the guest to get to access to do certain thing and and And then it will generate this point of the and you can try to time it when the guest isn't executing in particular place Let's say it's entering the syscall path and it's just about to switch. We it's still running like with the Kernel stack and and and you can have all kinds of bad things happening So we're also trying to kind of fix fix that one to make sure that we don't get these events But it's it's still kind of this kind of things you also have to think about but this is a bit more specific to this memory management of the of the hardware And then the last one last slide before going to results. So we also started to think Because everyone always thinks about this transit execution attacks So all these speculative side channels And and it's actually very complex topic and i'm not even fully qualified to kind of talk about it But we started to think that Our guests again like guests is run inside. We protected guests They do have we have responsibility for mitigating some of these attacks. It depends what the list is kind of depends on the On the type of attack on what mitigations are done by the hardware And in our case what is the rest is done by tdx module and so on. So it's kind of complex But there's one group which is spectra we won which you definitely know we have to protect There's no additional mitigations happening anywhere And there has been a lot of work and past done to protect to kind of study this attack surface between the user space Again between user space in a kernel and try to find all these gadgets and try to secure the kernel But now we actually have exactly the same problem. Just the attack surface is different So now we have to try to find respectory one gadgets not in the user space kernel path, but between the vmm the hypervisor and the And the guest and and what we did is that uh smudge the static analyzer It has a pattern an existing pattern has been for a while where it's called check specter You can use it to and it was a while As I said, you can use it to check the potential spectra we won gadgets between the user space in kernel And we have and it used uh It basically what it did is is used this internal smudge structures to track the We basically all the inputs you can get from user space and how it propagates through all of the kernel To the extent smudge can manage of course and then it would always ask Okay, is this index where kind of you know of a bound is this index belongs to Has been influenced by the user space input So now we've done the change. So we have now the similar basically tracking done for the host input So we have a way to ask okay Is this input is this variable and the kernel has ever been tainted with the host input So that is all this msr and so on and mmyos input And if it has been You can use this flag and you can run this spectra pattern It's already like a merge to smudge and everything you can say to it Okay, please just kind of tell me put this analyzed whose input on and it will show you potential Can you run it for a whole whole kernel? Source tree and it will show you potential gadgets for Essentially this new attack surface. There's a lot of gadgets there. I mean potential gadgets I'm not saying every one of them is a sexual problem. But we have to start Looking into it. Of course, there are false positives there Luckily most of the findings are again in drivers and we disabled most of the drivers So but there are also findings in the core code and this just started looking in it It's it's always so I kind of keep repeating this but it's it's it's a big attack surface Nobody has been looking before which has exactly same set of problems I mean they're very kind of similar Maybe there's a lot less going on of course like because there's no huge user space. It's still like, you know this hypervisor Guest interaction, but still it's a big attack surface and they have all kinds of problems We have worked for years From user space and now we have to kind of start thinking it on where the attack surface and it's not tdx specific or anything It's it's it's actually like it's it's linux specific for this threat model So let's see how much time I have so just to kind of very quickly This has been kind of overall hardening for the code that we actually have enabled we call it tdx guest code But again, this is just linux mainline. We kind of we we run it for most of the work we've done It's 515, but it's all transferable. So as I said, we use we use they do the round of audit We do runs of runs of auditing. We produce this Automatic list of findings all these code locations where this this input is consumed We run it for some filtering with the manual analysis We kind of classify this different with different kind of We have different rates So we we can make it suck some code looks safe and easy handling we can market a safe or some code looks concerning We can market a concern But all of this is just manual audit We need then we do the actual fuzzing setup and our fuzzing setup is pretty elaborate using this kfl fuzzer For most of the fuzzing we also use the kfx fuzzer, but it's it's usually it's just limited But we basically do the fuzzing runs on all of this Kind of all of this attack surface and then our normal fuzzers what we give you they'll give you a code coverage And and linux linux source code is huge. So if you just collect a code coverage and you get some number Let's say 40 percent It's not going to give you any information about how many of these actual points that we want to reach this ones Which we labeled let's say safe and concerned because we don't want to reach excluded code or Something which we trust we want to actually have precise information How much of that code we can reach and and we have also kind of this cross-referencing and and matching and Against this auditing results and at the end we actually get the coverage information The need so that we know that how much of that code of our interest we reached with our fuzzer And then we can start tuning fuzzers and and and get better at it Don't have time to go through this just to show that like this is an example of what we end up with The boot we have about 24 harnesses and we started to create also user mode harnesses and all kinds of different parts of the boot and and They depend like sometimes we have very big harnesses when we kind of okay, there's not too much things happening there of interest So we are care of big level harnesses for something which is like, you know, virtio and things we want to have much more precise harnesses to To make sure we can because there's a lot of things going on There's a lot of input consumed to the host so we want to kind of fast this faster So that's why how kind of harness this logic is done and we started to look on this user mode harnesses Which also kind of very important for us Some results since I have to wrap up. So this is for 515 Just to give you kind of an idea of what kind of things we're seeing so we we have like we have our Auditag distribution So a lot of code we likely can exclude and and this is only for again Most of the drivers are already dropped and non x86 code is dropped So this is i'm not even kind of including this into numbers But even of that core x86 code which is left and the drivers we enable We have this kind of distribution of attack How many concern kind of places we found and so on we have the fuzzing coverage This is already adjusted per this thing Above so how much of this safe and concerned code locations we can actually reach the fuzzers And it's not you can see it's not hundred percent, but we are working on it And then especially like for concern items sometimes this very corner case is how you need to reach this This particular location you need certain things to be enabled and so on We have found a bunch of back We have had so the audit basically this concern items means that we have found some issue in the code And then the we have found also some bunch of things fuzzing And and we actually like we're finding things much faster when we're able to and process them because This is that nobody has been looking into this before So we have we mostly trying to look into different Kazan warnings and things like that and we And and and very we have patches already and patches the public Some of them like visible numbers for 515 as I said some of these things already got fixed later in 517 or 515 So it might not apply But just an example like of of of things we had time to look into And many of these things like like any of these fuzzers or any of these back fuzzing Kind of Foundings you would need to kind of actually fix the problem to be able from fuzzer to reach deeper So it's it's it's kind of to some degree. We are filled with still scratching the surface and and We are and this is where I want to go to a discussion point because we I had like a bunch of people in first kind of slide saying people who have contributed in past and so on But at the end we're very small team now. We have I mean probably about four persons not even working on this full time So Going forward This is actually a big afford if we want to have this Attack surface secured From the point of view of the so if we want to linux kernel to be secure under this confidential cloud computing threat model So we really need to kind of collaborate with community and kind of establish some kind of a community Project or I don't know how to call it But like we need to really establish continuous fuzzing of this attack surface. We have the tools now. So which which Which work because like things existing tools and stuff we had to develop the The new tools because things like cscolar as I said and stuff. We're not applicable here We need to do a lot of boot time snapshotting and things like that for it to work efficiently and And and we really need to start doing it for mainline kernel because code is she being checked in all the time Every code being checked in could introduce this with problems People have a way should have a way to easily kind of find these things and there are probably even many more Hardening aspects that we even haven't had time to think about. So we just I repeat we just released kind of We're still Regardless of that who have been working on for a while the surface attack surface is so big that it's still like Very early on in the kind of you know, really trying to So kind of claim that we have this attack surface secured So what I'm hoping to do today in the afternoon. I think we're both sessions Would be so I would like to invite anyone who is interested to discuss about this to a both session for this confidential cloud computing securing guests so And I have a bunch of reference here. So as I said, we have a documentation public Follow methodology same things about all these different hardening aspects of the kernel It's all kind of written down. So it's easier not slides All our tools there's one repo, but it pulls like millions of different repos and pulls from the fuzzers and things So but it's kind of nice setup. You can try it doesn't require any hardware You can try it. We have a minimal emulation setup for our like tdx case So you can run it in any hardware you have essentially more than more than enough Like not super old system, but it requires some pin and stuff And then we have the tdx case kernel is just where we have security patches and stuff which haven't been sent to mainline yet So, yeah, so that's I guess all I think we're out of time somewhat, right? So maybe if you we have a both session as I said that we end and you can catch me I think on the brakes and things like that. So thank you