 Hey guys, thank you. So welcome to my talk. So this is my first time here at Nordsec and I'm very proud to be here today really This morning, we've had amazing talk. I learned a lot of stuff Lots of new things. It was pretty cool. So in my talk today I have something that is pretty cool that I want to show you. Well, at least I hope you will find it cool, too So basically this is a project I've been working on on and off for the past year or so and Every time I work on this thing I keep learning new stuff, new tricks, new ways of doing stuff and As a matter of fact, I Have something big that is about to come out Something that is using exactly the technology that I'm describing in my talk today But unfortunately, I haven't been able to pull it out for Nordsec Because I need a couple more months of technical implementation and things like that. So if you follow me, you'll see what I mean All right, so So in a nutshell, I'm about to show you guys how to set up a cache timing cover channel So that two virtual machines that are co-located on the same physical box Can talk together. Basically that they can modulate a signal through the last level cache the L3 cache Using a using a technique of cache and reload flush and reload The technique I'm going to show you is is is a practical implementation This is not just a proof of concept or theory stuff It's something that works for real and when I when I mean something that works for real It's something that is stealth Something that let's say you have your back door installed in one of the VM No, no one is going to detect it ever because it doesn't it doesn't take too much CPU resources. It's really stealth So just one thing to notice before we we start for this trick to work Your virtual machine have to be co-located on the same physical socket. They can be floating on any core It doesn't matter as long as they are on the same physical socket and the reason being that L3 cache is shared across sockets. It doesn't span across multiple socket machines So if if your if your virtual machines are co-located on hyper-treaded siblings on the on the same core Right, there is other trick that you can do to do to modulate the signal across the two VMs For instance here in this picture. You can use the the L2 cache You can also use the L1 cache or you can use the CPU resources that are shared across I cross the two hyper-treaded siblings And actually this is this is my first example. I'm about to show you guys All right. Okay, so before we begin Usual disclaimer so research. This is a research thing on my own time on my own network This talk is reflects my opinion of the one from my employer Information code and things like that. It's for educational purpose only. All right So yeah, all right, so let me introduce myself And also give you some context of how I ended up working on that project in the first place So my name is Etienne Martz. No, I'm from Ottawa, Canada. I work as a Linux kernel engineer for Cisco system When I was a kid that was really fascinated by radio and electronics and the concept of modulation and things like that Only much later it's ring my electrical engineering study. I finally understand what was going on But then after school I ended up I got a job ended up working on on the Linux kernel hacking around and things like that So I totally forgot about this modulation thing up until very recently about a year and a half ago Where part of my work I was doing some low-level performance analysis on on on KVM I noticed something strange once in a while. Basically. I saw some sort of cross talk going on between two VMs So I dig deeper into that problem back then and I realized that The problem I was observing was only manifested when the two VMs were co-located on the same physical core But two different operating pair So So then I did some more research and I found that nice picture from Intel So basically this thing is showing that some operation when you are running with hyper threading enabled some operation Instead of running in parallel. They have to be serialized one after the other so that's that's the picture that is showing that and So so that explained the result I had but then then at that time that that was exactly at that time I and I got at that idea. I said to myself. Well, what if from VM number one? Let's say I modulate a contention pattern over the execution pipeline Let's say a long instruction multiply is a one as short instruction is a zero. All right And now what if on the other VM? I try to detect that that Contention over the execution pipeline. So by measuring the time it takes to execute a certain instruction if it takes long Well, it's a one if it's fast, then it's a zero. So This is exactly at that time that I realized that it should be possible to to kind of modulate a signal across to VM using this this contention pattern tricks and to be honest at that point I was caught so I Ended up spending quite quite a few nights actually playing around with this thing this pipeline contention The phase lock loop and a bunch of things that comes with it I was really curious to see the nature of the communication channel and So so naturally I decided to try to transmit an image so that I could see the resulting output on the other side, right? So obviously this this was not the first team age. I tried to transmit But I repeated that exact same experiment for us here at Nordsec So this image is a six hundred forty four hundred eighty VGA quality one bit per pixel thing And then and then when I say send that image to the other VM, this is the result I got So it's all blurry But we can we can still see some pattern we can see the the the logo there but you see that there is lots of noise and Yeah, so basically at that point in time way back then I said this is at that time that I realized that there was a big security problem with this whole technology because if I can do That it means that I can probably send data. No problem, right? and Yeah, by the way before we move on to the core of my talk I Did a video basically I'm repeating this experiment, but instead of just sending one picture I'm just streaming them in real time at 15 frames per second and then we see the effect of what's going on on the system So let's take a look at the video here All right, so just before we begin So on the bottom left I have what I call my noise generator, okay? So it's compiling it's a compilation of the Linux kernel on all this all the processor So this is triggering lots of noise lots of lots of lots of lots of crap is going on in the execution pipeline And we will see the effect on the stream. The other thing I want to mention is I'm running I'm running the mp4 encoding software in order to render this video on the same machine Then then I'm doing the experiment So that thing is trig is causing lots of noise on its own if you don't run the encoding software It's really more. It's really more clean than that. All right, let's take a look So the stream is going on real time right now 15 frame per second and now I'm starting and stopping the Compilation of the Linux kernel you see that so remember this this Streaming technology is using pipeline contention. It does not have anything to do with cash It's just a pipeline contention that is happening between two upper-trended siblings. All right So let's go back to the talk here Thank you Okay, so now you understand where I'm coming from with this whole thing and design It's time to for us to take a step back. Okay, so my goal was to come with a practical implementation Not just some research stuff Why because I wanted to prove that this is a real issue so that we have to fix it and by the way I mean, this is VM right? I'm using virtual machine. It works on a yes ex and so on but With container it's even more easy. All right. We it's it's just like VMs is the most amount of challenge but container. It's super easy to do So in my talk today, we will go over I'm going to show you all the challenge involved in the design of such a communication channel We will go over exit the six shared resources I'm going to show you what they are and we will go over the fundamental concept is behind the cache line and coding and decoding We have to go we have to get around the hardware prefetching logic and I'm not disabling this from the BIOS Then there is something new that that I found recently. It's basically the CL flush instruction There is a way pretty pretty nice way to abuse this instruction to to basically create a bidirectional handshake for free Then we will look at the persistency of data the noise that is present on the host in the gas in the VM and so on Then we will also take a look at the guest to host page table the obfuscation sounds complex But I'll show you what it is. It's not rocket science At the very end, I'll show you my implementation of phase lock loop and high precision timer This is the key aspect for that thing to work. We will go over some detection mitigation I I've got I've got a video at the end of the the talk to that basically describe Reverse shell you'll see All right, let's take a look at those shared resources and and how they are isolated So with hyper threading enabled like I said at the very beginning, there is lots of possibility for for modulation You can do pipeline contention This is the first example. I show you guys you can do modulation in L1 modulation in L2 I pre-treading is typically disabled In production environment at the bottom here, there is a link that discuss that way back then in 2005 and so for that reason The you can use the the last level cache L3 cache to do your modulation So now the problem is of course if your VMs are being assigned to different socket on large system. Well This whole L3 cache modulation technique isn't going to work because the L3 cache is not shared across Multiple socket, but then there is the buses that connect all the socket together In a certain way and there is a core a rancy module into that thing is so You can actually do modulation using the core a rancy module. I've done it. It works, but that's not my talk today I can't take question offline after the talk if you want All right, so now it's time to deep dive about the The way we do we encode data inside the the cache so a cache line is typically 64 byte okay, so And and and here the way we do it is One bit corresponds to one cache line So when when you read a byte that is not in the cache Let's say you want to read a byte of memory that is not in the cache Well, it's the whole cache line that is brought in from memory all the way up to the to the first level cache So that you can read it and get access to it so Here are the basis of cache timing modulation rely on the fact that when you when we read memory We can measure very accurately using the TSC Counter the time it takes to read the cache if it's fast if it's an L1 Sorry, well, it's gonna be pretty quick L2 a bit slower L3 a bit slower and main RAM way slower. All right So now the trick to do the encoding is assume again one bit is one cache line So you take a specific cache line you either load it or flush it So you load we assume here. It's a one and you flush. Well, you assume it's a zero, right? and now for the decoding part what you have to do is you have to Access that memory where the cache line belongs to and then you read it You read that memory if it if it comes back real fast It means it's a one because it was already loaded if it's slow, then it's a zero, right? Okay, so I guess it sounds simple, but let's dive into the the core of it So so here the first thing I have is is a very simple client and server test program There was no virtual machine in the picture at that time. Okay, it's bare metal running directly on the host And the cache line that I'm using they are coming directly from from shared memory, okay? So there is no address space to worry about shared memory cache line So the client is actually encoding a pattern. That's the pattern that you see at the bottom here and Once the encoding is done the client just signal on the mutex the server wakes up do the decoding and Here the decoding Is all messed up, but there is a clear Pattern in it, but there is something weird going on. Okay, so I had to take a step back And I wrote a simple test program that flushes all the cache line from let's say zero to 100 and And and and I read them back and I measure the latency that I I'm expecting that it should be a long latency for all the cache line and Now you see the result Some of those cache line they show long latency 240 200 plus is long latency, but lots of the other cache line in the picture The they are all they are already loaded So there was something else that was going on in the background here So this is really at that time that I had I learned about prefetching, okay? So prefetching it means really bringing data or instruction from memory before into the cache before they are needed The the system I was using Had some some fancy algorithm for L1 and L2 they are different But at the end of the day prefetching means it the guys Those algorithm are trying to predict in advance what what address will be needed in the future So for the stale operations ring the The processor just fetched those those address so that it speed up the execution hopefully if the algorithm are good So I just took a picture here So the other way prefetch prefetch here is one of those things that you can normally disable at the BIOS level Obviously, this is not what I'm doing here because and with VMs. We don't have access to the BIOS and that will be cheating, right? so We have to find a way to to work around it So I just you know experiment a couple of things and then and then I did the trick that did it was it was Essentially to randomize the cache line access. So I had to do the randomization within a page and all I also had to do the red the randomization at page level and When I did that it just confused the hardware prefetch here entirely and so then at that time I got I got a clean picture So now there is another problem that I face okay, so what happened if You wait a little bit longer before doing the decoding. Let's say you wait you wait and you wait well We clearly see that The the time from when you when you encode the data to the time from when you decode the data has to be very small otherwise The other stuff that is running on the system will pollute the cache and erase your data third totally So in other word The encoded data that you put in the cache will evaporates pretty quickly So this is even more important for us when we are running in VMs Because with VMs, there is lots of noise. Okay, and believe me. I'm about to show you So Here I've done a couple of experiment. Okay, so I'm basically I'm using a calibrated software loop that takes exactly two CPU cycle to execute and I'm my tests consist of running that loop 100,000 time so I'm expecting 200,000 cycles to run that loop. Okay, and I'm repeating that test one thousand time Okay, so now if you are on on the bare metal inside the kernel with all interruption disabled This loop is going to take 200,000 cycle over and over and over again because there is nothing that disturbed the loop There is no one that is running. Okay So now if you are on user in user space just above the the kernel on the host directly There is some noise that is going on and that's expected like the small spike that you see there They are that the timer interruption on a per CPU basis and the bigger spike They are actually the network interruption that are coming on CPU zero now if we go in the kernel that is running into the VM and in that kernel if we disable all the interruption Remember, we won't be able to disable the interruption on the host But we will only be able to disable the interruption Inside the VM. Well, then there is slightly more noise because there is an hypervisor layer that is sucking out cycle here and there and finally If we are in user space inside the VM, you see that there is quite a bit of noise Because it comes from from the guest kernel the kernel that is running in the VM It comes from the kernel on the host all the interruption the guest kernel has his own sets of interruption The host has his own sets of interruption. It comes all together. There is the hypervisor and everything By the way, if you make the mat here, it looks pretty bad But it boils down to a two percent degradation from for a compute load So it's not too bad, but still there is lots of noise Alright, so now that you understand the noise. We know what it is You know what the hardware prefecture is let's go back to our original test program But this time we put the client and the server in different VM and Then I realized That there is another problem So the cache line that we were using before to do modulation In L2 or in L3 those cache lines are tagged by the physical address, but but in the VM The physical address that you have Has nothing to you cannot see what's going on in the host basically There is another layer of translation and then you as this picture is showing you don't have access to that Information so you don't know what is what from physical point of view So this is a this is a complex problem to solve to try to to decipher the the page table from a VM I don't think it's impossible. I I've got a couple of implementation that Got it almost, but that's not that the talk today Because we don't actually with with KSM. We don't even have to worry about this problem. That's the beauty of it So what is KSM so KSM is a feature in the Linux kernel. It's it's used with KVM Kernel same page emerging. So basically KSM enabled a kernel on the host to scan running programs and and compare their memory Okay, and then if there is in that they called pages in different program KSM detects that condition and Merge them together on on just one. Okay, so If of course, I mean if there is a program down the road that wants to modify one of those shared page KSM KSM will kick in and do the unmerging So this feature is pretty useful with virtual machine Because you know like the guest operating system image can be shared across across other Guest operating system you made so you end up saving lots of memory All right now going back to our test program So remember now I have to find shared cache line across two different VM. So I'm thinking about KSM All right. So what I decided to do here is each the client and the server they create let's say in memory they create they create a Per page unique pattern that is the same Across client and the sir across the client and the server The idea is that after some time KSM takes some time before it kicks in to do the emerging KSM will detect those pages and see oh those page are the same and so KSM will do the Will kick in and it will do the page deduplication for us So at the end of it it means that those page are going to point to the same physical address on the host And so the cache line are shared One more thing before I go to the next slide side comment with KSM you can do pretty pretty cool stuff such as let's say for example in identifying the Operating system or the application that are running beside you in the VM All you need to do for that really is You load into memory into your home memory the image of what you think is running beside you You wait a bit and after that what you have to do is you basically write to that memory And then you measure the time it takes for the right to go through if it's Slow if it takes time it means that you got KSM involved on the host and then you have a match for your in notification Okay, so the other problem that I realized with this scheme is There is no synchronization primitive that spawn across different VM, right? Well in reality there is mechanism out there IVS HM is just one example, sorry But this is not enabled in production environment IVS HM. You can signal a semaphore across VMs and so on so So it's a bit so we basically need something to replace the mutex that I was using to signal the server and the client Why well we want the server to run right after the client Because if we wait too long the data that we put in the cache is going to be completely gone So ideally we want the we want the client to run and the server to run right after if we wait too long Gone all the data is gone Okay, so there is a couple of option That I taught true. I mean one of them is basically okay I'm gonna forget all about that synchronization aspect because it looks pretty pretty complex to solve and With some error correction Set to the roof. I I'm can probably achieve some some Transmission so the idea is basically the client and the server they are floating around in their home time space And after at some point in time, they will kind of overlap and there will be some transmission So it will give a very low bit rate Obviously and also the CPU consumption is going to be low that that's good because that's what we want We don't want to be detected by anyone that is monitoring processes on the system. So we don't want to burn CPU Option to so then the other option is basically Well, I'm just gonna have a while one loop on the client side a while one loop on the server side And I'm just gonna add just the client to run slightly faster than the server so at some point in time those guys they will kind of overlap and When they do it's gonna provide an okay bit rate the problem with this implementation It's super high CPU utilization. I mean anybody that monitored the system will detect that there is something wrong going on right So ideally we would like to stay below one percent of CPU usage But I'd say that One percent is still high with the new technology. I'm using it's like more like point one percent Okay, option number three is really the the option that I'm using so this one it basically implies that client and server they agree on on a common period okay tea and We we set up we set it up so that the client and the server they lock into phase using a phase lock loop algorithm So the idea is you have a server. It's in your backdoor VM It's running and it send a sync pattern at regular interval that you have predefined and then This sync pattern is really like what you find in vertical TV like vertical sync for those of you who have seen that before And now I'm just gonna make a deep dive on to that sync pattern because that's news new stuff. I found lately so Remember we want things to be undetected so our server that is sending that sync pattern We don't want that guy to suck CPU right? We want our sync pattern to be pretty efficient Ideally point one percent will be the best thing But how the client will detect that sync pattern in that whole noise and stuff like that is a challenge So this is where I bring the magic of CL flush I'm implementing a bidirectional handshake with this one CL flush Instruction is a it's an exit the six instruction It this thing is normally used to flush a specific cash line So that the other guy that is running on the other side can measure if that cash line is loaded or not So here the idea I had is I'm using CL flush In a different fashion instead. I'm measuring the time it takes To run the CL flush instruction and then if you look if you think about the microcode This is just pure speculation here, but I mean it it works with that with the model The microcode of the CL flush instruction probably the first thing it does it basically. Oh, is this page valid? Yes, or no. Yes, it is valid. All right. Next thing is this cash line loaded or not. Yes, or no Well, no return right away. That's the fast path. Yes Run the eviction algorithm. That's the slow path. Okay, so the time there So the trick here for the synchronization that I'm doing It's basically on the on the on the server side. I do CL flush. She'll flush for time Let's say okay four times CL flush. So every time I run the 10th instruction I expect the time to run the 10th instruction to be small, right and On the RX side what I do is I basically load this cash line load this cash line load this cash line and load this cash line So I'm expect that to be fast The only condition where this thing is not gonna hold true is if the two guys are perfectly in Sync one on top of the other Okay, there is no way that the noise on the system can generate such a pattern because the TX is Basically running for CL flush instruction in a row Spare is spread apart by 200. Let's say cycle So there is no way that another program out there will access that same cash line and flush it And load it for time in a row. It's almost impossible only the RX side can do that. So that's why it's a pretty efficient sync pattern Okay, so Like I was saying so now we have the sync pattern. It's pretty efficient So the thing that is going on is the client that side. This is the machine that we control That's the VM that that belongs to us. We hold that VM. That's the back door on the other side And so that guy is doing a swept scan and basically look at the noise going on He's seeking for this sync pattern. Okay over and over again. I don't care how much CPU that guys burn It's my VM. I control whatever, okay? So When the once the client is basically finding the sync pattern that the server is using is basically locking on the face And this is where we can do our transmission but the big problem I had and that's the key aspect to my design is For that whole thing to work. We have to have a monotonic pulse. Okay, it means that really I mean like our to I don't the server and the the client they have to stay kind of inch in the in phase So if you are using a timer You have to come back always at the same time So in order to keep the sync you can tolerate some jitter But not too much because remember all the data that we put in the cache will go out very quickly So the problem is how can we achieve a monotonic pulse with Linux? Let's say or Windows I mean the first thing that comes into mind is to use timers. All right So timers are good because we need to sleep, right? So we can define a period so that over that period of time we just burn 1% of CPU But there is a big problem with timers. It's there is jitters with timers. There is lots of jitters This is a frequency distribution graph log log scale and as you can see With timers, there is there is lots of jitters in the order of 100 microsecond for example Actually to be fair this this is pretty good data. I mean I'm running this test in a VM and You know Linux is running on on the on the gas in the hose and so on so I'm really amazed how small it is but For the stuff that we are trying to do here. That's way too much So on top of this you have the jitter. I mean like both client and server. They are subject To the same jitter, right? So that's gonna be way too much. I mean they're gonna go like all over the place So the idea I had is to basically compensate those timers in software Using you to some value that goes above Something I define the maximum jitter. So in theory this should give us a monotonic a nice monotonic signal There is one catch here is you have to be very careful with with this because the compensation Whatever type of compensation that you do is subject to noise So in other word what I'm trying to say here is more time you spend trying to immunize yourself to noise More noise you end up accumulating So and you know obviously also it's the compensation on its own will burn CPU But that's fine. I mean all we have to do is to stretch the timer period to something bigger And then we can compensate into this one. It's a tricky problem at the end. I got it right so in short The compensation Algorithm is using a calibrated software loop that is kept in check at every single point in time with the TSC This is the result. I got Okay, so that's my 2.4 gig machine and on Nidale system the jitter I have on On my timers is in the order of 50 cycle On a loaded system it goes to 300 cycle I mean if you compare that with the stock implementation the stock implementation was giving 240,000 cycle best case 24,000 cycle. I mean that was way too way out of bounds. So this 300 cycle 50 cycle. This is the type of the stuff that we need for to be able to transmit data over cash So I'm putting the the data in perspective with the original latency graph here as you can see it doesn't even show up on the scale And as you may Already understand this this is the this synchronization aspect is is really the key behind this design It basically it enables communication to happen to happen with very low noise and at the same time it consumed just a fraction of the CPU Okay, so now I'm going to recap What we have so far so we have an encoding and a decoding scheme based on memory access time One is a slow zero is fast We managed to get rid of the hardware prefetching logic without fooling around with by us We found some physical cache line that are shared across VMs. Thanks to KSM Now I'm sure some people here are asking themselves. Well, what if KSM is not enabled and so on no worries I mean the cache has an associativity level. So if If we don't use KSM, we can still rely on the cache associativity To do the the encoding it takes a bit more CPU to do but I mean Really, I mean from from the result I have it doesn't make a whole bunch of a difference We have a very efficient way to synchronize the client and the server We have a phase lock loop that maintain the phase and the enter VM synchronization Less than 120 nanoseconds apart. Okay. All right. That's time. This is now time for a demo Okay, so So this is Let me just look into this one for a second. So this is the the original Experiment that I did at the very beginning at the very beginning. I was using pipeline contention You remember all the noise that was coming on the other side. It was pretty pretty ugly. So now with this Cache Last-level cache modulation I can achieve much better result You can see that I mean, you can obviously see that there is there is noise on the on the channel, but this is expected look I'm not doing any error correction here whatsoever so but the noise level is pretty low I would say Then what I did in this video is basically I repeated the streaming experiment but this time This time it's with this this cache techniques. Okay So the first thing that I want to mention is and that's kind of cool is Basically when the transmitter, okay, the transmitter is on the left-hand side. So when the transmitter is not running The receiver is picking up on Whatever is running on the system. He's basically picking up the noise of the underlying system and to me This is this is pretty cool. So right now you see like in this picture on the right-hand side It's like crappy noisy, but this is because of the MP4 and coding software that is running That guy is generating lots of noise if you take this out. I should have taken a video from my phone You see that it's pretty clear the pattern that is coming out of this noise and as a matter of fact if you do a Frequency analysis of that noise it basically it gives you a spectral signature Of the underlying operating system that is running at the host level. So you can find out Assuming you have a database to compare to who is running on the host with a very very High level of accuracy and this is the type of stuff. I'm working on these day And I wanted to pull it for it for today, but unfortunately, I haven't I haven't had a chance. All right, so let's look At the video one last thing. There is no compression. No error correction. It's raw raw data that is going on So again, I've got that noise generator noise generator that is running in the background Compilation of Linux kernel on the bottom left window. Let's take a look All right, so you see the noise So that's the video and coding software that is generating so much noise Now when I start the Linux kernel, you see it becomes white because There is so much noise in the cache all the cache is being evicted all the time. I mean Lots of file access lots of CPU operation It's it's obvious and now if I start the transmitter, okay So the transmitter is on the is on the left hand side You see what is being received on the other side You see all you see that there is noise and actually this noise is is triggered by again by the mp4 encoding software But if if you look when I start the compilation of Linux kernel, you see like lots of background noise going on All right, so let's go back to the slide deck Thank you. All right, so from so if you make the math about the bandwidth here So this video was transmitted VGA quality one bit per pixel It was interlaced four time 15 full frame per second one frame is is this quality one bit per pixel It boils down to roughly 4.5 megabits a second both sides are using 50% of a CPU so I could crank it I could double that essentially all right, so Now it's time to look at something a bit more useful than trying to stream data out So this is really the reverse shell example. I was describing to you guys So first thing that I'm going here that I'm doing in this experiment. So the two windows Basically, they are like, you know on the top. They are to the two different VM obviously And so the first thing I'm doing is I'm running the server in loopback mode What does that mean? It means essentially that the server doesn't process the command. He's basically just dumping whatever the client is sending So client happens to send Exile the symbol a so you'll see what's going on here. Okay. I'm launching Okay, so the server is running in loopback mode. The client is also running in loopback mode So the client is so this is the search for the sink Then there is a lock and the client is sending. So now the server receive Stuff, but you see that there is crap and in whatever the server is receiving. That's expected There is a couple of bits turned on compilation of Linux kernel trigger more noise again. That's expected So now I'm I'm repeating the same experiment, but in this in my program I have what we call forward error correction ECC So I'm turning on ECC dash capital F and with ECC things should be Corrected, right? So the server is running on the left hand side client I'm just gonna start the client with forward error correction enable and We should see that things are clean, right? Yeah, obviously things are clean because Errors are auto corrected by the forward error correction read Solomon. Obviously if there is more noise than what? The error correction can do the program will display like, you know Star star star because it it basically cannot auto correct so many bits. I'm just using For parity. I'm think I'm using actually just 16 bit for this one I'm just using 16 bit ECC for 240 bit payload so in theory I just could crank this to the roof and that that will make it more resilient to To the noise so now I'm removing the loop back And I'm gonna run the server in a reverse shell mode So that the client can execute command on the other side So when this thing happens The server is Does not burn CPU So I'm running command on the right hand side and the river shell on the other side is responding to me I'm looking at the process that are running on the other VM right now That's the timing channel guys that is running on the other VM. He's running the server shell. I can Actually, I'm I'm gonna dump a file right now That's the that's the cut that that's part of the source code for this this timing channel one of the file so Yeah, basically, that's the river shell that is going on I'm doing I'm just doing exit the exit is actually is going to exit on the Server side obviously All right, so that was a reverse shell example. Let's go back here to the slide Okay, sorry Okay, so now the thing is what can we do? You know to avoid that type of stuff. Okay, so first thing disable your KSM Okay, actually newer implementation of the kernel. They don't do Page d-dop across VMs. So that's fine per VM policy essentially. So you get rid of the inter VM shared page Good stuff So for your flush and reload won't work not for free at least And then so you won't be able to do OS applications Fingerprinting but you will suffer how your memory costs for that The defense my my stuff I'm doing if someone does that well say, okay, forget KSM We will work at the cache set level instead. So I don't I don't need page d-dop. It's a bit more expensive But you have to run a couple of the algorithm so to find your set merge the set and so on so you have typically you have 16 16 set to work with instead of just one So one of the other thing that needs to be done to my opinion CL flush I mean it leaks all over the place Like measuring the time it takes to do CL flush instruction and deriving data out of that is pretty bad It's pretty efficient for me, but it's pretty bad for for system and for fingerprinting. It's no good Okay, very secure system. You should think about your co-location policy. I was talking to someone At lunchtime about that, you know, you have containers VMs You're co-located with other vendor other other folks You don't know about them. So if you really want of a secure environment think think twice about that I suggest you you you basically get the whole box to yourself There is a couple of things we can do from from the detection point of view one of them is We there is hardware counters that that exists on the chip. So you can do pattern and noise analysis I've got some prototype. It's okay, but it sucks CPU on the host one of the other thing is is On the host those process that are run They are running exactly always at the same time for that to work So in theory, we can set some heuristic in the kernel to detect those type of scheduling pattern I'm normal TSC usage is another thing Read TSC this this type of my my implementation is using lots of read TSC to to set up the timing and so on so I mean like if there is abnormal TSC usage red flag All right, that's all I add guys So thank you very much for attending