 Alright so once again thank you all for coming out um let's give Anish a big round of applause and welcome him to the stage. Thanks for the introduction Jurist. Uh today I'm going to talk to you about how we can build more secure systems by applying the age old wisdom of turning it off and on again and how we apply this to the design of a hardware cryptocurrency wallet. Quick show of hands uh how many of you own one of these security devices, these security dongles, things like UB keys or hardware cryptocurrency wallets? Ah a good number of you uh look to your left, look to your right, those who didn't raise their hands are probably going to get hacked. Uh so we believe in the value of these uh external devices to augment our PC in order to increase security and it's not like all these devices are created equal, they're actually getting better over time. Some are maybe not the best idea like SMS based two factor auth is kind of a mess and uh I don't know about you but at least for me it doesn't really work at DEF CON because my phone's been on airplane mode since Thursday. Uh but uh other devices do a pretty good job of preventing certain kinds of attacks. For example U2F tokens have more or less solved phishing and that's pretty cool. Um but here this is not just devices becoming kind of better and better decreasing the size of the TCB in a straightforward way, it's not just a continuous improvement of two factor authentication devices, there's something a little bit more interesting going on here. I think there's been a paradigm shift. On one hand we have two factor auth devices which are all about doing more secure login on your PC. This does things like protect from stolen passwords or against phishing attacks. But on the other hand we have this new type of device uh a hardware cryptocurrency wallet or at a higher level uh or more general sense a external transaction approval device. And this device removes the PC from the trusted competing base entirely. There's scenarios where two factor auth won't save you but transaction authorization will, namely if your PC is compromised. And uh these may seem like they're solving different problems because two factor auth is traditionally associated with websites whereas hardware wallets are obviously associated with cryptocurrencies. But this idea of factoring out the most sensitive approval decision onto a separate secure device, well this idea is universal and it could be used by websites too. And so this is a way to improve the security of a large class of applications. Uh now you might ask can we just make the PC secure instead? Well uh the PC is kind of a disaster, everything's terrible. It's it's just this tower of complex technologies where every one of them has major security flaws. And while we have mitigations and workarounds for a lot of these uh it's just this cat and mouse game. New attacks come out then developers patch those exploits and then this cycle repeats. And we have no strong security in any principled sense. We have applications that are millions of lines of javascript code uh using who knows what dependencies from NPM. Probably running inside an electron container. And this is running on top of a complex operating system that's at least tens of millions of lines of code probably filled with bugs. Uh and this operating system is running on a CPU that's a leaky abstraction for the abstract machine it's implementing. And so our CPU's are susceptible to all sorts of things including these micro architectural bugs that have been getting a lot of attention recently. Uh things like Spectre and Meltdown and Four Shadow and Zombie Load. And then on top of that we have hardware that's also a leaky abstraction because physics is kind of weird. And so we have attacks like Rohammer and Rambleed where you can do things like repeatedly right to a row in DRAM and cause bit flips in adjacent regions and this breaks isolation. And so everything's a disaster it's too hard to build secure applications on the modern PC. But if we can't make the PC secure maybe that's okay. Maybe we can use this transaction approval paradigm instead where critical approval decisions and applications are factored out and go through a separate secure device. And we can use this as a way to bootstrap security. And these hardware devices as opposed to our PCs just might be simple enough that we can get them right and have strong confidence that we've gotten them right. Maybe with these devices it's possible to move beyond the just it hasn't been hacked yet and towards having a more principled reason for believing that such a device is secure. And so for those of you who are not super familiar with these well how do these devices remove the PC from the trusted computing base? Well it's really natural for applications like cryptocurrencies because they're already structured to require private key signature in order to do the sensitive operation. And so when using a hardware cryptocurrency wallet the private key never leaves the hardware device. And so the way it works is you can pair it with an untrusted PC and if you want to do anything like send a transaction what you can do is craft that transaction on the PC. You can say like oh I want to send 0.1 Bitcoin to address XYZ but you can't actually sign and broadcast the transaction because the PC does not have the private key. Instead all you can do is send the transaction from the computer to this hardware device which can then parse the transaction, display it to the human in a human readable way like 0.1 Bitcoin to address XYZ and then only if the human confirms the transaction on the device by pushing the button does this device sign it with the private key and send it back to the computer so it can be broadcast. And so if something goes wrong here like if the PC is compromised it might do something evil right? It might do something like swap out the recipient address without the attacker or it might change the amount that's being sent. And if there's malware running on the PC it could do all sorts of wacky stuff like it could even fabricate the contents of the screen so it could display on the screen like oh you're sending a Bitcoin to your friend when in fact what the computer is actually doing is sending all your money to the attacker. But even if the PC is compromised the hardware wallet user is safe because even if the PC's screen lies to the user the device will show the actual recipient address. And so we can think of these devices providing a secure IO path to the user where the PC was previously failing us. And it's good to remove the PC from the trusted competing base because things indeed go wrong. Like there are tons of attacks I could talk about where people still cryptocurrency. Here's this one that I think is kind of clever. In December 2018 there was a phishing attack on users of the Electrum Bitcoin wallet. And so the way the Electrum Bitcoin wallet works is that it's an SPV wallet. And that's like most other Bitcoin wallets people use today. These wallets don't download the full blockchain onto your computer because that would be kind of impractical like having to download hundreds of gigabytes. Instead of what these wallets do is that they rely on untrusted servers in order to do payment verification. So like you rely on this untrusted server and it can prove to you that a particular payment is on the Bitcoin blockchain without you having to download the blockchain and also without you having to trust the server. And this is pretty cool. In Electrum's implementation these untrusted servers could return error messages to clients. And that's also fine. But in Electrum versions before 3.3.3 these server error messages could be arbitrary text. And not just arbitrary text actually but HTML. And so the screenshot here shows what happens if you take an old version of Electrum and connect to one of these compromised servers which anybody could set up. So it's like hey, require security update, version 4.0.0. Go download the software here. Nothing bad will happen I promise. And by getting people to run bad software on their PC attackers made off with hundreds of Bitcoin. And this is just one example. There's tons of examples of this kind of thing out there. If you have bad software running on your PC it can steal your private key. Even if your private key is encrypted on your device the malware can just wait until you decrypt it when you're about to send a legitimate transaction and then once it has a private key it can do whatever it wants. And in this scenario and most of the others a hardware wallet could have saved you. So why shouldn't every application behave like cryptocurrencies in the sense that these sensitive approval decisions are factored out and go through a separate secure device? Well right now for sensitive websites I use two-factor auth. And if my computers hack I'm in trouble. Two-factor auth doesn't guarantee anything if your computer is compromised. And now these websites already have confirmation prompts for doing sensitive operations. Like if I look at my open stack if I try to delete a VM it's like oh confirm terminate instance are you sure you want to do this? Or if I'm using my I'm doing something with my domain name and trying to delete a DNS record it's like oh are you sure you actually want to remove this DNS record? And these are not for security. These screenshots you see on the left those are just like confirmation prompts to make sure you don't click on the wrong thing and accidentally do something you didn't mean to do. But in theory this approval decision could be easily factored out. At signup time a user gives a website their public key and then for sensitive operations the user must sign a human readable description of the operation with their private key. And the signing could happen on an external device on a hardware cryptocurrency wallet just like you do with cryptocurrencies. And this could be really useful for things like bank transfers and stock market trades and manipulating VMs and infrastructure and manipulating DNS records and deleting backups. Like basically any sort of sensitive thing could go through such a device. And there's actually some work on standardizing this kind of behavior. So web authentication is this web standard that's kind of like a successor to U2F. And as it says on the box it's an API for accessing public key credentials. And a large part of the standard focuses on doing better login like basically doing better U2F. But a small part talks about doing transaction authorization on an authentication device. So I think that's pretty cool. But this is pretty recent stuff and even though some people have started supporting web authentication as a second factor for login. As far as I know nobody's really using this transaction authorization stuff. But I hope it takes off in the future. So now that we're convinced that transaction approval is a great idea. Like this transaction approval of sensitive transactions happening on a separate secure device. Well of course we'd want to use it everywhere. With all our cryptocurrencies like Bitcoin and Ethereum and Monero and all the sketchy altcoins I didn't bother to list here. And even websites where we do sensitive stuff like where we manage our virtual machines and our DNS. And for practicality we'd want this to all happen with a single hardware device. But if we do that we lease some of the simplicity of other security tokens. Things like the UB key or U2F keys. Where those were basically fixed function. Here web authentication support maybe could be built into the device. But supporting arbitrary cryptocurrencies would for sure require running different applications. Because these different cryptocurrencies all use different transaction formats and different crypto systems and so on. And so now this device that we're imagining needs to run arbitrary third party code. While sand boxing all the applications from each other. Like I really want to be able to install a sketchy altcoin wallet without having my Bitcoin stolen. Is that too much to ask for? And so this device must provide strong isolation between these mutually untrusting applications. And now our device is starting to look complicated. Earlier I was saying oh the PC is this complex beast and it's going to be impossible to get it right. And the hardware wallet. Well that can be simple and therefore maybe secure. But now the hardware wallet is starting to sound kind of complicated too. Now it needs an operating system. It needs to multiplex shared resources like the CPU and RAM and USB and storage and display and buttons between applications. And sharing leads to complexity. And complexity leads to bugs and vulnerabilities. And hardware wallets have indeed had bugs in the past related to their operating system. For example the ledger wallet. One of the most popular cryptocurrency wallets had a series of bugs related to bad argument validation and system calls. So one example there is that there was a SHA-256 system call. And you could pass a pointer to it and it should do some amount of validation on that pointer so you don't read memory you're not supposed to. But turns out you could pass the null pointer and it would gladly compute a SHA over that. So you could call SHA-256 with the null pointer and varying length arguments and get hashes of prefixes of memory so you could dump contents of flash. Like you could call SHA-256 n times. And then compare that with independently computing 256 times n hashes and you could use this to dump the first n bytes of memory. And in practice this could be used to dump the first 8K of flash. Isolation bug. And there are other bugs in this style in the ledger wallet. There was some interesting security research done by Riskyr and presented at Black Cat in 2018 that talks more about this. And both ledger and trezor two of the most popular cryptocurrency wallets have had bugs where the arm memory protection unit in the device was misconfigured by the operating system. So in ledger's case it allowed read accesses to memory regions which applications are not supposed to read. And in trezor's case it allowed write accesses. And so at a high level uh these bugs are because of sharing. Cause if you think about it, if you had no sharing you wouldn't need a sophisticated operating system with a complicated memory protection policy. You wouldn't need the memory protection unit and so on. Okay now maybe I'm sending you mixed messages. Uh I seem to be hinting at security through physical separation. Like this is an easy solution in some sense. Uh one device per app. In this picture mutually untrusting applications are just air gap from each other. And physical separation gets you a lot of nice properties. So why don't we do this? Well it's impractical. Especially to support a lot of applications. And practicality is super important for adoption. And so there are these competing tensions here. It's like sharing versus security. And the question is can we have the best of both worlds? Well here's an idea. Uh what happens if we try to simulate physical separation? And I'm going to give you a very high level overview of it and then we'll talk about it in more detail. But what if we could have this device where we run an application and whenever we want to switch applications we just blow away all the state on the device. And in a sense logically have like a fresh factory new device in order to run the next application on. Well for whatever vague description of fresh device uh you might have in mind. Like if this is done correctly maybe every application that runs could be isolated from all the others. Cause if you kind of clear everything on the device away before running the next application. Well whatever rend before can't accidentally leak secrets to whatever ran afterward. And so maybe this sounds reasonable at a high level. Like oh let's simulate physical separation through reset. But uh obviously there's some details that need to be worked out here. It's like what what would the hardware architecture look like for something that's built around this idea. And then also what does it mean to reset hardware? Well here's a proposed design. What if we run third party code kind of on its own computer? A stateless computer. And we reset it between task switching. So like on the left or uh yeah on the left we have this thing we call the application core. It's basically a computer and it's stateless. We kind of load an application on it just as we want to run it. We run it to completion. And then before we run the next application we somehow magically clear all the state from this application core. And in this picture we have uh another computer basically for managing the computer on the left. Uh and the computer on the left is stateless but runs third party code. And the computer on the right is stateful. It can write stuff to flash memory but it never runs third party code. And so this has some other nice properties. Uh you actually don't need an operating system with a structure like this basically. Or maybe you can think of it as splitting a kernel and a user up in this way. Like kernel and user uh uh space rather than being like hardware protection on a single CPU is just two separate computers talking to each other. But in this picture whatever code runs on this like application computer we have on the left. Well it doesn't need an operating system. We can just give it access to bare hardware. Cause operating systems are all about kind of mediating access to shared hardware. But in our picture we have this idea of resetting all the state before letting the next guy run. And so we don't need to really worry about what kinds of wacky things some malicious third party code might do. As long as we can get reset correct. And so we need no operating system or hardware protection mechanisms. And so that gives us a lot of simplicity. And so it's easier to reason about the correctness of such a system. And then in this picture we also don't really need things like system calls. Like basically we need no communication back from potentially sketchy third party application code back to whatever management code is managing this device. This communication can happen over a really simple interface like a single wire UART. And maybe like the only system call you need is for the application to be able to terminate itself and maybe save some state for the next time it runs. And maybe this seems like a really wacky design. And this wouldn't work for things like your desktop. Where you want to run multiple applications simultaneously. Or you want to run sophisticated applications. But for the class of applications we're targeting this restricted model does work. And so this is what the architecture looks like. Now what do I mean by reset? We still need to talk about reset in more detail. So what state is there in this application core? Well there's state in many places. Obviously the CPU and RAM are stateful devices. But also even things like the USB peripheral or the display can maintain state. And in order to have strong isolation we really need to clear all the state that may be present in such a device. Like basically everything is stateful except for things like the buttons which are the switches. And to have strong isolation these need to be reset in some meaningful way. Clearing away whatever is there before. So that any application that runs is not influenced by whatever was running on the device before. And so again what do we mean by reset? Well we'll actually formalize this later. But for now you can just think of it as like factory new or untouched or untainted. Like get everything back into an untainted state. Any to pre-reset states should be indistinguishable post-reset. Well so how do we implement this? For this talk we'll focus on the CPU because it's the most interesting and challenging component and all the other things are kind of analogous. So first attempt. Turn it off and on again. What happens if we just take the CPU and cut the power to it and then plug it back in? Is this sufficient for clearing all internal state? It might seem like a good idea. Well it turns out that removing power from these chips doesn't actually clear all the internal state. Chip states such as SRAM can actually persist for tens of minutes in these devices. And so power cycling isn't really going to achieve anything. At least not on a reasonable time horizon. And in a sense we can think of power cycling as roughly equivalent to just asserting the reset line. Speaking of which, these processors have a reset line. Like there's a pin on the CPU labeled reset. Does this do the right thing? Well to answer that question, to understand what this reset line really means, well the first place we might look is the ISA manual. And so here I have a screenshot from the RISC 5 instruction set manual. Other instruction sets say things like SRAM are kind of similar. RISC 5 had a really clean description of this so that's why I chose this one. And looking at this we can see that the reset specification of RISC 5 is really weak. All that the manual really tells us is that after reset the program counter is set to an implementation defined reset vector. And all other state is undefined. And this is kind of like undefined behavior in C or something like that where something specific does happen. It's just that the standard doesn't guarantee that any particular thing will happen. So what happens in practice? Well all state isn't reset. Like for example registers may retain their pre-reset value. And so just asserting the reset line of a CPU isn't really enough to achieve the kind of stronger notion of reset. Maybe we'll call that deterministic reset that we were thinking about. So okay here's another attempt. What happens if we run code post reset? We might be inspired by the part of the manual that said the program counter is set to an implementation defined reset value. That means that once we reset a CPU we can return control to a known location. And maybe there we can have a read-only region of memory where we have some code that cleans up the rest of the state in the CPU. Like reset guarantees that the program counter will go there and we can run the special code that does things like clearing the registers of the CPU. So what needs to be cleared? Well maybe we can read carefully through the ISA manual and identify all the registers and stuff internal to the CPU. Things like the flags registers and all that. And if we think about it really carefully maybe we can write code that clears it. Well it turns out that that still isn't enough. There's actually more state inside a CPU beyond what is even named in the ISA manual. Because at the abstract machine level there's no notion of a lot of the details of what's inside a CPU. Like in the ISA there's no notion of a branch predictor. There's no notion of caches and things like that. These are all implementation details designed to make a processor go faster. But oftentimes these details end up causing problems. As we've seen in micro architectural side channel attacks in things like Spectre and Meltdown and so on. Because this micro architectural state which can't be named at the architectural level. You can't write code that directly talks about it. Well this micro architectural state can end up leaking into architectural state which is exactly what's going on in attacks like Spectre and Meltdown. For example through timing. And this is problematic. So are we stuck? Because code can only talk about architectural state. And so instructions can't even name things like the branch target buffer or things like that. Those just don't exist at the architectural level. And also every CPU's micro architectural details are going to be different. So seems like it'll be hard to support many CPUs. Well I think one step we could take is just minimize complexity. Like in our relentless pursuit of simplicity we should just use a processor that doesn't have all these fancy features. We're not trying to do anything super complex. Like we're not running Fortnite on this thing. We're just signing like small blobs of data and displaying some small text on the screen. And so we don't need a fancy processor. We can just use something that doesn't speculate and has no branch predictor and stuff like that to minimize micro architectural state. Like if there's no branch predictor we don't need to worry about clearing it, right? But even the simplest of CPUs is going to have some micro architectural state. Some implementation details that are not named at the ISA level. But there's still state inside the CPU. And it's really hard to reason about whether or not this could be problematic. So we better figure out how to clear it. And so here's our final idea. It actually turns out that it's possible to write code that resets a CPU. Like using this idea of asserting the reset line to return control to a particular point we control and then letting the CPU run for several cycles to run some code that we wrote. Now I might say, hey wait a minute. I thought you said that code can't talk about micro architectural state. And yeah that's true. But running code on a CPU affects micro architectural state. It's not described in the ISA manual how it does so because it's specific to a CPU. But for any given CPU, any particular CPU, the implementation is deterministic. It like does some particular thing even if it's not defined by the ISA manual. And so what if we wrote code to very carefully have the desired effect on a specific CPU. Like for example you might imagine if a CPU is a pipeline CPU and you just execute a bunch of instructions they'll end up kind of clearing away a lot of the state that was in the pipeline. And some CPUs are open source. So we can actually consult the gate level implementation of the CPU and understand the effects of code at the micro architectural level. Now what if we got it wrong well then we just have no more isolation. Like this is our kind of tool we're using to achieve isolation. And so it's really important that we get it right. But this process seems very challenging and error prone. Like this code itself seems hard to write. Like how do you write code that has the right effect at the micro architectural level. And also reasoning about the correctness of such code also seems complicated. Like how do you know that this code does the right thing no matter what state the CPU is in. Like the CPU can be in so many different states. How do you know that the code always does the right thing. Well so how do we know that reset is correct. Well what is reset trying to do. Reset is a single operation of this like asserting the reset line and then letting the CPU run code for a little bit that's applied to an arbitrary state and and it should turn the CPU from this arbitrary state into a purge state. Like basically no matter what state you started and before you should end up in the same state after this purge operation. And maybe another way of looking at it is like we have these two worlds one where a secret bit is zero and another where a secret bit is one. Like whatever software is running before is operating on some secret data. If these worlds are indistinguishable post reset then code that was running beforehand on that secret data can't accidentally leak the secret to whatever ran on the CPU after this reset process. And so this is a slightly more formal definition of our reset property. But how do we apply it. It seems really hard to just like sit down and think really hard about whether this property holds for a particular CPU and particular sequence of code. Like just we can't sit down and like think about all of this. So what do we do. Well we can use tools to do this reasoning for us. So a little bit of background. There's this really neat tool called an SMT solver. SMT stands for Satisfiability Modular Theories and it's this very powerful tool that has many uses including security. Like these tools are used for things like crack mese and bug finding and static analysis and things like that. I highly recommend you look into these after the talk. Just like a super neat tool. And SMT solvers are based on this more primitive thing called a SAT solver. So what is a SAT solver. It's a it's a tool that can solve Boolean formulas like for example in the top left you might have the formula X and Y or not Z. And it turns out that this formula is satisfiable. So if you give this to the tool it will say oh this formula is SAT. And the satisfying assignment is X equals false Y equals false Z equals false. So if you plug those things in on the left and evaluate it you'll see that that expression evaluates to true. And similarly you might have a different expression which is not satisfiable in which case the tool will return unsat. So then what are SMT solvers? SMT is SAT on steroids. Basically it supports more data types than just Booleans and more operations than just Boolean operations. So you might have support for things like integers or addition or inequalities or bit vectors or things like that. And different tools have support for different sets of things like you might have a theory of strings or a theory of arrays or things like that. But basically it's SAT on steroids. It's like a fancy equation solver. But it's not just an equation solver. Because when SAT solvers find a solution, well that's like an equation solver. But when they return unsat it actually it means something. It's not just like the tool is failing to find a solution. The tool has determined that there is no solution. You can think of it like it's doing a smart exhaustive search and proving that whatever formula was input into it has no solution. And so it turns out that we can use this fancy equation solver to automatically prove theorems. So consider this abstract way of stating a theorem, for all x p of x. Well we can mechanically translate this to an SMT formula. And the way we do that is we just strip the for all from the left hand side and negate the proposition. So for all x p of x turns into not p of x. And then we can consider what happens when we feed this to the SAT solver. If this theorem is satisfiable, that means that the theorem is false. Because if not p of x has a solution, well then that means that there is an x where not p of x is true. And so it can't be the case that for all x p of x. Like that's a counter example. On the other hand the SAT solver might say unsat. And that means that our theorem is proven. It's a proof that our theorem holds because if not p of x is not satisfiable, there's no x that makes not p of x true. And so it must be the case that p of x holds for all x. So in more concrete terms like how do we actually apply this to a real theorem. Well here's a simple theorem. This theorem says that the average of two real numbers is between their min and their max. And so the way we translate this to SMT remember is we just strip the for alls and negate the proposition. And even the like actual code for checking this theorem is actually pretty straightforward. So this is Python code using the Z3 theorem prover. And basically it's like oh x is a real number, y is a real number, let me define min and max because those are not built-ins. And then put in the theorem statement, assert that not of the theorem statement holds and check if that's satisfiable. And then of course like we know this theorem is true and indeed the tool returns unsat here, proving the theorem. Okay so going back to our reset property like the whole thing we were talking about earlier, here's the picture I had on the slide before but phrased this text. What does reset mean? It means that if you're in any two different CPU states, if you apply this operation to those states they must converge to the same state and here purges like this process of asserting the reset line of the CPU and then just letting it execute for some number of cycles so it executes whatever instructions we put in the right place. Okay so as long as we can express this in the language of SMT then we're good. We can just throw this at our SMT solver and hopefully it can prove it correct. So so far I've only talked about Boolean formulas and like numerical equations and stuff. How does that translate to CPUs? Well let's start with simple circuits. Let's start with combinatorial circuits. So these are stateless circuits that compute Boolean functions and here is a full adder circuit. So this takes two inputs in a carry and produces an output in a carry out. Pretty simple component, something like this might make up a very small part of a CPU. Like this might be chained together with other full adders to make a wide adder like a 32 bit adder and that might be a small part of an ALU and that might be a small part of a CPU. But we can see how this might be representative of a small part of a complex circuit. So the circuit could be represented in verilog code like this. It's like a pretty straightforward translation of exactly what you see in the picture. And it turns out that the Python Z3 code for this is actually pretty straightforward. Like this tool supports doing Boolean operations so we can just translate this code into Python and use that library. So this is all you need in order to represent that circuit. So let's try to prove some property about the simple circuit. We all know that addition is commutative, A plus B equals B plus A. And since this is a full adder and takes a carry in, let's just say that the carry in has to be the same. Well, what does this look like when we try to use the tool to formally prove this property? Well, this is all the code you need. Basically you put in the preconditions. We have two possible inputs. We have the A plus B input to the adder and the B plus A input to the adder. And we say that, okay, like the A and B are swapped in the two cases, but the carry in stays the same. And then we put in our negated theorem statement. Like remember from earlier we stripped the forals and negated the proposition. So here we say to the solver, like I am asserting that the result of the full adder applied to the two possible inputs, the one where the inputs are normal and swapped. Well, the result is different. And if you try to check that property with the solver, it'll say that this equation's unsatisfiable, which is a proof that our theorem is true. That addition is indeed commutative as we all knew. And then I think the last piece of the puzzle here is how do we deal with state? Because CPUs are stateful things. Uh, well, let's, let's not go all the way to the level of CPU yet because that's kind of complicated. Let's start with this eight bit counter. So here's verilog code implementing an eight bit counter. It's pretty straightforward. It has an eight bit register inside. And then what it does is at every clock edge, if the reset line is asserted, it'll set its internal state to zero. Otherwise if the enabled line is asserted, it'll increment the count by one. Otherwise it'll do nothing. This is like a simple example of some stateful circuit. Uh, verilog code on the top left, graphical representation on the bottom right. Well, here's a translation of that circuit to, uh, SMT. It's a little bit different than before because I'm using slightly more fancy features of, uh, Z3 here. I'm using data types to represent kind of the different parts of the counter. Um, but it's still pretty straightforward code. And, uh, the thing of most interest here is the thing on the bottom left, what I've labeled the transition relation. And this describes the behavior of the counter. And what this is saying, saying is that if we have two states, S1 and S2, if we assert that counter T of S1 and S2 holds, that means that state S1 steps to state S2 after one cycle of execution. And this is a pretty straightforward translation of the verilog code. Like we could have done this mechanically. Just in this example, I did it by hand so it'd be cleaner to look at. So let's try to prove something about this. Here's another proof. Uh, we might want a property like, oh, the counter's count doesn't decrease over its execution. Um, and we might have a precondition to that theorem, like as long as it's not reset, the counter doesn't decrease. Well, here's a little bit of code. Let's instantiate two states, S1 and S2, and then say, okay, here's our precondition. Like asserting this counter T thing says that S1 represents the state of the counter at cycle one and S2 represents the state of the counter at cycle two. And then we have this other precondition that the counter is not reset. And then here's our negated theorem statement. So we're trying to see if there's a case where the count decreases. So if the count at the second cycle can be less than the count at the first cycle. And as you might be able to figure out just by thinking about the implementation of the counter, it turns out that this theorem we were trying to prove is false. The counter returns SAT, or the SAT solver returns satisfiable, which means that there's a, there's a counter example to our theorem. And here, looking at the counter example, we can actually get some more information beyond just like the theorem we were trying to prove as false. Uh, looking at this a little bit more closely, we can see that here, uh, at S1, the, at cycle one, the counter had its reset line low, it's enabled line high, and the count was at 127. And then thinking about this, we can realize like, oh, the count's wrapping around. And that's why our theorem isn't true. Okay, so how do we apply this to CPUs? Well, the CPU is basically like our example of the counter. It's just a little bit bigger. Like it's a big circuit that has some state, has some inputs, has some outputs, and we can think about it kind of in the same way. Like, just like we had this counter s to represent the state of the counter, and the counter t to represent the behavior of the counter, we can just have a CPU s and a CPU t. And this translation can be done entirely mechanically. So on the left hand side, I have Verilog implementation for the gate level implementation of a CPU. And on the right is the extracted z3 code that describes one cycle of executing the CPU. And just like you can simulate a CPU that has some concrete state, like with standard Verilog simulation tools, uh, using these technologies, you can simulate a CPU over symbolic state. So you can be like, the CPU is in some unknown state, let me reason about what happens after some number of cycles of execution. You can even do reasoning over, like, partially symbolic state. Like, you might say, if the program counters zero, but all other state is unknown, let me reason about what happens over several cycles of execution. And so, now again, visually, this is what our reset theorem looks like. So we're considering two worlds where they can be in any possible initial states. And what we do is we apply the reset operation to, like, in one cycle of execution, like, we assert the reset line and simulate the CPU for one cycle, and then after that, we let go of the reset line and just let the CPU execute a whole bunch of cycles and over symbolic states. And if they converge to the same final state, no matter what the initial possible states were, then our reset operation, our purge operation is correct. And note that this is not like a test. This is not doing, like, test cases and checking for a bunch of different things. It's a proof. This must hold for any two possible starting states. And if the SAT solver can prove this correct, well, it will always work. And with these tools, we get a really nice workflow for developing this reset code. So remember, I was talking about how it might be complicated to write code that has exactly the right effect at the micro architectural level. Well, what's kind of cool is that with these tools, we can write some code and just throw it at the verifier. And if the verifier says unsat, then we're done. Like, that means that the CPU is always reset correctly. But if the verifier says SAT, not only do we know that our reset code doesn't quite do the right thing, we find out that there's a concrete counter example. And the tool can actually tell us, oh, this particular piece of micro architectural state was not properly cleared or not provably cleared. And then as the human in the loop, we can think about this, consult the implementation of the CPU and tweak our code in order to have the desired effect on that piece of state. And just keep repeating this process until we've developed code that always does the right thing. So what does this actually look like in practice? Well, here's a demo of using our tool to verify a CPU. And so in this demo, we're going to be developing reset code for the PicoRV32 processor, which is a simple risk five processor. And here I've already converted SMT fully automatically. And we're just going to develop the reset code and also see the output from the tool as it tells us like either we've done it correctly or we haven't done it correctly and what pieces of state we haven't cleared. OK, so we go ahead and run it. To start with, let's just run it for one cycle of execution. So let's see what happens when we just assert the reset line of the processor and let it go for one cycle. Well, as we might have expected, it's not correctly reset. There's a bunch of internal state that's not reset. And this tool tells us one particular piece of state that it complains about. And it also gives us a concrete counter example where it gives us an entire CPU state where if you take these two CPU states that are different from each other and then try to apply the reset procedure, they still haven't converged. And then the tool also hopefully tries to point at a bunch of other differences between the states. So this is like pointing out a whole bunch of internal micro architectural state, which is not cleared. And so as a programmer, we can look at this, like think about what's going on. And in this particular example, the one piece of state that the tool complained about way at the top was part of the instruction decoder. And so thinking about this, we can be like, oh, maybe we can just write some code. Like if we let the CPU execute for more cycles, maybe whatever's going on in the instruction decoder, well, it might be like cleared up by this. And so what happens if we just put a bunch of knobs at the start of the region which the CPU executes from when it's reset? Okay, the tool still failed. Oh, I only ran it for one cycle. To actually execute the knobs, I should probably simulate the CPU for more cycles than zero cycles. So okay, here I'm running the tool again. It's complaining about something in the instruction decoder. And we'll see that after a while, it starts complaining about something else. So now it's complaining about the sub part of the state called CPU regs. That's not always reset correctly. And so now again, as the programmer, like we can think about this and consult the CPU implementation. And if we did that, we'd see that CPU regs is something that's corresponding to the architectural general purpose registers inside the CPU. And so now we can clear, we can figure out how to clear that state. So we could just write some code that goes and clears all the 31 general purpose registers. Okay, so let's do that. And try running the tool again. So here we're running it again. Let's symbolically simulate from kind of unknown initial states and see what the tool can figure out. Okay, if a couple cycles in, it's complaining about whatever was in the instruction decoder. Now it's complaining about CPU registers, but we just wrote code to clear them. And oh, the tool failed. Okay, so scrolling up, we can see that, well, in this case, it just said failed to prove reached max depth 10. Well, with this tool, you need to tell it how many cycles to simulate for. You should give it some upper bound. Otherwise it might go on forever. And so I just didn't run it for long enough. So let me go and rerun it. And I'm going to speed up this video because you probably don't want to sit through like two minutes of number crunching going on. Okay, so the tool is simulating for many cycles now and still complaining about the CPU regs thing. So whatever is corresponding to the architectural general purpose registers. And we'll see that eventually it stops complaining about this because it's actually gone through enough of the reset code that all that stuff is reset. But now it's complaining about something else. Now there's this thing called memw data that's not provably reset. And so again, we can go back to CPU implementation, figure out what this corresponds to and this is something related to the memory write machinery. So now we can go and write a little bit of code that deals with that. In this situation we can just issue like a dummy write to read only memory region and that'll end up having the desired mark architectural effect. And finally, when we do this, we'll see that the tool, well once it gets through enough cycles of simulation, it finishes and it's like, oh, your CPU, your reset sequence is actually proven to be correct. And again, this is not like a test. We didn't try this for a couple different initial states. The tool's actually proven that this works no matter what state the CPU is in before. That no matter what state you start off in after applying this procedure, you always end up in the same resultant state. So that's the tool. And then of course we also wanted to make sure that this idea of like building a hardware in this wacky way where we have third party code that runs on its own CPU and like management code that runs on another CPU and stuff like that kind of makes sense. And so we built a hardware prototype. Like here's a V1 prototype, which is a bunch of development boards strung together and there's a little bit more going on here than that was just in the talk. But as we can see like to the end user, this behaves more or less like any other cryptocurrency hardware wallet. You do things like install applications on it and use it to sign Bitcoin transactions and things like that. And really the only difference the user will notice is that since it's built around this idea of like only one piece of third party code runs at a time and between applications which is you reset the processor like every once in a while the device will ask you like there's nothing more to do now. Like you need to reset the device before you can continue. Like here for example, you've installed a program but in order to launch the next program you have to clear all the state. Which the device will do for you by while asking you to press the reset button. And then the rest of the functionality like here we're just running a Bitcoin application. It works like any other Bitcoin hardware wallet so I can get the public key from the device and then go on my computer and set up a watch on the wallet. Like go here, open up Electrum, put in a public key, type in my very secure password that's I think DEFCON all lowercase D-F-C-O-N. It's a weak password. And it basically does the thing. We first magically receive Bitcoin so we can actually send a transaction. And you can see like of course we can't actually send the transaction from the PC because the PC doesn't have the private key. The whole point is to have these transactions go through the separate hardware device which is the only thing that knows the private key. We can use the PC to construct this transaction, send it over to the device which will parse it, display it on the screen in a human readable way and only if it's confirmed does the device sign the transaction and send it back to the PC at which point it can be broadcast on the internet. And so this is a reasonable design for a hardware wallet. Like we could make it run the same kinds of applications that other hardware wallets could but we get this really strong isolation of mutually interesting applications by applying this idea of deterministic reset and reset based task switching. And we've also made it work with websites and stuff like that. I'm not gonna go through this demo but it's basically what you expect. You try to do something on a website, any sensitive operations are required to go through this external device. It'll display a human readable summary and only if you approve the transaction on this external device will the website let you proceed with the transaction because it requires a signature from a private key that's on the device that's not accessible to the PC. So if someone tries to do something malicious like delete your domain name, your PC's compromised and it's trying to do weird stuff on the website, well you could just cancel it on your device and the website wouldn't let you complete that operation. And so in conclusion, we've talked about how we can apply the idea of turning it off and on again as a building block for better isolation but how there's also a little bit more detail there. Like there's a lot of details to getting the deterministic reset right and there's this neat idea of using formal verification as a technique for both developing this reset code and also reasoning about its correctness in order to gain confidence that this primitive that we're using for isolation actually does the right thing, the thing that we expect it to do. All the code that was shown in the demos is available online at this link if you wanna check it out and play around with some formal methods tools. And I hope that the impact of this presentation is that we as users more of us start using and demanding these transaction authorization devices and demanding that websites have support for this sort of thing because I think it can really increase our security. And I also hope that we as developers while we start supporting, factoring out approval decisions and also I think it'd be really cool to see more people use formal verification as a tool to improve security. Like maybe in the long term, maybe eventually we can move beyond the just it hasn't been broken yet and towards having a principled reason for believing that a system is secure. And I think I don't have any time for questions but I'll be outside if anybody wants to talk to me.