 So, the next speaker of today will tell us more about the glorious future, where machine checked proofs will be used in everyday software, all the way down to hardware development. We have Adam Cipala, he's a faculty at MIT, and he builds computer systems and has fun proving them using COC, not written as you think it is. Thank you. So from the earliest days of programming, people started to realize that the actual act of writing the code was going to be a relatively small fraction of the time that they spent building useful things. And here's a quote from one of the pioneers of computer science telling us about how at some point he realized a good part of his life was going to be spent finding errors in his own code. So we know that it would make a huge difference for developers, they could get a lot more cool stuff built if they could reduce the time that they spent trying to get their code to be correct, trying to understand what their code does, all these activities besides just writing it. And I want to tell you about some opportunities with technologies that are getting mature enough to potentially make a big impact in saving a lot of people time building correct systems. And I think these are just ready enough that if you want to be on the bleeding edge, you could start looking into these systems today. So first let me summarize a perspective on what programmers are doing today to gain confidence that their code is correct. So there's debugging the classic way to find a flaw in a program where you sort of do some detective work, make successive runs of a program, change a small thing between runs, see what happens differently on each execution. I'll graph these different approaches in a very wishy-washy two axis way. Rigor, I'm going to use to refer to the, when you use some sort of method to understand your program, how closely tied it is to the actual code that runs. So you know it's accurate. And then completeness is covering how many different execution scenarios you get some guarantees about. So with debugging, we only get guarantees about the scenarios we run. So that's low completeness. But I'm calling it high rigor because it's literally running your code. You can also, when you're writing your code in the first place at a white border or just in your head, think through carefully all the things that can happen. Consider all sorts of what-if scenarios. Convince yourself, don't worry, my program is ready for those. Or I'll add some new code to be ready for that. There can be all sorts of branching and we can think very carefully. And if we're very principled, then we get high levels of completeness but low rigor because this is just an abstract whiteboard level kind of exercise. Testing has a very similar profile to debugging. We create a bunch of scenarios that are saved in a list of test cases that we can run over and over again in an automatic way. And an alternative is to put lots of comments in your program that capture exactly your expectations, what should be true whenever you reach that comment. And then read the code and think through the interactions between all these expectations. So we could call this the comments approach, which might have a high level of completeness. There could be comments everywhere saying a lot of things about a lot of code. But it's not so rigorous because there's typically no tool that connects the comments to the actual running behavior of the code. Another approach is auditing and there's a whole spectrum of different levels of seriousness. We can use an auditing. We can do relatively cheap auditing where we maybe read a lot of code but don't think too hard about it or we can have some more in-depth auditing where we probably need to read a smaller amount of code but we can be much more thorough in checking the properties of that code. So the techniques I want to tell you about are a way to sort of get the best of both worlds to get high ratings on both of these axes in contrast to the traditional methods. So I'm a professor at university. You're probably used to people from that sort of place shooting out all these little Greek letters and fancy mathematical symbols and saying this is formal methods. This is great. We're going to prove things about programs. I think this is an even more popular research area here in Germany than in the U.S. where I'm from. It's been around for a long time, still not too widely used in industry in the open source world but maybe there's hope. I think most people in those worlds are used to this kind of scenario where people doing formal methods research are saying it's really the right thing to do to prove your program. Don't you want to do the right thing? And then the developers just sort of nod and smile politely and then they need to get back to work because they need to implement a few new features for the version of the system that ships next week. I want to try to argue that formal methods, if not already there today, are very close to the point where they will actually save you time. They will reduce the cost of developing systems, especially those where bugs are especially costly. So the pitch I'm trying to make is that in the history of computing we've gone through layers upon layers of building new higher level abstractions that lower the cost of creating particular functionality. And we've built each of these levels upon earlier ones. They provided new possibilities for modularity and abstraction. And formal methods is actually an enabler for some of these next level of possibilities for dividing big programs into small pieces that we can understand effectively. So what I'm claiming should be feasible by let's say 10 years from now with the right kind of community forming around it and doing the work is complete deployed computer systems with top to bottom correctness proofs, meaning from hardware up to applications where these proofs will be checked algorithmically so we don't have to worry about low reliability people reading long mathematical documents. And we will get to the point where none of the code that you run needs to be trusted to guarantee the properties you care about whether they're in security or other kinds of correctness. And as if that weren't good enough, by the way, this will play to hardware designs, not just software code. And this kind of approach should be able to lower development costs in the same way that a good systematic testing framework can. And to a large extent should be able to replace very time-intensive activities like debugging, testing and code review. So I better go into a little more detail on what kind of techniques I'm talking about here and why we should believe this stuff is ready. But before I get into the next level of detail, I'll just summarize how this kind of approach can substitute for the traditional development activities. Instead of debugging where we're exploring concrete executions of a system, we can do proving where we're exploring symbolic arguments that cover all possible executions. Instead of testing, which applies to concrete scenarios, we can do specification writing where we describe general requirements that can cover all scenarios. And instead of auditing our code in detail, we can audit the specifications in detail. Specifications should be much shorter and typically don't include optimizations, which can be some of the hardest parts of code to understand. So a summary of what I'm going to cover in the rest of this talk, I'm going to introduce the key technology that I'm selling here, which is called proof assistance. I'll cover some common objections and the standard answers to those questions about why do we believe this can actually be applied to real systems, why it can be applied cost effectively. I'll explain a few different ways that this technology can be used in the development of computer systems. I'll talk about two case studies, one in hardware and one in cryptographic software. And I'll finish up with some pointers to further reading for continuing the journey along learning this subject. So one of the first questions people have about formal methods is essentially based on an argument embodied in a paper from 1979 by DeMillo Lipton and Perlis called social processes and proofs of theorems and programs. They had a lot to say in that paper, but I would kind of boil it down to they say real systems are so complex and they change so frequently that it is beyond the powers of human concentration to read careful proofs about those systems. And it's not just one proof, it's version after version of the system, you'll need a new proof to cover the correctness of the new system. And I would say today we learned that these authors were both right and wrong about this point. They're probably right that people can't be trusted to read proofs about real systems. But they were implicitly assuming you would be people who read the proofs. Today we have a better way. Let's get algorithms to read the proofs. So here's how this kind of proving can scale up to real systems. So here's a proof engineer trying to prove a theorem. He wrote a proof about it. And don't worry, he didn't have to write it all himself. There's an ecosystem of libraries of existing proof facts as well as proof techniques that can be reused. And the proof is actually ASCII source code. It's checked in diversion control, like on GitHub. And that allows a number of developers to collaborate, writing different theorems within a larger development, get them to all wind up in the same place and have their versions tracked. We can plug our version control into continuous integration, which is effectively running over and over again checking, is this proof still convincing? Is it still convincing? Every time there's a change. And how do we know it's convincing? We have an automatic proof checker that reads the source code of our proofs and applies some straightforward rules to validate that they really prove the facts that they're claiming to prove. And by the way, one risk here is that some of these programmers or developers prove the wrong theorems. So we do some code review where each proof engineer might read the theorem statements written by the others and make sure that we're really proving what we set out to prove. So one, probably some of you are thinking, this picture looks just like the way that software development works today in the real world. And that is very true. It's also true that we also have all these tools already for doing this kind of development for systems with correctness proofs. And I will try to demonstrate that in the rest of this talk. So this future is more or less here today, though there are a lot more rough edges on the proof versions of some of these tools than on the traditional programming versions. And there's a lot of opportunity to get involved with improving the tools. So the key system we're going to use here is called a proof assistant. It's a software package that is basically an IDE for stating and proving mathematical theorems, where you have to do manual work to write the proofs of the theorems as ASCII source code, but then they are checked automatically, just like we're used to with, say, type checkers in programming. There are a number of proof assistants that have been developed for the last few decades and that are pretty usable today. One of them is called Isabel Hall. A large part of that development was done here in Germany at TU Munich. And probably its best-known application is the L4.Verified project, which has done a correctness proof of an operating system. Another proof assistant is called Cauch, developed at Inria in France. One of its best-known applications is CompSert, which is a C compiler with a proof of correctness, bridging the gap from the C code down to the assembly code that is produced. I'm going to focus on the Cauch system in this presentation because that's what I use in most of my work. So how does this process work where we have a sort of collaboration between the human and the machine to come to a convincing proof? Well, the human user is repeatedly making suggestions to the automatic engine, basically saying, I think the following strategy ought to work for the next step of this proof. And then the engine spins for a while and then says, OK, if we do that, then here's what's left to prove. Please tell me what to do here. And the library of steps, or tactics, is not fixed. We can plug in new libraries to the tactic engine, introducing higher and higher level abstractions of what constitutes a valid piece of an argument. In the end, after we've finished this cycle and there is nothing left to prove, then the tactic engine outputs what's called a proof term, which is essentially a program in a small language with just a few reasoning steps built into it. And in fact, everything I've shown you up to this point is untrusted in the sense of trusted computing bases and security. The only thing we need to trust, the only place where a bug in the infrastructure could lead us to accept an incorrect proof, is a separate, relatively small proof checker that only looks at the proof term, ignores the way it was produced, and checks whether it actually establishes the theorem that we were interested in. So what are those primitive deduction steps that the small proof checker is implementing? I'll just give you one classic example of a kind of step, which is more or less built into systems like Calk. This is the modus ponens rule. It says if you know P implies Q and you know P, then you're allowed to deduce Q. So imagine a set of roughly 10 rules like this that we write out in ASCII syntax for the core proof terms of the system. And this is actually a language inspired by functional programming. It's a lot like OCaml. In fact, Caml was invented to use to implement Calk in the 80s. And so we have lines like this that are essentially chaining the other proof steps and we have little applications like that modus ponens at the end saying, and here's the deduction rule we use here. And believe it or not, these chains of steps are enough to reproduce all of the results from math and computer science and all sorts of other logic-based fields. And it's easy to write a relatively small trustworthy checker for applications of these rules that will apply to all those different domains. So let me make this a bit more concrete and show you an example of a proof or two in Calk. So I'm going to be working with this code in Emacs. There's an Emacs mode for Calk authoring, of course. So Calk is based on a functional programming language. It's a lot like Haskell or OCaml, for those of you who know those. We can do things like write this expression, which is a list of natural numbers, 51389. I run a check command and I press some keys and then I got this highlighting in this region, which indicates this is now where the processing of the proof assistant has moved up to. And it tells me this is a list of natural numbers. I can make a list of Booleans and I can use this alternative notation for lists with an IntX operator colon colon for adding a new element to the front of a list. So this is the list consisting of 51389 with an empty list here. It's another way of writing this. And I've also written a recursive function here called length that for any type A takes in a list of type A and returns a natural number and we do pattern matching and say an empty list maps to number zero and a non-empty list maps to one plus the length of the tail of the list where I'm assigning the name LS prime to the tail. I don't have time to go over all the classic ideas from functional programming like in Haskell and ML. Hopefully this will make just enough sense for people who aren't used to seeing those but it's very similar notation to in those languages. So let's prove a theorem. Well, first I'll check. What's the length of this list? It's three, good. And I also define another classic function that takes two lists and essentially concatenates them, makes a new list that's first the elements of the first list then the elements of the second list. You would just write this as plus for lists in Python for instance. Here I'm calling it append and I can append one, two with three, four, five and get one, two, three, four, five down here. So you can already see some of these elements that led me to call this an IDE. We're getting constant feedback as we go much like the ominous red squigglies in Eclipse when you make a mistake in your code. We'll have those coming up shortly when we try to do some proofs and we get this feedback on the computational behavior of what we're defining. So here's a theorem. For any type A and for two lists of that type, if I append them and then take the length, that's the same as taking the sum of their individual lengths. So there's this red background that admitted telling us this is a fact we didn't prove yet. So let's prove it so we don't have to look at that red anymore. And so I'm gonna say, my first step is, I'm gonna prove this by induction on the variable LS1. And then what this is going to do is create two new sub-goals. Here's our current goal just repeated from up here. When I run this, let's give us more space there. There are now two sub-goals, a base case and an inductive case. In the base case, LS1 was replaced with an empty list and in the inductive case, LS1 was replaced with an arbitrary non-empty list with some head element and some tail element. So let's start proving the base case. One thing I'll do, I noticed the goal here starts with a for all, so I'll run a step called intro that's going to take out the for all and instead give us a local variable called LS2. Here it is. And then I notice, this looks pretty obvious, just from the definition of a pen. So let's ask Koch to do some algebraic simplification with that definition. Essentially running parts of the program at compile time, if you will. And then now this looks pretty clear. The same thing is on both sides of the equality so I can say reflexivity. And that case is proved. Let's go into the inductive case. I'll start at the same way. I'll get rid of that for all down below here and simplify. Now I notice there's this term here, length of a pen, blah, blah, blah, which is actually also present up here. What appears here? In general, above the double line, our hypotheses are known facts that we're allowed to appeal to. And here we have an induction hypothesis which has the theorem restated essentially but applying specifically to the tail of the current list so I can say, hey, let's take that known equality, use it to rewrite, we'll match this with this, and therefore we'll be able to replace it with this. So we run like that. And now reflexivity applies. Very good. And the most exciting moment of any proof in Coq you write QED, process that. And if the system doesn't complain, this is the equivalent of typing your code into Eclipse and not getting any red squigglies under it. This is a correct proof. Coq is convinced that the theorem is true. Thank you. One more example to show how we reuse lemmas in proofs. This is a reverse function for lists that takes in a list and then steps down and building a new list using the append function to repeatedly add elements that were on the front onto the end. So reverse one, two, three into three, two, one. And so then we can quickly prove length rev. I'll do this by induction on the list. I'm reversing. I want to show the length does not change when you reverse a list. So the base case is pretty simple here. I'll use algebraic simplification and then that's obviously true. And let's see what happens in this case. Well, here I see I have a length of an append that doesn't directly match my induction hypothesis, but luckily I proved something called length append or append length. I don't remember, we'll find out. Length append, which replaced length of append with sum of the lengths. So we'll play that again. We have the length of the rev plus the length of the singleton list. Then I can simplify this. And now we're very close. I can rewrite with IHLS. By the way, this capital S is the plus one function. It stands for successor. So this is pretty obviously true, though capital S is not defined as plus one. It's a little different, but I can call it procedure called omega, which is a solver for what's called linear arithmetic, which knows how to prove all sorts of numerical facts like that. So that was an example primarily of using a fact we already proved within a larger proof. And this allows us to build up to proofs that cover pretty significant systems and their correctness properties. One last example that I will not go through in much detail. I just want to show that even for programs I can write in one page of code, it's not very straightforward naively to establish their correctness. I've defined a language of abstract syntax trees for expressions with plus and times, as well as constants and variables. I've defined a recursive function to evaluate those expressions to numbers, given values for all variables. And I've defined a medium complexity optimizer that finds parts of the arithmetic that we can actually simplify at compile time. This is a classic optimization called constant folding. But I won't dwell on the code there. I just want to show, here I started trying to prove the correctness, namely when you constant fold and then interpret, it's the same answer as if you interpreted the original program in the same variable environment, xs. And then I started approved by induction. The first case was easy, the second case was easy, but then I started needing to do case analysis on which constructor was used to build particular terms in the abstract syntax tree type. And then I needed to do a sub case analysis inside that one. Then I needed to do a sub case analysis inside that one. And I'm typing the same thing over and over again. A lot of stuff is going on here, but it's very repetitive. And at some point I just gave up and wrote, oh, that's not gonna work. Too hard to write that manually. So keep that in mind. But for now, this is some reason to be skeptical that this approach will scale to full scale, realistic deployed systems if we have to work this hard on a program that fits in one page. But I will come back to that. All right, so having concretized things about the use of proof assistance, you might have a few questions and I'm going to try to answer the standard ones now. One is, okay, we know it's pretty hard for programmers to get the code right. Despite all these techniques that we have, there are still frequently high cost bugs in the wild. Is it really that much better to make programmers get the specs right? For many systems we have trouble imagining how an unambiguous specification would be much shorter or simpler than the original code. One answer is, I would claim verification of this kind is most worthwhile when we apply it to the most commonly used system infrastructure that sits underneath everything else that we run. So processors, operating systems, compilers, databases and so on. Our first foundation for everything else we do and second, they do turn out to have much shorter specifications than implementations. For instance, in a compiler, you can have a specification that doesn't say anything about the particular optimizations the compiler uses, even though those are the most complex parts of it and the easiest ones to introduce bugs in. So in my experience, when we think about what makes a spec a lot shorter than the original program, if we take a spec which covers the functional behavior of a system and then we add in optimizations that make the system fast enough, use little enough memory and so on, then we recover the original implementation. So to the extent there are a lot of optimizations in the system, then there's a real opportunity to write a spec that is much clearer than the original system. But still, this is essentially programming just in a different language and it's still hard to get right, so let's take an example to think through what might be the biggest challenge is imagine we have a compiler who corrects this theorem essentially says the compiler bridges the gap between the semantics of the source language and the target language. For this purpose, let's say, the target language is machine language and imagine each of these semantics, when I write the word semantics, it really means reference interpreter or something like that, though usually it's written in a logical notation. I only wrote the compiler because I wanted to use it to compile some applications and imagine each application has its own specification and then the application theorem kind of snaps together with the compiler theorem through the semantics of the source language of the compiler. And then down below I have my processor that implements the machine language semantics on top of, say, the HDL semantics or semantics for whatever language the processor is written in. So now think of all this together as one composite verified unit. The theorems I've proved for the different parts all compose with each other to create the abstraction of a thing that sits on a particular circuit board and behaves in this application specific way. The really neat thing about this style of development is that first, none of the code we wrote is trusted anymore. It's all internal details inside this big block and it's all covered by the proofs that we've done. Second, even the internal specifications are now no longer trusted. They've become encapsulated details of the system and it's proof. So a mistake in, say, the semantics of the C programming language, which you could imagine is pretty easy to make if you're trying to write out the meaning of C and not miss any cases, does not allow you to accidentally accept an incorrect system in this style as long as you keep that specification internal to the overall functionality that you're building. So let's contrast this with what we have to do to convince ourselves in the current standard software development approach. We have to run not just unit tests about our individual libraries, but also full system integration tests. Because as we compose libraries and modules together, the state space of the system is growing exponentially and therefore it's much easier to have a complete feeling coverage for individual functions than for full systems. In contrast with the computer proof assistant approach, the system integration theorems that we prove actually imply all the components inside the box are doing what they need to do to make the full system work. And we really fundamentally don't need the equivalent of unit tests in such a central role as it has today. In the old way, we have to do careful code review of all of our components because a bug in a corner case and any one of them could completely destroy our guarantees of the whole system. In contrast with the proof-based style, we only need to do careful code review on the externally facing specifications. In fact, you could even accept in theory a component from your worst enemy as long as it was proved to meet the appropriate specification, you could run it without even inspecting the code. And this is potentially a way to support, say, downloading applications from untrusted parties and doing proof checking to guarantee that they meet particular security policies. So, another concern, I showed that as I tried to write out that proof for a medium-sized program modeled after a compiler optimization, it got out of hand quickly and I ran out of patience. Well, we're familiar with the scenario. If you imagine it's 1950 or so and someone says, writing machine code program sure is a lot of work. It's never gonna be very popular. Today we know we have compilers that will go from successively higher levels of abstraction and automatically write that code for us. And by the way, we also have libraries at each of these language levels that we can typically reuse for most of the work, say if you're putting together a web application with a popular framework. We don't expect to rewrite everything each time. In fact, the same story applies in the world of machine check theorem proving where we can build up layers of automatic proof generators. We can build up libraries of lemmas that we appeal to for commonly useful facts. And this is the secret to building up an ecosystem of tools to make it relatively low-cost to create a new development with a proof that is roughly similar to one you've done previously. So let me now go back in and make these proofs that I wrote before a little more appetizing. And so I'll restart this one. So I wrote this proof out here with some very manual steps, but it turns out I can just do this. I can first, I'm gonna use a semi-colon to say, take this, run this step. A bunch of new sub-goals will exist afterward. Then run this one on all of them and then run this one on all of them. The next one has a kind of funny name called intuition. It actually refers to intuitionistic logic, but people typically experience it in a different way at first. That's not what it means. And then that just proves everything. And let's see if we can get lucky with the same thing. The next proof, oops, now it's called LS. And almost, it almost worked. What we're missing is, recall down here, I mentioned a lemma we already proved. Hadn't been told that it should consider that lemma, so it didn't try. So what if I say two steps? I'm gonna take the lemma we already proved. It's called length append. And I'm gonna say, hey, remember that lemma I approved? That'll be useful. Try to use it as a rewriting rule. And I can say auto rewrite with core. And now it's applied here. And we're almost done. I just need to say these three steps should be done over and over again until theorem is proved. So this is an example of a loop here. This is actually a Turing complete scripting language. It's very different from the primitive proof term language with just 10 or so steps in it. We can build up whatever abstractions we need here with loops and recursion and data structures. So what we're really doing is we're writing scripts that find the proofs for us. A particular script is designed to be readable by humans. But because you can read the script doesn't mean you understand every precise detail behind the low level proof it's generating. And that would generally be completely overwhelming to have to understand those details. And a nice thing about writing scripts like this is that often if you change the system you're verifying, if you wrote the script well, it will keep working. And that's a partial answer to the concerns from that 1979 article about proofs keeping up with changes to systems. So it's an engineering problem just like good library design. We want to design good proofs that are adaptable to many similar situations. We don't have to give up any formal assurance if we do this properly. Okay, so here's this example which I didn't have time to explain in detail. I just wanna show how short this proof can be. I've proved some lemmas before. I won't even need those. What we can do here, induction E. Let's see, simple tuition. A few cases are left. What I see when I look at this is this program contains a program level pattern match over a term, this. We should, that's essentially a case split in the program. We should mirror it with a case split in the proof. So let's try that. Match goal is a construction for examining what we're trying to prove and doing something differently based on what we see. And what I'll do is I'll look for any pattern match. This is a syntax for matching any pattern match inside the goal. And I will do a case analysis on that and then simplify things afterward. And I'll say, do this over and over again on all the sub-goals that we generated. Now we're very close to done. We have just a bunch of arithmetic facts. And here we have Neval on a particular term which, given the definition of Neval, take that word for it, can be simplified. So if I just say simplify, not just below the line but everywhere, then those will go to simpler forms as well. And now I notice I have this form with a bunch of things are equal to evaluation results. I want to use those equalities to rewrite down below here. So let me do that as well. Repeatedly look for cases of hypothesis that says something is equal to evaluate something. Rewrite that and then get rid of it. So now all those are gone. We get simple looking things like n equals zero plus n. And it turns out, I have how many of these? It says I have 53 of these facts, a whole bunch of not too interesting things with zeros and pluses and everything. It turns out all of these follow from the axioms of the algebraic structure called semi-ring, which the natural numbers that I'm using here belong to. So I can just say prove that by ring and then oops, that is a wrap for that proof. So cool thing about this kind of proof, imagine I added a completely new kind of node to my syntax trees. Like I got really ambitious and added subtraction and then this same proof would pretty much work without any changes and even more ambitious changes to the language or the optimization could lead to similar outcomes. So this essentially is how proof authoring scales to large systems and to rapidly changing systems. So back to these slides. So this is the underlying technology for a bunch of different cool things that we can do with proof assistance. So the traditional mode, math proofs is the user has some sort of theorem in mind, writes a proof of it, passes it off to the proof checker and we make sure that it's convincing. We can do the same thing for programs where we start with an implementation and prove a theorem about it and check that that proof is true, that's called verification. There's also kind of the other direction called synthesis where we don't even have the artifact to begin with, we just have a wish list, a specification that says I sure wish I had a foobar that did Baz and then we actually pass that specification into some automated system that builds the implementation for us and actually builds the proof of the implementation at the same time and we can check that proof without having to trust the synthesis engine that did this work for us. Both of these can often be useful and they build on the same technology. I also showed examples so far that only write programs in the functional language that's built into the COC system, but it's such a rich logic that we can encode pretty much any other language you can think of, we can write down its syntax and semantics. So here are some examples of languages that developments in COC have worked with by formalizing what exactly these languages are. Going all the way from high level database query languages down to hardware description languages and we can define these, compose them, do all sorts of cool things. We can also use a wide variety of tools to construct our programs and our proofs. We can use compilers between languages, we can build automated provers that live within the COC ecosystem and generate COC proofs, we can build synthesis tools. The key thing is every tool we introduce in this ecosystem, just like say in a regular software ecosystem, everything needs to somehow connect to a common binary object format. Here everything needs to connect to a common proof format. The most straightforward way of doing that is to take each of these pieces, say some arrow that does translation and prove that the translator itself is correct. It always maps inputs to appropriate outputs. But another approach is to create a proof generating piece that for instance takes in an input, produces not just the output, but also a proof that that particular output is correct. And for different circumstances it's more appropriate to use one of these two styles. Luckily the same proof checker can be used with both of them. So to get more concrete, let me tell you about two projects that I've been working on recently. First one is called COMI. It's all about proof support for digital hardware. It's joint work with a number of other people at MIT listed here. And basically the story is, let's imagine that we want to support rapid open source development of new digital hardware systems by mashing up components from libraries. So for instance you might pull some processors out of one library, a Cascuer here in memory system out of another. Maybe we write our own new accelerators for a particular purpose. And by the way, all these boxes have proofs. So when you assemble your system, you don't just run it, you also combine the proofs of the pieces into a proof that the full system is correct. And you can potentially know correctness without needing to do any debugging if you've proved your top level, it's written your top level theorem correctly and checked that it follows from the constituent proofs. What do I mean by correctness? Well, imagine we have a highly optimized implementation of some hardware component and we've also written a specification which is essentially the simplest possible way to express functionally what that box is supposed to do. We ignore optimization performance, et cetera. We might call this a reference implementation, not just a spec. And we want to prove in the semantics literature we would call a contextual refinement property. What that really means is let's invite into the picture an omniscient judge who's going to very carefully inspect both boxes, he's gonna send them inputs, look at the outputs, and what he's trying to do is find any difference in the way they behave. So if this judge can ever get the implementation to have some externally visible IO behavior that the spec couldn't possibly generate then our system loses. But if in every possible interaction he could have, anything the left side can do, the right side can also do, then we say it is a correct implementation. And in our system we write these hardware components in the blue spec high level hardware language which makes it particularly straightforward to define what exactly are the legal ways to interact with the system. And importantly, many internal implementation details may not be observed directly just like you can't peek into the private fields of a Java class. And that's important for facilitating optimization without foreseeing the details in the spec. So a few words about the blue spec language, it's kind of in an object oriented style where program modules are objects that have private state and public methods that are the only ones allowed to access the private state. And methods can call other methods and other objects. Every method call appears to execute atomically without any interleaving concurrency. And we can summarize any atomic method call with a trace of all the sub method calls that happen. Some coming into this module, some going out to others with their arguments and return values. And then this property that connects the implementation to the spec is basically saying, any trace that my implementation could ever generate could also be generated by the specification. And this is a nice modular way of setting things up because it allows you to then link modules or objects together. As you do that, certain methods become private and they stop appearing in the traces and you can hide the details of how your spec is implemented with a particular hierarchy of optimized components. So let me show you a quick example of some code in the Kami framework. This one's called tutorial. Here is some code for, this is a producer consumer example with two hardware nodes with a queue in between them. This is the producer side. It's a module, it's a particular register in it and a rule that might run on a particular clock cycle, reads a register, calls a method of another module, et cetera. One thing I want to point out here, COC has an extensible parser. So in this framework, we've taught COC the syntax of this hardware description language. There's no built-in keyword module, register, rule, any of that. We've defined the syntax and semantics in logic of this hardware language and that allows us to use it to define systems and do proofs. And here's the consumer definition. It's a lot shorter. Then we define a spec that actually combines the producer and the consumer into one module and avoids using an intermediate queue. And then with less than 100 lines in the file, we get down and prove Ethereum that the implementation is a correct refinement of the specification. And take my word for it. All this can be scaled up to pipeline processors, cache-coherent memory systems. We've done all those examples. So we prove those individually, we compose them and also compose their proofs at the same time. And it is quite general. I also wanna mention one application area that we've been working on for this technology. Some of you might have heard of the risk five instruction set. I think there was another verification talk about it on the first day of the Congress. And I'm wearing a risk five t-shirt here. In the back it says instruction sets want to be free. Risk five is basically like the Linux of hardware instruction sets. If we think of Linux in 1995 or so, it is an open instruction set controlled by a nonprofit foundation in terms of the definition of what are the instructions, what do they do? And has a lot of opportunities for getting involved, including informal methods. So we've been working within the risk five foundation to formally define the meaning of the instruction set. We've been verifying components that you can use to assemble your own risk five systems with correctness proofs. We've been working towards some support for verifying the correctness of risk five machine code as well as infrastructure for composing those proofs together. And there are some nice consequences of this open model that fits really well with formal methods. So in the traditional world, the processor manufacturers control the intellectual property that defines the instruction set and the processors. And they actually don't want most developers to know how things really work because then they could potentially build competing processors and hurt the bottom lines of the companies that own the IP. But in the risk five open model, then we can have a formal semantics at the center of everything that says exactly what the instruction set is. Allows everyone to build a cloned version of the processor. All those can be made open source. They can all have theorems associated with them connected to this formal spec. Others can come along and mash up existing components. And it can dramatically lower the cost of getting into hardware development in this kind of space. So we're trying to build that formal semantics to enable this story. The second case study I want to talk about to finish up here is one we've done in cryptographic code, namely elliptic curve cryptography, which is used in TLS and many other settings. Also joint work with a few folks at MIT. And so here's the story here. Cryptography, as we know, is really important. And most applications using cryptography just take libraries off the shelf, which were written by a small handful of elite cryptographic implementers. Let's think about how hard they have to work to get everything together here. There's a medium-sized set of cryptographic algorithms for all the primitives we need. It turns out many of these are parameterized by large prime numbers that are the moduli for arithmetic. And there's a medium-sized set of different hardware architectures that we'd like to target. It turns out that in practice, for every choice of one circle from each of these bubbles, an expert spends at least a few days rewriting everything from scratch, possibly in assembly language. And this is not a very scalable process. And of course, when doing that sort of thing, we can believe that bugs could be introduced that have serious security consequences. So what if we could have a compiler that automatically did that work when you choose something from each of these bubbles and produces not just the fast low-level code, but also a machine-checkable proof that it truly implements the whiteboard-level math. So that's what we're after here. So we're doing a crypto verification. One kind of property people often prove about cryptography is that these high-level security properties like if you don't know the secret key, then it'll take you more than polynomial time to compute something. Through protocol verification, we can show that a particular mathematical algorithm meets such a property. And through an implementation synthesis process, we can then generate low-level efficient code that follows the mathematical algorithm. Our project is only in that second category. There's been a lot of interesting work in the first category, too. And our system has been adopted by Chrome through the boring SSL library to replace the manually-written elliptic curve finite-field arithmetic that they had had previously. You can find our implementation of curve 25519 in Chrome version 64, which is in beta now. And the next version of Chrome after that will have our implementation of P256. Together, these are the curves that are used in the vast majority of TLS connections. So really quick demo of running this generation process. What we've done is we've built a library of functional programs that capture the generality of arithmetic in this domain. So it turns out these numbers are way too big to fit in hardware registers. We have to be clever about splitting one number among many registers and then essentially implement grade school arithmetic ourselves with some catches for performance on popular processors. And we do that using standard functional programming stuff like folds and maps and so on. For instance, here's a definition of multiplication. It uses a flat map and a map. I won't go into details on exactly our representation. This is kind of a simplified version for demo purposes. But we get caught to process all this and make our way down to the bottom of this file where I'm going to essentially state the following goal. Assume we have two 10 tuples of integers. It turns out that on 32-bit processors for curve 25519 you want to represent each number with 10 different word size digits. So that's why these are 10 tuples. And we say, please find an element of this set. Tuples that are the correct answers to the problem of multiplying the two inputs mod two to the 255 minus 19. And I run a few steps here that I want to explain in detail to start symbolic manipulation. And then key step is here. This is going to run for about 10 seconds, I think. We're asking Cork to do some correct by construction derivation of concrete code for computing this multiplication result. And get a bunch of outputs like this. I think this is the last digit of the answer in some of the arithmetic for computing it. It has a lot of redundant parts to it. So let's use the ring algebraic laws to simplify this a bit more. And this one takes a few tens of seconds to run. But what's happening is the system is automatically computing the code that traditionally has been written by hand for this cryptographic primitive. And it's doing it in a way that doesn't just build the code, it also builds the proof that it's related back to the original whiteboard level modular arithmetic definition. And this will finish any moment now. And it will no longer have all these constants and that these will be combined into a nicer form. And the output that'll pop up here is not the end of the story. We also have later phases that lower it into C-like code, do bounds analysis to make sure we've assigned enough bits to each register. That all happens automatically given the other parameters. And here we have an example of the output. So this might look familiar to people who've been implementing curve 25519. It's still in a relatively high level form without breaking things down into individual temporary variables. But we didn't have to manually figure out, for instance, there should be a 38 here and 19 here. And all those things, it was done for us automatically. And you can put in different parameters and generate the standard code for any of the other curves and any of the hardware architectures that folks like to use them on. So to finish this up, our implementation automatically builds that code from a way of writing down the prime modules in a suggestive way, like the here. This is typically how these are written as sums and differences of powers of two or small multiples of them. So we wrote a Python script that uses that to generate a few lines of configuration to our system. And then we scraped the elliptic curve mailing list at moderncrypto.org to find every prime number that appears in the archives. So we could use them all as inputs for automatically generating code. There were only a few weird prime numbers posted there that are actually very inefficient to implement. That was probably a newbie to the list who made a suggestion and was quickly rebuffed. But we generated those two or tried to. Sometimes you ran out of memory with those. And we automatically built code for all these cases compared against the GNU multi-precision library, which is a very standard off-the-shelf library for big integer arithmetic. Here are our results generating for 64-bit x86 here for a 32-bit ARM running on an Android phone. And generally, the GNU multi-precision version of things will take two to 10 times longer to run one of the elliptic curve operations than with our code, which was created completely automatically and also has strong guarantees of correctness. So thanks. I'll just leave you with a few pointers to places to follow up if you'd like to learn more about this subject. Number one place to start, an online book called Software Foundations written by Benjamin Pierce and others. It introduces caulk just assuming standard undergraduate computer science material, no formal methods, or even functional programming experience. I should also advertise our multi-university initiative in the US called the Science of Deep Specification, or DeepSpec. In particular, in summer 2018, we'll be running a summer school that introduces all this stuff with some hands-on practice with hardware verification and C program verification and a few other topics in caulk. It'll be held in Princeton, New Jersey. And you can subscribe to our mailing list on the website if you'd like to get updates as we pin down more of the details. Here are the GitHub projects for my own projects that I mentioned here, and two of the others that we made code contributions to. Thank you. We have about 10 minutes for Q&A. Yeah, microphone one. Thanks to great talk. I enjoyed it really much. I have a question, what strategies you have for finding the right theorems. Because I think this is the hardest part here. And you said you make reviews for this. And I think maybe that's not enough. For example, when you have a reverse function and the sorting function, you can have the theorem that, for the invariant, that the length will always be the same when you execute it. But maybe that's not enough for the theorems. So what are your strategies? Because maybe I think a combination to have some concrete example test that tests the functions also good here. So theorems are maybe not enough. What are your thoughts on this? Yeah, so this relates to a number of the frequently asked questions, answers that I had in the middle of the talk. Generally, we build infrastructure to support applications. And applications tend to have more straightforward specifications in many cases than the systems infrastructure below. We try to set things up so if we got the infrastructure theorem wrong, we will not be able to use it to prove the application result. So in general, one technique is when you build a verified component, use it in as many different contexts as possible. And don't peek back inside the implementation. Only look at the original theorem you chose. And by forcing yourself to prove all the larger systems that did use your component, you will get out the bugs in the specification of the components that you're reusing. And you mentioned it might be useful to have a set of test cases as part of a specification for a component. That could be a good strategy also. But it's kind of covered by reusing your component in many different settings where you're naturally, say, calling your function with different arguments. And you need that function to be correct to get the larger system to be correct. Microphone 3. Hi. Thank you very much for the excellent talk. The specifications of many, many problems also involve time. So you have not only to have the correct answer, you have to have the correct answer before the TCP session times out, the user gets bored, the machine gets rebooted, your deadline is over. How do you write down time specifications in Cog like this thing will finish after at most this many milliseconds? Yeah, I think we'll probably hold off in trying to prove that users don't get bored for now, but maybe someday. But for the timeout property, I think, so we're working in a very general mathematical framework that can essentially encode anything that you'd find in a math textbook. And it's pretty easy to define what time is from the program execution perspective. And there have been projects that, for instance, prove that a C compiler preserves running time between the source code and the assembly code with probably a naive model at the bottom. And I don't have any projects of my own in that vein so far, but a number of people have looked at that. So I think my main answer is, very flexible framework. You can encode whatever you want. You can prove whatever you want. It's just a question of how much labor is going to be required. And we'll learn more about better abstractions that will reduce that labor over time. Thank you. We have a question from the signal angel. Yes, maybe somewhat related to the last one. So can proofs help against side channel attacks, for example, by proving a certain function runs in constant time because the main point of manual crypto implementation is to avoid side channel attacks and do the automatically created implementations that you presented address this issue? So the automatically created implementations I presented are actually in a very restrictive language that enforces constant time by construction. It's all straight line code. Think of it as the performance critical inner loops of the cryptographic library. And so yes, that one does, by construction, provide constant time. Other formal methods projects have done automatic program analysis to guarantee constant time properties. I think this is an important kind of analysis to do. It's among the easier ones for formal methods, which is great given the big payoff in security. So I'm optimistic about scaling up to whatever settings it's needed in. Microphone 4? Yeah, so about scaling up, to what extent is the user? To what extent do you look at human factors? Because in many cases, that's how this is break. Well, it depends. If you're talking about the humans who are writing the proofs, then that's us. We're very concerned about us. And we make an effort to help ourselves by improving tools. Do you mean the users of the software or hardware that we're creating in some way that they might be worse off because of this methodology? Yeah, so to what extent? So when you're talking about full system specifications, it's really a question of what do you include in your system? I see. Oh, you're asking do we formally model the user in the theorems? So we'll do things like the top level theorem of the hardware models that it has, an input channel and an output channel. We have no idea what's on the other side, but our specification says if you get this input, you need to give that output, that sort of thing. I don't know of anyone who's been doing anything like cognitively modeling users and what you should expect from them in this kind of setting. That could be worthwhile, though I expect there'd be a lot of disagreement about exactly how to write the theorem down. Thanks. Microphone 1. So I'd just like to start by saying that I already agree with everything you said here, so you don't need to actually assault me on any of it. I basically believe that the average industrial program was a Dunning-Kruger poster trial. And so anyway, Tony Hort said that null references were his billion-dollar mistake, and he has this following quote, which is, 10 years ago, researchers into formal methods predicted that the programming world would embrace with gratitude every assistance promised by formalization to solve the problems of reliability that arise when programs get large and more safety critical. Then he says more stuff and concludes, which is like the most dismal, sad thing I've ever heard. It has turned out that the world just does not suffer significantly from the kind of problem that our research was originally intended to solve. OK. And why do you believe that programmers will accept this now when they have, for 30 years, developed so many confused epicycles on the ideas of research that do work? In my experience, at least for some isolated communities of developers, like the authors of the main crypto libraries, they do worry about correctness, and they're very interested in exploring these kinds of techniques. So that's one answer, let's say the very high assurance domain where people are especially interested, especially willing to accept certain costs for correctness. The other answer I'd give is that I think we're progressively improving the usability and creating many component libraries that can do most of the work for you for each project so you don't have to start from scratch. And I think we're heading towards passing some critical point where the ecosystem is big enough that there's a fundamental change in the cost of applying these tools. And if we get there, then I think that'll make the difference for the kind of barrier that Hora was talking about. I agree with that, but I also think my example was null references. And all you need in order to eliminate nulls from a language are parametric polymorphism and some types. And people still say, oh, you know, nil, it's not a problem, it's all good. I don't make those kind of mistakes. Well, I guess we can discuss this later. There's another question from the internet. How does the halting problem limit what you can prove and is there the problem in real life applications? So the halting problem applies to arbitrary pieces of code that get thrown down in front of you and you get asked questions about them. Luckily, the pieces of code we care about were written by people who had understandings of them in mind. And I personally believe that the author of the code should be responsible for distributing a proof with it. And it is then decidable to check if the proof is convincing, which is the real problem that matters for this domain. Microphone three. The baby examples you gave us with proofs, the proofs to me looked still very verbose because the examples were very simple. So is there more optimization possible and people are doing that? There are a wide variety of approaches to proof automation. I think of those proofs as pretty short. They're definitely significantly shorter than the programs that they're applying to. It was maybe one fifth the size of the program, which if you look at these kind of proofs 20 years ago, it's a big improvement already and we can extrapolate from that and hope for even shorter proofs in the future. Last question, microphone three. You said that in the semantics of blue spec, methods executed atomically. So what actually happens if there's a failure during a method? Well, so this is in hardware land where within a method, everything's running just as propagation of signals over wires. So you have to plan ahead and detect the failures with guard conditions at the beginning and then you just don't look at what's flowing down that part of the circuit. No, that's it. I'm sorry, I have to kill it. Next talk will be in 15 minutes. Thanks. Thanks. Thank you.