 Welcome back to the post coffee session. So the next two sessions, which is this session before lunch and the session after lunch, our invited speaker is Sanjit Seshia. He is, of course, a very well-known name in this area. He is a professor in the Department of Electrical Engineering and Computer Science at University of California Berkeley. His research interests are in formal methods, independable and secure computing, with a focus on cyber physical systems, computer security, machine learning, and robotics. He has made significant contributions to the areas of SMT, SMT-based verification, synthesis, and recently on applying formal methods for verifiable AI and ML. He is a co-author of a widely used textbook on embedded and cyber physical systems, has several awards to his name. I won't list out all of those. And he's also a fellow of the IEEE. So we're very glad to have Sanjit with us today. He's going to give two talks on two different but related topics. Thank you, Supatik. Thanks for the kind invitation to come. Always very nice to be back here at IIT Bombay, which is my alma mater, and also nice to be back at the SAT SMT winter school, which is now in the fourth year, I think. And I remember speaking at the very first edition, and it's very nice to see how this winter school has continued and grown. So as Supatik said, I'm going to be giving two talks. And both of these are going to be more, I would say, on the application side of SAT and SMT. And also, I made a rather conscious choice to give a bit more of a breadth overview rather than go deep into one thing. So I'll be talking more about applications and then giving you a bit of a broad overview. But I hope to highlight some of the, I guess, my personal opinions on where I think SAT, SMT, and especially extensions of what we have today, can really have a nice impact. OK, so the first talk, this one, is going to be on a system called Euclid 5, which we've been developing at Berkeley and now between Berkeley and IIT Kanpur for the last few years. And I'll explain the rationale for building this new modeling verification system in the talk. But it's really the crux of it is about integrating formal modeling along with techniques for algorithmic verification and algorithmic synthesis and also data-driven learning. And the system was created jointly with a number of people, particularly Pramod Subramanyan, who was a post-doc with me and now he's on the faculty at IIT Kanpur. OK, so I want to start by putting up a quote, one sentence from a classic paper in formal methods. Does anyone recognize this? What paper this is? The hint is it's the very first sentence of a very famous paper. Sorry? The very first one. No, not quite. What's that? Trinity Rosner. Trinity Rosner? No, although you're getting closer. It's a good guess, though. So you would think, right? It's something to do with synthesis. This is actually the first sentence of the Clark and Emerson model-checking paper. And the interesting thing is to read this paper, and we always think about model-checking as a technique for verification, not synthesis. But they were really after a kind of synthesis in that paper. So one of the messages of this talk is that verification and synthesis are very tightly integrated, really. And it's not just the connection going back to this particular paper, where you can think of it as starting out with the aim to do synthesis and then ending up with a very nice technique for verification, but it's also going the other way. And this, using verification for synthesis, this is a trend that has especially been very productive in the last 15 years. So I'm just going to give my personal view. I'm sure there are many other groups that have been working on this. But my personal view is based on the work that my collaborators and I have been doing in what is called program synthesis, widely. In particular, we had a large project called the XK Project in the US, and Kuldip was a student, a graduate of the project. And there, some of the techniques we used was to use verifiers as oracles to answer queries that you use for program synthesis. So one technique is quite widely used today, which is counter-example-guided inductive synthesis. The counterpart of counter-example-guided abstraction refinement, but for program synthesis. And there's another class of problems called syntax-guided synthesis that I will tell you more about, which is actually very close to what our first speaker talked about this morning, which is using a grammar and having the grammar guide the search for programs broadly defined. So that's a connection where verification is used and solvers are used for synthesis. So a few trends. So first of all, one of the things that has been happening the last 20 years is that as people try to apply formal methods in industry and in large problems, the first tumbling box is often specifications. Where do the specifications come from? Where do the properties come from? And it's not just the properties you want to verify. It is the properties that you need to verify the properties you want to verify, like all the auxiliary invariance and all of that stuff. And specification mining, which is learning specifications from data has been a big enabler. And I myself had experienced working with automotive companies. The second is inductive synthesis. So the word inductive here is used differently from mathematical induction. It's used in the sense of induction from examples, induction from data, learning from data. And so inductive synthesis is synthesis from example. This is also a very dominant paradigm in program synthesis today. And it's close to machine learning in a sense. But I'll later on my talk, I'll make a distinction from purely data-driven synthesis. And the third, of course, is data-driven design. And by this, I mean really the use of AI and ML, specifically machine learning components in other kinds of systems. So all three of these trends have really come together in the last two decades. And now, I think one thing that we're seeing now is we need to build formal systems. So things like model checkers, verifiers, things like that, with a view that you're going to have in some kind of inductive learning and synthesis integrated into these. And so in this talk, I'll try to give a flavor of some of the things that I think are interesting in this area and some application domains. But I actually wrote a paper about this almost eight years ago, was published at DAC, and then a journal version in the Proceedings of IEEE. So I encourage you to read this, because I don't have the time today to talk about all of this. So broadly speaking, there's three connections here. There's connections between synthesis verification and learning. And then, as I just mentioned, around 2016, we realized that there was really no formal tool that made all these connections that allowed you to use synthesis seamlessly to solve verification problems or was able to integrate machine learning into solving verification tasks and things like that. And so that's really why we sought to develop a new one, which is called Euclid-5. So I hope to give you a demo of Euclid-5 at the end, but it won't cover all the features and its open source. And I encourage you to look at it and try it out, and we would love to get feedback. So before I dive into the talk, I just wanted to state what are the assumptions I'm making about the audience here. So I try not to make too many assumptions. So really, I assume that you know what is SAT, what is SMT, and what is model checking. Maybe not the how of how these are solved, but the what. So SAT, you're given a Boolean formula over some number of propositions or variables, P1 to Pn. And you're asking, is there an assignment to the PIs, such that the formula evaluates to one or true? SMT, similar thing, but now you have a Boolean combination of predicates over some underlying combination of background theories. And now the question is, is there an assignment to the variables in those theories that causes the overall formula to evaluate to one? Model checking, there's a couple of different ways to define it. I would say today model checking is defined quite broadly as a collection of algorithmic methods that are all based on some kind of state space exploration to verify if a system satisfies a specification, a formal specification. But if you go back to the original clock definition, in, for example, the model checking textbook, there it's defined very specifically as it's a technique to check if a finite state system is a model of a specification that is given as a temporal logic formula. All right, so I'm assuming that you know what these are. And I will refer to a temporal logic formulas and things like that in both of my talks. And I'll try to, if you don't know temporal logic, that's OK. We can, I'll explain what the formulas are as we go. Good, so this is the outline for my talk. I'll first give you some motivation, especially of one of the problem domains that also drove us to create Euclid V. So it's not just the connections between synthesis verification and learning, but also we came across a new class of problems that we felt were not well served by existing tools. So I'll talk about that, which is the verification of trusted computing platforms. Then I'll talk about the use of synthesis, particularly syntax-guided synthesis, and a flavor of inductive synthesis I'll call formal inductive synthesis. Then I'll tell you about Euclid V, and I'll give you a demo of some of its features, and then we'll conclude. All right, so we'll start with this one. So this is the area of the general area of secure computing. And now that more and more computing is moving to the cloud, we all have this goal of, we want to have secure remote computation. What does that mean? So imagine that you're the client on the left, and you're using services that are in the cloud. So an Azure or AWS or something like that. And you have these sorts of objectives. So maybe you're using email that's hosted in the cloud, or you're using a file sharing service hosted in the cloud, and some of your data is secret. And so you want to have a guarantee that does my secret data remain secret, that nobody who is not authorized to touch the data should be able to see it. The second is maybe you're using programs running in the cloud. You're paying for them. So a good example, actually a good example for all three of these questions, at least in the US, is tax preparation software. So you're using tax software that is hosted in the cloud. Maybe you've paid for it. So you want to know, first of all, that the private data that you're using for computing taxes should remain secret. Secondly, you're paying for this program. So you want to make sure that it's running as it is specified to be. And thirdly, you want to know that the program that you're paying for is the one that executed in the cloud. So that particular program should be running. So the first question is a question about confidentiality. The second question is one about integrity. And the third question is a question of attestation. You want to know the program running in the cloud is the one that you want. So now that's the broad setup. Those are all the high level objectives. So what are the kinds of attacks are possible? So you're using your browser. So I'm using, in this case, a simple bank application. And to log into your bank account, you pass your username and password to the server. It sends back secret data. Imagine the property you're interested in is confidentiality. And so a very classic set of problems that people have looked at is network or protocol attacks. So there's somebody who can snoop on the network traffic. And you want to make sure that they can't see what's being transmitted. That is a secret. Another class also very well studied are software security problems. So you have vulnerabilities in the software that can be exploited. And you want to know that you want to make sure you have the absence of those vulnerabilities. More recently, there's been concern about vulnerabilities that are lower in the stack. So in the operating system or the virtual machine, we call these software infrastructure attacks. And then even more recently, there are concerns about hardware, so that there can be problems that are the attacks in the hardware, so either at the level of the micro architecture or even at the circuit level. So these are a very broad class of attacks. And I've shown them on the client side, but they can also happen on the server end. So one of the things that arose to help combat all of these attacks is the use of trusted hardware. The general idea being that software is very hard to fully formally verify and make sure that none of those attacks are possible. Let's try to push features that were traditionally implemented in the OS and the hypervisor into hardware. And particularly isolation between processes. And so this idea had been known for a while, but about seven years ago, Intel said, we are going to implement this in the next generation of our x86 processors. They call these SGX, Secure Guard Extensions. And then they made a bit splash about it. They started selling processors, implementing SGX. And that's when this really picked up steam. So the whole idea was that you can write an application in a way that you encode the security critical part in what is called an enclave. So an enclave is a region of memory, including both core and data. And the guarantee the hardware provides is that the enclave is completely isolated from the rest of the computing stack. So no process that is not authorized to read or write to the enclave can actually do that. This is guaranteed by the hardware. So in the case of Intel, they said, we have formally verified this. Trust us. Everything will be fine. But then there were groups that also said, well, we don't fully trust Intel because ideally we want to be able to see what is being verified. And so in particular, there was a group at MIT that built a processor called Sanctum based on the open RISC-5 ISA. So the general worldview with enclaves is that now you can partition your application into two parts, a relatively small part that goes in the enclave, and then a larger portion, which is the rest of the stuff. So imagine computing with a standard map reduce type of framework. And then you have a mapper program and a reducer program. And what you can do is you can make sure that a small chunk of that program, even if it's implemented over a big system like Hadoop, there's only a small part of it that is sitting inside the enclave, which will take encrypted key value pairs and then do the decryption inside the enclave and do the computation inside the enclave. So I'm not going to give a full tutorial here, but the general idea here is that now there's a combination of hardware and software that is giving this guarantee of isolation. But enclaves are software themselves. And if people program enclaves in the wrong way, they can be exploited. So that's a challenge. And so there's a verification problem here, which is you want to make sure that enclaves are written in the right way, that outputs are always encrypted. There are no site channels that lead secret. I'll say more about this later. And then ideally you want the guarantees at the level of machine code, so that you reduce your trusted computing base and you cut out the compiler. And furthermore, about a year and a half or maybe two years ago there was a big news that certain classes of processors, quite a wide set, can be exploited. In fact, features, very common features in micro architecture that have been implemented and taught for a few decades like speculative execution can be exploited. So we had the news of the Spectre and Meltdown bugs so this is at the level of hardware, you want to make sure that if you have a platform that claims to provide secure computing with enclaves, that it should be robust, this sort of attack. So that was a quick whirlwind tour of the topic. And so what we started looking at was how can we formally verify trusted enclave platforms to provide this guarantee of secure remote execution? So this is what I'm going to talk about in the next 10 or so minutes. So the idea here is that the user on the left has a program and some secret data that they want to compute on a remote server. And so everything in red on this slide is untrusted. So they are sending it over an untrusted channel and there are other programs and the operating system and the hypervisor, the software stack which is not trusted. The only thing that is trusted is the enclave itself and the enclave platform. And there is a way involving cryptography that the user can make sure that they can send the enclave over encrypted and make sure that it's set up in a way that they know that that's the program that's running on the platform. So let's ignore that part. The questions we want to answer here are I gave you some minutes ago. I gave you sort of informal language goals of secure remote execution. But if we want to do verification, we need to be able to formally verify it and formally specify those things. So the first question is what does it mean precisely? The second question is Intel has implemented a bunch of features and they claim secure remote execution. The RISC-5-based platform also implements a different set of features. But what is really required? What is the minimal set of platform features that is required to guarantee this? And then given these two things, how do you formally verify that a given platform implements secure remote execution? So that's the question that we really set out to answer. And the details are in this paper that is cited. So the first thing is I'll tell you about a formal definition of what secure remote execution is. I'll then show how you can decompose this into three kinds of properties. Then I'll talk about the formal model of this idealized Anklev platform, which I'll call the Trusted Abstract Platform, or TAP. And then I'll tell you about our initial attempt at verifying these models using SMT-based approaches. So by the way, stop and ask questions if anything is not clear. So I'll be at this point, what I'm going to do now is describe to you what we did. And then at the end of it, this is all a set of work that we did without using Euclid 5. But it was kind of the motivation for us to create that system, one of the motivations for us to create the system. So here's what the formal model looks like. So what you're seeing on this slide is think of the lifetime of application running on a platform. And so what happens here is each dot is a state, and each arrow is a transition. So this is a trace. And the idea here is, at some point, when an Anklev is created, we call that a launch. So the region of memory is created, initialized with code and data. And then when you start executing the Anklev, there's a part of your browser which is not trusted. So when it starts working on your secret data, it will then enter the Anklev. So everything in the box here is when the code from the Anklev is running. So there's an enter point and an exit point. And then when it exits, it yields a control back to the untrusted host application. And so if you look at the trace of the Anklev and other programs running on the system, it's going to look something like this, where there are subsequences of the trace that correspond to the Anklev execution. So we're going to take the concatenation of all these boxes, and that will be a trace of the Anklev. You look at the set of all possible valid traces of the Anklev, and that will be double bracket E. Then another important aspect of doing verification for security properties is you have to model the adversary. You have to create a formal model of the adversary. And so in our formal model, we assume a so-called privileged software adversary. So think about this as a program that can run at the highest privileged level in the software stack. So it's basically running with OS level privileges, kernel mode. And this can try to trample with the Anklev by executing any arbitrary operations that are available to it. And it can also observe anything that is written out on a channel that's observable for it. So the way we'll model this is using two functions, a so-called tamper function and an observation function. And you can just think of an observation function, for example, as taking the entire state of the platform and projecting it to the subset of that state that it can observe. Further, the adversary can run at any point. So now you can see that all of these red arrows correspond to the adversary operations. And we assume an interleaving model of execution where the adversary operations can be interleaved with the Anklev and other program operations. And so now this is our definition of secure remote execution. So we say that the remote platform securely executes the Anklev program if two things hold. The first is that now this Anklev program is running on this remote platform with other untrusted software. But we still want the set of Anklev traces to be preserved. So any execution trace of E, those boxes, the concatenation of those boxes, on an untrusted platform have to be equivalent to the set of traces that you would get running on a trusted platform where only the Anklev was running and no adversary. And the second thing is that we provide in certain observation function the things that the adversary can observe, the part of the state that the adversary can observe. And the knowledge has to be restricted to that. So that's the definition of secure remote execution. And this definition is a little abstract. It's basically saying that the semantics of the program has to be preserved as if there was no adversary. And further, the adversary knowledge is restricted to this observation function. But it's not so easy to verify as is. And so what we have is a so-called decomposition theorem which says that there are three properties that you can decompose that to, which are called measurement, integrity, and confidentiality. And if you have those three properties, that implies SRE. So these are the three properties. So the first one is measurement. And informally, it means that you're executing the right Anklev. The second is integrity, which means that your adversary can only influence the Anklev's execution or the Anklev's state through inputs that it can provide. So it can't really change what the Anklev does any other way. And the third is confidentiality, which says that the adversary knowledge is limited to the observation function. So this is still a little informal. Let me give you a little bit more precise definition of one of them. So the first thing I want to mention is that all three of these properties are what are called two safety properties. So in the set of properties of systems, there are things called trace properties. And trace property is one where if you're given a single trace of the system, you can tell whether or not it satisfies the property. But in the case of these kinds of properties, you can't tell with just a single trace. In general, you need some set of traces to tell whether or not the property is satisfied. And a common way to formalize confidentiality is using notions called non-interference-based formalizations of confidentiality, and this is one of those. And so the idea here is that this one is called observational determinism. The idea here is that there are two things. Imagine there are two entities executing on the platform. So there's the adversary, which is the red arrows, and the Anklev, which is the green arrows. And what we want is that the adversary's observations has to be a deterministic function of the adversary's own state and the public outputs that the Anklev generates and nothing else. So what the adversary can observe and compute can only be a deterministic function of its own state and what the Anklev outputs publicly. And so one way to look at that is imagine you have two traces where the adversary does the same thing. So all the actions here are the same, but the Anklev does something different. So here you have Anklev E, and there let's say we have a different Anklev E prime, but they both produce the same public outputs. So this ob's arrow, this ob's label here, that is indicating that as far as the outputs from E and E prime, they are the same. What it's doing internally, internal computation can be different. And so what we're saying here is that the adversary is identical, but the Anklev is possibly different, the adversary will continue to be identical. That is from the adversary's viewpoint, it cannot tell whether it's running on the platform with E or with E prime. That's what this property is. It's a very similar formalization for the other two. Yeah. So the question is, do we allow the adversary to use randomization? And in practice, the adversary can use randomization, and we just model it as a non-determinism in the adversary. But the point is the adversary for this will have to make the same non-deterministic choices. So you're saying, well, what happens if the adversary does different random things, right? In this particular case, what we want to do is make sure that the adversary's state is controlled for the internal choices that the adversary makes, right? So the viewpoint of this property, right? You're just saying that, assume the adversary does, makes all the same choices. If 90% times you are able to attack, so in that case, 90% of the random choices would be fine and 10%. Yeah, so what Kuldip is saying, making is a good point, which is about a lot of the cases. Something like this is in general too strong a property, because in the case, you may still have a case where the secret is, the notion of confidentiality is more quantitative, right? They're okay with some number of bits being revealed, but, right? And in this case, you would flag an error, even if one bit is being revealed. So for now, let's ignore that aspect when we're using a non-quantitative version of confidentiality, okay? If you wanted to use a quantitative notion of confidentiality, as you know, right? You'd have to, under the hood, instead of using SAT and SMT as we do, you'd have to use something like model counting, right? All right, good. So now, that's the flavor of property we are verifying, okay? So first of all, the takeaway message I wanted from that particular thing is that the flavor of property is not the standard type of safety or liveness property that you're verifying with something like model checking. It is these two safety properties, okay? So you need verifiers that can support that. The second is that you have, you want to be able to model the platform, right? And so, the key question is, if you have a secure computing platform like SGX or Sanctum, then what are the set of primitives it should support? Okay? And so, this is what we created. We created a formal specification for platforms like Intel's SGX that is independent of the, you know, specific instruction set architectures. It also includes multiple adversary models. And so, you can kind of compare multiple platforms, security guarantees using a common formalism, okay? So this is what the TAP model looks like. First of all, you have to model the abstract state of a CPU and associated memory and, you know, other data structures used for keeping track of enclave state. And then the idea is that the trusted platform is something that exposes a set of operations that the applications can invoke, okay? And we came up with a set of, you know, about 10 such operations, okay? So the first is you have to model, you have to keep track of what happens when you do memory operations, okay? Because when you do things like a load or store, right? From address and memory, it matters whether the address is in the enclave or not in the enclave, right? So it does something to the underlying enclave metadata and so forth. Okay, so that's, you have to model those things. And then you have to model also address translation because that's one of the ways in which these platforms track who is trying to access what region of memory and then they have appropriate safeguards to make sure that even if the OS is compromised, it can't like change the page table and change the mapping, right? So that they can read secret data. Then you have all these things that I talked about before, like create an enclave, destroy an enclave, enter, exit, pause an enclave and then resume it and so forth. And this is something where you can take a cryptographic hash of the enclave region, okay? So those are the set of operations that we found was enough to provide this guarantee of secure remote execution, okay? And so the TAP model then becomes something like the ISA. It becomes a contract between hardware and software. So for the hardware developers, if they prove that their hardware implementation refines the TAP, then they know that it satisfies secure remote execution. And for the software developers, they can develop their libraries to help people program with enclaves as long as they provide the set of operations that spans all the operations in the TAP, they know that they're compatible with all the underlying platforms, okay? And so we actually have a formal specification, our first formal specification of the TAP publicly available. And this was actually created using the Boogie program verifier from Microsoft Research, okay? So Boogie for those of you who don't know is a system for sequential program verification. So verifying the kinds of programs that our first speaker talked about in a language that the Boogie folks have devised, and they can use backend SMT-based approaches to basically check if the verification condition is satisfied or not, and the default is Z3, right? And so this model is in Boogie, and I'll have more to say about this in a bit. The other crucial aspect of the TAP is the adversary model. Okay, so it's very important to model what the adversary can do and cannot do. So we had three kinds of adversaries here. So the first adversary we'll call M, and the idea with adversary M is that, M stands for memory. So this adversary can basically invoke any operation in the Trusted Abstract Platform specification with arbitrary operands. And they can observe anything in the region of memory that's outside the enclave. So they can see all of the memory, except the part that's in the enclave, okay? MC is similar in terms of the tamper function, in terms of observation function, they can also observe the state of the cache. So this is getting into side channel attacks where typically there is some mechanism for the attacker to observe which regions of memory have been loaded into the cache. So you can observe certain cache lines. And in this case, we kind of abstract the mechanism away and just say the adversary can directly read the cache. And then MC and P is where the adversary can also observe the page table state, okay? All right, so the first thing is the question, if you have a formal model of the TAP, okay? And then you have the secure input execution properties which are these three hyper properties, the three two safety properties I talked about. And then we were able to prove using boogie that our TAP specification satisfies all these properties for the first adversary and also for the second and third, but with some riders. So for the second one, you need the cache sets to be partitioned, right? So the enclaves cache sets cannot collide with the cache sets of the non-onclave programs. And for the most powerful adversary, you also need that the enclaves page tables are private. Some mechanism has to be used so that the OS cannot read or write to those page tables. And using those, we were able to show that SGX, Intel's SGX version one is secure for the first adversary but not for the other two. And we were able to replicate attacks that at that time had already been known in the literature, but in a formal way. And we can show that sanctum, which is the MIT-based MIT risk five processor is secured for all these three because it actually does these two things. So this is what the high level structure of the proof looks like in boogie, okay? So here you have the property of secure input execution. We have a TAP model. We want to prove that it satisfies this. For that we have to be able to prove two safety properties and the way this is done is using something called self composition. So it's well known where if you have a two safety property, you can prove it as a safety property on a system where you make two copies of the program and you run them side by side. Okay, I'm gonna show you a demo of Euclid later where we'll do exactly this and you can see how this is done. And then we have a model of SGX in boogie and a model of sanctum in boogie and we prove refinement. And this refinement is done using a standard simulation-based proof using induction, right? So we basically are saying, well, if you're in the state of the platform which corresponds to a state of the TAP and then you make one step here, then the TAP can simulate that step, okay? So that's the kind of proof here. So what adversary would you do this? I mean, adversary is adversarial. Yeah, so these are the three adversaries that we, so all these proofs are for these three adversaries. So they were repeated for that first adversary, the second and the third. Okay, so this was used, done using boogie which is a fairly automated verifier, right? For some of you might have used it. But the effort was significant. The manual effort was significant, okay? And so overall, the number of lines of code in the non-white space, non-blank lines of code was about non-comment, was about 9,000. And if you download this model now and you run it, you will, it'll complete in a few minutes, right? So it's not that hard to complete the proof now but it took four person months to get the model to that point. And a lot of the hard work was actually what our first speaker talked about which is coming up with all these auxiliary invariants, right? And so it was really more like a, I wouldn't say automatic verification effort. It was really working with some kind of, I mean, a very highly automated proof assistant but still there was a lot of manual effort there. The other aspect of it that we realized was that it wasn't the right language to use, to model for this kind of platform because these platforms involve both hardware and software. There are changes to the hardware design in the hardware description language and there are changes in firmware and then there's changes to the software layers. And in order to do the proof, you need to model all of these, right? And so, Boogie is excellent for what it was designed for which was sequential program verification but this is not a sequential program. This is a concurrent system, right? But it's a concurrent system which has software components, right? So you would like to be able to have procedures and preconditions and post-conditions and all that good stuff, right? So if you look on the other side, if you look at hardware verification tools, right? And tools like new SMV and so on, they are excellent for hardware verification, modeling concurrent transition systems, but they are terrible at modeling and verifying sequential programs, okay? So you need to have something that has both of them. The second is actually the point that was made very well this morning which is you need a lot more automation in the verification process, right? So generation of inductive impedance as a big one but that's not the only thing. You also want to be able to do for concurrent verification assume guarantee contracts and then for verifying hyper properties which are the types of things that I talked about, you need ways of automating the verification of those things as well, okay? And then finally, we would like to take the whole verification to be incremental and compositional because that's the only way to scale, okay? So the next part of my talk will focus more on these two aspects, okay? Which is really the use of synthesis to try to automate it. Then I'll come back to Euclid 5 and I'll give you a demo and show you how we can combine both modeling sequential programs as well as concurrent systems in the same formal system, okay? So when we talk about verification, even so-called push button verification like model checking or program verification using, you know, whole style reasoning, there's a lot of things that have to be generated or synthesized, right? So inductive invariance clearly, right? But also abstractions, right? So in practice, you can say, well, I use this abstract domain and I was able to prove it but how did you come up with that abstract domain? That's a challenge. Auxiliary specifications, so if you're doing modular verification, you often have to have the pre-post pairs or function summaries. If you're proving refinement, you need to come up with a simulation relation. If you are doing assume guarantee reasoning, you need to come up with the environment assumptions, interpolates, ranking functions, various kinds of lemmas. In fact, even inside SMT solvers, there is synthesis going on, right? In generating lemmas and patterns for quantifying instantiation and so on. So there's a lot of synthesis happening inside SAT SMT and verification tools, right? And so really the effort that we and many others have been trying to do is how can you automate that, right? So it's going back to this picture and what is, I think maybe I'm underlying the point but it's really, what is crucial to this is coming up with the right model. The right formal model is really crucial to be able to automate all of this. Okay, so now I'll give you an example. So this is actually an example that I think a version of it came up in the first talk this morning, right? So now, but I'm gonna model things differently. I'm gonna model as transition systems, okay? So a transition system has state variables. In this case, there are x and y which are integer variables, okay? And the initial state is x is one, y is one. The transition relation is that x is updated to x plus y and y is updated to y plus x which is the same as x plus y and that's a simultaneous update. So x and y are updated together. The property you want to prove is that always or globally, y is at least one, okay? And the way you do it by induction is you would prove the base case. Well, that checks out but then you'll prove the inductive step. So you'll say if y is greater than equal to one and x and y change like this then y remains greater than equal to one, right? And if you do the proof by, try to do this, encode this to your favorite SMP solver, it'll fail because you didn't have enough restrictions on what x is, right? And so what you need to do now is find the strengthening. So this is a synthesis problem. Find the phi such that this holds, okay? And so that's one of the fives that works, okay? So again, this was made very nicely from the first talk today that safety verification is can be reduced to inductive synthesis. In fact, the general idea is that a lot of verification tasks can be reduced to synthesis, not just invariance. So the reduction that you see here is you have a transition system with the initial state and transition relation, you have a safety property. The verification problem is does the model satisfy the temporal logic property. The synthesis problem that it transforms to is can you synthesize a strengthening of the little phi such that the base case holds in the inductive step holds, right? Okay, but you can also do another synthesis problem which is say, well, I wanna do abstraction. So synthesize an abstraction function that maps the set of states to a set of abstract states, okay, such that the abstract model satisfies the property if and only if the true model satisfies the property. And in fact, if you look at counter example, guided abstraction refinement, it is basically a way of synthesizing the abstraction or the abstract model in an iterative fashion, okay? So the point is that big long list that I had in the earlier slide of all the artifacts that are synthesized in verification, you can actually formulate the synthesis of all of those in a similar fashion and then use some form of synthesis to generate it, right? So the particular flavor that I'll tell you a little bit more about is of synthesis is called syntax guided synthesis, okay? So this is a problem that some of us in the XK project created that just tries to capture what people were already doing, right? So the TCS group we saw we heard you have a grammar and then you use the grammar, you search to the grammar of candidate invariants and then you check whether those are really true invariants, right? In program synthesis, what people were doing was searching through a set of possible template programs and then checking if they satisfy the specification, right? So syntax guided synthesis or SIGAS is a problem that is designed to capture this sort of thing, right? So what you do here is you first fix a background theory or combination of theories. You fix the thing you want to synthesize. So we'll think of it as a function here, okay? And we think for this talk, I'll just think of one function but in general you can have multiple functions to be synthesized. And then the SIGAS problem is the following. You're given a specification phi which you think about as an SMT formula which is in the combination of EUF uninterpreted functions and the theory T, okay? And then you're given a grammar, a context free grammar which produces expressions that you will use as substitutions for phi. You'll try to synthesize an implementation for phi, okay? So the SIGAS problem is the following. You're given phi, you're given T and phi contains F and you're given the grammar or the language of the grammar E. And you want to generate a little E in capital E such that if you replace F by E in phi, the resulting formula is valid in the underlying theories. So the correctness specification phi is your SMT formula and it contains a function F which is treated as uninterpreted. You want to replace that with an implementation where the implementation is drawn from the language of a grammar and you want the resulting formula to be valid. So let's look at an example, okay? So let's say the underlying theory T is linear integer arithmetic, quantifier free theory and F is a function, it's a binary function. It takes two arguments X and Y, returns an integer. Here's your specification. So in this case, you say X is less than F of X, Y and Y is less than F of X, Y and then F of X is equal to X or Y, right? So what's the function here? X, max, okay? So max of X and Y. Now let's say this is our grammar. So grammar is the set of all linear expressions. So you can either have X, Y, a constant, integer constant and you can create a linear combination. And if you run, if you try to synthesize something, you will find that there is no solution, right? There is no expression from this grammar which if you can plug it in, it'll make this formula valid, okay? And the key here is that to synthesize max, you need to be able to compare, right? X and Y. And so a natural way to fix this is you then introduce the if-then-else, a conditional construct and then you can get a solution out, right? So this is an example of Sygus. So the Euclid-5 solver that I'm gonna show you now, it uses Sygus to generate things like invariance. So what it'll do is it'll take a candidate problem, it'll compile that into one or more Sygus instances and it will solve them to generate invariance or other things, okay? So how it solves Sygus is a different point, okay? But the idea is it reduces problems to a sequence of Sygus problems, okay? So let me now touch upon very briefly what it takes to solve Sygus problems, okay? So one thing that you might say is solving Sygus the same as solving SMT with quantifier elimination. And the answer is, well, it depends on your definition of SMT, but certainly if you have quantifiers only over first order variables, then you can't really do Sygus using quantified SMT. Sometimes you can reduce it depending on the grammar. So for instance, if your grammar is all linear expressions, then you can do the standard thing where you introduce parameters A, B and C and then you turn this into a quantifier over the coefficients, right? But even if it's not quantified SMT, there are ways in which quantified SMT problems are solved that can be reused for Sygus. In fact, the CBC4 SMT solver was turned into a Sygus solver precisely doing this, right? They just reuse the techniques, the heuristics they were using for things like quantifying instantiation. In general though, and this is an important thing, Sygus problems are undecidable for even very simple theories that are decidable for SMT unless the grammar is suitably restricted. So the source of undecidability in Sygus often comes from the grammar, right? So if you don't, if you bound the length of expressions, then everything becomes decidable. But if you don't, you just specify a grammar, then very quickly things become undecidable. We had a paper on this four years ago if you're interested. Okay, so how is Sygus solved? So this is the connection now to learning, right? So the way Sygus is typically solved is using a class of learning that I call oracle-guided learning. And in particular, what I'll show now is counter-example-guided learning. So the idea here is you have really what is a learning algorithm. So this is inductive synthesis. It's learning from, synthesizing from examples. It doesn't know anything about the specification, okay? So it gets a set of examples and it has the grammar and it synthesizes an expression that satisfies all the specification on the examples, right? So it's consistent with the examples. You pass that candidate to a verification oracle and that oracle has access to the specification. So it checks, is the candidate expression, does it satisfy the spec? And if it does, then you are done. That's where you have success. If it doesn't, you produce a counter-example. And this counter-example is added back into the data set and then you rerun your learning algorithm. So really, this is the common way in which all cyber solvers work, including the one in CVC, right? Internally, they have a counter-example-guided learning algorithm, okay? So as an example, if you take the specification for max that I showed you earlier and you take your grammar to be things that include the if-then-else operator, then the way this will work is you start with no examples and then you can come up with any expression from the grammar, say x. And then the verification oracle will say, well, this doesn't work because if x is zero and y is one, then the max is gonna be one, this is gonna be zero. So that becomes your first example. Now the learning algorithm has to produce something that will work for zero, one. So let's say it produces y. Now the oracle will say, well, that doesn't work for x equal to one, y equal to zero. You add that back into the set and you'll build up a set of examples and at some point it comes up with the right expression, okay? So all Seiger solvers today use this basic approach and where they differ is inside this box, okay? For the most part, okay? So this approach, by the way, is called counter-example-guided inductive synthesis, often goes by the acronym sieges, something that we came up with in 2006, but there are lots of very nice extensions of sieges for solving Seiges. One of the first ones was actually using an enumerative approach, very simple enumerative approach. Another one was using a SAT-based approach. Basically you take your grabber and you encode your productions using Boolean variables to indicate which productions are being used and then you solve the SAT problem and you extract the solution from that. A third one is using stochastic approaches based on work from Alex Aiken's group on super optimization and this is actually the state of the art as of 2013. So we are already six years later, there are lots and lots of alternative approaches, including approaches that actually use machine learning algorithms in the learning box. So people have experimented with neural networks and decision trees and so on. Okay, but this is only one way to solve these synthesis problems. Okay, so using counter example guided synthesis. So more broadly, this class of synthesis is what we characterize as formal inductive synthesis. So a few years ago, we wrote this paper where we said that a lot of these uses of synthesis and verification are very different from the kinds of synthesis from examples that people are using elsewhere in programming by demonstration and so on. And the key point is that you're trying to synthesis from examples while also trying to satisfy a formal specification. Okay, so really the paradigm is something like this. You have an oracle that knows the formal specification and you have a learner and there is an interface through which the learner and the oracle communicate. So through queries. And the most common one is what I showed you, which is the query is the candidate expression correct. Right, that's the counter example query. But there are a lot more types of queries and that's what we have in this talk about in this paper. Okay, so generally the generalized version of this is that you're given a class of artifacts. Think of that as specified by a grammar if you want. You're given a formal specification. Then you're given a domain of examples from which you can draw and you're given this thing called the oracle interface. The oracle interface is all the questions that the learner can ask the oracle. And the formal inductance in this problem is that you want a learning algorithm that has to adhere to that interface and using that interface it has to find a candidate specification. But the point here is that you can have queries that are a lot richer and different types of queries than just counter example queries. So I'll just make two points here. One is that you can, it's been shown that you can use this sort of oracle guided synthesis to generate formal models from implementations. So imagine that you have a large body of code that is either in something like C or maybe in the hardware level like Verilog or VHDL and it's impossible, just won't scale to be able to verify some property directly on that. However, what I'd like to create is an abstraction of that, a small more compact model. I'd like to synthesize that abstraction from the implementation. How do I do that? And it's been shown that you can use not counter example guided synthesis but a different oracle guided synthesis to generate this. The second thing is this, those of you who may know the literature on query based learning. So Angluens algorithm for learning DFA's and things like that would see a similarity between this picture and what they do there. But the key point of difference between this and that is that in things like Angluens algorithm, the oracle is fixed and you cannot change it. And here you can actually design both sides of this. All right, so much for the detour over for synthesis. So now what I'd like to do is talk a little bit about Euclid 5 and give you a brief demo. So just to recap, when we did this verification of trusted platforms, what we realized was two things primarily. One was we needed a better modeling language for both hardware and software and something which includes both of them. And secondly, you want more automation which we're trying to achieve through synthesis. Okay, and so that's what led to Euclid 5. Some background, so Euclid 5 is an evolution of an earlier system that we created called Euclid. So Euclid was actually one of the first SMT solvers and SMT based verifiers. And it was based on something we call term level modeling. So the idea here is really, if you're familiar with hardware model checking, for instance, that is typically your modeling systems using either Booleans or bit vectors. And here you're trying to model using a combination of theories that is available with SMT. And you can do things like modern model checking and K induction and checking simulation and so forth. Okay, and so the Euclid system is something that we used and maintained on many projects until 2014, at which point we started working on this new domain and I found that this old system was not really a good match because it was created for modeling hardware-like systems, okay, hardware model checking. And for here you have to have something that models software. So what I'd like to share with you is where the new Euclid 5 sits in the space or at least aspirationally. And I want to make a comparison with different types of tools. So I just picked a few examples that you may know. So ABC is a model checker from Berkeley, from a Bob Braden's group for hardware model checking, one of the best hardware model checkers from academia today. New XMV is an evolution of the new SMV model checker which uses SMT. Boogie is the one that I told you about earlier. Coq, of course many of you will know it's an interactive theorem proof assistant. Euclid is the old version of Euclid and Euclid 5 is the one that we've just created. And what we wanted was really a combination of all of these which is why the last column is green. So green means we want something that can handle all of these things. So first of all we want, in order to model or do verification of these kinds of trusted platforms, you need something that doesn't just model things at the bit level or bit vector level, okay? So you want more abstraction, you want a high degree of automation, you want multiple types of verification. So not just doing sequential program verification or model checking linear temporal logic properties, you want really a combination of these. You want modularity, we want importantly both the ability to do sequential software updates as well as concurrent updates like you have in hardware. And then you want support for generating counter examples that is really useful. This is as many industry folks will attest, generating counter examples is really a really important aspect of a verification tool, right? And so without going into the details, what we found is that there are tools like Boogie for instance that is excellent on the things that it was designed for but really poor on some of the other aspects, right? And so what we found was there was no tool out there that could do all of it. And that's what we are trying to build here. So Euclid 5 is a verifier that is, would be very similar to the kinds of verifiers that you might have used before. So you start with a model and then it goes through a front end phase, type checking and instantiation and composition and so forth. It goes through a back end phase and then it invokes a back end SMT solvers as well as SIGAS solvers, okay? And then you have all the types that you would have in using SMT. And here's the high level structure of a Euclid module before I go into the demo. So one important aspect of Euclid is that it has both support for modeling concurrent systems as well as sequential software, right? So the concurrent systems, you can think of those as a composition of modules, okay? So the unit there is a module and a module will contain a number of things. So you can define your own types, you define state variables, inputs, outputs. You can define the initial set of states and the transition relation. You can define a whole bunch of properties. But then you can also define procedures. So procedures are the things that you would have in a boogie, a language like boogie, right? You'd have a procedure with precondition, post-condition, assumes, asserts, things like that. And then at the end, we have what is called a control block which is where you write a little proof script except that you're not really doing it interactively. This is just giving a set of commands to the tool, okay? So what can you do? We can verify everything that boogie can do, Euclid 5 can do, right? So all the standard sequential program verification, we can also verify invariance and linear temporal logic properties using bounded model checking and K induction, okay? We can also verify a simulation or refinement checking. So you will have one transition system and another one and you want to check that one simulates the other. And then we can also check two safety hyper properties like I talked about. We use syntax-guided synthesis to automate as much as we can. And what we've realized along the way is that the current cycle solvers are great but they really are woefully behind what we need in practice. And then, yeah, this is with respect to the original Euclid, it subsumes everything that it could do, okay? So let me start giving you an actual demo of what it looks like. And then if there is time, I'll tell you a little bit about what we did for verifying absence of specter meltdown vulnerabilities, okay? So first let me show you some code. So this is an example of a very simple Euclid model. It's actually a simplified version, highly simplified version of the kind of problem that we have in verifying enclave, whether an enclave platform satisfies the desired properties. Okay, so in this case, what I'm showing you is in Euclid, of course, you can have a hierarchical structure with files and so on, but here we have everything in one file. So this first module here is something where we define all the types that we're going to use elsewhere. So the point I want to make here is you can define your own types, you can define words, you can define abstract data types like the address that you see up there on line six, then you can define memory types, so line 17 you'll see, you can define a memory as a function from addresses to data words, you can define axioms, these are uninterpreted functions, and you can define an axiom over the uninterpreted function. So those of you who are users of boogie, all of this stuff will be quite familiar to you because those are the kinds of things that you can do in a better file like boogie. Then the next thing we have is a model of a very simple CPU. So I'll come back to this at the end. What I want to show you is the top level module. So this is the main module. So what we are doing here is there's a CPU module that I'll show you later, and what we are doing is we're trying to prove two safety property. And while Euclid has support for proving this directly without me doing an explicit self-composition, I wanted to show you how you do a self-composition. So here I create two instances of the CPU module, but I initialize them with different instruction memories. So the kinds of property that I'm proving is I have a CPU which is this very simple, simplified trusted platform, and there's going to be a region of memory that is the enclave region, okay? So let me go to the specification here. So look at this specification invariant, okay? So what this says is the CPU memory in isolated mode, think of that as like an enclave mode, has to be identical no matter what the adversary does, okay? The adversary cannot touch it. So now what you have is you have two CPUs where the adversary can do different things because the instruction memory is different. So the adversary you can have can do different things, but what happens is that as long as your addresses are in the isolated range, which is this is what this uninterpreted function does, it says that address A, if address A is between the low range and the high range, then the data memory in CPU one has to be the same as the data memory in CPU two. The contents at that address have to be identical, okay? So what these two properties are saying, they're invariance in the sense of globally being true. It's saying that even if the adversary does different things, the contents of those memories can't change, okay? And then so what we're really doing here is taking these two copies of the CPU, okay? And they're putting them together, they're composing them synchronously. And then we are checking does this model satisfy a property of the form globally P, okay? So it boils down to a standard safety property check. And now to do this, we're going to do this by induction. So there's nothing new there, but in terms of syntax, you define the initial state and you'll see here the initial state is defined, this is a bunch of assumes. So everything is symbolic using uninterpreted functions and those types of underlying theories, but using symbolic constants. And here all we are saying is we're gonna assume that CPU one and CPU two have the same sort of protected ranges of memory and the initial state satisfies the invariant that we wanna prove. And then down here, look at this, this is the next block. So the next block is the transition relation. So we see the next block of the overall system is obtained by stepping both CPU one and CPU two synchronously, that's what this says. So it's a next of CPU one says, CPU one makes a step, CPU two makes a step and ignore the semicolon, it really means it happens synchronously. And then there's some, this is an auxiliary assumption that is needed to make the proof go through, okay? And then if I go down to the control block, what this says is use induction. In fact, by default, this is one step induction. If you give it an argument, you can say you can use K as an argument and do K induction. So this says create the verification object, V. This one invokes the SMT solver, this will print the results and then you can print the counter example and you can project the counter example on a subset of variables. So you can say, I don't wanna see the counter example on everything, just give it to me on a small subset of the state and you can make it, you can have expressions there. Okay, and then the other thing that you see up here is there's a lot of other invariants, right? So everything up here from 207 down to 21, right? And these are all auxiliary invariants. So we had to come up with these in order to make the proof of those two go through, okay? So these are the things that we wanna be able to synthesize eventually, okay? So that's the top level module and now if I go to the CPU module, okay? So the CPU module, it's a standard CPU and I won't kind of go through the details of it but I'll just point out the main things. You define your own types, the input to the CPU module is the instruction memory. It has state variables, okay? It has these symbolic constants. So everything that's a function is an uninterpreted function, right? So you can do all the kinds of things that you can do in a system like Boogie and then what it does here is I've defined this procedure which executes an instruction. So this takes as input an instruction, which is a word. It takes the PC and it returns as a return value, the next value of the PC and it modifies all these state variables, right? So this would look to you similar to what we'd seen in standard program, sequential program verification and the reason for this is that it's often very convenient to describe properties even of hardware as a program that does a bunch of updates, right? So this just will model what happens in one round of the CPU, one step. And what you'll see is that you can use constructs, you know, so you can have loops and conditionals and all of that stuff, loop and variance if you need asserts and then you can do havoc in order to give a completely arbitrary value to some state variable. But then that procedure is inside the module and the module has an initial state and the next state, right? And inside the next block here we are calling the exec instruction function, okay? So you can basically combine these ways of modeling concurrent transition systems with initial and next, the transition relation and also the sequential programs, okay? So the way you'd run this is that's the file that I just showed you, right? And so this is all implemented in Scala and so ideally the way I would run this is I would initialize the JVM and so on first and then I would run everything. So what I'm not really gonna do that on this laptop. So each time I run Euclid it actually does all the initialization first. So this is what you see as the latency but this will, this tries to verify all those invariants and it passes, okay? So not super interesting. If I actually change something you'll see counter examples. What I wanna show you is synthesis. So I'll just go quickly to that as we're running short of time, okay? And this I've deliberately given you an example that is very simple to the example that the first speaker used, okay? And which is on my slide as well, right? So this is the case where you have a transition system with X and Y and then you update them and the invariant you wanna prove is that Y is greater than equal to zero, right? So if I just use standard induction and run it, right? So it's just gonna try to prove it by induction and it'll come back saying I can't prove it. It produces the standard counter example to induction, right? So it says X can be negative, right? And so we know that can't happen and so instead what you can tell Euclid is synthesize the invariant for me and then say and do it using the theory of linear and the charismatic. Now note I'm not getting a grammar here and so what it's gonna do is it's as the grammar it's going to use this default grammar for LIA, right? And so then in this case I also have to invoke a certain cycle solver. So I'm gonna use CBC4 because that's the one I have installed in my laptop here and then I reran that. And now what this does is, so just for you to show you, it prints the Seiger's problem that it generates internally, right? And that's what it looks like the format. But you can ignore that, just look at this line here. So it says successfully synthesize an invariant and here's the strengthened invariant, okay? So now let me show you another example just to have a little bit of diversity here. So this is an example which is also doing the same thing but everything is bit vectors, right? And so all you can do is just synthesize an invariant using bit vectors, okay? And here's an against another very simple example where X and Y start out being zero and then they're updated to X on each round, okay? And the invariant is that Y is always zero but to prove that Y is always zero you need to know that X is also zero, right? And then you can basically invoke it in the same way, right? And it'll come back with invariant. So nothing, you know, out of the way happening here it's basically invoking a Seiger's solver and it generates an invariant there, right? We can also do synthesize invariants with arrays and also do things that are not invariants alone but I don't have time to show you all that, okay? So one last thing I'd like to show you is LTL. So this is another trivial example of a little transition system where you have two input variables to this module A and B and then this flag in it and if it starts out being false and it's true from the first cycle, right? And so here what we can do is we can specify an LTL property, right? So you have two properties here. This one says it's always true that once you're in it the sum is A plus B, which is true and this one leaves out the antecedent. So you know it's not gonna be satisfied, okay? And then we can run modern model checking for LTL you can also do K induction, okay? So I won't run that but basically you have the capability of doing all these types of verification, okay? So one thing I wanna mention quickly is that we redid the Trusted Abstract Platform Proof in Euclid 5 and it's hard to make a comparison because we have the experience of having done the proof in boogie, right? So it's not a completely fair comparison. However, in terms of the model size, it's less than half the size of the boogie model. One of the big reasons is that boogie has no support for hyper properties. So hyper properties you have to duplicate everything and in Euclid you can just do it automatically. The other thing is that we can automate some of the aspects that were manual and boogie and it's about a similar amount of time to verify. So one last thing I wanna mention is been applying this to other problems in security. The first one is verifying absence of these Spectre Meltdown style attacks. The second is verifying a completely open source alternative to SGX that has been developed at Berkeley called the Keystone Platform. And so the idea here is that there were these attacks that were demonstrated that took advantage of these micro architectural features like speculative execution and branch prediction and so on. And people came up with mitigations, both software and hardware level mitigations. And so these mitigations are hard to reason about. It's hard for someone to know if it has actually solved the problem. So for instance, Microsoft came up with a compiler extension, but they didn't have any formal reasoning for why that compiler extension is actually now going to prevent these attacks. And in fact, it wasn't. You could find attacks against it as well. So what we wanted to do was formulate a very general property that captures this whole class of attacks, not just the specific Spectre variants that were demonstrated on the Meltdown variants, but the whole category. And then formulate an attacker model and produce a way to do automatic verification of these. So the general problem statement is you're given a model of the platform, which is speculative, a model of what the adversary can do, like observe the cache. And a program, maybe in C, you wanna determine if the program is vulnerable to a transient execution attack. So in particular, this is one of the examples that was shown to be vulnerable to Spectre variant one. And then you can just put in a memory fence here and then the problem goes away. So I'll skip the details of this property we formulated, but the key point I wanna make here is it's a four safety property. So it's a hyper property over four different traces, but I'll skip it because we don't have time. And so now using Euclid five, what we can do is we can take a program in C. We run it through the CMU's binary analysis platform tool and it generates, so basically we compile that C down to a binary that that step is not mentioned here. We take the binary and then we decompile it into this intermediate format. Then we pass this to our translator that encodes that into a Euclid model. So it really does verification of the level of the binary. It also takes in formal models of our platform and the attacker also a Euclid five. And then it can either check whether it satisfies the speculation property, which is a four way safety property, or that it's, it violates it and then it produces a counter-example. And a counter-example is a sequence of operations that the attacker can do to access the secret. Okay. And so in particular, in this work, there's this well-known researcher, Paul Kocher, who has done work on a lot of work on side channel attacks. And when Spectre and Meltdown came out, he published a list of 15 programs that are vulnerable to these attacks, okay? So he gave all these examples and then he proposed mitigations. And so we could use bounded model checking in Euclid five to find the vulnerabilities. And then we could use induction to prove that the fixed versions are indeed secure, okay? And all of this could be done in a few seconds. So there's ways already that we have available to go from binaries to these programs are very small. So they're basically in the C code is of the order of five, 10 lines of C code. But think of these as the basic tests that they provided to show that if you run this, I mean, you can, this is vulnerable to some variant of Spectre, right? And if you fix it, then it's no longer vulnerable. So these are actually very, very, very small programs, okay? So which reflects in the runtime numbers. And as we scale things up, it's going to be a lot harder to verify them, okay? So anyway, so we have these parts to go from X86 binaries and also risk five binaries to Euclid five models. And we also have ways to go from hardware description languages into Euclid five models. All right, so to conclude, I started out talking about this confluence of trends, right? Between verification synthesis and learning. And also the fact that systems are, there's often, we are seeing this more today that there's a lot more heterogeneous systems, meaning that nothing is purely software or hardware. A lot of companies are doing more vertical integration so that they can extract better performance or you have the security guarantees. And so we need formal tools to be able to address and leverage these trends. And Euclid five is our attempt at trying to do this, okay? It's like I mentioned before, it's open source publicly available and we are very interested in growing the community. So if you're interested in using it or even contributing to it and developing it or maybe if you have an idea for synthesis or SAT or SMT, that you think we can benefit from, let us know and we'll integrate that in. So come talk to me if you're interested and thank you for your attention. And here's a few papers in case you want to follow up further. So for when you encode this into Saigas, so is the grammar useful for the solver so that it can solve it faster or do you also have constraints that you would really want the functions of the particular form of the grammar? Good question. So the, I would say both, right? So for a lot of the use cases so far, we have been using the Saigas invariant format which does not really put any constraints on the grammar except that they have to allow admit expressions from some underlying theory. So far the only theories are bit vectors and linear integer arithmetic which severely restricts the practical application of this. Then we have our, if you want to verify something like the sanctum, these kind of examples that I showed you, what you need is you need bit vectors, arrays, that combination of those two and quantifiers often. And right now almost no Saigas solvers do a good job of supporting this combination. So for this particular case, what we have to do is supply a grammar, okay? So what we have to do is not synthesize full invariance but we synthesize chunks of invariance that fall in the combinations that solvers can support. Solvers can support, right? And so that's what, that's our current approach, right? And then the question becomes where does, how do you come up with a good grammar, right? So there's all the techniques that we heard this morning that one can use. In our case, there's certain classes of properties we're looking at where the grammar naturally suggests itself, okay, just from the domain. And so basically we have good heuristics for coming up with the grammar. This is a bit tangential, but you mentioned that MC, MCP scale of measuring how much isolation there is. That's really interesting because usually when people talk about isolation in a more general sense, you know, VM containers, it's mostly based on intuition, like VMs are more isolated than containers but we don't really have a scale to like quantify that. Are you familiar with any work or have you tried to expand that on more general idea? Like if we can create a scale which sort of gives a global quantification of that, are you familiar with any work or have you worked on that? Right, okay, good question. So the, I guess the question is really, is there work on like characterizing or formalizing a range of adversary models, right? And then you can compare and contrast or at least given a certain, doesn't even have to be a trusted platform, right? Some program or system that you want to have, want to prove secure, you can try to show what combination of adversary models it is secured against. Like more secure, less secure. Or you could sort of curve it against a performance overhead or something like that. So I would say there's nothing really out there that is formal, right? Especially with respect to things like the hardware level attacks that I talked about, right? You can think about every, in the micro architecture, every feature that, every resource that is shared can become a possible side channel, right? So for instance, something like the branch target buffer. That can be one resource that is shared, which goes outside the MCP, right? So basically you can take almost everything that in which multiple programs with different, which, you know, with different levels of privilege or different security levels, like, you know, enclave versus non-enclave, right? Can all use the same resource. And then you can think of an adversary who can access that resource somehow, right? And that becomes another side channel. And we haven't done it yet. So I think it'll be interesting to characterize that and then see, you know, given a solution which subset of attacks is resilient against. Is there a reverse implication of the decomposition theorem as well? I mean, does secure remote execution require confidentiality, integrity and measurement? Yeah, so that's a good question. So the short answer there is that there is a reverse implication for two of them, but for attestation, we rely on some of these cryptographic primitives, right? And so maybe there's alternative ways that you can implement it that will also give you attestation without satisfying the measurement property, but that's something that is outside the scope of what we make some assumptions about how that is being done. But yes, what do you know that? It should go the other way too. Okay, thank you.