 Test, test. Yeah, I'm Everett Hildenbrandt, and I should have put also my group on there, but I was kind of in a rush to make the slides because it was all a little bit last minute, but you know, that's how it goes. And the original title was Overview on Progress Report, and then I decided yesterday that more kind of a K by example approach would go better here, so I figured I'll put that as a subtitle, but still leave the original one, so I don't offend anyone or something, so we'll see. Okay, so what is K? K is this language for building programming languages in, and then you get to derive a bunch of tools from it. So essentially, you give us the formal language definition, syntax, and semantics, and then we derive a bunch of tools from it. So ones that are more in the vision phase right now are test case generation and compiler, but all of the other tools you see here, we actually derive from the one semantics. So that means that the tool that you're doing symbolic execution with, or model checking with, or deductive program verification, or you're executing the test set with, use all the same semantics, there's only one semantics that all of them are using. So there's no way for them to disagree on how the execution happens, which is important when you're trying to verify that programs have the properties you think they have. The K backend, I'm not really gonna go into much depth on this, but the static logic we use for matching configurations is called matching logic. There's some ongoing work on showing that this is a generalization of separation logic and the polyotic modal logics and first order logic as well and stuff like that. So when I say generalization, I really just mean like there's embeddings into it. And then also the dynamic logic is reachability logic, which is this language independent dynamic logic for reasoning about transition systems. So go ahead and ask me or the other team members, you guys raise your hands, if you have any questions about these specific things, but I'm not really gonna talk about that in this presentation. Instead what I'm gonna focus on is this K by example thing. So the organization we have on GitHub is this K framework organization, the KVM repository is at EVM semantics, and all of the semantics that we maintain are developed there. There are some semantics that other people maintain that aren't within our kind of organization. And then directly in this repository, we have this directory which has a bunch of toy languages you can kind of play with and get familiar with K. So that's a good place to start. Okay, so I'm gonna dive right in and basically this part of the presentation is just gonna be a bunch of K. So I'm gonna try to teach you guys K using an example that hopefully people are familiar with, which is EVM. So we're gonna start off with some basic kind of functional style rules in K. Here we're declaring a function called chop. It takes in an integer, it produces an integer. So the type signature of this is int to int. We declare here that it's a function and then we give the rule, we give it semantics using this rule keyword. So we say rule chop of some integer goes to the integer modulo pow 256 where this is the built in K modulus operator and that only happens if it's either less than zero or greater than pow 256, which is two to the 256. Otherwise it just goes to i if it's within those bounds right there. So you see that these cover all the cases. Some other operators we define, these are just a fraction of them for demonstrating, but for example there's this plus word operator and notice we can define the syntax right here to be in fix directly so we can say actually something like three plus word for instead of having to say plus word of three and four, for example. We declare that that's a function and then we give semantics out to it down here just by calling the built in plus int operator and then calling chop on top of it. Notice we also give semantics to division. This one has to be broken into two cases where you're dividing by zero where in EVM that's defined to be zero and where you're not dividing by zero in which case we just call chop on the normal division. And also feel free to ask any questions at any time. It's gonna kind of ramp up in difficulty as we go so it's better to ask earlier. Okay so defining data structures in K. This is gonna be a word stack for EVM and we're gonna define it as a simple cons list so anyone in the functional programming languages will recognize this. This is just something that it's a hint to our SMT solver but basically we say that we have this dot word stack right here which serves as the empty element of the cons list and then you can also cons and integer onto a word stack and that also produces a word stack. So this can be thought of as a singly linked list and then here's for example word stack append. Given two word stacks you can put plus plus in between them that also produces a word stack and then dot word stack append. Some word stack is just the word stack and then some word appended to consed onto a word stack. Appended word stack is the word consed onto depending of the tails, okay. Pretty simple functional programming like stuff. Okay so then in K we have this thing called a configuration which basically specifies the state of our system and that lets us kind of tell K that there's a bunch of states sitting around that we don't want to have to mention all the time but we want to be able to grab it anytime we do need to mention it. So here's just a fragment of the configuration. The actual configuration contains 60 plus cells. A cell is one of these XML like brackets like this and then in the semantics when we say configuration you also supply kind of the default value of each of the cells. So this is telling us that in the K cell the entire EVM program is loaded, that's what this dollar sign PGM, it's a special symbol, is loaded into the K cell at the beginning of execution and then there's a sub configuration called EVM that for example contains the VM execution state. So for example it's the current executing program which in the yellow paper is I under bar B, the current word stack which is mu S, the local memory mu M and the current gas available which is mu G, okay. We don't just have the VM execution state, we also have the network state. So we have another sub configuration, so what you saw above is where this dot dot dot is. This is the network state, we have a map of the active accounts, essentially that map just states whether it's an empty account or an actual non-existent account. And then notice we have this accounts cell which is wrapping this account cell which has multiplicity star which basically says you can have as many account cells as you want. So in a realistic EVM network you can have many account cells and then in each account you have an account ID, the balance, the code, the storage and the nonce and these are the initial values. Down here we also have things like the transact, the current transaction set and a couple other little network state things for instance. So like I said the actual configuration contains 60 plus cells, obviously I can't put the whole thing here. And then another thing to note this multiplicity star here people have asked me about how can you do concurrency or parallelism or something like that, that's exactly how you do it. You essentially do a multiplicity star in a cell that's surrounding the K cell. So for example in the C semantics we have or in the C plus semantics we have we have multi-threading semantics using this multiplicity star or in the rolling K semantics that exist that's one we don't develop. They also use multiplicity star for the independent processes that are evolving in parallel. Okay, any questions thus far? Okay, let's get on to EVM execution. So before what we saw were rules that were just defining functions essentially and that's just kind of the functional subset of what you can define in K but now we want to define generic transitions in the transition system. So first we need to say how to perform a single step. So we introduce this next operator and in the yellow paper basically what you do is you say okay do a few simple checks to see if this is gonna throw an exception like will the word stack be under or overflowed? Will the gas limit be exceeded? Some other checks like that. And then if not then you execute the opcode and then afterwards you increment the program counter and then you revert the state if any of the above steps threw an exception. Okay, so here's the actual K rule right here. Notice here now we have the same keyword rule but we have multiple cells mentioned. Back up here, right? We have rule but we have no K cells mentioned and then we have this function attribute which basically means that anywhere in the configuration this rule can apply. But this rule only can apply exactly at these parts of the configuration. So what do we do? When we see the next operator, we replace it with this chunk of code when the program counter is P count and in the program cell, the current program cell, that program counter is pointing to op and we don't care about the rest of the program. Note that this dot dot dot is not me alighting details. That's actually what the rule looks like. This is verbatim what the rule looks like in the semantics. So essentially we use this dot dot dot to tell K don't care. We don't care what's there. We just care about that one particular op code in the program cell. So yeah, when we go next, we say first push the call stack which basically saves a copy of the current execution state. Then we check is it exceptional? This is, is it a bad jump operator so is it gonna jump to an invalid jump destination? Is it one of the designated invalid operators? There's one, oh the stack underflow or overflow check. If that check passes and doesn't throw an exception, then we actually execute the operator which does the memory computation, how much memory this is gonna change. Also does the gas computation and also has the effect on the state of whatever this op code is. And then we increment the program counter. If any of these throw an exception, that exception ends up consuming the remaining of these operators until it hits this syntax right here which basically acts like an if then else over exceptions. So if there is no exceptions, we drop the call stack which essentially forgets the state that we saved up here, but there is an exception. We take the false branch of this if then else which means that we pop the call stack reverting to the previous state and propagate the exception on. So this little squiggly arrow can be read as followed by essentially. So if you only see a single element right here with this dot dot dot. So this, the precedence of this is that this dot dot dot is outside of the scope of this rewrite arrow. Then that means there's only a single element and then anything else in the followed by part we're replacing the single element with several different computations to make. Does this make sense, people? All right. Okay, so EVM programs. Let's take a look at what they look like in K. EVM op codes, here are some simple expressions. So we actually bin up the op codes based on their arity essentially. So sub and div are bin stack ops. And we use somewhere else in the semantics we will automatically load the arguments based on the arity. So all bin stack ops will get two arguments off the word stack and the word stack will have those arguments removed. So sub just goes to minus word. Div goes to divide by push, divide by word. And then that's followed by this hash push which is just an internal operator for actually moving the result over to the word stack. And then in here's some local memory operators like mload which is an un stack op. So it takes one argument. So mload at a specific index. You go and grab the local memory from the local memory cell. And then you say as word the range of the local memory starting index and going 32 bytes. So you take 32 bytes from a local memory, pack it together as a word and then you push that onto the word stack. Because remember the local memory is actually a byte sequence. It's not a word sequence. Here we have a bin stack op mstore. So mstore of this value at this index. It's gonna take the local memory and replace it with the local memory. Where at that index you write 32 words prefixed with essentially the bytes of this value. So you chop that value up into 32 different words. If it's not enough you pad it to the width of 32 and then you write it to the local memory. Does this make sense? Okay, so the rules are gonna start getting bigger and mentioning more of the configuration. Notice this only mentions one cell. This mentions two cell. And there's one more thing. Notice here we have two rewrite arrows. This is as opposed to a lot of languages which would require that you pull this rewrite arrow outside copy the whole configuration above and below. But essentially K does that for you. It pulls the rewrite arrow outside and copies. So you'll have a K and a local memory before then the rewrite arrow then a K and a local memory with the right hand side of these. But K just pulls those out for you. So the rule can be a little more compact and readable. Does that make sense? So then here's some Ethereum network op codes. So sload and mstore. They're the network storage analogs of the mload and mstore. So when you sload at an index, there's an unstack op. And the current executing account is the one with this ID. Well first we match an account. So remember there's many different accounts because that multiplicity star. We match the particular one that matches the account of the one that's currently executing. And then we look in its storage for this index essentially. And then we push the value up right here. And the storage and the account don't change. We just push the value up there. Yeah, so it's kind of a lot going on but once again this dot dot dot is not me aligning details. This is actually verbatim the K rule for this operator. So once again K is kind of being our friend allowing us to alight lots of details of the configuration that we don't need to mention for this particular rule. S store right here. It has once again two transition arrows. One that's going on within the account and one that's going on in the K cell. And then we're rewriting the S store to the empty computation essentially. So we're saying that the computation is done. We once again match on the account ID, grab the account, take the storage, update the index and the storage with the value. And then there's this side condition here because there's a bunch of different semantics whether the key is already in the storage or not in the storage or it wasn't empty or it wasn't, yeah. So this arrow right here is actually map update and it ships with K as a built in from the prelude. This right here is actually specific to the EVM semantics because it has to write a whole word stack at a time. So it writes the first element and then the second element. So this de-sugars to a sequence of 32 of the map update operations. Does that make sense? I guess I could give it some nicer syntax but that, I don't know, that was an early design decision and it's hard to change later. Yeah. Okay. Okay, so I just wanted to show an example of a bigger network opcode. So this is the call opcode. It's running off the edge of the screen here. But basically we define a bunch of internal operators to help with the call opcode and that's pretty straightforward to do but then these internals can be reused for the delegate call opcode or probably for the static call as well. But I'm not sure, it was a while since I wrote this code. Okay, so gas calculation. I'm gonna run through this super quick because it looks like I'm running low on time and I wanna get to the verification part. The intrinsic gas calculation, we tried to mirror as much as possible the style of the yellow paper. So we have this function gas exec which is parametric in the fee schedule that you are executing with so you can specify different fee schedules. For example, I'll talk about that on the next slide. So given the schedule and the opcode call the csstore gas function from the yellow paper essentially and then this we're gonna declare as a function which means we're not allowed to depend on any part of the external configuration. So this gas exec operator is gonna grab all the relevant parts of the state that csstore needs to calculate its value and give csstore those values as arguments. So the definition of csstore is right here. Once again, it's a function, takes three arguments, produces an int and notice it doesn't depend on any part of the configuration. So you check is the value not equal to zero and the old one is not equal to zero so this has to do with whether you're setting it for the setting it or resetting it essentially and then these are different schedule constants so they're parametric in the particular schedule and then these are all sorts of other cost functions that we've implemented. These mirror exactly what goes on in the yellow paper. So really you can read this instead of the yellow paper if you don't like reading the yellow paper and it has a lot more explanatory text around it in our repository online. Here's how we implement the schedules. So we say that you can produce an integer if you have a schedule constant and then these brackets and a particular schedule and then here are some examples of schedule constants. Once again, from the yellow paper, G zero, G base, G very low. Here's a schedule, the default schedule. You know, it sets G zero to zero, G base to two, EIP 150 schedule, G balance gets changed to 400 over the schedule that come before that was like Homestead or something I don't quite remember. So you just update for each new schedule change you just update these schedule constants for that particular schedule. So it's actually a command line flag. We can give this Mantix to tell it which schedule to execute with. Okay, so now I'm gonna talk about a toy verification example which is the sum to one, sum to N example. I forgot to put what the actual spec is here but it's basically that S is equal to N times N plus one over two, just a classical, you know, oilers form, I think it was oilers forming left for sums or maybe Euclid or something. You say enough names, you eventually get them all. So in no particular language, this is kind of the program that we're looking at but in EVM, obviously it looks a little uglier. So the proof claim, this is the main claim right here. I'm gonna run through this really quickly but basically we're saying they look like reachability, they look like the rules from the definition and that's because they pretty much are the rules from the definition and basically what it's gonna do is it's gonna start on the left-hand side of the rule. Notice we have a symbolic value here for the gas and symbolic word stack here and it's gonna symbolically execute using the inference system of reachability logic until it reaches a state that implies this right-hand side of the rule, essentially. So basically we're saying starting at any word stack, we get to the same word stack but with the sum from one to N put on the top and our counter gets to zero and the gas consumed is exactly this amount and then we have some preconditions basically stating that N is greater than zero, there isn't an integer overflow, the size of the word stack is small enough that there won't be a stack overflow and the gas consumed, the gas available is large enough. Basically we first write down the spec up here and then we try to prove it and then the prover tells us in a not so intuitive way that it can't prove it and then we say, oh, what are the preconditions we have to add? The reason I'm telling you that is because, okay. So actually with some example, there's this loop in traditional logic style things, you have to provide a loop invariant, we generalize that a little bit in reachability logic to the notion of a circularity and this basically says, if you're starting at the loop head, you can reach the end of the program and calculate the correct remaining partial sum. So notice in a loop invariant, if you're familiar with those, you specify the behavior of a single iteration of loop, here you specify the behavior from the beginning of the loop to the end of the program which is often easier to specify than the behavior of a single iteration of the loop. Okay, so verifying ABI compliant contracts, writing these specifications is actually really hard for ABI contracts because they're huge, right? So we've provided some helper functions like ABI call data and some other little ones which will let you essentially actually pass in the name of the function you wanna call or the typed arguments that you wanna pass that function for example, instead of having to pass in, sorry, the hex encoded bike string for example. And so here's an example usage and say ABI call data transfer to this address, this transfer value, note that transfer here is a symbolic value, but this is actually a constant. So that's the call data cell and then here you would use it right here, this would be that line above, instead of having to write out the byte encoded hex values here and then notice the balance one goes to balance one minus transfer, balance two to balance two plus transfer and then all these preconditions right here, some of them were caught, this example is the HKG token that had a bug in it, some of these were caught by the HKG auditors, some weren't actually. So our prover was able to help us find some integer overflow bugs and stuff like that. Okay, so not enough time here, but basically we passed the tests, almost passing blockchain tests with an order of magnitude of the performance of CPP Ethereum, which is pretty good for a formal verification framework. We're working on some ABI extraction stuff and not discussed here, but come and ask me about it, EVM Prime, which was the IC3 Bootcamp project, we're extending EVM with some stuff to make it easier to give semantics to Viper via compilation to EVM and we're getting pretty close on that. Yeah, so this is the K Framework overview. It's not just for blockchain languages, and that's the end. Thanks, Everett.