 I would like to thank the organizers for inviting me to give a talk in FSCD, which is one of my favorite conferences, and also to thank them for giving me the opportunity to talk about the latest developments in the K framework, and also to present our vision how things should go from here. My talk is going to be about formal methods and programming language semantics in the context of the blockchain. But I'm going to first introduce very briefly the challenges from the blockchain and what motivated us to go into this area. So one of the main applications of the blockchain, as many of you know, is cryptocurrencies. Cryptocurrencies are this digital money that we hear about, like Bitcoin, Ethereum, and many others. Some people think that this could be the future of money, others think that not. I'm not here to debate that problem. I'm going to go into technical issues of different nature, of programming language and formal methods nature. But first, just to put things in context, so cryptocurrencies have an increased use these days, and they have more and more capital invested. So if you only look at, there are hundreds of them, and if you only look at the top five cryptocurrencies, they actually hold more than $200 billion. So it's something that we have to take seriously at least. I'm not here to teach you how the blockchain works, because that's not really my area. I would like to emphasize though what are the big challenges for us as formal methods and programming language researchers in the blockchain space. So at a very, very high level, the blockchain works as follows. So suppose that you want to perform a transaction, and for the purpose of this talk, let's assume that transaction can be any program that you can execute. So it's a program that you can see and run, and execute. In particular, if you move value V from account A to account B, that's a particular program. But you can think that you can have any program literally that runs on a shared state. There is a state that is shared among many different actors, and the programs are public. You can see the code, and all these programs modify that shared state. Then, once you initiate a transaction, the transaction is sent to a node, a so-called node, and then the nodes, that node broadcasts the transaction, and the transaction then needs to be validated by the other nodes. That means that they re-execute the code in the transaction in order to agree using algorithms for achieving consensus under the hood, and those are not our concern in this talk, but those different nodes, the point is that those re-execute the transaction in order to determine how they should modify the state. And they do that using virtual machines, and once the transaction is validated, then each of the nodes deploys the transaction locally, and this way they modify the history or the distributed ledger or the blockchain by appending a new block of transactions at the end of the existing block of transactions. And this way, each node has a view of the new world, a new state of the world, and because of the consensus algorithms, all the nodes together have the same shared view, and what is appealing here is that it is all decentralized. There is no node in control saying what the transaction is and how the state should change. Now, the interesting part comes from the fact that some of these transactions can actually put new code on the blockchain, right? So you can have a new code called a smart contract that you can pass to another into one of the accounts, and then future transactions can invoke your code. So this is where all the problems come. You can put new code on the blockchain, that code is visible, and everybody can initiate a transaction using your code. And we end up with an environment where all the code is public, and this is where the unprecedented challenges come from, right? You end up with code, which is public, which anybody can invoke, and which can irreversibly change the state. In particular, it can steal your money. If that's what the program does, to move money from your account somewhere else, then everybody will agree that the money was moved from your account somewhere else and the money is stolen. And because of that, there is a huge incentive for hackers to actually attack bugs or weaknesses or flows in smart contracts, more like any time before. To give you an idea of how a smart contract looks like, I have a little snippet of code here, part of a larger contract, a so-called ERC-20 contract, which is a protocol for moving value around between different accounts. And just for you to understand the scale of these contracts, there are about 40,000 similar contracts running on the Ethereum blockchain. And all of them have pretty similar code. So there are slight differences here and there. So this snippet here only shows a function called transfer, which is invoked by somebody. That would be the caller into the transaction. And then there is an address where you want to transfer the money to from your account and a value. And then the code goes like, pretty much like as you would expect, you first check if the value is zero, you return false, then you check if you have sufficient funds and then you check if the account you want to transfer the money to runs into an overflow if you do that. Remember that all the state is shared so you can see exactly what value is in the account you want to transfer the money to. So you check if you have an overflow or not. And then if you have sufficient funds and there is no overflow, you actually do the transaction. So it looks very natural, very little code, but in fact, this code has problems. Even though it is very simple, it already has problems. And with the technology that I'm going to talk about, we actually can detect these problems. For example, the ERC-20 protocol doesn't state that you have to return false if the value is zero. In some cases that can actually mean a lot and some other contracts may fail if you do not modify the log as expected. So this is a slight violation of the protocol, but another violation of the protocol which is even more important is that in fact, there are cases where there is no overflow here. For example, if you transfer money from your account to yourself, then there shouldn't be any overflow. Yet this contract to report an overflow and reporting an overflow, so to detect an overflow and detecting an overflow, it will not do the transaction and the message will not be logged. And again, that can have implications of a different nature. So you would like to not violate the protocol because if you do, then others don't know what to expect from your code. So in other words, this code actually does not satisfy the ERC-20 specification. This is not a problem. I specifically picked a piece of code which doesn't have very serious implications, but as we know, there were several attacks in the Ethereum space where flaws in smart contracts were exploited and lots of money was frozen or lost or stolen. One of the most recent examples happened only two months ago. The beauty coin example has a very similar problem in the code. As you can see here, there's a multiplication between a counter and the value, just the token and how many tokens you have was the value. And then the amount is, as you would expect the number of tokens times the value, the problem is that this multiplication here is not protected. You can literally have an overflow there and that overflow can be catastrophic. Actually, it has been catastrophic that allowed for a transaction to be initiated that there should or two transactions initiated that violated the correctness of the protocol. And I think there was a huge amount of tokens stolen which would be the order of 10 to the 70. Of course, they never got that money but it literally destroyed the coin more or less. And that's not the only attack. We know the DAO exploit where others stole a less amount of money. And this is one of the first known attacks that led to a fork in the Ethereum blockchain. And then we had recently the parity bug attack which resulted in 280 million or so being lost. And then that was fixed and then there was another attack also on the parity multi-seg, this time 300 million dollars were stolen or frozen. So, right, so there are lots of problems like that and there will be more problems in the media. The question is what can we do about this? Of course, there are lots of political and economical factors that I cannot influence but there are also technical factors out there. Actually, most of these errors were due to software bugs. Problems in the code, in the smart contracts that could have been detected if the right tools were used. So, I think we can do a lot about ensuring that the execution environment behaves as expected. And in fact, I firmly believe that the ideal scenario is actually possible in this space. The ideal scenario is to basically use formal methods to rigorously specify what basically everything can rigorously formalize everything, the programming languages, the virtual machines, everything. And then the implementations themselves can be provably correct and must be provably correct. And this is possible to do in this space. Specifically, I believe that we can have provably correct virtual machines or interpreters that run in nodes. And I believe that the smart contracts can use well-designed programming languages and those programming languages come with provably correct compilers or interpreters. And finally, in terms of verification, I think that all the smart contracts that run on the blockchain could be provably correct with respect to their specs. I believe that this is possible. And now whenever we see many languages and we see lots of provably correct keywords around, makes us think that we may need a language framework. We have several languages going on here. We have Solidity, we have other high-level languages like Plutus in which we write the actual smart contracts and we have the lower-level languages in which actually are interpreted by the virtual machines. So all these languages need to be formalized somehow in order to prove properties about them. So a language framework would be very, very convenient in such a context. Now we are firm believers in this vision of an ideal language framework that has been proposed many years in the past and many of us believe that it is possible to have an ideal language framework. There are technical limitations, but maybe it is time to address those technical limitations. So let me tell you what we mean by an ideal language framework. So the idea is to have one formal language definition for a given programming language. That means both syntax and semantics have just one definition and then from that definition to generate everything that we need for that language. So that the dose could be generated automatically or semi-automatically by the framework itself and what I mean by that, I mean basically everything that you need for your language like starting with parsers and interpreters but then going further into compilers, debuggers, symbolic execution engines, model checkers and even deductive program verifiers. We believe that all these can be generated from a formal definition of a programming language. What I'm going to tell you about the next few minutes is how we are attacking this problem of an ideal language framework with our effort, the K framework. I'm not saying that we have solved the problem because that's not yet solved but I can tell you how we are trying to solve the problem and how far we went and also some of our efforts in the blockchain space. All right, so next I would like to say a few words about how we approach this challenge, the challenge of an ideal language framework through our K framework effort. I'm not saying that we have solved the problem because the problem is hard to solve and it will take many years but I can report on some of the results and I can show you some uses of the framework in the context of blockchain languages and where we use it actively for formal verification of smart contracts. So first of all, how K was born? We've been teaching programming languages for many years and we used various approaches, various semantics styles like big steps, small step operational semantics, denotational semantics, the chemical abstract machine, reduction semantics, devaluation context, writing logic, literally almost all approaches that people have developed over the years to give semantics to programming languages and we learned from each of the semantics approaches, we tried to understand what is really nice, what works and at the same time try to avoid the limitations and in the end, we basically engineered the framework by grabbing from the different approaches what was useful and work for us and at the same time trying to stay away from problems that we thought stay in the way of defining languages easily at scale and some of the major problems that we found with almost all the semantics approaches were with respect to scalability and basically with respect to modularity and reuse. It is often the case that when you add a new feature to your language, you have to go back and change almost everything that you've done in order to accommodate a new feature to your language and that is quite inconvenient and demotivating to experiment and design new languages. So the key framework was designed in the spirit of basically avoiding that major limitation and to be modular and easy to use. Then once the framework stabilized, once we defined several languages using it, then the theory came. We refrained from developing the theory prematurely. We thought that the framework is not ready for nailing down its foundations and then worked on the foundations after we had several languages defined and that was quite a wise decision because the framework changed a lot during these efforts and it is only now that we think it's finally stable and ready to disseminate the foundations and the theory. Before I go into more high level aspects, I would like to show you to give you a flavor of how K works. Obviously, I cannot teach K here now in all the details but I can give you a high level picture of how you would write a definition in K. This is a very simple language, kernel C, a fragment of C, much, much, much simpler than the actual C. So this is the entire language. It's a complete definition. We call this the language poster. It is generated by one of the tools in the K framework from the textual semantics of the language. On the left, we have the syntax and macros and on the other two columns, we have the semantics and I'm going to zoom into important parts of the definition only. Syntax. So we define syntax using regular BNF notation. However, we annotate the syntax with K-specific comments which are interpreted by the K framework in a semantic way. For example, here we say that our language has an assignment, which is an expression construct in C, not the statement. And it is taken the second argument, meaning that we have to evaluate the second argument before we can say anything else about the semantics. So before we can talk about the semantics of assignment, we have to first evaluate the expression that we assign to the variable. And we found by actually defining languages that this is the best place to state your evaluation strategies for your syntax. When you define the actual syntax, because you are there, when you define the syntax of your construct, you have in your mind the evaluation strategy, so it is very convenient to just say it right away in an attribute to the production. And there are many other attributes, this is just one of them. So this is how you define syntax. These are macros, they're the sugar complex constructs into simpler constructs. I'm not going to go into those. The next step is to define the program configuration. Program configuration, all semantic approaches have some notion of a program configuration, which is a data structure holding everything that you need during the execution of the program. Or in other words, you can think about it like if I execute a program and you stop at a certain moment, so the configuration holds a snapshot of the program execution at that moment in time. That's where you have the remaining code that you have to execute. That's where you have the current state of the program, the heap, the stacks, the registers, everything that you need for your language. And we got inspired from the chemical abstract machine approach where the semantic information of the configuration is holding these semantic cells. Actually they are called the solutions in the chemical abstract machine. And you have the pieces of information that you hold in them are called molecules. So we decided to change the terminology, we call all these semantic cells and they hold semantic information. And we also decided to label them. So this is, for example, there is a top level configuration, as you can see, then there is a K cell holding the code, there is the functions, the environment, memory, function stack, input output buffers, and so on and so forth. So all these you have to define in the program configuration before you can talk about the semantics. However, as I said, K has been designed in the spirit of modularity, so you can add new semantic cells as you progress, as you advance your definition of your language. So if you decide to add exceptions later on, you can add another semantic cell for an exception stack. For example, if you want to do it that way and then add rules, and you don't have to change any other rules that you already defined for other features. And finally, the semantic rules, which are the actual characteristic of K. So here we have the semantic rule for assignment, which says in words, so this is the graphical representation, this is the textual representation, that's how you write it normally in the ASCII file, and this is what you generate from it. So what this is saying, let me show it on the graphical representation, is that in order to give the semantics of assignment in your language, remember that assignment was taking the second argument, so this argument evaluates the relative of value. So in order to give semantics to assignment, then you need the K cell holding the code in particular the assignment, and in the environment, you must be able to match the variable X bound to some value. The underscore is the anonymous variable, which we don't care about because we're going to change it anyway, so that's why we use anonymous variable. And then we underline the things that change, so the assignment will change into V, and this whatever value X was bound to will change to V. So this is saying that first of all, X must be declared in order to assign something to it because otherwise you cannot match it in the configuration, in the environment, and then you do to write, so the same time, so it's like a transaction. So these two writes happen at the same time, and this is one of the novelties of K, which we call local rule writing. So as you can see in this rule, the textual rule, actually we have two arrows. There are two rule write rules going on, one at the top of the K cell and one in this delimited by these parenthesis. So in other words, what we do, we push the writing relation to the place down in the term in the context where the changes take place as opposed to just having one big write at the top. This allows us to be more modular in the sense that as you can see here, we can abstract away parts of the configuration that we do not care about, like the rest of the environment or the rest of the computation, because we do not have to repeat it in the left and the right hand side of the term. And metaphorically, these dots, which we call structural frames, are represented in the graphical notation as these tears on the cells. And the intuition is that, oh, I have a lot of information in that cell that I don't need, so I'm going to just tear it away and throw it away and only keep what you need. And as a matter of fact, one of the design principles of K from the, from early on was that we'd like to, we like the users to only have to write in their semantics and rules what is absolutely needed and nothing else, because every single character that you put in a rule that is not absolutely needed may work against you later on. When you extend the language, you have to come back and change it. So that's why we decided to minimize the amount of information that you have to convey in a semantic rule in order to capture the semantics of your language. So as you can see here, there is really nothing which is unnecessary in the rule. So that's how K works. It's very easy actually. We teach it to first year students at the University of Illinois and there are several other professors who teach K at various other universities and students grasp it relatively quickly and then they use it and they think of it as a language, this we are programming language in which they implement other languages. But in fact, what they are doing is to define a formal semantics of a language. Each scales, actually that was one of the important driving forces to define real languages, not only toy languages, because we had enough toy languages defined in the literature. So we defined languages like Java, JavaScript, C and when I say that we define, I mean that we define them completely, not 50%, not a convenient core, not even 95%, but literally completely. Meaning that we're able to take benchmarks that language developers used to test their implementations, compilers or interpreters and we run exactly the same benchmarks with the key semantics of the language. One of the most major efforts so far was the C language. Specifically, we took the standards, manual of C, the ISO C11 standard and formalized it completely in the K framework. And that was a big effort, took more than seven years in total. And there are more than 10,000 programs right now in the regression test suite. Each time we run, each time we modify the semantics, we run all the tests to make sure that we do not break any of them. Because the semantics of a language, like anything big, especially for such languages, like anything big can have bugs. And the bugs in a semantics are even more serious than the bugs in the code because they become part of the definition and they will be accepted as features rather than bugs. It is very important to properly validate and test your semantics in order to gain confidence that your semantics is appropriate for the language. And in addition to these languages that we defined early on in the K framework, now we define several other languages motivated by our verification efforts, formalization efforts in the blockchain space, like for EVM, Solidity, LF-Rutus, Viper, which I'm going to talk about later. So program configurations can be quite big. This is the configuration of C only for you to have an idea how complex configurations can be. This has more than 120 cells and one of them is the heap, small up there and more than 5,000 rules written by the user. So it was a big effort and lots of bugs had been found but in the end it works, which is quite encouraging because if it works for C, then maybe it works for other languages too, for maybe for any other language if it works for C. So what I'm going to show you next is how we are approaching the various blue boxes on this picture, the various tools that we want to generate automatically from a formal semantics. So how we approach this big important problem in the K framework. And our first focus on the interpreter because the interpreter is one of the main tools, is by far the most used tool in the tool suite. Why? Because each time you define a language, what you want is to test it. You write the syntax of your language and then you immediately want to run some programs, to parse some programs first. Then as soon as you add some meaningful programs, you want to execute them to see do I get the right result. So you always test, you always test and how you test using the interpreter capability. So this is how we do it in K currently. We have a translator from K to OCaml and then we compile the resulting OCaml natively. And this way we get an interpreter. So K to OCaml to binary, native binary. And for C, we have incorporated this interpreter that is generated automatically from the semantics into a product, a tarantai verification, the startup that I founded in order to commercialize all the research around the formal semantics of programming language that we developed in our lab at university. So this tool RVMatch is essentially just K instantiated with the semantics and executing with your OCaml interpreter currently. And this is how it works. So suppose that you have a C program like this program here, which actually has a problem. I'm not going to tell you what the problem is. You can figure it out yourself. So, but one of the typical problems we see programs is that you compile them and run them and you see nothing. The compiler and then when you run the binary do not detect the problems because programs can have this so-called undefined behavior. And when you have an undefined behavior, the compiler is allowed to do whatever it wants. In this case, the GCC compiler, compile the program, gets a binary, and then when you run the binary nothing happens, meaning that with GCC you cannot detect actually this problem. However, if you compile the same program with the KCC tool, which is part of the RVMatch tool, if you compile it with a KCC program, then you get a binary, same like when you compile it with GCC, but when you run the binary you see all these errors. And this describes exactly what the problem is and it also gives you pointers to sections in the C-standard where that type of undefined behavior is explained in detail. And we did that because some early users of our system complained that our tool RVMatch reports false alarms, which is incorrect because the tool does not report any false alarm. And because of that, we added pointers to the exact description of the undefined behaviors in the standard and now the users can go and check them. In order to sell RVMatch, we had to compare it with the existing tools, the state of the art. And here we were lucky that Toyota made public a benchmark of tricky programs that they used to evaluate static analysis tools. And that benchmark was published two, three years ago and the Toyota researchers evaluated several static analysis tools and they came up with some numbers and some metrics and in the end, they calculated so-called productivity metric, which is just one number that you can attach to each tool. Like to code sonar, for example, they got the number 82, which was the highest number for all the tools that they evaluated. The benchmark is public, so what we did was took exactly the same benchmark and ran it with KCC the same way I showed you with that example and we're very pleased to see that actually we scored 91 and that's particularly encouraging because our tool was not crafted for C. It's just a generic interpreter, a generic execution engine generated automatically from the formal semantics of C, which is public, open source, and everybody can wait in and say whether it's the right C or not. So it is very nice to see a general purpose tool, language-dependent, that is just instantiated with a particular language and then to see that that gives you better results than tools that were specifically crafted for that language out there and which are also very expensive. Then we also compared it with many other tools that are free. Actually, all these tools here, you have to purchase licenses to use them, but there are lots of other tools which are free, like the Valgrind kind of tools and Helgrind, the LLVM sanitizers from a C also has a program analysis tool, the CompSert interpreter, also can detect undefined behaviors and actually all these free tools scored, the best of them was the LLVM sanitizer tools and they together scored 67, so lower than the commercial tools, which is not unexpected, but we just wanted to make sure that we compare also favorably with dynamic analysis tools because many of these tools are dynamic analysis. My point here really is that having a formal semantics can be worthwhile. We may have reached a point where we can compete with actual tools that were specifically developed for specific languages and this is amazing in my view because we got to a point where it is worth giving a formal semantics to a language. Formal semantics is not just an academic exercise anymore, it is useful, it is practical, it is the best way to get the best tool in the end. The question now once things worked for C was can we move on, can we go to other languages and use the same infrastructure, the same ideas for other languages and see what we get and we started looking into the blockchain languages as I mentioned mainly because this is a place where correctness of programs has a huge, huge importance because most of the security attacks that I showed you are rooted into errors in programs. I will talk more about that, just wanted to wrap up the interpreter capability with future uses of it. All right, so far I told you about how you can generate an interpreter from a K language definition. Next I'm going to show you how we can do program verification because this is one of the most important and actually what started us to define formal semantics. How we can verify programs? How can we do deductive verification of programs based on a formal semantics moreover based exactly on the same formal semantics that we use for execution and for interpretation? Before that, let me give you a high-level overview of the state of the art in program verification these days. So the way we verify programs is to first formalize the program in language in an appropriate setting where we can reason about programs and there are several approaches there like whole logic or dynamic logic or separation logic and then to use these formalizations in order to derive facts about programs. This is great because it gives you a logical framework in which you can rigorously derive properties about programs. The problem, however, with these approaches is that you literally have to redefine your programming language semantics in a different framework. So suppose that you already had an executable semantics that gives you a reference model that you can use to interpret programs and to develop tools like Match or like RV Match and now you have to put that aside and redefine your programming language for a different purpose for verification and we do not want that because that is at best uneconomical. You would spend a lot of effort to do the same thing that you've already done that you already have a formal semantics. Why do you find another one? And these rules, the way you formalize a language like a logic are of course very language specific like for example, I picked whole logic but I could have picked other approaches too. So in whole logic, because the whole logic is one of the most popular approaches and whole logic as you can see, this is a proof rule stating the invariant property of Y loops. This is a proof rule for dealing with procedures specifically with procedures which can be recursive and as you can see they are very specific to your language. They use syntax of your language as part of the rules. So it's like a design pattern telling you how it's a process that telling you how to design a logic for your language in order to do reasoning. That sounds very natural and it is very natural. We still use it in teaching. I use it in my programming language classes. It's just that when it gets to real languages which you already have a formal semantics. This is very tedious and it's hard to justify to yourself. Why should I define another semantics for my language when I already have one? Only for a different purpose. Why? Because I already have a semantics and that semantics tells me everything I need to know about the language. Why have another semantics? That's an unnecessary effort. So can we avoid it? And this is what we actually want. So what we want also for deduction, deductive verification is to use directly the trusted semantics. The semantics that I tested, that I validated, that I spent years already to formalize. I want to use exactly that one to do verification. In other words, what I would like to have, I would still like to have a logic, a proof system that allows me to prove and derive properties, but I want that to be language independent. I want that proof system to work with any language by simply presenting that language as a set of axioms to a proof system and to have only one way to derive properties about programs that works for all languages, for all programs in all languages. That's what I want, if possible. And moreover, I also want that proof system to be sound and relatively complete for all languages. That means that anything that the proof system proves is sound holds for the actual programs in the language and also that everything that, so that's the soundest part, and the completeness part says that any property that holds for the program is also provable with the proof system. And relative completeness in particular is very hard to prove for your languages and I'm not aware of actual formal proofs for your languages, but what we'd like here is to get that for free. That means that I want a language-dependent proof system that is sound and relatively complete and works with all languages at the same time. All I need is to formalize my language as a theory in this logic and then use the general machinery to reason about programs in that language. And that's what we do. Took us many, many years to come up with a logic that now looking at it retrospectively looks ridiculously simple. And I don't know if that's a positive or a negative, but I think it's a positive. It's really very easy now to explain and it is almost unbelievable that it can explain everything, everything that we do with the K tools. In other words, everything that any of the tools of the K framework does can be seen as proof search in this logic that I'm going to tell you next. And the logic is, as I said, very simple. It captures the essence of what you need in order to specify programming languages as well as properties about programs in programming languages. So one thing that you need nevertheless is I'm going to go through the, there are five major constructs. I'm going to go through these five major constructs and they are grouped in three categories. There are structural constructs that allow you to build terms basically. As you can see, you just have variables and then symbols on top of other terms. But we do not call them terms because you'll see they are more than terms. We call them patterns. So we build these patterns starting with variables and symbols on top of other patterns. And then you have constraints. There's another category of constructs in the logic for constraints. You can negate a term, you can complement a pattern, take the intersection of two patterns and this way you can do logical reasoning. So what does it mean to negate a pattern actually? So as you see the semantics, the meaning of patterns is that of pattern matching. So a pattern can be matched by many concrete instances. Like a pattern, for example, can express some kind of configuration where you may have this program in the computation, you may have these values in the heap, these values in the stack and some constraints over them. So this pattern specifies a set of possible states that match it. And then when you say not of a pattern, you mean the complement of that. That means that you do not match this pattern. And intersection of two patterns means that you match both patterns. And finally, one other important construct in many languages that we define in K, specifically functional languages is binders. You need to bind variables that occur in a pattern, free, that occur free in a pattern, you need to bind them to a particular place where you declare that variable. And we could have chosen many different binders, but we decided to go with the existential binder. So yeah, and that's basically it, right? So structure, so we have constructs for structure, for constraints and for binders. And notice that there is no distinction between symbols. Our symbols are just symbols. We do not categorize them as function symbols or predicate symbols. We can axiomatize them to be functional symbols or predicate symbols, but at this level, they are just symbols. We regard all symbols as constructors of patterns. So it's very simple, as you can see. Then symbols, in first of the logic, symbols are interpreted as functions. Operational symbols are functions and predicate symbols as predicates. In matching logic, symbols are interpreted as relations or equivalently as functions from the carriers of the arguments to the power set of the result. And the intuition here is that a pattern can be matched by many different values, not only one. And we think of this matching also as satisfaction relation. So a pattern is satisfied by all the values that match it. And this allows us to interpret patterns as sets. And as I said before, negation is complement and as intersection and the existential union over all values. I know it sounds very simple, but it took a lot of time to compress, though we had much more complex versions of matching logic that eventually they had to be narrowed down and simplified, and we had to throw away things that we realized that were not needed or definable in the smaller core. And in the end, we came up with this very, very simple actually logic. There is also a proof system, and I do not have time to go into the details of the proof system. All you need to know, and I can send you the paper if you are interested in, all you need to know is that matching logic with that notion of satisfaction and models is sound and complete. And the proof system that makes it complete has 13 proof rules and that it includes completely first of the logic. So we took the proof rules of first of the logic as are and they happen to be sound. We didn't expect all of them to be sound, but they are all sound the way they are, meaning that they were actually more general than maybe the creators of first of logic thought initially. Then we have a set of rules related to propagation of logical constraints, right? Since we have symbols and symbols can be applied to any patterns and those patterns can contain inside logical connectives, then a natural question is how do these logical connectives interact with the symbol above them? And it turns out that you cannot eliminate the logical, you cannot lift the logical connector always, but in some cases you do, like for disjunction, for example, or for existential quantifier. And this is very useful for case analysis, for example. Another important proof rule is the framing rule, which allows us to do local reasoning. So if you're in a big context, imagine a program configuration and in the heap you want to say that you have a list, a linked list. Right, so you can go there and prove some properties about it. You can prove that this is implying, so that this portion of the heap implies a linked list. Then you can lift this local reasoning to the whole configuration through the framing rule. You see, so here I prove an implication and then I can leave that implication to the entire configuration. This is the context applied to the two terms. And then there are two other unexpected rules that had to be added, mainly for technical reasons, to be able to prove the campaign as theorem. And the existence rule is the one that you wouldn't expect. Exists xx. Right, so this doesn't make any sense in first-order logic, but it makes all the sense in matching logic. And when you translate it into models, that is basically saying that the universe of values is equal to the union of all these elements, which is obvious semantically, but you cannot prove it with the other proof rules. So that was necessary. All right, so yeah, so these certain proof rules make matching logic sound incomplete. And that means that you can prove any properties about any theory in matching logic. The question is, is matching logic expressive enough to represent everything? And again, this was inspired from our definitions of various programming languages across various paradigms, some of them functional, other logical, other object-oriented. So we knew that K works with all these paradigms. We didn't know whether everything makes full sense from a logical point of view. And now I'm happy to say that actually matching logic can explain finally everything going on under the hood in the K framework. And in terms of expressiveness, so we were able to define using matching logic, various other logics that are out there and extremely useful for programming languages or programming language, formal analysis. Such as first-order logic, you can define equality membership, partial functions. Notice that in conventional first-order logic, you cannot define equality. You can add as many axioms as you want, but you cannot make those axioms in force, equality to be interpreted as equality in models. You can interpret as some sort of equivalent relation, but not as equality. Also, you cannot define partial functions. There is a whole area with books written about it called partial first-order logic. Here we can define partial functions just fine using the matching logic patterns. More interestingly, and initially unexpectedly, we can also capture calculi, like lambda calculus and mu calculus. And the trick to capture these general binders like in lambda over mu is to realize that a lambda binder in fact achieves two goals at once. One is to build a term and the other one is to build a binding. So what we do here is to separate the two, right? So we use an auxiliary symbol, which here I denoted by lambda zero. Lambda zero, they just builds the term and then we use the essential quantifier that we already have to build the binding. And then we just say lambda Xe is an alias for this. So lambda Xe puts together, bundles together the term building capability and the binding capability. And that's very nice because now you can think atomically about it. But in matching logic, we decompose the two basic operations and that allows us to now define lambda calculus as an ordinary theory, literally, ordinary theory in matching logic where you simply add this equation, which is the beta equation, or if you want eta and other equations, you just add them there, they all make full sense, where lambda is now just an alias for that. And you can do the same for mu and other type of binders. Mu, here we say that mu is a fixed point. And this doesn't state that mu is the least fixed point, but you can actually capture that property with another pattern, which we call the Knastor-Tarski pattern, which literally captures the essence of the Knastor-Tarski theorem, right? Which says that if psi is a fixed point of E, then mu is smaller than it. And now once you add this proof rule as well, you have mu, you have fixed points. And with fixed points, you can define lots of other important logics. Like we went ahead, and again, I can make the paper available, I cannot go through now, but you can define model logics, core logics, dynamic logics, all of L, T, L, C, T, L, C, T, L star. You can define all those once you have dynamic logic that's well known. Separation logic as well, which is also just a matching logic theory. And finally, and more importantly, for the connection with K, reachability logic. Right, so yeah, so we have recently realized that reachability logic, which was used as a foundation of K until recently, we realized that reachability logic itself can be defined actually in matching logic, which made matching logic the solid foundation of K. We just need one logic now, which tells the entire story, and it is so simple. So next, let me tell you a few words about reachability logic, because I'm pretty sure that will look like a lot more intuitive and useful to you than matching logic. But keeping the back of your mind, the reachability logic is again, definable completely in matching logic. It's just a theory in matching logic, like any other theories that I showed you before. So reachability logic, what you do, you use patterns like this, you use actually rurites between patterns, right? Remember in K, we had a rurite relation. This is the counterpart in reachability logic, where you go from one pattern to another. And the intuition here is that if you give me a program configuration that matches phi, I will execute it with the programming language semantics and eventually reach, that's why it's called reachability logic, or eventually reach a configuration that matches phi prime. And this can generalize two conditional rules, and it can also incorporate side conditions nicely, right, in term writing, when you have side conditions, say left rise to right if some side condition, but now we can conjunct the side condition with L, and this is not even a conditional rule anymore in reachability logic. And then we realize that actually the reachability relation can be stated as a pattern once we have weak eventually, and weak eventually can be defined using mu. So reachability rules are just ordinary match logic patterns. But from here on, I'm going to continue to present things with reachability rules, because as I said, they are more intuitive, and actually even in proofs, we use reachability as synthetic sugar for patterns that you can define anyway using match logic. Why reachability logic was the right formalism for us to explain what K does, is mainly because the sentence of reachability logic, this reachability rule captures, so the same time, both the elements of operational semantics that we use to define languages and of axiomatic semantics that we use to specify properties. Specifically, we can take operational rules, transitions, and regard them as reachability in one step. This is, for example, the assignment, the small step semantics, operational semantics of assignment. As you can see, you go from this configuration to this configuration. These are just patterns. They have variables and so on. And I think now the intuition is quite clear why it is matching logic, right? Because you write this pattern and clearly this is being matched by the configuration that you wanted to capture when you wrote this, right? And then you say, these do write to a configuration that matches this. But similarly, you can capture whole triples, which are specifications, basically, of programs, where you have precondition, code postcondition, and such specifications can be as well described as reachability rules, where you have to create a little configuration, you have to add a precondition there, a condition, and then this reaches another configuration. So whole triples, just a particular instance of reachability rules, small step transitions, a particular instance of reachability rules. So now we have a logic in which I can express both the operational semantics and the axiomatic semantics with the same statements, reachability rules. And what is K? Now it's very easy to say what K is. K is literally a best effort implementation of reachability logic. K is doing its best with heuristics, with algorithms, decision procedures, whatever it can, to implement reasoning in reachability logic. So this is how it works. Like any other program verification environment, it takes as input a program and a specification, or specifications that you want to verify. But what is different here in the K program verified is that the verification infrastructure is one box that you do not touch. That's one for all the languages. And then that takes as input the particular program in language, right, in which you want to write your program, right? So in other words, we instantiate the program verifier with the language semantics and then the verification infrastructure reduces the problem of verifying those specifications to smaller problems that can be solved in the mathematical domains of the various built-ins that we use, like natural numbers or Booleans and so on, which then we discharge with SMT solvers like Z3. So in some sense, you can think of the K verification infrastructure as a little engine that takes language semantics as input. And then it allows you to reduce the problem of verifying a program for that particular programming language to queries to an oracle that knows about all the mathematical domains that you need in order to specify properties or the semantics of your programming language. This is where the relative completeness actually comes into the picture in our framework. So this is complete, or machine logic is complete, relative to a model for an oracle for certain domains that we consider fixed. And now what we do in the blockchain space, we instantiate the verification infrastructure with more languages, like the EVM, Yale, Plutus, Solidity, and so on and so forth. We have evaluated the K program verifier with several large languages, C, Java, JavaScript. And the moral overall is that performance is not an issue. So we have been discussing with many colleagues, and they said, yeah, even if you make this work, the performance is going to be horrible. You will not be able to do anything efficiently. And we have actually written a paper specifically to compare performance. And there are lots of numbers there two years ago in Upsla. But in the end, the conclusion is that performance is not a big issue. We compared for C how long it takes to verify a program in C using our approach versus the same program in C using the VCC prover, which is one of the most influential provers for C, which was extended with separation logic by a research group. And in the end, we verify the program in 280 seconds while the other, the VCC tool, verifying in 260 seconds. Yes, a bit faster, but not a big difference. So anyway, my point is that a language-dependent verification infrastructure can be, in the end, as efficient as other specialized verifiers for specific languages can be. Because in the end, most of the time will be spent to solve queries to the SMT solver, which will be pretty much the same no matter what prover you use in the end. All right, so now let me tell you how we use all these key infrastructure that I presented for the blockchain. The very first step to approach smart contracts formally is to have a formal semantics of the basic machinery, basic virtual machine that runs smart contracts, which is the Ethereum virtual machine. For that, we have defined the K EVM semantics, which is a formal semantics in K of the EVM. And like with all the other languages, we have given a complete semantics. And we thoroughly tested it. We tested it against more than 40,000 programs that come with the Ethereum C++ implementation. And we passed all of them. That was not unexpected that we passed them, because we've done that with many other benchmarks before. But what was slightly unexpected and kindly a surprise was that the performance of interpretation was not as bad as I initially thought. I thought it would be at least 100 times slower than the actual C++ implementation. But it turned out that it was only 20 times slower. And there are a few programs, 10 or 15 programs, among the 40,000, which were the stress tests, which are responsible for most of the overhead. When you take those out, you get 10 times, actually, only slower execution with the interpreter generated from the K framework. And this may look bad if at the first site. But in fact, think about it. Think what happens here. You are executing an actual formal specification of the EVM. And that's only 10 times slower. So you didn't implement anything. You just execute the mathematical model of the EVM, the specification of the EVM. And the specification is what you would like to prove about the C++ implementation if you can. But that would be very hard. So you just execute that. And it's only 10 times slower. So we saw that excellent news that gave us energy and enthusiasm to think that we may be able, in the end, to generate interpreters that can be practical. Maybe at a point where they can even compete with handwritten interpreters for languages. And in fact, this is one of the things that we are doing right now with the EVM, together with IOHK. We actually are going to have generated and deployed the correct by construction EVM client. We have a KVM testnet that runs the KVM interpreter, or the interpreter generated by the K framework from the EVM semantics in K. And if you search on the internet, IOHK, EVM client, and so on testnet, you will find the links. And you follow them. And you can actually go ahead and run your smart contracts with the KVM client, which is great. But you can do more than that. You can also verify smart contracts. You can now, using the reachability logic and matching logic verification approach, now you can take smart contracts and use the KVM semantics and verify them, prove that they do what they are supposed to do. And this is actually one of the thrusts in my startup, the runtime verification. We verify smart contracts commercially. And the way we do it is to take the K verification infrastructure, instantiate it with the K semantics of EVM, and then verify smart contracts. Basically use the proof system of matching logic and prove that contracts do what is claimed. As part of our effort to verify smart contracts, we also engage with the Ethereum Foundation to verify one of their main protocols, CASPER. There are some changes planned for CASPER. But for the time being, we are the team in charge of verifying the CASPER protocol, which is implemented as a smart contract, which needs to be verified just like any other smart contract. It's a lot more complex than usual smart contracts. So the point is that this formal semantics of EVM can be used not only to generate an EVM client that you can then use to run smart contracts, but also exactly the same specifications can be used to generate a program verifier that verifies smart contracts. You see no other semantics of the EVM for verification needs. It's exactly the same semantics, both for execution and for verification. Another important effort, and probably one of the most important achievements of the K framework recently, is a new EVM-like language for the blockchain, Yele, which we designed from scratch using K. And we generate the implementation completely automatically the same way we generated the one for EVM. And what we did was to retrospectively analyze the EVM specification that we defined and to look at all the attempts that we had to verify smart contracts and to try to understand where things were harder than expected, and then step back and think how should have a virtual machine for executing smart contracts being designed in order to overcome all those problems. And we came up with Yele. This is the best we could come up with. And in short, Yele is an LLVM-like virtual machine for the blockchain. That's the simplest way to think about it. And for me, as a designer of K and the creator of the K framework, this is a hugely important milestone because it is for the first time when a real language has been defined in K. And its implementation has been automatically generated. And that's the implementation of the language. There will be no other implementation ad hoc handcrafted implementation. This is the LLVM, the actual interpreter generated by the K framework from the semantics. So previously, all the real languages that have gone through this process were either existing languages like C Java JavaScript or toy languages. So this is the first time when a real language is designed that way from scratch. And the implementation automatically generated from it. And again, if you're interested, you can search for Yele, blog posts, IOHK, and you'll find several pages. And soon more pages describing Yele. There is also GitHub. Repository, everything I present here is open source and public. So feel free to download and contribute and play with this. IOHK, in collaboration with runtime verification, are going to launch a test net running Yele this time. So it's the same infrastructure, the same network infrastructure mantis that is being used also to run the KVM client. But this time, it will run the Yele VM under the hood. And this way, for the first time, literally for the first time, people can execute smart contracts using a different VM than the Ethereum virtual machine. And we encourage you to try it and let us know what you think. Please feel free to report bugs, performance bugs. Yeah, we're going to see bad things. We don't need you to tell us good things. We know the good things. Tell us the bad things that you find about the Yele test net. And the reason I have this slide here is to also show you that Yele is quite human readable. You may not see the fonts because they are small, but you can see that you have a contract here and you have a function, another function, and this code is human readable. It's saying like LLVM. LLVM was designed also to be human readable. And we really like that. So we borrowed from the LLVM everything that we could because we like LLVM. And Yele matches that and adapts it for the blockchain. Several other languages in the blockchain space have been given or are being given semantics in K as we speak, such as the WebAssembly. The Ethereum Foundation also is shifting towards WebAssembly from the EVM. Everybody recognizes that EVM has some problems and there are better alternatives. We go with Yele. The Ethereum Foundation goes with Wasm. But what's interesting to notice is that Wasm is also being formalized in K as well. And Solidity is another language. Plutus is Solidity is actually the main language used for smart contracts in the smart contract community. But there are many other languages that are being proposed in order to address problems discovered over the years with Solidity. And one of these languages is called Plutus, also designed by IOHK. And they work very closely with Philip Wadler, who's leading the Plutus effort. And we are formalizing Plutus also in K with the final goal of being able to prove Plutus or smart contracts in Plutus also correct. And also to be able to translate those programs to Yele and ultimately run everything using the Yele VM. And Viper is another language proposed by the Ethereum Foundation, which is a moving target. Actually, Plutus is also a moving target. But in the case of Viper, the formal specification and the implementation go hand in hand. And so the formal specification of Viper literally helps the design of the language. But the design is still done in the traditional way. But in parallel with it, we also do the formal specification. So far, I talked about what we have done so far with K. Now, what I'm going to talk about next for the rest of the presentation, 10 minutes, is ongoing projects in the K framework. Things that we'd like, new tools and new infrastructure we'd like to add to the K framework in order to provide better support for users and for the languages that are being defined in K. And by the way, we are re-implementing actually the entire K framework in Haskell. And we are hiring. So if you're interested, please contact me. All right, so remember that the interpreter is one of the most important tools. And that the current OCaml backend of K is fast enough to power KVM, the KVM testnet, to power the Yale testnet. And actually, the tool, RVMatch, that is being commercialized by runtime verification. However, we think we can do a lot better. We can have a faster backend. And the trick is to understand the specific needs of K matching and then to generate a pattern matcher that goes beyond your camel pattern matcher, go directly to LLVM and make it very specialized to the K framework. Because sometimes we have thousands of rules. Actually, in the case of C, there are like 25,000 different rules. So you have an OCaml match statement with 25,000 cases as you can imagine. There's a performance penalty there to pay. So we believe that by very case-specific using very, we can implement a very case-specific matcher directly in LLVM. So that's what you're working on now. And we believe that once that is implemented, we can get at least another magnitude of performance improvement on performance. And when that happens, if my prediction is right, then we'll get to a point where we can compete with actual handcrafted interpreters. So if you are a designer of a language, you may be motivated to implement your language as a mathematical theory in K, and then generate an interpreter which should be as fast, maybe faster, actually, than the interpreter that you write by hand. But there's not the only benefit you'll get. You also get all the program verification infrastructure and all the other tools in the K ecosystem. But even in terms of execution, we believe that we get to a point where you can be more motivated to write a formal semantics than to implement an interpreter for your language. Another important piece of research and tool that we are developing for the K framework is what we call semantics-based compilation, which means to go beyond interpretation, to generate a compiler from a semantics. And the way to do that is to use symbolic execution and try to summarize all the behaviors of your program that can be summarized statically before execution. And then at runtime, only pay the cost for whatever cannot be determined statically. This is a high-level picture describing the idea. So the idea is to have a new tool in the framework, semantics-based SBC, let's call it SBC, which would take a language as input and the program in that language. And then it would generate another language definition, let's call it LT, which is a specialization of L for the program P. In other words, you have a new semantics of a new programming language or of a new language, but that language is actually hardwired for your original program P. It doesn't know anything else except your original program P. And the advantage of that is that that semantics will be a lot simpler, and all the computation that you do it over and over and over again when your interpreter program will be gone and done only once statically. We did some experiments. We took a simple language with while loops, assignments, and so on. This looks like C. Actually, that's like the kernel C. And applying SBC, we get this specification. We draw it as an automaton, because it is an automaton. But in fact, all this is encoded with the right rules. So this is like a new programming language, having one, two, three, four instructions. And these rules, these arrows are actually rules. This is how the green color says how they change the state. The red color says what side conditions we need to check. And in the end, this is really very easy to execute efficiently. It essentially refers to particular important points in the program, like the beginning of the loops. And then it summarizes entirely the basic blocks, even the conditionals. So basically everything that you can do statically you do statically. And then you generate a definition, a new definition, which now you can compile and run with whatever backend you want. And in experiments, we've seen performance speed-ups of at least another magnitude, sometimes more. So by doing SBC and then using the LLVM backend, we hope that we can get to a point where the compiler generated from a semantics, not only that they are correct by construction, but they will also be close, at least for some languages, to actual handcrafted compilers for the language. Probably it will be hard to beat GCC or Clang soon. But at least for languages that are not very, very well engineered or domain-specific languages, this can give the best possible way to implement a compiler for a language. All right, the final project that we are working on is to generate proof objects for everything that K does. As I told you, every single tool, every single blue box here, in the end, what it does, it searches for a proof. There are decision procedures, efficient algorithms. But in the end, you can think of what it does as searching for a proof. And if that is the case, then why not generate a proof, a proof object out of it? And then if you do that, you can use that proof object to serve as a correctness certificate for what the tool did. So for example, now, suppose that we use the K tool, or you use the K verifier to verify a smart contract. Why should you trust it? If you trust your verification, it means that you trust K, which is something very big to trust. K is huge, and it can have bugs like anything else. So how about generating an actual proof object and then using that as a correctness certificate? And that's the only thing you have to trust. You don't have to trust K. So that's what we plan to do. And even for the parser, we want to also generate proof objects. Because we found that sometimes the parser parses the program in an unexpected way. We had the program x minus 1. And to our surprise, it was parsed as the function x applied to the argument minus 1 completely crazily. So if you prove something about that program, you prove something about the wrong program. So parsing itself actually can generate proof objects to say that the parsing was done correctly. And that's what we intend to do. And I would like to conclude with our grand vision for how K can serve as a universal language for the blockchain. You've seen lots of tools, lots of applications. What we'd like to do is actually to realize that we can put all these together and be able to provide an infrastructure where people can write smart contracts in any programming languages they want for which they have a case semantics. And this is how we envision that. This may be completely off. This may never work. But I think it's worth a thought, at least. So let's get back to our blockchain picture. And what we'd like to do is to actually store, in certain accounts, formal semantics of languages. Like, for example, here, I'm going to store the case semantics of Solidity version 0.2.1. From K's perspective, version 21 and version 22 are completely different languages. And in fact, if you claim that you verify a program in Solidity, you have to see the exact version of Solidity because the program can have completely different behaviors. So once we freeze the version of the language, why not also freeze it forever by putting it in the blockchain? And then you can have many other languages the same way. So the case semantics of languages will be stored in the blockchain. And now, whenever you want to write a contract, let's say like in this new block, this new transaction, then you can write that contract in any language for which you have a semantics in the blockchain. Just refer to that language by its unique ID, let's say L. In this case, it's a unique address of the language, where the language is. And then use the SBC tool for that language. And there is no doubt about which language you are talking about because there is a unique way to identify it. So use SBC for that language. You may optionally also verify the program P using the semantics of L. But keep in mind that whatever you verify there may be irrelevant because the compiler can change the meaning of the program. So there's no really necessary because we verify things directly at the binary level. Then once we have LP, the SBCD P, according to the semantics of L, then we can use the YLE back end, FK, to generate a YLE program, PLA, corresponding to your program P. So all this translation from program P in language L now became a program in YLE. And now we can formally verify that program, the smart contract, at the YLE level. There's the bottom. You don't have to trust anything. You don't have to trust maybe even SBC may have errors. You don't even have to trust SBC to go all the way down to YLE. So you see this way, you only put contracts that have been verified on the blockchain. And you have also the language semantics on the blockchain, which is a good place actually to store them anyway. Then the nodes. The nodes can themselves run a YLE client, a YLE VM. And YLE has been generated a YLE client that has been generated automatically from the formal semantics of YLE. All right, so you can imagine that all these nodes run YLE. And this way, if you step back and look at the whole picture, in the end, everything being executed on the blockchain or everything on the blockchain is either a formal specification or something that has been generated automatically correct by construction from a formal specification. And all the smart contracts in various languages have been verified in the end. So there are no places where human error can interfere with the whole process. Of course, there is still the question of whether we get the specifications right. And that will always be a question. But I would rather have that problem than low-level problems in the code that I cannot control. In conclusion, we believe that this can be done. The ideal language framework can be approached these days, whether K is the final answer. I don't know, probably not. But at least I believe that it demonstrated that it is worth pursuing this path. And if this works, then why should you do things any other way? Because we shouldn't if this works. Thank you.