 The next talk is called end-to-end formal ISA verification of risk-5 processors with risk-5 formal. I have no idea what this means, but I'm very excited to understand what it means. And Clifford promised he's going to make sure everyone will. Clifford has been very known in the open source and free, software, free source, oh my gosh. He's an open source community and especially he's known for the project Ice Storm. Please help me welcome Clifford. Risk-5 is an open instruction set architecture, it's an open ISA. So it's not a processor, but it's a processor specification, an ISA specification that is free to use for everyone. And if you happen to have already implemented your own processor at one point in time, you might know that it's actually much easier to implement a processor than to implement all the tools that you need to compile programs for your processor. And if you use something like risk-5, then you can reuse the tools that are already out there, so that's a great benefit. However, for this endeavor, we need processors that are actually really compatible to each other, processors that implement the risk-5 ISA correctly. So with many other ISAs, we start with one processor and we say, oh, that's the processor and later on we figure out there was a bug and what people sometimes do is they just change the specification so that the specification now fits the hardware they actually have. We can't do something like that with risk-5 where there are many, many implementations out there all being developed in parallel to fit the same specification. So we want to have some kind of means to make sure that all these processors actually agree about what the ISA specification is. So what's former verification? Former verification is a super broad term. In the context of this talk, I'm talking about hardware model checking. More specifically, I'm talking about checking of so-called safety properties. So we have some hardware design and we have an initial state and we would like to know if this hardware design can reach a bad state from the initial state. This is formally the problem that we are trying to solve here. And there are two means to do that, two different categories of proofs that are bounded and unbounded proofs. And with the bounded proofs, we only prove that it's impossible to reach a bad state within a certain number of cycles. So we give a maximum bound for the length of a counter example. And with unbounded proofs, we prove that a bad state can actually never be reached. So unbounded proofs are, of course, better if you can make an unbounded proof. But in many cases, this is very hard to achieve. But bounded proofs is something that we can do. So I'm talking about bounded proofs here for the most part. So what's end-to-end formal verification? Because that's also in my title. So historically, when you formally verify something like a processor, you break down the processor in many small components, and then you write properties for each component and prove for each component individually that they adhere to the properties. And then you make a more abstract proof that if you put a system together from components that have this property, then this system will have the properties that you want. With end-to-end verification, we treat the processors one huge black box. And just ask the question, does this one huge thing fit our specification, have the properties that we want? That has a couple of advantages. It's much, much easier this way to take one specification and port it from one processor to another because we don't care about how the processor is built internally. And it's much easier to take the specification that we have and actually match it to other specifications of the ISA. Because we have a specification that says what is the overall behavior we expect from our processor. But the big disadvantage, of course, is that it's computationally much more expensive to do end-to-end formal verifications. And doing this end-to-end verification of a processor against an ISA specification is something that historically was always viewed as like the textbook example of things that you can't do with formal methods. But fortunately, the solvers, they became much better in the last couple of years. But now, if we use the right tricks, we can do stuff like that with the solvers we have nowadays. So that's risk-5-formal. Risk-5-formal is a framework that allows us to do end-to-end formal verification of risk-5 processors against a formal version of the ISA specification. So risk-5-formal is not a formally verified processor. Instead, if you happen to have a risk-5 processor, you can use risk-5-formal to prove that your processor confirms to the ISA specification. For the most part, this is using bounded methods. Theoretically, you could do unbounded proofs with risk-5-formals, but it's not the main use case. So it's good for what we call back-hunting. Because maybe there is a count example that would show that the processor could diverge from the desired behavior with 1,000 or 5,000 cycles. But usually, when you have something like a processor and you can't reach a bad state within the very short bounds, you have high confidence that actually your processor implements the ISA correctly. So if you have a processor and you would like to integrate it with risk-5-formal, you need to do two things. You need to add a special trace part to your processor. It's called the RVFI trace part, risk-5-formal interface trace part. And you have to configure risk-5-formal. So the risk-5-formal understands the attributes of your processor. So for example, risk-5 is available in a 32-bit and a 64-bit version. You have to tell risk-5-formal if you want to verify a 32-bit or a 64-bit processor. Risk-5 is a modular ISA, so there are a couple of extensions. And you have to tell risk-5-formal which extensions your processor actually implements. And then there are a couple of other things that are transparent for a user-land process, like if unaligned loads or stores are supported by the hardware natively. Because risk-5-formal only says that when you do an unaligned load or store, then a user-space program can expect this load or store to succeed. But it might take a long time because there might be a machine interrupt handler that is emulating an unaligned load store by doing aligned loads and stores. But if we do this formal verification of the processor, then the risk-5-formal framework must be aware what is the expected behavior for your core. Should it trap when it sees an unaligned load store or should it just perform the load store unaligned? So what does this interface look like that you need to implement in your processor if you would like to use risk-5-formal? This is the current version of the risk-5-formal interface. Right now there is no support for floating-point instructions and there is no support for CSRs. But this is on the to-do list, so this interface will grow larger and larger when we add these additional features. But all these additional features will be optional. And one of the reasons is that some might implement just small microcontrollers that actually don't have floating-point cores or that don't have support for the privileged specifications or that don't have CSRs. Through this interface, whenever the core retires an instruction, it documents which instruction it retired. So it tells us, this is the instruction word I retired. This was the program counter where I found the instruction. This is the program counter for the next instruction. These are the registers that I read and these are the values that I've observed in the register file. This is the register that I've written and this is the value that I have written to the register file, all that stuff. So in short, what we document through the risk-5-formal interface is the part of the processor state that is observed by an instruction and the change to the state of the processor that is performed by an instruction. Like changes to the register file or changes to the program counter. And of course, most processors actually are superscaler. Even those processors that say they're non-superscaler in-order pipelines usually can do stuff like retire memory load instructions out of order and parallel to another instruction that does not write the register, things like that. So even with processors we usually don't think of as superscaler processes. Even with those processors we need the capability to retire more than one instruction each cycle. And this can be done with this N-RED parameter and we see all the ports like five times wider if N-RED is five. OK, so when we have a processor that implements this interface, what is the verification strategy that risk-5-formal follows in order to do this proof, to formally verify that our processor is actually correct? So there is not one big proof that we run. Instead, there is a large number of very small proofs that we run. This is like the most important trick when it comes to this. And there are two categories of proofs. One category is what I call the instruction checks. We have one of those proofs for each instruction in the ISA specification and each of the channels in the risk-5-formal interface. So this is easily a couple of 100 proofs right there because you easily have 100 instructions and if you have two channels, you always have 200 proofs that you have to run. And what this instruction checks do, they reset the processor or the starter the symbolic state if you would like to run a unbounded proof. Let the processor run for a certain number of cycles and then it assumes that in the last cycle the processor will retire a certain instruction. So if this check checks, if the add instruction works correctly, it assumes that the last instruction retired and the last cycle of this bounded check will be an add instruction. And then it looks at all the interfaces on the risk-5-formal interface to make sure that this is compliant with an add instruction. It checks if the instruction has decoded correctly. It checks if the register value we write to the register file is actually the sum of the values we read from the register file, all that kind of stuff. But of course, if you just have this instruction checks, there is still a certain verification gap because the core might lie to us. The core might say, oh, I write this value to the register file, but then not write the value to the register file. So we have to have a separate set of proofs that do not look at the entire risk-5-formal interface in one cycle, but look at only a small fraction of the risk-5-formal interface, but over a span of cycles. So for example, there is one check that says, if I write the register, and then later I read the register, I better read back the value that I have written to the register file. And this I call consistency checks. Yeah, so that's, I think what I said already. So for each instruction with risk-5-formal, we have an instruction model that looks like that. So these are two slides. The first slide is just the interface where we have a couple of signals from this risk-5-formal interface that we read, like the instruction that we're executing, the program counter where we found this instruction, the register values we read. And then we have a couple of signals that are generated by our specification, that are output of this specification model. Which registers should we read? Which registers should we write? What values should we write to that register? Stuff like that. So that's the interface. It's the same for all the instructions. And then we have a body that looks more like that for all the instructions that just decodes the instruction, checks if this is actually the instruction the check is for. So in this case, it's an add immediate instruction. And then we have things like the line near the bottom above the fault assignments, assigns back PC write data, for example, says, okay, the next PC must be four bytes later than the PC for this instruction. We must increment the program counter by a value of four when we execute this instruction. Things like that. Yeah, so you might see there is no assert here. There are no assertions because this is just the model of what kind of behavior we would expect. And then there is a wrapper that instantiates this and instantiates the core and builds the proof. And there are the assertions. The main reason why we don't have assertions here, but instead we output the desired behavior here is because I can also generate monitor cores that can run alongside your core and check in simulation or in emulation and FPGA if your core is doing the right thing. That can be very, very helpful if you have a situation where you run your core for maybe days and then you can have some observable behavior that's not right, but maybe there are thousands, even million cycles between the point where you can observe that something is wrong and the point where the process actually started diverging from what the specification said. And if you can use a monitor core like that, then it's much easier to find bugs like this. Okay, so some examples of those consistency checks. The list is actually not complete and it varies a little bit from processor to processor. What kind of consistency checks we can actually run with the processor we're looking at. There is a check if the program counter for one instruction. So I have an instruction that says this is the program counter for the instruction and this is the program counter for the next instruction. And then we can look at the next instruction and we can see is the program counter for that instruction, actually the next program counter value for the previous instruction. And they must link together like that. But the core might retire instructions out of order. So it might be that we see the first instruction first and then the second instruction later, but it's also possible that we see the second instruction first and then the first instruction later. And because of that, there are two different checks. One for a pair in the non-reversed order and for a pair of instruction in the reversed order. There is one check that checks if register value reads and writes are consistent. There is one check that sees if the processor is alive. So when I give the processor certain fairness constraints that the memory will always return a memory read within a certain number of cycles, things like that, then I can use this to prove that the process will not just suddenly freeze. This is very important. And this will also prove that the processor is not skipping instruction indices, which is very important because some of the other checks actually depend on the processor behaving in this way. And so forth. So there are a couple of these consistency checks and it's a nice exercise to sit down in a group of people and go through the list of consistency checks and see which set of them actually is meaningful or which set of them actually leaves an interesting verification gap and we still need to add checks for this or that processor then. Okay, so what kind of bugs can it find? That's a super hard question because it's really hard to give a complete list. It can definitely find incorrect single threaded instruction semantics. So if you just implement an instruction incorrectly in your code, then this will find it. No question about it. It can find a lot of bugs and things like bypassing and forwarding and pipeline interlocks, things like that. Things where you reorder stuff in a way you shouldn't reorder them, freezes if you have this life check. Some bugs related to memory interfaces and load store consistency and things like that. But that depends on things like the size of your cache lines if this is a feasible proof or not. Bugs that we can't find yet with RISC-5 FOMO are things that are not yet covered with the RISC-5 FOMO interface, like the floating point stuff or CSRs, but this is all on the to-do list. So we are actively working on that and a year from now, this stuff will be included. And anything related to concurrency between multiple hearts. So far my excuse for that was that the RISC-5 memory model is not completely specified yet. So I would not actually know what to check exactly, but right now the RISC-5 memory model is in the process of being finalized, so I won't have this excuse for much, much longer. So the process is currently supported. PicoRV32, which is my own processor, then RISC-5 Rockered, which is probably like the most famous RISC-5 implementation, and Vex-RISC-5. And there are also a couple of others, but they are not part of the open-source release of RISC-5 FOMO. So if you would like to add support to RISC-5 FOMO, for your RISC-5 processor, then just check out the RISC-5 FOMO repository, look at the cost directory, see which of the supported costs is most closely to the cost that you actually have, and then just copy that directory and make a couple of small modifications. So I have a few minutes left to talk about things like cut points and black boxes and other abstractions. So the title of this slide could just be abstractions, because cut points and black boxes are just abstractions. The idea behind an abstraction in FOMO methods is that I switch out part of my design with a different part, with a different circuit that is less constrained. It includes the behavior of the original circuit, but might do other stuff as well. So the textbook example would be, I have a design with a counter, and usually the counter would just increment in steps of one, but now I create an abstraction that can skip numbers and will just increment in strictly increasing steps. And this of course includes the behavior of the original design. So if I can prove a property with this abstraction in place instead of just increment by one counter, then we have proven even a stronger property and that includes the same property for the thing with the original design. And actually this idea of abstractions works very well with RISC-5-FOMO. So the main reason why we do abstractions is because it leads to easier proofs. So for example, consider an instruction checker that just checks if the core implements the add instruction correctly. This, for this checker, we don't actually need a register file that's working. We could replace the register file by something that just ignores all rights to it, and whenever we read something from the register file, it returns an arbitrary value that would still include the behavior of a core with a functional register file, but because the instruction checker does not care about consistency between register file rights and register file reads, we can still prove that the instruction is implemented correctly, and therefore we get an easier proof. Of course, we can't use this abstraction for all those proofs because there are other proofs that actually check if my register file works as I would expect it to work. But if we go through the list of proofs and we run all these proofs independently, then you will see that for each of them, it's possible to abstract away a large portion of your processor, and therefore yield an easier proof. Depending on what kind of solvers to use, some solvers are actually very capable of finding this kind of abstractions themselves, so in that case, this doesn't really help by adding these abstractions manually, but just realizing that the potential for these abstractions is there is something that's very useful when guiding your decisions how to split up a large verification problem into smaller verification problems because you would like to split up the problem in a way so that the solver is always capable of finding useful abstractions that actually lead to easier circuits to prove. Yeah, with a bounded check, we also have the questions of what bounds do we use. Of course, larger bounds are better, but larger bounds also yield something that is harder to compute. And if you have a small bounds, well, then you have a proof that runs very, very quickly, but maybe you're not very confident that it actually has proven something that's relevant for you. So I propose two solutions for this. The first solution is you can use the same solvers to find traces that cover certain events and you could write a list and say, I would like to see one memory read and one memory write and at least one ALU instruction executed and things like that. And then you can ask the solver, what is the shortest trace that would actually satisfy all this stuff? And when that's a trace of, say, 25 cycles, then you're okay. When I look at a proof that's 25 cycles deep, I know at least these are the cases that are going to be covered. But more important, I think, is usually when you have a processor, you already found bugs. And it's a good idea to not just fix the bugs and forget about them, but preserve some way of reintroducing the bugs just to see if your testing framework works. So if you have already a couple of bugs and you know, oh, it took me a week to find that and took me a month to find that, the best thing is to just add the bugs to your design again and see what are the bounds that are necessary for RISC-5 formal to actually discover those bugs. And then you will have some degree of confidence that other similar bugs would also have been found with the same bounds. So results, I have found bugs in pretty much every implementation I looked at. I found bugs in all three processors. We found bugs in SPIKE, which is the official implementation of RISC-5 in C. And I found a way to formally verify my specification against SPIKE. And in some cases, I found a difference between my specification and SPIKE. It turned out it was actually back in the English language specification. So because of that, I also found bugs in the English language specification with RISC-5 formal. Future work, multipliers already supported. The floating point is still on the two list. 64-bit is like half done. We would like to add support for CSRs. We would like to add support for more cores, but this is something that I would like to do slowly because adding more cores also means we have less flexibility with changing things, and better integration with non-free tools because right now all of that runs with open-source tools that I also happen to write, so I wrote those tools. But some people actually don't want to use my open-source tools. They would like to use the commercial tools, and it's on the to-do list that I have better integration with those tools, maybe because I don't get licenses to those tools. So we will see how this works. Yeah, that's it. Do we have still time for questions? Yes. So I'd say we start with questions at one. Sorry, here we go. We have two questions, we have time for two questions, and we're going to start with microphone number one, please. Hello, thanks for your talk and for your work. First question, you told about RISC-5 formal interface. Yes. So does vendor ship their final processor with this interface available? Oh, yeah, that's a great question. Thank you. This interface has only output ports, and actually when you implement this interface, you should not add something to your design that's not needed to generate those output ports. So what you can do is you can take the version of your core with that interface, the version of the core without that interface, then in your synthesis script, just remove those output ports and then run a formal equivalence check between that version and the version that you actually deploy on your ASIC. Thanks, one short question. When people say formal verification, usually others think, oh, if it is verified, it is excellent. So absolutely excellent. And do you plan to say that it will find all the error for the processor? Well, it depends on what kind of proof you run. The most work I do is with bounded proofs and there you only get a certain degree of confidence because you only see bugs that can occur within a certain number of cycles from reset. But if you want, you can also run a complete proof where you start with a symbolic state instead of a reset state. And then you can make sure that you actually check the entire reachable state space. But that's a very, very hard thing to do. So that's not a weekend project. Just adding the RISC-5 formal interface and running some bounded proofs is probably a weekend check project if you already have your RISC-5 processor. Thank you. Thank you. We actually do not have time for any more questions, but there will be time after the talk to ask you questions, maybe? Yeah. So maybe you can find me at the Open FPGA assembly, which is part of the hardware hacking area. Super. Very great job to put that much information into 30 minutes. Please help me thank Cliff for his wonderful talk. Thank you.