 So, good afternoon everyone, I'm going to start here, talk about Clang as a front-end, so this talk will be a bit different from the other talks, so this is actually not changing Clang itself, but actually using it. Our agenda, I'll just start by saying why do you need to find bugs, a brief introduction to model checking, ASBMC, the two that I'm working on, and the Clang front-end inside the ASBMC and the future of SMT solvers. So, why do we need to find bugs? Does anyone remember this case, Ariane 5? It was launched in 1996 and exploded 40 seconds after launch. Does anyone knows why it explodes it? Yeah, it was an exception thrown by a conversion to from a 64-bit flow to a 16-bit signed integer and then they just lost the rocket. So, in 1996 also there is this case of USS Yorktown, they automated the whole battleship and does anyone know why it stopped working, why it break, why it crashed? So, it was a really simple bug, it was a division by zero and the whole battleship crashed and had to be towed back to the naval base. So, yeah, embarrassing, right? So, we need to fix bugs, we cannot have those situations happening when you are commanding a battleship, right? So, why it's formal methods? Formal methods, something, it's a way to try to avoid that, to find bugs. What's the main standard today? It's testing your simulation. So, you have a program and you check a path on the program and you get an error on ALK. So, it checks paths on the program, it may miss some errors but that's okay, you can check. Since it's quick, it has more memory requirements, time requirements, you can actually check a lot of times so you get a good coverage. On the other hand, model checking, it's unencoding. So, you have the program specification of the program and you check all the paths, all the single possible errors are checked by the model checking and in the end, you have, okay, no bugs or error trace. So, you can actually see the set of assignments that lead to that bug. The problem is, since you are converting the whole program, it can be extremely resource hungry, so memory and time. We are working here at University of Hampton, we've bound the model checking. So, it's kind of a laser approach to model checking. We bound loops, array size, contact switches. So, we just keep increasing until we find a bug or we don't, what we just say we don't know. It's good to find bugs because usually bugs are really shallow in the program, so we don't have to iterate that much to find bugs. It can never prove that the program does not have bugs. So, it will either say there's a bug or I don't know and if you have infinite loops, it might be a problem because you can never unroll all the possible, all the programs, all the possible paths. So, you might never be able to actually check all the possibilities. So, yes, we can see it stands for efficient SMT-based context bound model checker. It's this, it's this architecture. I'm not going to go into details in each block, but basically we get a source code, a CLC++. We convert to a simple language called go-to language. We basically remove switches, loops, and just make the program simple. We do a symbolic execution, which is go through all the paths in the code. We encode the constraint in properties of the code. So, properties are like, if the variable is never zero, if it's a division, it's a division, this kind of stuff. And then, we call it using SMT. Right now, we support Z3, BULLECTO, MATSARC, CVC, and X. And, well, this is the whole, whole process. And I'll focus on the client front-end. Yes, we can see it has all these built-in verification support. So, pointer safety, array, bound access, individual by zero, enabled by default. But we also check overflows, memory leaks, deadlocks, that races. I'll show you a small example. So, here are a few examples. So, don't mind the gaps in the program, they are there for a reason. So, we just have a function, full, and it does in the division. And we want to check all the possible values of A that will trigger a bug in this program. So, it's simple as, yes, we can see. So, it will find a set of assignments that will lead to a bug. You see these big numbers there. So, we can actually do something like, okay, my program will never have this kind of input. For instance, my program will only have A ranging from zero to a hundred. So, we can write stuff like, assume that A is between zero and hundred, B is between zero and hundred. So, this is example two. So, it will try to find a property violation that will trigger division by zero for the values of A inside this range. So, you see if you use five and one, this will trigger division by zero here. This is using bolector, one of the solvers. You can actually check for using other solvers. So, that's really much. So, different assignments, but they always trigger the division by zero. This assume is actually the same as if we write an if, if this condition is not match, it returns zero. So, it's the same. So, finally, this example three, let's say I don't want to check division by zero. You can either disable division by zero using the flag for the two, or you just assume here that C is never zero and C is the equation for the division, right? So, let's say, so no bug, right? But I mean, you don't want to check for that bug, right? But let's say you want to check for overflow. You actually get a set of assignments for overflow. You can check with other solvers. That's that as well. Sorry. Right. So, yeah, so these are simple cases, but you can see they're really effective to find those bugs. We have actually the set of assignments that lead to that. So, it's easier for the, for the, for the developer to see how he can reproduce it. Another program here with concurrency. So, it's basically a program with a dialogue. And we can check that as well. Yes, let me see. Try something very naive when it comes to, to check a concurrent program. So, by default, that lock is not enabled. So, sorry. So, it checks by default if the ML lock return and no pointer. But with this flag, we can force it to always succeed. So, by default, it doesn't check for that lock. But if we add the dialogue, it will eventually find a bug and give you the backtrace for the, for the set of assignments that lead to the backtrace, to that, to the dialogue. Yes, let me see. Try to interleave all the possible points in the program. So, if you have huge concurrent programs, that will take a while to, to verify. You try to interleave everything. Okay. So, we encode everything using SMT. And it's complicated. We can talk later. Okay? All right. So, okay. So, these are simple examples of using SMT. And the clink front end, right? Why you are, we are moving to clink front end. So, our old front end was about 15 years old. So, we didn't have support for a bunch of stuff. So, for instance, compound, compound literals, stuff like this. And, well, I'm not a really good bison developer. So, I had no idea how to implement that. And we didn't have support for designated, initialized stuff like this to initialize the race. No support for type of operator. It was full of bugs, almost 30k locks. So, every single, what? You didn't pass your program on your program. Then, you would find all the bugs. Yeah, yeah. Let's, let's talk about that later. So, every single change in the program is, I mean, it would lead to a lot of bugs and stuff. And it was hard, especially when I talk about C++. So, we tried to, our old front end, tried to support C++ 98. There was harks everywhere to support them. One of the problems is that we never really understood the rules for template initiation because they are hard. You're going to see that later. And it extended the C front end plus the 5k locks. So, yeah, we don't want to maintain that because that's not the objective of the tool, the main aim of the tool. The tool needs to verify the problem. We don't want to maintain that. And then came Clang. Clang has a well-defined AST. So, we just actually need to write a converter for the AST from Clang in two hours. New features, we just have to add a new conversion node. I will never have to program in flex buys anymore. So, that's, that's alone. It's good. And we have convenient function like evaluate as integers, evaluate as building conditions inside the Clang that we can simplify a lot of the, of the AST for us. We have one in there, in there was for a real compiler. So, the same errors and one as you get from the compilation you're going to have on, on the verification tool. And it's much smaller. So, just the C front end is about 4k locks and the C plus plus about 7k locks. So, much easier to, to support. The C plus plus front end is not released yet because we don't have any polymer field. But I'll talk about that later. And there is another thing that alone by itself, it's what, it makes the, the Clang awesome, which is the AST contain all the instantiated templates. The AST contain all the instantiated templates. Have I told you that the AST contain all the instantiated templates? And I'll show, I'll show you why this is so great. So, I have here the standard. So, this is the, probably the last standard before the release of the C plus plus a left standard. So, you see there the date. If we go to page 368. So, we are talking about explicit instantiation, right? So, that's easy, right? Until we, so we have a lot of points, a lot of examples, until we get to, I'm not on the right one. So, explicit instantiation. So, we get to the point number seven here. And I read for you guys. Is it? Sorry. Here. So, a placement of explicit instantiation, specialization declaration for functional templates, class templates, member function of class templates, static data members of class templates, member class of class templates, member of class templates of class templates, member function templates of class templates, member function of member templates of class templates, member function of member templates of non-template classes, member function templates of member class of class templates, etc. And the placement of the partial and specialization declaration of class templates, member class templates of non-template classes, member class templates of class templates, etc. Can affect whether a program is well formed according to the relative positioning of the explicit the specialization declaration and their point of specifications in the translation unit as specified above or below. When writing specialization, be careful about the location, it will make you compile will be such as river or kindle itself emulation. So it's hard, right? We don't have to deal with that anymore. Clever people don't care for us. Some kind, so we're just going to use that. Really? Oh, that's much better there. So why moving to clang? We are using Libby tooling because by the time we try to use the Libby clang, there was some misfunctionalities. I think it's much better now, but since we start with that, we are not moving back again. Most of the code to walk the S2 is based on the coding STDumper type printer and the coding Libcode gen. Some limitations we found is C++, for instance, does not support implicit function declarations. So yes, we assume that I showed you guys, we're not going to work by default. I'm still trying to work out how to fix that. So random crashes, for instance, if you try to get a line number from translation unit in clang 30.6, it crashed, but it's fixed since then. Clang does not build the VJable for you, so at least not using the defined AST, so I'm going to have to write another converter for that. We have no access to the static analyzer. That would be excellent for us, since we're trying to verify the program, try to get some information about the reasoning of the program. There are no optimizations at the AST level which is reasonable because all that optimization are done at bytecode or VM, so we get small optimizations, and there are some lack of documentation, some corner cases. For instance, let me show you guys this. So that's a small example. We just have a class X and we do a copy constructor, we call a copy constructor, and if we try to generate the AST for that, so briefly, this is the, but three variables I0, I1, I2 of the member function, the member str. So basically what Clang's telling me here is that this is the copy constructor for this class, and I have no idea where this is coming from, and Clang does not tell me either. So however, by the time you reach the code gen, and when the bytecode is generated, it's just a main copy for that case in this particular, but there is no documentation for that. We just try to pass the AST when we see that, and there is no explanation why that is like that. I think this just place holders to generate my R. Exactly. It will match to an M copy. Yeah. But the thing is, we are very careful when it comes to variables, and if there is a variable with that declaration, we don't like it. So we just don't know what to do. So. Any difference between the declaration of those variables and the declaration of any other variable that you have in the context? Because if there isn't, then those could be real variables, and then global variables, and we have no idea if this real thing is just made up. Exactly. We don't have an idea how to do that. Okay. So just a bit future about SMT solvers. So why we don't check our base code? Because SMT solvers are really resource hungry. We're talking about hundreds of gigabytes to formally verify how to. Well, SMT solvers are evolving, but they still need a long way to go. But how is the future for them? So going mainstream. So this is a serial patch by Dominic Chen. They are on the fabricator. They add Zentry to the constraint solver. The memory usage is about 20 percent higher, which is okay. However, their time to run is about 15 times higher. So that's something to improve. But we are getting there. With the Zentry patches, we are able to finally reason about symbolic flowtext expressions that this technicalizer could not do before that. It's not accepted to mainstream yet, but it's been on the active discussions on the fabricator. That's it. Yes, when we see it's open source, it's closed development however, for some reason. Our developers are shy, I guess. Please check the code and any questions, just send me an email later. Thank you. Are there any questions? Yes. You mentioned that it doesn't do V-tables. Are there any other normal constraints about the kind of code you can analyze? Not really. Okay. So he asked if there's any other limitation other than V-table for the C++. Not really. With everything else, it's there. We just have to parse it, actually. Even function pointers and things like that you can follow? Yeah. I mean, not on the AST. But wouldn't you have to track all the things that could be theoretically loaded into that function pointer which is also analysis? Yes. This is done in C++ on a later step, not on the AST. So the AST is basically just to convert that to our internal representation. So we don't have to parse the program, do any type check. Okay. Cool. What language is CSGFC at my time? So it's in C++. Yeah, C++ 11 basically. So it wasn't a big deal. No, no. Yeah, basically C++. We have some MagPy developers. So MagPy is that bird that likes to steal Chinese things. So every new Chinese thing, new features on C++ we just like to implement that. So soon we're going to be moving to C++ 14. Okay. What is the biggest software problem to check with your tool? So he asked about what's the biggest program we checked. In terms of source code, about 10 megabytes of source code, which is small to medium, I guess. It really depends on the program because a lot of stuff we can just throw away and not give it to the server. We do some kind of static analysis on the program later on to remove unreachable path and that kind of stuff. So it really depends on the program, but so far about 10 megabytes of source code. Or what? Have you run it so long? On the tool? Not really, because it's too big. Yeah. Yeah. Can you state the difference between this tool and CBMC? So yes, so the SMC is a fork of CBMC. And CBMC is mostly about set solvers. It does have support for SMT solvers, but it's less than ideal. We are focused on SMT solvers. Yep. There are also other tools that could be a clean part. What is the advantage of using the word tool versus P and what are the ways about this? So, as far as I know, it tries to run the program and generate coverage. Right? You know, it's a symbolic institution, so. Okay. So it tries to produce all the possible input for the program to cover all the possible part. While you, as I have seen, do some kind of forcing that to produce some kind of part. Yeah. So what's the difference between CLE and CBMC? So we walk all the paths or do the symbolic execution and we do a bounded check. So you actually can define how many loop iterations do you want to run. And we call it SMT solver. I don't know if CLE uses SMT also. Yeah. So yeah. I would say the bounded model checking. It's the difference. You can bound the verification. Right? Thanks for the talk. So your tool can verify dead logs and stuff like that. Yeah. And you also verify obituary things that you express using LTL or maybe something else. Yes. Yeah. And then how expressive is your execution? Yeah. We have a support for LTL. There is a paper we published four years ago, I guess, that defined all the semantics for that. You just have to write in using LTL. All the function calls, I guess. And you have LTL or you can use LTL. LTL. Yeah. Yeah. How do you go about modeling a system library? So when you use a client, you're getting in. Yeah. So you don't have, like, a given C? Yeah. So for C++, it's not a big deal because they provide the whole code. But for C, stuff like string, print shaft, this kind of stuff, we have a model for that internally. It doesn't cover all the C libraries. But most of them are there. So handling strings, charts, files, everything is there. Floating points as well. Yes? To start from the Clang CFG, does the CFG include anything? Because basically, by the time we were looking at the Clang, we thought that the SC will provide the most information we needed for the verification. So this CFG can come kind of optimized. Some branches can be cut. And we just wanted the actual representation of the program, which is bad because this CFG code kind of generated the VTAY for you, right? Yes. Yeah. You mentioned the throwbacks in this, right? Yeah. Yeah. And also, another thing where you could have, like just the LVM. Yes. There is a tool that does that. It's called SMAC. So it works from the bytecode and converts to a boogie language. I think that is and try to verify. Yeah. The advantages of that is that you can verify much more language, not only C and C++, right? You are working at the LVM bytecode. But since we are focused on C and C++, we decide just go with Clang. Yep. OK. Thank you very much.