 All right next up. We have Zephyr giving a talk on symbolic computing moving on moving on up Please give him a warm tour con. Welcome. Thank you guys Gosh, how are you guys doing for tour con 20? It's been so long. I Actually remember when I was in high school. I think this was to her con seven We've gave a talk about tour hidden Unknown it's unknown what he gave a talk about he mentioned. He was really high on LSD about halfway through and It it was a talk it really went from topic to topic. There was accusations. There was Some technical talk of tour And he had a height man in the back that would just like a cake like hat backwards full-on favor fave like yeah Yeah It was really great. It was a great talk But I wondered you know as he got into his description of How he had come to give this talk and why this talk was was important, which didn't make any sense at all I I wondered like what kind of conference would possibly host such a An event and so when I had my own stream of consciousness based presentation, I knew who to take it to Anyway, this is about symbolic execution, which I have been I have been dealing with now for How I would say the past three years I'm sure many of you here have heard of symbolic execution Maybe even part of like, you know a more broad temperamentally aligned dynamic taint analysis Or maybe like particular pieces of tooling like when the cyber grand challenge happened A tool called anger came out of that and that's been been of considerable interest And Just going over, you know what symbolic execution is You know rather than some vague description of it. It's really quite simple You provide symbols rather than concrete input as arguments to functions and there's a variety of ways that you can manipulate these symbols a very very common if antiquated method is to supply some kind of Transformation some description of what a transformation entails on each symbolic variable And then as you go through the program just as if you were running in an emulator You make a note of that again very similar to dynamic taint analysis Once you're at the end of that the end of that being some arbitrary slice of the time that it took for a Program to get from either, you know one basic block to another or from, you know, some wall clock time to another you can Create the set of Transformations that happen to that variable and then run that through something like a SAT solver and SMT solver some general purpose Solver for determining what input yields what output which if you're writing computer exploits You know, that's critical like that's a that's a huge feature if you can determine What do I need to provide to this function as input in order to get it to be in this state? if that state is even possible to be in and even more broadly like there are tools that exist that can make a They can produce a general memory model of a computer program and that memory model is Subject to all kinds of interesting restrictions about what kinds of things that can handle but in general it allows you to work with the the most basic case of Here's a sized allocation I have somehow or another gotten the ability to write out of the boundaries of this sized allocation and That's a bug. Tell me about any condition that that can happen in either in tandem with a fuzzer or Through, you know, some some mechanism like I was discussing earlier I'm not gonna talk too much about the history. I originally intended this talk to be, you know full Complete understanding of how symbolic execution works and what it what it entails and that was a little bit optimistic This this is just the the original is a paper called select and Select just in a sentence or two told you everything that I just spent the past five or ten minutes doing Symbolic execution precedes a normal execution except the values may be symbolic formulas over the input symbols Just like solving for x with you know, your algebra problem from from middle school high school, whatever You can solve for x with a little bit more complex rules Just instead of the the rules of equality and transitivity and all that you you have the the rules of x86 or you know See standard, whatever whatever is appropriate All right Select a fantastic computer program really groundbreaking And paper called effigy came about which for reasons that are some combination of politics and chance takes off And this methodology of symbolic execution remains influential in Program verification for years, but it takes a few more until computers are fast enough and the state of research is At the point where it is Being applied by non-academics to the question of can we find exploitable vulnerabilities in computer programs using symbolic execution and There's been a lot of doubts about whether this is a feasible path forward. I mean, I know that Elkheim tough Who's the author of AFL as well as Poff if you've ever used that You know, he he he basically ground it down and said this will never be used to find any non-trivial number of vulnerabilities in real computer systems and I think I'm I'll be happy to tell you that I Actually show off a few vulnerabilities Including one zero day So get out your cameras And it's it is indeed difficult and it's a much different procedure than fuzzing But I think that at the end of this talk you you'll have the power to Work with a lot of these tools like CNBC, CLEE, whatever the case may be and find and exploit a very basic stack or heap overflow Before we get into that, you know, why should you use a? You know tool that employs symbolic analysis or or more abstractly the abstract analysis of computer programs You know, it's it's tough to to say I mean you always will have a manual analysis given sufficient focus will be superior to any other form of analysis that I can foresee in the near future, but a Tool like CLEE is going to be able to give you very fine-grained analysis of what conditions can be systematically proper can be problematic and What can be detected and What kinds of conditions can be detected that result in? Whatever memory corruption or or who knows Memory leaks, however It's really slow though. I can tell you that even tracing through Some of the variables that I had in Erlang the Erlang runtime system is one of the vulnerabilities I'm going to discuss finding here You know, obviously Erlang is interpreted computer programming language. It's You know, there's bytecode instructions and then it calls out to C programs and Tracing the value of a variable through early I mean it took overnight before I was able to see this relatively modest set of transformations that were applied to a particular variable within the beam virtual machine I Got I told you guys a little bit about what got me interested in this Again, like lots of people on IRC will tell you that it's useless and and they don't usually have very good reasons for why it's useless I don't have a background in in sat solvers or symbolic execution or or anything like that I I come from the world of finding vulnerabilities memory corruption bugs and you know corresponding exploits in those bugs What is this new symbolic execution thing? How can it benefit me? I did a ton of work to try to build myself up to the point where I Could you know have a firm grasp of what is entailed in symbolic execution? I actually wrote my own sat solver. I watched on video lectures. I even wrote my own like What you might call a lifter, which is like a basic highly idealized description of the behavior of x86 instructions, which is Really really quite quite challenging like there are hundreds of x86 instructions and once you think you know like how Complex x86 is like you don't yet know how complex like no matter how complex you've said it to be it is yet more complex than that and Once you you think you're at the limit, then you haven't yet considered like rap or shift left or all these other instructions that are just Bizarrely extremely complex And so by watching this talk you can all avoid the hassle of doing this and you won't ever need to write a Description of the transformations that particular x86 instructions entail So that when you write your p-trace based What a description of the state changing of the state of a register you can just say no, that's that's useless I Thought I had another slide here never mind So the very first vulnerability I found with this method and I think I'm gonna go a little bit into how I I found this But just as a description of what it is I said that here it's this is a CM yk color space bug But this is actually was a bug in the decoding of the Huffman tables. This was in the plan 9 JPEG library and Plan 9 I don't know if you guys have heard of it. It's a research operating system produced by Bell Labs a lot of people Who worked on it later became really well known? Rob Pike is Intimately tied with going and Kim Thompson who you know Unix all that kind of stuff so what I mean to say when I when I bring up all of these rather famous people is that it's a generally a very high quality code base and Despite a few nebulous conditions I think if you were to manually review plan 9 it would not be easy and furthermore If you wanted to write a fuzzer I Mean you can't really use AFL You're working in a system where you either need to use a port of the plan 9 utilities to a POSIX system or you Report AFL to some 90s research operating system So what I did was just as I as I described before I did this did this manually and that means breaking out an SMT solver and Hunkering over the Intel manual and you basically try to write something that describes What is the behavior of each x86 instruction and you don't need to do every instruction, of course There's been huge huge swaths of the set of x86 instructions that are Not going to be used and furthermore, you know the plan 9 x86 compiler doesn't even recognize and so it could never emit as the as the consequence of generating object code You do need to do everything that's within you know the program and That's still again hundreds of instructions and so This is actually I mean It's kind of ambiguous when I'm describing it to you here now but what you're ultimately doing here is Starting with the state of a computer program the state meaning not even necessarily the memory but what are the registers at this particular point and Given some highly idealized description of memory, which you can do in an SMT solver Give me the Give me the set of things the set of inputs that will cause this particular register To yield this particular output after these transformations have been applied to it And so if you're doing a loop or something like that, you know You just apply every one of those transformations and this big big list of transformations to every one of the registers And that's a very very narrow case of what symbolic execution can do That's that's like, you know, I that basically required me to have some high-level insight about Jesus about the Nature of what a computer vulnerability is and how computer vulnerability can be exploited so then I'm kind of you know Just doing the mechanics of how do I get from what I Suspect is a vulnerability to what is certainly exploitable using you know a SAT solver and That I would not recommend doing it but I do mention it because it is possible to do and If you have any friends that run plan 9 you can go check this function out I have on my github an image that generates That that's a proof of concept for this and you can You can pop their box if you if you were running into any corporate plan 9 networks So After I dealt with plan 9 I I don't want to turn this into a biography, but I do want to mention that Reasonable amount of time separated the the creation of this vulnerability in plan or the creation is exploit for plan 9 and the research that I was doing in the Erlang virtual machine and Does everyone here know what Erlang is or have you heard of Erlang? Okay, it's computer programming language I think that's suffice to say Interpret it as I mentioned before Very very large Source code. I mean it's been built up over years It's a real computer program and I think that it's not crazy to imagine that it is within an order of magnitude of some other computer programs that you might want to Evaluate like Whatever open office or Adobe or Word or a web browser So the technique that I took with this one was to You know try a bunch of different concocted execution engines symbolic execution engines Dynamic trained taint analysis to try to figure out where in Erlang might be the greatest source of vulnerabilities and In the end I to find this bug I used a tool called CLEE, which I'm going to get into a little bit later It's a tool produced by University of Stanford. I believe and a Critical aspect of symbolic execution that I didn't get into in in the the previous description of plan 9 because it was such a Narrow view into the behavior of the JPEG parser within plan 9 You need to if you are evaluating a C program for example You need to prune out everything that is not conceivably exploitable or that you don't believe could be reasonably exploited by Some intelligent human I I don't and the intuition for this is totally a result of writing software and Breaking other people's software and I don't know if there's anything that I could give you that could tell you like Here's what you need to look for beyond the basics of like, you know, if they're not doing any sized allocations and it's just Arithmetic that doesn't control the behavior of any later allocations, then you can forget that just as an example You need to hone in on what you think is really Harry code extract that code out into some place where you can run a simple a Simple test input against that bug just like you were if you were doing a fuzzer Right like you would want to create a test harness for some set of code that you were interested in And it's the same with symbolic execution except maybe a little bit more refined in what you want to be extracted So I took out a piece of the regular expression matching code The bug is actually in yeah compiled regular expressions within Erlang and I did just this I took out a huge swath of it used a piece of software called niffy which a native native function in Erlang is called niff and You you replace all of the elements of it that are You have immediate control over as a user and replace here means very simply let's see here Sorry, I don't have it included in here, but Klee makes symbolic. That's a function name Klee makes symbolic is A tool that indicates to Klee. I'm going to want you to treat this as if You had control over its its value and that may not be true And you can subject Klee to various assumptions about what kinds of values you can provide if that's if that's something that you need to do But in the end I think as I as I told you I Found a actually I think I found the very first remotely exploitable vulnerability in Erlang I don't know if that's true or not. I The first CVE for it. I didn't register the CVE and so I I have I have not taken a serious look into it And I also found some vulnerabilities in PCRE that were totally unrelated as well as Found some vulnerabilities that had already been fixed in PCRE that were were fixed upstream PCRE by the way is pearl compatible regular expressions It's a widely used regular expressions library Okay, so how do you How do you do this like where if you want to get your feet wet? Where do you get start? There's three tools that I would recommend to you and once you start taking a look at these I think you will immediately branch out into all of the wide world of other things that that Abstract analysis of computer programs is concerned with CNBC Klee manticore. I haven't used manticore nearly as much. It's something new by trail of bits or something like that I I I can't I don't include any information about it, but I have heard really great things about it CNBC and Klee I have used extensively though, and so I'm gonna try to show you guys what exploiting a basic like You know aleph one smash in the stack for fun and profit style vulnerability looks like inside of these two tools This is a CBMC tool from Oxford as part of the Seaf Prover set of tools Primarily interested in program correctness and Verification, but it does indeed have a lot of facilities for the conditions that I described earlier, which Basically are writing outside of a region that you should be able to write outside of So if you if you create a new C file with this simple, you know overflow oops Obviously, you know 16 by buffer overflow, you know unchecked string copy If you run CNBC on this alone, which is just you know dot slash CNBC Provided trace unwinded pointer check if you call this with even without a any kind of special indication that it should check This it'll let you know hey this is something that's could be very bad and this can be a little bit tricky in bigger programs because it looks at The translation unit of a C program when it is making these judgments about Where the checking on variables is and sometimes that's not the case So sometimes there's actually you know just like we're calling overflow here like overflow is safe If we check the length of the string prior to passing in The string argument, I'll you mean safe in quotes. It's theoretically, okay However the check for the safety of that property can occur anywhere in a computer program but CNBC isn't necessarily going to be aware of that because it doesn't have high-level heuristics about that kind of thing But The vast majority of vulnerabilities don't fit into that category So you're gonna be great if you look at something that you suspect is some Harry C code It will quickly let you know with a whole lot less false positives than something like rats or ITS for I don't know what the current state of the art for like static analysis is or Sigital I have never used it, but I assume some people here have And It's very similar with clean here's here's clean make symbolic with clean makes it with clean you're gonna want to indicate Hey, I'm passing in a symbolic value, which is string to overflow and Not only is this symbolic in the terms of the content it's symbolic in terms of the length up to this value and You can do that in two ways you can indicate with With just like a loop like you can create a statically sized string and say for every value of this You can just loop over and say clean makes symbolic of this particular value or Clean makes symbolic actually provides a facility for Passing an additional argument with Var Args for hey, this is a Unsized buffer. Let me let me know what what you think of it And Klee if you run it Klee works a little bit differently. You have to compile the the Code that you're using with LLVM clang rather And then you run Klee on the result of this intermediate representation that's produced and Klee gives you so much really valuable information, but if you're just trying to work with something you believe has a Stack-based buffer overflow You're gonna want to take a look at the air files air does not mean an air and Klee it means air is detected by Klee and so you know cat Your last Klee analysis and you're gonna see things like this which this gets cut off a little bit here But what it says at the end is basically oh, you know, what is a value for this? Well, if you provide string equals char of blah blah blah blah blah up to some ridiculously huge number you're gonna have a problem and indeed that would cause a segmentation fault and You can also use Klee to test for heap overflows as well, but There is within Klee a simple a Simple model of an allocator that I don't to be frank understand adequately to be able to describe to you what it's To what granularity it's it's capable of understanding the behavior of Blocks like you know you allocate a block you allocate an adjacent block you overwrite that adjacent blocks heap chunk metadata I don't think it has an enough understanding to know what is gonna go on unless you provide the source code to an allocator Which if allocator your your program has been compiled with but that's gonna make it a lot slower as well So what you're gonna do with heap overflows is very similar to what you're doing with stack overflows, which is that you just look for a Pointer into an area with that you do not you should not control and what can give me that kind of of a value for a pointer and This is great. I mean this This works because Klee and CNBC and tools like this detect buffer overflows And I suppose that a a heap and a stack overflow are both part of this broader category of buffer overflows But if you're doing something against an allocator that relies on the behavior of The allocator to trigger the vulnerability like in this piece of code here where realloc doesn't actually Doesn't actually perform a check for the size of the newly allocated block So it will take an existing block Just have a free list that indicates oh, I've just changed the size of this update the associated heap trunk metadata You allocate another block will that block if it's within the same arena that is to say it's of the approximately the same size It will allocate it right next to it and so now you have a memory leak of what was previously in there You have the heap trunk metadata for what was previously in there You can change the heap trunk metadata for what is previously in there and then in the original block You can whatever call free or something on I forget the exact details of whether you need to coalesce blocks or whether you need to free or what the what the There's always a million different ways to exploit allocators and there you can get quite sophisticated, but in any case it doesn't matter because Symbolic analysis wouldn't be able to tell you that this is the path you need to take So you know not to say that a fuzzer could either but if you're hoping for some really high-level Understanding of computers and you know the organization of a C program. You're not gonna get it. You're getting a pretty low level By the books Understanding of if this allocation occurs outside of the region. It should then Excuse me if this right occurs outside of a region. It should then that's bad and if not, it's okay All right, yeah, as I as I mentioned before I couldn't figure out a way to to make this work with a but symbolic analysis And as I said, you you often need this frying fine grain model and symbolic analysis just cannot provide that But symbolic analysis will and if you get into all of these tools, you know, you have a crash for example Just like I'm sure many of you have heard of the bit blaze project Clee has a very similar facility just inside of The the data that's generated by Clee you sign K test last or Clee last you can run K test on that and determine What happened to this variable prior to some crash? And I don't know if this is I didn't I didn't hit my little timer here So I don't know how much time I have left, but I am open to questions Nobody all right, that's that's it. Oh, what's up? You said Moe flow. I know I haven't I haven't I see I see no I Moe flow I have not what what's the story? I mean I see I see Oh Yeah I mean it always took me a little bit tricky to do binary analysis Symbolically, I mean it's outside of the most clear cases like segmentation faults It seems like a much trickier problem than we'd be just being supplied see code and Evidently there's people that are working on it though and you know BAP and and of course many other many other projects You're trying to do like binary analysis. I think that's actually what BAP stands for But no, I haven't that's great mofo. I'll take a look Anyone else right? I hope you guys enjoy this song and I hope that you You really dig Turk on 20 Take it easy guys