 So hello, welcome to the security dev room Get a seat and we'll start by introducing Kai from Germany Who is going to talk about Panopticon Ida this up this assembler welcome Thank you I'm Kai from Bohem. I'm a hacker and I'm today here to tell about an exhaust project called Panopticon Which is a cross-platform Libre disassembler So I will first talk a bit about the goals of the project and then we will come back to reality and see how the project actually Is implemented now and if the time isn't up there yet, I will tell about about architecture but first I want to make the case for why we need such a tool and Well, when we're in security, especially we need disassembler for analyzing proprietary software for example finding bugs in tools like Windows and analyze malware because most of the malware doesn't come with source code attached And often when we are just free software developers, we often want to implement free software replacement for file systems or network protocols that are implemented in proprietary tools only So we have to rip those tools apart too That's what we need a disassembler to and most of the tools we use now especially in security work and all proprietary so the idea about the Part of project on project is to build a replacement for these proprietary tools and So when I'm talking about the rest engineering, I'm talking about binary rest engineering So we only concerned about F binaries or P binaries that implemented in machine code and What I'm also not talking about is automatic rest engineering. So what we are doing here is mostly about manual rest engineering So project on as this kitchen sink approach where you have one tool that does everything and everything is integrated and Add your fingertips and you have an integrated graphic user interface To allow you to surf the code and figure out what the application does So this assembly always starts with the this assembly and This is where most of the open source tools stop So we have our binary code and we can reverse that last assembly the last step of the completion of assembly because it's more or less a one-to-one mapping between bits and the assembly code listing and For example tools like object dump just dumped a assembly code onto the console and let's read it But that's not something you can really use in reality because most of these tools have millions of lines of assembly listing And we are not interesting in 99% of the code because if you know a bit about programming We already know how it's implemented what we're interested in is it this little part in the application that Deployments the state machine for the network protocol or that checks that file system Or that implements some kind of backdoor So what the advanced tools do is what it's called static analysis So it takes that assembly code listing and tries it cuts it up into chunks So the concept of functions for example exists exists on the assembly level so we can separate a code into functions and then separate the functions in something called basic blocks which are Sequences of assembly code instructions that are executed without interruptions So we know when the first instruction is executed execution will continue until the end of the basic block and then we have a jump or a branch So our tools try to recover this information Be the nice graph and then comes the last part which is often overlooked especially by open source tools And I think this pop this problem is mostly cultural We have to get the information that's in the computer into the brain of the user So we have you need a graphic user interface so the interface part is the important one We what we essentially have to do is to transform the information the computer in a way that our brains can Well understand it So with Propticon we take a step That's not often done by most of us too is that the graphic user interface is an integral part of the system So when you implement a feature or what you want me to implement a feature You have to tell me not only how it interacts with this is similar part in the study analysis part But also how do we represent the information we gather to the user in a way the user can actually use so not just dumping into a text file or something like that, but 10 turning in the pictures or something like that that we can actually They will help the user to understand the binary. So that's really important Another thing most of the even proprietary tools lack is analysis based on semantics So what most tools do is they know how assembly code looks like so they know this bit pattern Turns into this mnemonic and the morning is a string and we know how the arguments look like and the arguments are also strings And that's pretty much it and just gives it a dead code to you for you to read But what's more interesting is when we have a tool that actually understands the semantics of the code at runtime So what probably condos it implements an intermediate language. It's a bit like using copilus There are so for every mnemonic we recognize the other generic short sequence of intermediate language There's easy to analyze that implements the semantics of this opcode at runtime And when we have the semantics we can do analysis on the semantics instead of just at the syntax of what it looks like and I will give you two examples what we could do when we have the semantics and The first one is called a direct interpretation The basic idea here is that we have an analysis across all possible paths throughout the program and Instead of just looking at one path and a one value at a time We just replace concrete values with sets of values or abstractions of set of various. So What I mean with that Is here a bit explained so we have to see code on the left and that implements a switch statement It's just a bunch of cases and when you have a certain set of cases for value we print Prime and if it isn't of course, it's all primes and If there isn't a case so we return false and what GCC does from this code is it implement something with something a binary search tree so if you first start with the middle case, which is 11, I think and Will look if value is equal to 11 and if it's equal of course it jumps to the basic block that implements the print And if it doesn't it compares whenever Value is larger or smaller than 11 and then branches According to that and so you have some kind of tree that Unfolds towards the bottom and at the bottom you have to false case where everything flows together So what we often are interested in is okay. What are the values? That causes the print F to fire and of course when we are an experienced reverse engineer We can always see that this is a binary search tree and this probably switch case statements So we need a code and check for all the equal for the comparisons and the equal jumps and then we see that all the equal jumps flow into one basic box basic book What abstract interpretation can do is automate exactly that so it can execute the code Figure out. Okay. When this jumps is taking a very must be 11 And this time to take a very must be 19 and it can take the superset of all the possible values and shows that They are limits to that of course doing abstract interpretation of course across the cold wall program is hard especially any of things like I owe But again, you can do this manually But I'm having a machine to do it for you and presenting it to you Helps you to concentrate on the big picture and do that what the machine can analyze which is Inferring what this means or what it means when well is an event for example So just giving hints to the user Well, I believe make a rest engineering way more easier Another thing is called modern model checking as opposed to abstract interpretation We are it was born much and we only care about one specific path throughout the program. That is feasible under a set of constraints so One example where we could use this is okay This code is a bit artificial, but it implements some kind of sanity check on a network protocol on a file system Form it so we first we have two inputs a and b Let's just answer integers and we first check that a is smaller and be and then a mustn't be zero And then we multiply a by three and inward B And then we add them both together and we expect this to be x decimal 42 So when we are Compiling this we get something that looks like we have on the right so we have all the checks and The true branches out of fall through here. So we want to follow the red lines and The last basic block is the one we interested in that of again prints there. Okay, so of course we interested So what does the input has to look like in order to that print have been executed and of course But what we do with as experience versus geniuses be executed called backwards in our mind because we check the first conditions Okay, the addition has to be OX for two and then we trace the code backwards and what we do in reality is we write a Short-pisan a program that just enumerates our cases until we find one So at least this what I do What binary what one machine can do is? generate an Well, I'm more or less familiar from that We add a bunch of constraints and then we throw it into the magic binary model checking algorithm And we give you give us an possible trace throughout the program that will Hit that basic block. So what we do here is we add the constraint that that last jump is taken which just means that zero flick has to be one and Then the model checking algorithm will look for a possible set of values that were for first constraint and give us The values and including the traces up there. So we see on the top there we need a to be o x one five one one four a six and be something else and What's Very nice about this is that you can add additional constraints. So maybe you're kidding you can see a but you may there are some checks before that you already saw that Check that a isn't this way you so can add another constraint that okay. We wanted Job to be taken but as we don't want that aid to be that value We can start the algorithm again And it will find another solution over tells that there's no solution or it will try to compute forever And I think we'll crash But these are three possible. So just as a reminder the difference between abstract interpretation and born watching is with a second Pretation we are looking at all paths at the same time and with born watching. We're just taking it looking at one path So As I thought that some other skeet features I would like to see and ascending all of outrageous list What we really really nice so probably got is meant as a static analysis to it But having dynamic information is always very helpful when you have huge applications So of course with the semantic information you could simulate the whole program But this is very expensive and especially with bond model checking you can't do this on real-life applications So it's pretty much impossible to do bond model checking on a wall chrome your instance for example So having the ability to include traces from pin for example dynamic RO or just the gdb Instance would really helpful and what I would like to see is that we can match the Traces onto the control flow graphs and that can tell us okay when you have this input under this environment Control flow flows like that and when I change that value or that part of the environment control flow flows like that But that would really have for I think and the nice thing is we already have to traces We have or a pin view of dynamic. Oh, we just have to implement it that matching and the reading of these traces Of course, you always need scripting support When you have a powerful tool you want to automate things So I'm abetting at any type of scripting language would really be helpful. I would prefer to have only one and I would like I don't want to start in Any language while so But we can do pretty much everything Ruby Python rates. I'm not a fan of this so in if we can case you want to guy that may be a longer discussion, but I can live with everything and Well, when you want to replace I la pro you have to replace hex rays So a decopilot would be pretty nice even a decopilot doesn't really Decopilot or that the secret you get out there isn't really secret as was written especially when the program wasn't written in C and But only only get that code so that this is kind of born one checking can can be done on C, but There's no real use in doing it in C instead of in the assembly code But the control flow Structures you have in C are easier to read your control for graphics always planer You have high-level type information makes it easier to read real life applications So maybe some kind of decopilot would be nice This isn't as impossible as it looks like especially if when you have semantic informations You can use abstract interpretations for example to recover stack layouts and the use of stack frames throughout the program And then you only have to do a type inference So back to reality This is all nice and part of this implemented especially the abstract interpretation part But aside from that the program isn't as far as I wanted to so how does it look like a bit like that? So we have a graphic user interface. It's in Qt and you can open the application You can open a file and then we'll start this assembly at the entry point and we give you the list of functions You can click the list of functions. You get a control flow graph You can pan around consume can click on one of the lines add comments save the whole thing. That's pretty much it We can disassemble Intel architectures as well as two of the smaller 8-bit microcontrollers we have semantic informations for the 8-bit microcontrollers pretty much complete and Well, Intel is another thing. We have wallet 500 memos in Intel And so you have to write the semantic information for 500 or so opcodes But this isn't as big as it looks like because when you look at real life applications When you implement in around 100 120 of the most popular of course you already have 90% of everything that's in there and As I said before we are not really concerned about global program analysis We just want a local reasoning about that function or that set of basic blocks So what happens at runtime? How do we get to the path there? So it isn't as important that we have 100% That we are 100% precise. We're not trying to do this symbolic execution or do automatic exploit generation We can open ELF fights. I actually have a pull requests open Pretty much now I will merge when I come home and get a bit of sleep. And so we will and support P files too And aside from that we can roll the raw flash now for AVR, which isn't that complicated As a project is hosted in GitHub We have an open development model. We use the issue tracker there So in case you have a question you can open an issue and I will try to answer it And if you have a patch you can set it as a pull request So I have a bit time left. So I will talk a bit about the architecture The application is in one repository, but it's two parts So we have one library that does all the disassembly the static analysis And the representation of the code And it's written in Rust. In case you never used Rust It's not that much different from C++ for example So I Started Rust one and a half years ago and it took me three months or so To understand it in a way that I can program tools like that. So it isn't that complicated We also have the graphical front-end which is a bit rust to interact with the library and On top of that is QML. That's some kind of JavaScript derivative. That's used by Qt to implement widgets So when you clone the repository you see something like that the library consists of around 20 fights That are more or less named after the thing they do We have the abstract interpret which is the abstract interpreter and then we have the assemblers for AMD 64 AVR and MOS and Then we have some kind of tree of the Representation of the program. So at the lowest level we have the mnemonics with a mnemonic RS Mnemonics are grouped into basic blocks basic blocks are grouped into functions functions are groups in the programs and programs are grouped into a project and the project is sort of the top-level mode of what's saved in the application and Well, you have the data of definitions in there yes of I L RS Which is the definition of the intermediate language we use there in case you're more on the academic side we use rail But it's a it's a derivative of rail and we have some custom functions custom operations in there And well the front-end isn't that complicated We have a bunch of rust fights to Communicate the library and do the layouting for the control for graphs and I'm aside from that They are film folder called QML where all the QML files live each five limits one widget There isn't that complicated. So in case you never use JavaScript and it isn't that farth. It isn't that JavaScript T as you would expect Also QTS Really nice documentation so you could check that out It's pretty straightforward So in case you're interested and may want to help me and or just want to check out the project We have a website we you can also we have on the website. They are the link to the API documentation the users documentation And we are you can also jump to the GitHub repository directly and also if you have a question you can reach us on the Free notes channel and also we have a Twitter account where we mostly posted about use about the project Thank you, Kai So we have five minutes for questions We have the first one. I'm wondering why rust Amiga Love us, but when they were why the project started years ago, and I used C++ because why not and I'm sick of C++ and When I saw rust rust solves the problems I have with real life C++ applications and this is my hobby project So I just thought to myself why not use rust So one year ago. I just rewrote the application that was at that time 10,000 lines of code into rust and Turns out it was way easier than I thought I actually Got the line count down to 8,000 I have less bucks and Rust really helps me to avoid the kind of box here for C++ code bases like iterate and validation data races It's way less painful to program rust and C++ I was wondering about Office kated malware and in particular There are some TTPs that you can recognize so easily that you could potentially build semantic information for that. Yes Of course obfuscation is there to stop us There's only so much you can do You know why that that's why it's interesting to have a dynamic information So when you have something that unpacks itself at runtime you can do a snapshot and import it As well when you have things like virtualized Morbare where you have some some interpreter in there What I can do is use the scripting engine to implement some kind of lifter for that intermediate language after you have disassembled it and Then use because you only have to do to generate intermediate language and then use all the Code analysis features there a bit in there to do code as analysis directly on the office kated in virtualized mobile But of course, this is a problem. It's there to stop us We can all do so much Follow one question on that. Do you have the disassembler for all the language you had to see but we're thinking to Disassemblers to C++ or some other high-level language or you can't be compiled to C++ to write Also with C++ you have the Advantage that you can try to pattern match certain parts of the C++ compiler to figure out how for example Class hierarchies look like but right now it's easy, but only if for now you can analyze Assembly code listings, okay, of course You can even analyze Haskell it just looks a bit crazy So we have two more questions planned raise your hand if you want to ask more No, it's a simple question, but What's the logic behind the the compiling it's Besides of the scope of these the talk but When you the compiler a list of the assembly you the compiler in a C or C like What's the logic behind you? you you pick and a Code of G are associated by a list of assembly besides another So what you can do is what for example I la promo see does is better matching. You can of course the compiler Turn certain constructs into certain assembly code listings and you try to try to recognize that and turn it back and Other ways is to just turn the code into C. So you turn it into some kind of Well, see expressions and then you can turn the assembly code expressions into a C expression The decopulation is just three process. You only have to recover the control floor Architecture the control floor constructions in C. So you can do this with better matching You see okay when I have a block that just have a loop. Okay, that's loop What's more complicated is to recover the type information and then to recover how the stack is used That's can be done with the assembly with the abstract interpretation and the type information where you can do a type inference algorithm you have with Haskell or thrust and To for this to work. You also need type of typing information So you need to encode into the disassembler that certain API calls have a certain typeset And so you can use this when when the assembly code calls the function, you know, okay The arguments must you have must have test types and you can try to push the information down into the assembly code So that's pretty much how the complication works Thank you More questions One question There to first question. What was the reason not to use any of the existing the assembly libraries? Which would give you access to more proposes of families? And the second question would be is is there an option for for example another type of syntax like 80 tc syntax And others you use inter syntax for the xx86 So currently we only Let's start with the assembly question The problem is that the assembly libraries you have now don't don't give you a really systematic information There's capstone which can tell you at least which part with arguments are written with red But I can't tell you what's the function between those two arguments and Doing this is the most of the part of most of the work So I saw no much use in trying to rep library because trying to let see rap see libraries and having it Compile flawlessly on most machines is very hard with rust so when you only have rust it's easier and Okay, we of course we can generate AT&T syntax. There's we can put the switch in there currently we have inter Hard-coded, but that's not much of a problem My time's up. Okay. Let's thank kai And there's a five-minute break, please open the door so we can get some air in thank you