 Thank you for being here. I'm super nervous. The first thing that I always have to say is that the research is mine, is my thesis that has nothing to do with my employee and the things that I did here I did outside of the work time. So that's me. I'm a malware researcher at F-Secure. I also help organizing the Black Hoody that's a reverse engineering bootcamp just for women. I am a really proud program in the program committee member of HACLU and our conference is open so submit. I'm also ambassador for diesel bay that's kind of a HECA conference in Finland and today I'm talking about malware analysis and logical programming so I will try to explain really shortly and without the math what constraint logic programming is. For that I will talk about SMT servers and how we can use them in the malware analysis. Then I will talk about the applications of the constraint logical programming in the IT security. I will talk about binary obfuscation and how you can use the SMT servers to do obfuscate malware. So first of all SMT servers can be viewed like a huge server for equations so if you have a system of a lot of equations you can use the SMT servers to solve these equations for you. The main difference between SMT servers and the set servers that most of the people know about is that in the set servers you can just use Boolean equations to resolve between two values like true and false and with SMT servers you you can resolve with normal equations also mathematical stuff and so and the good part of it is that a lot of real-world problems can be represented as problem of solving system of equations. I hear a lot of people saying that SMT servers and symbolic execution is like a hype now a lot of people are doing it and stuff and before no one was really talking about a real-world applications of it and I just don't believe that it's a hype because it's a really old theory and yeah I guess we just don't have we didn't have the computer power for doing symbolic education before so it was a hardware problem or technology problem then the problem of the theory so what are constraints we have like two main programming paradigms that we use like or styles of building programs you have the imperative programming that's a kind of that you just state the computer what you what you want the computer to do and then you have another way that it's like describing to the to the computer what you have right now and what you want to get from it so you are just describing what the program was most accomplished in terms of the problem that you have right now great I think just say to computer like to the computer yeah do this way and find out what you are going to do next so you are not like a mom say to the computer hey do go there and do this stuff and I come back you are just saying I have this problem so it for me so if you think about a rider that is a guy that I I really really appreciate the ideas that he has on computers he noticed it like a way ago he said like country constraint programming is like the Holy Grail on the computer science because using it you have the computers there to do the stuff that you will really want the computer to do to solve the problems for you right so with the constraint programming we state constraints about the problems that we have and the computer will find the solutions for these constraints for you of course that there is a theoretical limitation that's like I will talk about that after now it's a bit complicated to understand but circa 90% of the real-world problem that we have in the IT security for example or even in the math or stuff like that they are there are problems that you can define and in this kind of models and they can be written like combinatorial code problems so you can use the same key so far solving it and constraints over for infinite or more complex domains are also possible but we still don't have the computers for it so don't worry I told you I'm not putting the equations here and I'm not talking about the math of it but to explain how SMG servers work we need to talk about predicate logic or logical programming and for that you can see this SMG server like a black box so you have this formula that's the theory or the problem that you have are the goals that you want to achieve then you put this stuff inside the SMG server it will reason about this logical connections between your problem and your goal and then you say for you like is it feasible or is it not feasible to achieve the goals with the domains the constraints that you have in your domain and the best part of it is like it's just it is not just saying to you like yeah it's feasible you it's possible to to do that it's also giving you instances of the ways of solving the problem that means you also have all the possibilities of the solution for the same problem so to understand why you have stuff like that like SMG servers you need to think about who wants to prove a lot of things we had in the keynote someone talking about Lancsec and they are also coming from the mathematicians the SMG servers were used before to prove large theories so you can break the really biggest problems of them is more little problems and your reason about all of them and if you can prove all these small theorems you can also prove the bigger one so the same idea you can apply for hardware or software verification for example and then you can also think about proving that your software is secure or that the language that you are using is also secure and symbolic execution is the most known way of using SMG servers today the symbolic execution of a program is nothing more than a path generator that it's constructing all the structure of the program expanding all the paths of the control flow of your system and with that you have like this hundred percent code coverage that you have in the code verification and the same thing you can do with binaries that means that it's possible to to know that if you are in the time like if you count like every path or every line of code you can know exactly in every line of code which kind of input and output makes it possible to be in this point of the branching or something in the entry point until the end point of your program so it looks like that you have your one input and if you see the second line of code you can see that your domains are going to be changing every single in every single node and if you do it for a small code like here you have 11 lines of code not even that you see already how many paths are possible and that that's one of the biggest problems with the symbolic execution it's known as explosion of pet explosion that I'm also going to talk about after the problem is that the precision of this graph that we are generating and the performance of the analysis of the code they depend really strong on the options that we use to generate this kind of control flow that means that the context sensitivity of my graph that I'm generating is I guess one of the most important points to think about when you consider if you are doing like malware detection or malware analysis the configuration of this parameter that will determine the representation of this kind of system calls for example or the branching points and if you have the possibility of having a high context sensitivity you can also enable cross references between different since these calls that are invoked in multiple locations of the code or even thinking about the global variables the context sensitivity may also depend on the functions or the calls themselves and furthermore that means that increasing the context sensitivity means it's in a greater number of since calls clones that you have in your in your program so you have a lot of nodes in your graph and you can imagine how it looks like in a memory so it's important to keep in mind when you are working with symbolic execution that you don't have concrete values you have like symbols as arguments you are going to explore any feasible path in your code and you have a program state you also need a lot of memory because the symbolic values is stored in a memory and you have and you have like also a problem with the time that it takes time to resolve about all these kind of domains that you have so about the applications for the IT security like here you can define a model that's pretty simple to understand you can just say my software is secure because nothing bad ever happens that means that if I'm checking that my program is safe I need to negate it because it's a pretty good logic so I will negate and say okay if something bad happens that means that my software or my hardware is not secure and that would be this formula that I am putting inside my SMT server and I will ask is it feasible is it possible that something bad happens that makes my software or my hardware not secure so that's the most no idea of using symbolic execution is for closing or code verification or for binary analysis but mostly for assuring or just kind of proving that your software or your hardware is really secure then you have like as a kind of natural thing that it's coming with the exploitation so you can also you can of course just take the inputs that you have that are causing your program to not be secure as a proof of concept but you also can think about create some kind of automate exploit generation and also automate payload generation so that you don't need you can use just the output of your SMT server the instances that you have and create a payload for it and that's the part that I most like it's to use this kind of symbolic education to analyze malware the problem with malware analysis is that malware is mostly obfuscated they use all the compiler optimizations that they having and you also have like a trend with ransomware so how binary obfuscation works it's a funny thing that that's one of the things that's made me fall in love with the malware because it's kind of dual if you think about because malware is if you think about the technical part and the educational part of it it's really interesting and some techniques that you will see in the malware are not really different than the things that you use for example in software protection systems so you have this game that you really want to play but you don't want to pay it's obfuscated and then you have this malware that don't want to be found it's obfuscated and they use the same tools and the same techniques so in both cases there is a program that's doing something somebody puts kind of a cover or stuff around it that makes the program really difficult to analyze and to understand how it works and the problem is what is inside but that is something that you can just know when you are inside so in one case the objective would be to make the analysis of the program so hard that that you cannot write a cracker for it and in another way it would be just to make my hard really hard at work so it's really dual but really nice and so which kind of malware obfuscation I'm looking at right now when you are analyzing malware you really think in the beginning at least for me that I'm new in the industry I was like yeah what can be so hard it's just assembly right and then I got there I was sitting in front of my first binary at all and I couldn't even understand what what is this about I never saw it before no one is using this assembly code I use it to write coding assembly and I never use these kind of instructions and yeah compiler and the same thing works with packing if you are a normal person you're just like sleeping everything or maybe doing tar with easy but they can write their own packers but it's really nice sometimes and you have a lot of different values obfuscation like shoring and stuff like that so shoring is also something that I see a lot because in the word of mouth where they don't want to you to see the strings and that's a really fast technique to just yeah hide you well else or the registry keys or whatever they don't want to see another easy way to obfuscate malware is just put a lot of garbage code around it so you write a lot of stuff like wires wire through and it's always false or you put some if through and then it's always false and stuff like that so using a same cheese lovers is also possible to simplify the code just writing the constraint and saying if it this branch is never taken we don't need to analyze it you can just delete it from your control group and then you have less code to analyze with the packers sometimes malware they do developers they go a bit further when they want to make your life hard and they write their own packer and most of them are really easy you just put the big point and really the memory dump and that's done but sometimes they are not so easy sometimes you have the packer inside the packer and then you need to unpack the packer and you know what I mean and with the SMT server it's also a possible to just resolve all this kind of stuff and in the end you get really the last memory dump you don't need to think about how many times you need to unpack something so the limitations there is a really good one if you think about the theory of it I guess everybody knows what a theory machine is and that just means it's it's impossible to write a software that will prove that your software is safe because a software cannot read a software until the end and that is the practical thing the practical thing is you need someone behind the code when you start because the SMT servers they cannot take the first constraint they cannot generate it you need a person that is saying like okay this is a constraint you I know that it's not going to happen so what I learned when I was doing it was that symbolic execution is a powerful tool it helps me a lot when I am analyzing my web because I don't need to find out which key he's using for the show or how many times I need to go through this unpacking algorithm or whatever and of course the SMT servers can be used to simplify the control flow graph that means I can just delete all the control flow graph before the private code before I even start analyzing anything so I'm not starting any binary anymore without running it before what I did until now is like I wrote to this binary garbage code eliminated that means I'm just cleaning everything before I start analyzing any kind of binary I have the short search so it's a short search everybody knows there are plenty of them out there but mine is pretty fast and I also wrote some algorithms that can resolve an easy cryptographic algorithm like safer and stuff like that and I'm trying to write more to help people with problems with hands somewhere so what I'm working on right now is to I want to finish my generic compactor I'm writing a simple constructor because in malware we don't have this funny nice table on the end of the binary I'm also working on the hadari to integration so that can be used together as a plugin and a lot of other stuff that I am not it's you're not talking about but I plan to do a lot of stuff yeah that's it