 Yes, okay Hello, can you all hear me, okay? All right So this talk is going to be about information hiding in binaries many intro con some people choose to remember me as rock on It's gonna be pretty exciting talk lots of cool things. I've been working on on some steganography Basically, there's a big thing. I want to warn you all about there's a big difference between steganography and stenography So many people come up to me and go. Hey, you know this stenography thing you're working on. It's really cool Can you tell me all about it? Well just for illustration. That's stenography All right, so we're just going to talk about information hiding in general to begin with and then little by little I'll Delve into theoretical Details and end of practical implementation of things and what can all of this be used for? There are five general types of information hiding a lot of you know steganography and I've seen it before and there are lots of Applications that do that, but steganography is just one of them. There's covert channels communications anonymous communications and a big one is copyright marking which is mostly seen as watermarking now all of these five different classifications of information hiding are really We're only really looking at three different criteria is one of them is data rates the other one is stealth and the third one is resilience to tax and These five different kinds of information hiding have differing requirements for each one of those secondography is When you're trying to hide a message, but you don't necessarily care about the data rates But what you do care about is the fact that it's stealthy that nobody can know a third party even proper cannot know that you're That there's any communication going on at all But at the same time doesn't necessarily have to be resilient if it is great otherwise It's not a big deal covert channels is it's kind of a breed apart where you're basically using Channels that were not designed nor intended to provide any information at all such as maybe the timing of TCP packets or TCP IP sequence numbers or what error message is displayed and things like that Anonymity is when you're trying to hide not necessarily the message itself But the end points of the communication so I could literally post something on a usenet group for everybody to see But if nobody knows that I posted it and who the destination is Then that's also a form of information hiding now copyright marketing. That's a big one I'm going to be spending a lot of time talking about watermarks and how they're defeated just because most of their Information hiding research is done for watermarking purposes You have robust marks and fragile marks. The only difference is that It's a different application Robust marks are meant to be as resilient as possible to attacks Whereas fragile mark as soon as you touch the medium whatsoever the mark that hidden information in it breaks so this big this is useful in the case where you're staying a court and you have a Digital image and you're trying to use that as an evidence for something If there is a fragile mark hidden in it, then if there's any alteration whatsoever in the image The mark would would have been broken so you can prove that this image has not been tampered with Now robust mark you all know watermarking where you try to hide something so that later on somebody can say this Medium belongs to me because look at the watermark that's hidden in it fingerprinting is similar but it's Instead of embedding something like copyright recording industry you'd embed copyright Joe Schmo Sorry, this file was sent to Joe Schmo and I when I send it to Joe Schmo If I ever find it on Kaza or Napster or whatever then I would know okay Joe Schmo leaked this file and Good luck to Joe Schmo So in historically basically There's been three general ways of doing information hiding security for obscurity. It's a big one Mainly way back in in Caesar's days they used to use these kinds of techniques where because nobody would know how you would encode your information then Nobody but the recipient would find out what your message was, but it's also used Nowadays in the sense that sometimes it's almost impossible to to have a hunt It's impossible to have a hundred percent foolproof system where Nobody can detect or break Your system and get you get your message out So today instead what people do is that they They try to make the system as difficult as possible to break even though they know that eventually it will be broken and They use obscurity techniques. So they'll just try to make like a reverse engineers life very difficult and Example is if you release a video game and you don't want people to reverse engineer the Zero number generation method You just make it so difficult to do that. It will take a reverse engineer Six months to break and within six months, you know, you would have sold all the copies of the games that you would normally have sold anyway, so It wouldn't matter anymore if if the code was broken six months later Camouflage another class of techniques where you're hiding in plain sight, but You do things to pool the human perceptual system So you would either put very very tiny dots in places where the human eye wouldn't see it or Little holes right next to a character and because the character is black and the paper is white Your eye doesn't see it or you could even use invisible ink and stuff like that Another thing is You could have the data hidden a very particular location of The medium and you would hope that your attacker wouldn't know where to look to find that data This is easy to break though because all an attacker needs to do is modify your medium as much as possible while keeping the original meaning of the medium and And if and they could destroy your data that way So what people have been doing more recently is spread the hidden information all over the medium And what that means is that they would one way is to have the message repeated several places all over your medium and And if one place is broken, then it's okay because you have several other copies and you can reconstruct the original that way Now I touched upon this before there these three general Criterias that are used to evaluate any information hiding system data rate how much data you can store stealth how Obvious is it that you've hidden anything and the resilience what he tries to modify your your your medium and extract and That basically break that the message that you have and how easy is it for him to do so There are three classes of attacks. One of them is subtractive, which means that you Take a medium which I represent here as as that rectangle and I use double use watermark because usually a lot of these attacks are against watermarks and Basically the idea is to try to take the W out of the rectangle. So have our friend here Joe hacker who Comes along with his friendly chainsaw attacks our medium and Extracts the extracts the watermark out now in general This is a pretty hard attack to do because it would mean that your information hiding system is is So broken that the attacker could know exactly where The watermark is located and exactly what to do to take it out without destroying the medium In such a way that it's not useful anymore. So if you have a huge copyright Movie industry smack in the middle of the screen when you're watching a movie an attacker could you know? crop that whole section of the screen out and he would have effectively Get gotten rid of the watermark But the problem is that the meat the resulting medium is just useless now and so it's not a successful attack It is sort of attack is another form of Is another kind of attack where the attacker doesn't necessarily know where in the medium The watermark is resided But he'll do a lot of different things that basically messes around with the medium just enough so that it's still useful when he's done messing around with it, but The watermark is modified in such a way that it's not recognizable anymore by Whoever put the watermark in So an example is well Illustration this guy jumps up and down on the thing and then you get it double you prime Which is our our new watermark An example would be if you have a JPEG and With a watermark in it and you scaled it a little bit sheared it a little bit maybe Twist some points in in the image just enough that the human eye cannot notice the difference between The original JPEG and a new one, but a machine looking for a set Sequence of colors or or however they implemented their system wouldn't recognize it anymore Now an additive attack is where you don't do any of these things instead you try to make as if you are Putting your own watermark into the Into the medium and the end result is that you have a medium with two watermarks in it If you're lucky you can actually overwrite the burst watermark if if not Then you have these two watermarks that are side-by-side and if you end up in court you can be like hey um How can you prove that my watermark wasn't there before yours and this medium actually belongs to me so Information tidings been done and so many very many different mediums sound image, you know all of this Text you've probably seen a lot of text Steganography there's this spam I think it's called spam mimic Some of you've probably heard of it you Enter your message and it generates a spam like Niger and 9 you know 419 and In it there's your message hidden and if somebody just uses the same technique they can retrieve it But to everybody else this just looks like a 419 scam now this was information hiding in general and now I will talk more about Binary information hiding and what why is it different and What makes it interesting problem to solve? so It's a low redundancy medium What that means is that there's very few different ways to say to express the same message now I haven't spoken about redundancy yet, but the idea is that In order to hide any information In the medium without it being detected You have to have the resulting medium look very similar to the original so that the human a human cannot look at and say Hey, what's wrong with this new medium? Why does it look so bizarre? And the way you do that is is that if there are several different ways to express the same thing? At least to the human perception then Then you can fool people so if your medium is redundant then you can hide information in it An example is the English language There's so many different ways to say the same thing that if you choose to say say things one way rather than the other You could be encoding information Any of you know the Monty Python dead parrot sketch? Yeah, anyways guy walks into the pet store He's like hey my parents dead You know they start talking about it and they get into this big argument and the guy goes You know really fed up trying to explain to the salesman that the parrot really is dead And he says the parrot is dead in about I don't know 15 different ways You know this parrot is dead this parrot is no more this part of cease to be This parrot has run off its mortal coil and joined the choir invisible and my favorite this is an ex parrot you know So basically if you chose one of these dead parrot Sentences to convey your message this parrot is dead You could convene it ahead of time that okay if I say the parrot is dead What I mean is me me and noon you know over there I say it's there is no more meet me at one over there and so on So that's what I mean by redundancy the more redundancy you have in your medium the more different ways you can extract your message and the more The easier it is for you to hide information to it now binaries Have notoriously low redundancy in them because they were designed from the onset to be efficient Your instructions said in your CPU instructions that has to be as small as possible so that your CPU is as Not complicated as possible and you know cost money and time and all of that so How do you do it when you have a medium that's specifically been designed to be non redundant as possible? I'll go over this and there are two classes of Techniques you can use to hide information binaries one of them is called static mark the other dynamic I'll go with them right now a static data mark is When basically you have Pieces of data in your code that you can use to later identify your your mark meaning you have an array with Data such as copyright I don't know recording industry and so on and then later on you run strings on your binary and you find that That string in it. You're like aha. You see my my data mark was hidden in it There are many ways to do it one way is like this another way is You can actually have codes that has I don't know a bunch of different ways You don't have to use strings either you can use anything you want But the idea meaning that it's a piece of data It's just sitting there in your binary waiting to be found in a very specific location A code mark is similar except that you're not playing with data anymore, but but code so So instead what you'd have is is you'd have pieces of code ordered in a certain way in your in your executable and The original would be the thing on the left and then you just change it a little bit And then if you see the thing on the right anywhere, then you know, okay, you know, this is this really is my piece of code And this is another more sophisticated example of static code marks which is you have a bunch of go-to statements that do nothing but go to themselves and Depending on the ordering like does a go to be or C or D and as this be go to whichever one You can have a pretty long list of go-to statements that uniquely Identify your code that has to be my code I mean nowhere else in the world would I have such a sequence of go-to statements As it says easy to break and easy to implement The thing is it's static. It's just sitting there waiting to be found So if it's sitting there waiting to be found and all it takes for an attacker is to know where it is and then mess with it so this is this used to be used a long time ago and everything but now it's not it's not a Valid anymore instead. What's more difficult is what we call dynamic marks So a program is running you give it a set of inputs So you could be clicking on it You could be typing something and say if it's a if it's a browser you can type something in it's in the URL or whatever and Depending how your program reacts to this input at some point you can stop it and then that's the state it In which it's in would be the mark itself I'll give you examples right now Data structure mark execution is right. You all know Easter eggs You type, you know about Mozilla and you get this stuff so the idea being that if you write a program that looks for this input about Mozilla and then Displays this text when you do type that input then you know and and say You're looking at some other web browser somewhere else like five years down the line And you type about Mozilla and you get the exact same thing. You're like wait. It's a little bit suspicious Why is it reacting the same way? This this is the this is the Easter egg information hiding Yeah Dynamic data structures same general idea where you take a bunch of input except that you do a whole bunch of operations on that input and Those operations can can be can be part of your normal execution flow So if you're if you're looking at a At a web browser It could be one one input could be the URL and then you XOR it with some random variable and memory like here You have four variables You just do some computations with it You intermingle it with a lot of other normal code and just keep going on that way until your last piece of input Has been fed in and at that point Some of those variables will have a certain Content in them and if that content is what you expect it to so here the ASCII to great Mahir, then you know That okay, this is my program and this is my watermark This is another form of dynamic watermarking where It's a data structure This data structure is a linked list and you you basically have two pointers in them one pointer that points to one of the other elements in the list and the second point of just simply points to the next element in the list and depending on on how you order your how you Order your pointers you can encode certain data so in this case we have four sorry we have five five linked list elements and With that, I mean, it's it's just a formula that you use The first element all the way on the left is five to the power of four just because there are five elements and you and then four for the zero based counting and then The first pointer on the left if it pointed to null then you it would be zero times five to the four if it points to Itself then it's one times five to it points to the next item then it's two times five to the four and so on And you can encode a numbered out way now The advantage of doing this is that if you choose two large prime numbers P and Q Time multiply them together. You get this other huge number now if you encode this huge number n into your data structure Later on if you're trying to prove that indeed this is your code well You can do say look I have extracted n from from this data structure And I know how to factor in a factorizing a lot to The product of two large prime numbers is extremely difficult So if you can do that it proves that yes, you were the original you were the originator of the code So in this case seven times 191 equal 1337 now dynamic execution trace is a You get a set of inputs and you just look at the the the way your program executes you don't necessarily look at what data gets produced or Or Yeah, or any output or anything instead what you what you look at is for example The addresses that the program goes through while it's executing or maybe even this falls in the sequence of therefore there of You can use that to uniquely identify your your program now There are a whole bunch of attacks on all of these things. They're just more difficult to implement one of them is What we call semantics for preserving transformation, which just means that you have a piece of code on the left which is just assigning three variables and you You change it to something that's functionally equivalent, but it's just unrecognizable and If you were to try to recognize the code on the right automatically you'd be like no It doesn't look anything like the first piece of code, but in fact functionally. It's the same thing To the linked list that I told you about before you can add a whole bunch of different pointers confusing the recognizing program You know, you just in general add levels of indirection If the program is supposed to work in a certain way, you know, maybe add more Function calls more system call just change just the general way it works while keeping the functional equivalency Now This is all great and everything, but the problem is that I just said things like yeah, so you add a pointer here or you add function calls And you you know you do stuff like that but the thing is that how do you do that if you do not have access to the source code and It's the main difference between By code and and things like assembly code where bytecode can be disassembled one-to-one meaning you can Pretty much get the original Quote-unquote source code when you disassemble Java bytecode, but you cannot do that with with assembly code and the reason is There is no difference between data and code So you could never know if something was a number or if something was an instruction that you're They're supposed to execute. So there are disassemblers, of course, but they're they're not perfect You could you can simply not disassemble something and then reassemble it and expect it to work. No way It won't work. The assembly is never perfect. In fact, it's Perfect this assembly is what we call an intractable problem meaning it's It's just impossible so as a result we cannot use the advanced techniques of that that I talked about for attacking dynamic watermarks and And it's it's really difficult to even have Dynamic watermarks in code to begin with because unless you have the original source code There's nothing you can do it. You cannot just add, you know that graph that link list I told you about you can't just add it in the middle of the in the code and you know get it to cooperate nicely with with everything else that's around it So as a result very little work has been done Most of the work that's been done has been on source code and not machine code So what's something that hides information information in loose sense Into binaries viruses viruses come in and they hide their own code in there And they try to make it very difficult for an anti virus to detect So this is something interesting to look into if we we want to hide information into it. So The way usually works. You have your program, you know, there's an entry point which is where the program starts and The program just runs from there What a virus would do is it puts puts it payload at the end of the the program and then hijacks the entry point And when it's done executing its own code, it goes back to the original entry point and just keeps on executing There's a nothing to happen This is the basic like first virus, whatever Now that's very easy to detect. So because what anti viruses do is they just look for this kind of behavior and And detect it So instead what would be oh, sorry the payload basically would would be fixed originally So what a antivirus would do is just basically grep for that code sequence And it would know okay if I find this piece of code then it's obviously a virus and you know flag it and that's it So what people started doing was use encryption where they would have this a few statements that would Decrypt the code that they would actually then run now and every time a virus Infects and you host they would change the the key that would be used to encrypt and decrypt So you would have every single at every single New host you would have a different code block at the end of your of your program So the virus body would be changing but this fixed Routine would remain the same, you know it needs to decrypt So what anti virus people started doing was simply triggering on that decryption routine If they ever see that decryption routine in your code, then they're like, okay, there's a virus in there so what people started doing was Was we call polymorphic virus which is Simply you keep that idea of encrypting your virus body, but Change the way change the decoding of your of your of your decryption routine and and Yeah, so there are different ways you can write the same piece of code So every time you infect a new host you just write a new version of your decryption routine now that made that made antivirus Makers life a little difficult for a little while, but then they realized that all they needed to do Was to wait for the virus to decrypt its its code and because at some point it has to decrypt it and then run it So if ever it finds the antivirus finds this decrypted code in memory then They've flagged the virus. So so that's what's been happening Now as a response to that what people have been doing is called metamorphic Metamorphic viruses and in this case there there simply is no decrypter Some people use Encryption with it, but at the basic the basic idea that doesn't necessarily even have to have one and instead It simply uses different ways of writing codes And every time every time it spreads itself to new hosts It would just rewrite its own code in a different way And so when it's loaded in memory it looks different every single time even though it's the same code That's much more difficult to detect So a trick that metamorphic viruses use I'm gonna talk about metamorphic viruses because It's on the basis of that that information hiding that I've chosen to implement information hiding Register swapping is simply How many of you know a little bit about assembly? Okay, all right. I thought all right good. So basically you have you have Assembly statements and you use different registers and but there's no rhyme or reason necessarily as to which registers you use in your code you could Use eax or ebx or ecx anything you want as long as you're not clobbering any registers you've used previously You're good to go. So in this code example on the left you you've been using e si EX and ecx and ebx No, yeah, and basically you would just you just swap the registers around and the result is that you're resulting code Still behave the same way. You're still Doing the same that same thing except that you It looks different now the portions in red Basically almost half of the code looks different. So if if an anti-riders were to just simply grep for A code sequence it would never find this even though this is functionally equivalent Now there are very very many tricks that was just a simple one Instructions substitution mean that you take several instructions that have this The same meaning and you swap them and depending which one you use it just looks different, but it acts the same way You can have data. That's That you use in your code, but then it's not really doing anything So instead what you do is you just change the data to a different value And your code looks different because it just has different data in it now But you're not actually using it or if you are you compensate for the change elsewhere things like that You can add knobs in the middle of your code garbage random things that do nothing And so on basically you're just trying to change your code as much as Changing the looks of your code as much as possible, but in fact Functionality remains the same so Haydn uses instruction substitution that I'll detail now to to encode data and I'll go for a demo. Hopefully it'll work All right, so Hang on. All right. So the way Haydn works is that basically you just You supply it a binary so I can do like bin LS and then you supply it a message file message And then you just give it an output file so it could be LS dot stagged Password foo. All right, so embedded, you know it It embedded 192 bytes It could have embedded up to 362 now, let's just Execute LS stagged Looks the same, you know can do whatever you want For all practical purposes, it's it's still LS now Let me see what message I hid in there. So I do Haydn decode LS stagged password foo Bang Xavier is my handle, by the way You can see the original bin LS and LS dot stagged exactly the same size But the new one now just has some extra information in it. All right, that's demo so what I use is instruction substitution and So yeah, in trucking substitution and what I mean by that is that you can have a bunch of instructions that Behave the same way, but they're different. So in this case addition and subtraction You can add a number to register or you can subtract the negative of that number to the same register functionally, it's identical. So in this case you have add EX40 and What I do is I choose one way to embed a certain piece of information. So in this case, I just simply chose addition is bit zero subtractions is bit one and if I want to embed a piece of data Say zero one zero, whatever. I just go through the disassembled Binary every time I see an addition I say, okay, that's bit zero every time I see a subtraction I see I say, okay, that's bit one and if I'm embedding data, then I just swap them around as appropriate Now this is just with two instructions There are more There are very many more instructions that we can use to do such a thing One of them is this test instruction which simply compares to registers and If you're only looking at the same register if you test EX with itself, which is often done to test for if it's zero or not you can Equally do you can or it with itself and there are two different ways of doing or I'll explain this a little bit later Or you can add it with itself. There are also two different ways to add it But just to keep it simple, I've kept this to four With four instructions you can embed with four instructions you can embed two bits of data and In this case, I've chosen, you know, test is zero zero or zero one and so on Anyways, so this is just an example. I have a listing on the left and And on the right and the data I want to encode is zero one zero zero So I hit an ad I want to embed a zero already have an ad which is zero So I just leave it as is or same thing. I can embed two bits and I just flip it around and so on. It's pretty simple There are a bunch of stuff that I do in hiding to make it a bit more secure because the idea is that you would want to Have your information. I mean it's steganography. We're doing information hiding but I would ideally like to do steganography, which means that it has to be stealthy now I Don't want somebody just simply look at the text at the resulting binary and be like, oh, this looks really suspicious And what's going on ASCII text if you embed straight ASCII text. It's very suspicious because Every most significant bit in ASCII is zero or one. I forgot which one so what would happen is that you would end up Having this repeating sequence every eight bits you have one One one and that just looks really weird. So Instead what we do is we Encrypted encrypt everything so the resulting the stuff that you embedded actually Looks like garbage Now there's other thing I do for To prevent people from seeing what's going on is what I called random walk now If you simply went through the listing as I said before and embedded sequentially, you know at every other instruction that you saw It and your message doesn't fill the whole size of the binary then you'd end up with a whole clump of data at the top of the binary and Nothing afterwards. So That would look suspicious from the point of view of an attacker. So instead you have a random walk Which is you seed a random number generator with say your password and then you just call the random number Generator every time and it just gives you back a number another number and you use that number to jump around the binary and embed Embed your stuff. So this is the binary with all the like code locations now the the blue the blue Blobs are where you can actually embed things So you pick one and then you just jump a random number and I just keep jumping around a number and just keep going until you feel There you fill everything up When the person that when somebody receives your message what they would do is they would feed the same input to the random number generator, which is the password and By doing that they could retrieve the same exact sequence of random numbers that that that is an output for you Now sometimes some of the instructions that I use are not Equivalent in edge cases. So addition and subtraction do the sum are the same, you know Most of the time but sometimes they're not an example is if you if you add negative one to say eax the result is negative one but The flags are set The overflow flag is zero and the carry flag is zero But if you do if you subtract one instead to eax then your carry flag is set differently So what I do is I scan ahead like if I see an addition or a subtraction and I want to change it I scan ahead to see what other instructions there are in front of it And if one of the instructions ahead of it is an instruction that changes the carry flag anyway It clobbers it and in between the addition and that clobbering instruction There there are no Instructions to check for this carry flag then we're good. We can use we can do whatever we want to the carry flag It never gets checked Other instructions have similar behaviors and you know Yeah, what do we do if we have seven instructions I showed you the example where we have two instructions for instructions but If we have eight instructions we can embed three bits of information, but if we have only seven what are we going to do? We could just waste three of them and then only use four of them for embedding, but There will be a waste so instead what I do is I use one of the instructions as a wildcard joker and If there is no instructions that can be embedded with with Sorry, I try to embed say with it with seven instructions I try to embed three bits as much as I can But if I cannot then I use one of the instructions as a joker value meaning this instruction encodes no data skip move forward the result of that We can embed log two n minus one bits of information for when n is not a power of two So the example with seven instructions is that you can embed log of log two of six bits of information with the two point fifty eight bits Whereas normally we would only have to we would only have done with two bits. So it's an improvement now Instruction are created equal meaning that it is possible to detect addition and subtraction all these instruction substitutions that I do simply because the compilers produce code in a certain way and At the end of the day you can see that difference when you're using Haydn an example is negative subtractions compilers rarely do sub eax negative five for some reason they just never do that instead they would do add the eax five and What Haydn does is that it creates a very many of those negative subtractions So if you were to just look for an inordinate number of negative subtractions You would know all right something is very weird in this in this code It's also low bandwidth Meaning out guess which embeds to JPEG and that's one bit of information for every 17 bits of of image Whereas Haydn