 Alright, now they're gonna talk about the uh a little bit of new uh new shell coding and a little humor along the way. Go ahead fellas. Hi, hello DevCon! How are you today? Brilliant. Very glad to be here today. We're gonna talk about the ABC of next generation shell coding. A lot of interesting things. So let me just say a few words of this claim up before I begin. Don't look at what's written there. We are going to have a deep dive into the dark arts of shell coding. We'll use brute force mathematics, wizardry, bit of dancing. We'll make your head spin at some point. And the idea will be to build up obscure incantations to make computers do things they shouldn't do. We'll conjure monsters, give you nightmares and hopefully you'll stay until the end so that you know why we do all these things. So just a few words about who we are, uh the three of us. You've got Adrien, George Axel and myself. Uh we, we work at university, we work as researchers in security. And the point of the talk today more precisely is creative methods and I insist on creative. To write shell codes, or exploitation code and our constraints on new architectures and we will illustrate that on an architecture that is not even easy to find actually right now because it's uh not yet deployed very much. So just a reminder. Who, who amongst you has ever written a shell code? Just raise your hands for me. Well that's half. Okay. So you know how it works. So you think at least. Um for the other half this is what shell code is. It is essentially code that you wrote or found in a target's memory and that you want to jump to. And usually it's, it pops a shell. So that's why you call it a shell code. And then once, once you have a shell you do whatever you like. To, to actually jump to the shell code, uh you have to trigger vulnerability, uh buffer overflow, use after free, type confusion, whatever. Um but the typical scenario is that your target runs a program, the program accepts user input. And so you write user input, you carefully craft user input. So as to inject your code in memory and jump to it. So you have a nice picture on the screen there. The target's memory uh gets the payload inserted and then you jump to it using the vulnerability. So that's very nice. The issue with that is that it's not that easy. For those of you who actually wrote shell codes, you know it's not trivial. Um there are constraints because it has to pass as user input. So you can't have uh terminating zeros in, in your strings. It might be stack protections. You may have limited memory. You may not know where the shell code is in memory. That's, that's annoying really. It turns out you can work around these constraints. You can, you can always succeed nevertheless uh using clever techniques. We are not going to talk about the techniques to bypass existing mitigations because well they're well known. And that's not the point of the talk today. What I'm going to talk about is the fact that the shell code such as this thing does not look like user input. I mean perhaps you guys input such things. But the normal human beings which are not here today would not. And there are several things that give it away. Uh things such as um the presence of knob characters, knob instructions in the code. The non-printable characters in it. The presence of suspicious substrings such as b n s h. And the fact that you have bits that I mean look like well known malware for instance. Side to side. So this is suspicious. And this is detectable. Which means antivirus or blue teams or I mean your annoying neighbors will find out that what you're writing is actually a shell code and perhaps make it uh a problem for you. So we try to stay under the radar. And one idea to do that, one illustration of that to pass as human inputs especially for strings is that you want your shell code to be written using for instance just ASCII characters. ASCII printable characters. Or perhaps just alphanumeric characters for username for instance. Perhaps English words. You just want to write poetry if you're inspired. It turns out to be a shell code. Why not even Shakespeare's codes? So why would you do that? Well for one, if it really looks like it's just English text it does not trigger alarm that much. And you have plausible deniability. You could just say this is poetry. This is the lyrics of my next song. Right? It is, it is not an NSA implant. Well of course it is. It is also less likely to be escaped or broken because it is already text. It doesn't have any special character so your exploit might work better. And if everything else fails you can always try the pickup line at a bar. Okay? So do try that. The only question remaining is, is that feasible? Can I write my code, my programs, my eternal blue using only English words? Yes. Take the x86 instruction set for instance and just look at what the letters look like when you're disassemble. You've got for capital letters A to O you've got increment and decrement operations. For the others you've got push and pop. So you stack operations. You've got jumps. You've got XOR. So you can actually do a lot of things. Right? It turns out Rix and others have shown the x86 ISA is extremely nice and smooth when you try to write alphanumeric code. You've got everything you need and you can even make it work on 64 bits architecture almost trivially. So here is a full shell code that works on x8664 written entirely using letters and numbers. Very fine. You can print out on a t-shirt, right? Just, that's what you should do. Now, you can actually do further than that. Well, you can even go further and for example here we will see how to do for some English shell coding. So it has been published almost exactly ten years ago by Mason and others. The idea is to do exactly the same as previously but now this time you will generate an English compatible subset of x86. So this is exactly alphanumeric as before but you have even more than alphanumeric characters. So you can have spaces, you can have punctuation, you can have columns, semicolons, you can have dashes, you can have some special characters. And for example it gives you even more instructions than that. For example if you look at point you can have more operands available as before. And if you look for example at space you have one more opcode which gives you the end instruction. So we have more than that. So the fundamental idea behind that is that you do a normal shell code. So you write a small decoder with those instructions. Then you cut it into small snippets of code that would fit into English words. Then you have some gaps. And those gaps you can jump from each snippet to another using the jump instructions you can see there. And just the idea is to fill those gaps with something that makes your shell code look like English text. Of course this is done using Markov chains. So Markov chains fundamentally are just the auto-completion feature on your iPhones. So you write a word and then it gives you some other words. So it can give you some pretty nice text if you write some SMS's with that. And you just have a text that looks like English code that looks like, you just have a shell code that looks like English text. You send it to the, to the, to the vulnerability application and you enjoy and you have your word chain. So let's have a little demo for that. So this is what I did on my computer. So here we would go for a standard set user ID exploitation. So the ID is you have, the ID behind it is that you have a program that would be executed as a root program. But it can be executed by the standard user. For example if you want to change your password then it has to be a root action that has to be done by the standard user. So you have a program executing on root. So here we give it the set user ID permission. And when we switch to the user, so we can have our program here that says you have the small s in the permissions that says it gets executed as root. And the ID is to send a shell code to it so that it does something else that what it was supposed to do. So here we have our English text here. We just write it, we send it and we have the root shell that pops out so we can check that indeed we are root here. More generally when we speak about x86 it's almost fully sold. So the ID is that you have for example MSF Phenom so you just say I want some shell code on x86 with these restrictions on the instruction set. So I want only alphanumeric, I want alpha plus some characters, I want something that looks like a neural, I want something that looks like a path and this kind of thing. And it automatically generates you whatever is required so that everything goes well. In principle we could even write some fully functional shell codes from only Shakespeare works. But what we will be speaking about in the next part of the talk would not be on x86 because now we have more and more devices that are running on something else than x86. And I'll give a, I'll give a victory. Yeah so challenge for you by the way the Shakespeare cell code. We did not do it, do it for us. Now we're going to take risks and as was just mentioned what powers most devices today is slowly drifting away from x86 including phones, including voting machines, including several interesting things that would like to run shell codes on. And to do that we need to look at the way for instance risk instructions such as ARM work. It turns out that you cannot use the techniques which are described on the ARM. The reasons are you do not have any more single character instructions, we do not have as many addressing modes, in particular we lack, we lack the memory to memory addressing modes. And we have constraints on operands that make it very tricky and actually so far impossible to write shell codes for risk five. It does not work on the ARM actually that is as well. So I'm going to talk about three approaches very quickly about the two compilation and emulation technique and a bit more about unpacking technique. So three ways around these limitations that allow us to work nevertheless. The compilation approach, the first approach, consists in compiling assembly code directly to the constrain instruction set, so directly to alphanumeric for instance operations. The good things about it is that it may be possible to compile to one instruction set easily, that's the Moffus Cater for instance written by Christopher Thomas, does that. The main disadvantage of such an approach is that it does not work when the constraints are on the operands and not on the outputs. And also, who wants to write a compiler? I mean if Chris Thomas is in the room, do it, please, by all means that we want. That's just a lifetime's work. Second approach, the emulation way. To do that you write an interpreter for some language, you write your payload in that interpretive language and you just run that. The thing is you have to write the interpreter once. And once that's done, well you can reuse it for different payloads. It's quite well, it's feasible, it's been done by Yunnan and Philippets for ARM7. They did a brainfuck interpreter. And well, that works. The issue with that is what? Well, it's interpreted. Which means it's toothless, you cannot really call this course, you cannot really do fancy stuff that you'd like to do with the shell code, right? So this leaves us with the third approach which we introduced some years ago. And the idea is a several-step process. So let me just take some time to explain that one. The first step is that your payload will be encoded in a constraint-compliant way. So, for instance, if you want an alphanumeric shell code you would first encode it in some alphanumeric way. You hide it as you can see on the top right picture. Then you look at the ISA targeting and you identify high-level constraint-compliant constructs. So a set of instructions that fits with your constraints and allows you to do some basic operations, building blocks, zeroing a register, increasing a register. And using these building blocks you build an unpacker, something that decodes in memory in self-modifying code, the payload. We've got a minimal unpacker because we don't want to spend too much time on that and we just don't care about the unpacker that much. We want to run the payload. So the unpacker decodes and executes the payload. This is straight there. Sorry. So on the eight, this is the demo of this approach. Very quickly speaking. Um, we run, sorry. So you've got the application there. Uh, this application takes a string as input. For instance, a username. We paste that shell code which is written with this unpacker strategy. Um, so here is the shell code. You can see it's mostly letters. We run that, we just paste it. And once that's run, it unpacks in memory, executes it, and here is the shadow of the target. Okay, so now that you saw how you can, we can bypass the limitations of usual ARM processes, uh, as if everyone is turning around from AX86, we're gonna turn attention to risk five for various reasons. Thank you. So, risk five. Maybe you've never heard about it. It is a new architecture. Uh, basically it is a, uh, once again a risk architecture. Very much like MIPS if you heard about it. It aims at being open source and also open hardware. And it is still very working progress. By this I mean that the specification is not completely done yet. There is very few silicon available. But hopefully, uh, in a few years we'll see risk five everywhere. There are many companies in 27 it. So, that we mean to be seen. But hopefully it is the architecture of the future. We do have one issue with risk five when it comes to alphanumeric shell coding. It is that it makes our job much, much harder. So, let's look at what is available for us in alphanumeric risk five. So first we can load a few constants with typically the load, load immediate, a load upper immediate. Then we have small increments. If you combine both of them, it means you can load quite a lot of values in registers. Then we have some branches, both conditional and unconditional, but only for what branches? We do not have any backward branches. So, that's an issue. Then we have a single, uh, arithmetic instruction, which is a hide shift. Why not? And then we have a system register of rights. Uh, the issue with this instruction is that it is only available at higher privilege levels. Typically, it would work if you, uh, are attacking Linux or your operating system, but not just a simple program on it. So, we, we'll just forget about them since we want something quite generic. And finally, we do have Mesalino's floating point operation. So, as you've seen, we have no loops because no backward jump, no star, and no C score. Eh, where we start. And I can even tell you it is not even coming complete. So, let's look at what a typical risk five instruction is. It should look at the seven low bits of the binary representation of the instruction. You have the opcode and seven bits is exactly what an ASCII character is. So, we will just allow ourselves one more single printable character. As a spoiler, I can tell you that there are three useful printable characters that can go, make us go out of a no loop, no star, and no C score issue. We have hash, slash, and tick. Typically, hash will give us, uh, regular stores. With regular stores, we can write our unpacker. So, let's look at how it works, uh, for, uh, writing alphanumeric plus hash, uh, shellcode on risk fives. So, here's the architecture. We have three stages. Uh, on the left is the stage one. First we have some initialization. Then we do use a forward jump, which is a jump and link. Uh, with a jump and link, it means that you can actually get the PC of a shellcode, which is quite useful if you do not know exactly where your shellcode is in memory. Since we have a forward jump, we have some wasted space. So, we use the wasted space to, uh, put the encoded payload. Then we write our unpacker. This is the hard thing to do. But we won't unpack directly the payload. We'll first unpack a stage two. The reason we do this is because, uh, it is difficult to write a generic unpacker, but writing an unpacker for a specific code is much easier. So, we have our stage two. Our stage two is much more straightforward. It is just a simple decoded loop, which because now we have loops, because we just unpacked something. So, uh, stage two will unpack the final encoded payload, and then we won. We have something which works. So, uh, let me just show you a little demo on the only silicon-idealable right now, which is the high five and list board. Okay. So, this is basically what the shellcode looks like. Uh, you can see that basically all the hushles that correspond to our star instructions. This time, uh, we assume we have a valuable, uh, network application. We will just send it, uh, send our shellcode on the socket. As you can see, we've sent it. We now have a root shell. We can check that we are indeed wrote. And if we check the CPU type, it is indeed a RISC-5 CPU in the middle. That's all. Well, let's go a little bit dirtier. Uh, so, we have seen what, uh, hash can do. So, it gives you, uh, standard stores. Uh, so now we will switch to the other character, which is to another character, which is slash. Uh, we can be really useful when you are writing, for example, URLs or path, uh, in, uh, in a Linux, uh, operating systems. Uh, of course, switching to hash, uh, to slash instead of slash, of hash, uh, does not give us standard stores anymore. So, we have to find a new, right, a memory writing primitive, uh, to compensate for that. Of course, slash is not taken, uh, out of nowhere, uh, because this, uh, character gives us atomic operations. Uh, so we have two, uh, mainly useful atomic operations. So, the first one, for example, AQ3 slash, gives us atomic R and the other is atomic AND. Fundamentally, an atomic OR operation reads 64 bits from the memory, stores it in a register, and then stores back to the memory the same, uh, value or with another register. So, the AND is exactly the same with the AND operation instead of R. Of course, uh, so, given that I can read and write 64 bits into the memory, so this is a memory writing primitive. So, the ID is just to, uh, to write my stage two with those instructions. However, in RISC-5, uh, there is a little constraint for atomic operations, which was not there for stores. Uh, and it says that the address held in, uh, the register must be naturally aligned to the size of the operand. And if the address is not naturally aligned, a misaligned exception will be generated. So, that's fine. It's six, it's eight bytes. I have to align it at eight bytes. So, the ID is I have a pointer to which I write two. I write my eight bytes, then I increase this pointer by eight, and I continue writing it. So, we'll have to use some add immediate instruction that will allow us to increase the pointer. So, we look at the available instructions, we look for the add immediate, then we take the shortest one. And of course, the shortest add is of 16. So, we are fucked now. So, we will have to find a way to go out of it. Uh, the solution is to use, uh, 16 byte chunks, because we are obliged to move our pointer by 16 bytes, out of which only the eight are controllable. So, the ID is we will use six only out of them. So, it's even better. Uh, and we will put an instruction at the beginning, uh, that will be our real instruction of the stage two. Then a not like operation. And then we will put a jump instruction that will jump to the next block. Uh, here we decided to put two bytes and two bytes instead of four bytes of instruction because it was easier to build, uh, the shell code and just because we are lazy. So. So, using some black magic, uh, so I will explain all of his, all of his step by step. So, here is, uh, the example of, uh, some code that allows you to, to write exactly one block, to load into the memory one block. And, uh, we will use some GDB, uh, over, uh, beamer, uh, to look how, what it does exactly. So, other black magic here, uh, we load, uh, in the initialization section, uh, magic value in the TP, uh, register, which is 8 0, 31, 0, 0, 0, 4. And, let's go step by step to it. So, first we will zero S4. Then we would do the atomic end to the, uh, chunk, which in practice would zero all the, uh, the first 8 bytes of the chunk, which is exactly what we want. Then we would do the R with the register that has the magic value. So, it loads 8 0, 31 into the memory, which is exactly what we want because this is a jump 12, which will jump to the next block. Then you load a magic value into 8 0. You shift it by 12. You subtract 10 out of it. And then you do again an atomic R, uh, to the memory, which would load, load into the, the chunk. 97 8 0 and 0 0 0 5. 97 8 0 is exactly add A4, A4 SP, which is one of the instructions of the stage 2. And 0 0 0 5 is the NOP operation, which is exactly what we want. So, the idea behind it is that you do exactly the same for every instruction of your stage 2. So, you had a load upper immediate instruction. You shift it by an amount and then you put some add, uh, immediate instructions. So, small or bigger, uh, on 32 bits. Uh, and you just brute force on all those instruction sequences. So, at the end it will allow you to load one value into the, the chunk. So, if you have several instruction, instruction sequences that do the same thing, you keep the shortest one. And if your stage 2 does not fit into the instruction sequences you found, so you just modify it, when you tweak it a little bit and this will give you. So, here is exactly the stage 2. So, sorry I had no place, uh, for putting it vertically. So, just please turn your head 90 degrees. Uh, so, here if we look at it, you have exactly, uh, the instructions. Uh, okay. So, let me put it back, uh, in the right order. So, you have the, the body of the loop. So, everybody knows what's in the body of the loops. So, let me take it out. Here it, it becomes normal, I think. Okay. So, let's get back. So, you have the jump instructions. You have your nop instruction. So, you have, uh, left shift at the end which is, which shifts, uh, a register that we do not care about. So, it's a nop like instruction. And you have the real stuff here which is exactly the stage 2. And we had some, you have the 2 bytes instructions and there is one instruction that is 4 bytes long which is the fence instruction which allows you to clear the cache. Uh, if you have, uh, self-modifying code, this is absolutely essential. Uh, and for this we just hand wrote the, the instruction sequence and it's only one instruction though. So that's fine. So, let's get back to the demo. So, here. So, we still have our shell code here. So, you can see the slash, uh, characters that tell you that it's an atomic operation. And we will send it to the same application that has another filter now. So, instead of filtering out all, uh, the hash characters. So, it will only keep the slash characters. So, we send it. We got our shell. So, we do ID. This is root. And if we check again the CPU, so it's again, risk crime. Okay. So, let us look at this nice quote from XKCD. Either you're handing out whole floating point variables. Sorry. Or you've built a database to track individual atoms. In every case please stop. Well, I'm very pleased to tell you that we are not going to build a database to track individual atoms. Which means we're going to have fun with floating points. The last character, Tic, gives us, gives us floating point stores. And as it's really difficult to work with. So, as a, as a reminder, we only want to change the unpacker so other parts of our architecture do not change. But instead of using regular stores or atomic store, we need to write our first unpacker with floating point stores. So, uh, floating point 101 for people who needed it. Uh, floating point representation in memory, uh, has three fields. Montesa, the exponent and the sign. And the mathematical representation of this, uh, binary representation is very roughly Montesa times 2 to the power of the exponent. Plus the sign bit. It is very rough, but it's much enough for this presentation. So, our idea to write the unpacker is to first load some floating point value from the memory. Since it is from the memory, it means that it must be alphanumeric. Then do some computation. And hopefully at the end, we have a chunk of our, of our stage 2 in, uh, our register, which we can just store it to memory. We repeat this for each chunk and we have our unpacker. Obviously, the issue here is which value do I pick and which computation do I do? Well, uh, let's look at what is available on alphanumeric with five on the term of floating point operations. So, first we have loads and stores. That's a good thing. And then we have, we need to find our operation to work on those loaded values. So, first we have quad to double conversions, but since we do not have double to quad, it's like not super useful. Then we have sign manipulation, such as, for example, taking the absolute value of a floating point register, but it will only change a single bit in the register, so it's not super useful. And finally, we do have fused multiplier add instructions, which has multiple variants. So, uh, fused multiplier add is an operation which has three inputs and one output. And basically it combines, uh, multiply and add in a single instruction. So, uh, it's A times B plus C. And the variants have some minus sign in the middle. So, for example, if you have instruction fm sub, uh, ft6, fs2, ft4, fa0, it means that the floating point register t6 will be set to the result of h2 times t4 minus a0. So, uh, here's how we want to store our chunk, our stage two. Let's say we want to store the 16 bit value, A, B, C, D, in exact decimal. On the height, you can see our, uh, fused multiplier instruction, and we need to set A, B, and C, which must all be alphanobic, such that, uh, those contains A, B, C, D into n. So, first we'll just take a random thing for A. Okay, why not? Then, same thing, we will take a random thing for B. And at this time, at this point, we only have a single input left, so we can mathematically solve for it. And in this time, if we want R to have A, B, C, D in the orbits, it means that C is something quite difficult, BZ, and non-alphanobic character, so it doesn't work. So, what we do is that we try again, we take another B, again, solve mathematically on it, this time we are lucky. C is alphanoric, as you can see, it is BBOQCCZ6, and this time we have a good result in R. So, you might want to ask, how, how long do I have to try to find good Bs? Well, not that much, only 10,000 times. And since we're doing it on a computer, 10,000 tries is like nothing. So, it's like really efficient. And I don't have a proof of it for you here, but I can tell you, just trust me, that it works, if I change A, B, C, D to anything, it works for all 16 bit values. And even better than that, when we wrote our thing, we saw that we could actually control much more than 16 bits. As you can see on the right, on the left, just before IBCD, we have lots of zeros, which means we can actually control all those bits, otherwise they would be random. So, we can actually have all 48 bits, which means that we have three 64 bits value, we get 48 bits of output. Well, we have quite a good impact here. So, we do it for every part of our stage two, and then again, we have an end packer, and all the rest works fine. So, again, a little demo. So, you have already said it, but this time it is with tech. So, on top you can see the encoded payload, on the bottom you can see the end packer with all text corresponding to a floating point store. Once again, we are sending it to our vulnerable application, which this time accept alphanumeric plus tech. We get our roadshow, as you can see, we are out. And once again, you can guess it, this is the same CPU, it is still our S5 CPU. Okay, I hope you did not expect that. So, we went through different new techniques to write code. We focused on alphanumeric, but as you can probably imagine, these are tricks that some of them weren't known before, we tried to bring you to navigate the yoga of writing constrained char code, to avoid filters, to fool IDSs and humans as well, to target specific applications. As we mentioned, the x86 environment is already quite mature. So, this is a sort of problem there, almost. But new architectures, and particularly Rix 5, is something that's gaining momentum, and we need to keep up. It would be unacceptable that it goes public before we have attacks for it, right? So, we show that it's possible to write alphanumeric char codes, even on very constrained instruction sets. And what we described to you, the impactor, was the hard part, really. The decoder was the hard part. And now what remains is just to put your payload, any arbitrary payload. This is the world first, by the way. So, modern tricks and techniques, we have introduced new approaches that can be transported to other architectures, and for those of you who are really curious how to use that, for once, do come to our talk next week, do read the paper that has been published yesterday, or the code is open source, you can actually find everything there, if you're curious. And you have no excuse whatsoever, so no get hashing and slashing and ticking for fun, and for profit. Read the code, send us a friendly email. Thank you very much, your friendly neighborhood hackers.