 Okay, then hello and welcome to my presentation, Exploiting Buffalo Flows on Risk 5. So maybe some words about me. I'm, yeah, Christina Quest. I'm working as an Embedded Systems Developer. I studied Electrical Engineering at TU Berlin and now I'm living in France in Nice. And at my university I played at T Security City Fs and that's how I got interested in security and that's what this talk will be about partly. So that's what I will be covering. After the introduction I will give you an architectural overview over the architecture, Risk 5. I will tell you what the differences to ARM and X86 are and then we will directly dive into hands-on examples how to exploit a buffer overflow, how to write your own trial code for Risk 5 and how to perform Red Tulip C. And I hope all those words will make more sense to you once this presentation is over. So what is Risk 5? Risk 5 is an ISA, an Instructions at Architecture. It was started 2010 at the University of California in Berkeley but since then it has seen many contributions. And since 2018 we have boards out there in hardware because before it was just software written a program which you could flash onto your FPGA to simulate a CPU but nowadays you have hardware. And since March 2019 the version 2.2 is stable of this Instructions at Architecture and you can start using it in your software development project. And to see how important this project actually is you can see it by which companies join the Risk 5 Foundation which are for example Google, NXP and Vidya, Qualcomm, Samsung. And so the advantage of Risk 5 is that it's released under a permissive open source license which means that anyone, you, me and University can implement his own Risk 5 architecture without needing to pay any royalties. So yeah, what is an ISA actually is whatever your CPU hardware understands. So it defines memory and registers it defines logic ports which do the calculations between the registers. Yeah, it's basically like the machine language which describes what your CPU is capable of carrying out. And so what your CPU actually understands is binary data and that's what, but in order to make it easier for us programmers there's the assembly mnemonic language which we see here for example this one means add the immediate 559 to whatever is stored in the register A5 and store it back into A5. But what the CPU actually sees is just binary data which means the same. And so why do we care that there is an open hardware implementation now? Why do we want to have an open hardware processor or CPU? Yeah, for one it's the license fees I manage nowadays. For example, if you want to implement your own ARM architecture you would need to pay royalty fees to ARM. Or so if for example a startup wants to make a quick prototype or University wants to use an architecture to teach it to the students there was nothing on the market until now. So that's where Risk 5 comes for example into play. And the whole idea behind Risk 5 is also to democratize the process. So you have documentation to the chip and anybody can join the Risk 5 foundation and discuss and find a consensus how what kind of features to include in the architecture and which not. And anybody can join which means be it a company or an individual. The project was designed to take all good ideas of architectors we have now in place and fix their shortcomings. So they want to be performed on microcontrollers which usually have a low on memory and have a low power consumption as well on 64-bit processors. So how did they play around with the architecture? One possibility would have been to take one of the boards but I thought it would be easier to just emulate it in QMU. So I downloaded the image from the Pedro website and launched it. Already I had a usable Risk 5 system with toolchains where I had the Bilib C, a GDB which I didn't need to try to cross-compile or anything. But the problem is it's quite resource hungry. So on my system for example it needs three minutes to start up. So it's not really optimized yet. But at least you have everything in place you can just directly start hacking. So what is the Risk 5 architecture? It's a reduced instruction set computer which means you have small instructions which are not complex but just do one task. Intel for example has push and pop operations to put things on a stack and take things from the stack. Risk 5 doesn't have that. You need to address your stack relative to your memory. Your program counter cannot be directly written. Instead you have to play around with the return stack which is usually put on a stack and you have to overwrite that. Furthermore it's little-endian which means the least significant byte comes first and the most significant byte comes last in the memory. So if you want to write your exploit later you need to shift around the bytes and we will see that later. So Risk 5 has a modular approach which means there's only one minimal instruction set which is obligatory to implement and everything besides that is optional. The minimal instruction set has jumps, branches, logical operations or add and subtract operations. But you can also add multiplications on integers, floating point operations, vector operations. You can add a 64-bit instruction set on top of that and 128-bit instruction size instruction set is in the works right now. And also you can use compression instructions which the compression extension uses two bytes for one instruction instead of four. So what else are the differences between architectures? So on Risk 5 as well as on ARM you would have you can use your predictors A0 to A7 to pass functions to parametric function. In x86 32-bit it used to be that you have to put everything on the stack but that's not true for the 64-bit version right now anymore. Risk and ARM have many more general purpose registers. They have 32 of them while x86 64-bit has only 16. And the reason for that is on x86 most of the instructions have access to memory so you can perform your add or sub directly on a memory address whereas on ARM and Risk 5 you first need to load the value from memory into a register then perform whatever operation you want to perform and then put it back into memory. So yeah Risk 5 is therefore called a load store architecture while x86 is a register memory architecture. Then Risk 5 instructions are or memory access is byte aligned while instruction access to instructions in memory is word aligned which means every time you fetch an instruction you always fetch four bytes unless you're using the compressed instructions which is only would support two byte instructions and Risk 5 got inspired by many successful CPUs which are already on the market like Spark and PowerPC or MIPS or ARM and all of them have a fixed width instruction set while for example Intel is known to have variable size instructions which are harder to schedule or harder to fetch and decode in a CPU. And all those CPUs have a vast number of registers so that you need less accesses to RAM and accessing a register takes much less time than accessing a value in RAM for example and you have a very simple addressing mode. So this is the important registers we will be looking at later in our assembler code for example we have the zero register which is basically only zero which is useful in programming in general for example sometimes you want to compare whether a pointer is null or whether a value is zero and you already have this register at hand while in other architectures you would need to clear this register first before you can use it for comparison and furthermore having the zero register also reduces the instruction set because you can use it for negation so instead of having an negation instruction you would just calculate x zero which is always zero minus x one which contains your value and you have the new value then r a is yeah contains the return address as p is the stack pointer then s zero is the frame pointer which you will find later in the assembler code and in a zero and a to a seven as I said you pass the function arguments and the return value of your function you can find after the execution of your function in a zero to a one. So I'm showing you the function pro and epilogue on risk five because you will see this pattern basically later in the sampler code and might be easier to understand so on the right side we see sd sd stands for store double which means it takes the value which is in r a and sort stores it at the memory address stack pointer plus eight and ld is load double which does the inverse it stores whatever or it loads whatever is located at stack pointer plus eight into register r a and um so a typical prologue and epilogue is whatever happens or whatever the compiler puts without you knowing that like you write your little main function and the compiler takes care of constructing e and d structing the stack um which means he um basically makes um space on the stack for um 16 bytes and then he stores the return address and the frame pointer from the previous um function onto the stack and whenever the function like here's the normal function operation and then you have the compiler has to decont or the code has to destruct the stack which he builds up so he stores back the return address and the frame pointer and um yeah destructs the stack and um jumps to this address which was yeah stored before so that's how it looks like on the buffer um it says that this is your stack and um stack grows downwards whereas um address space uh grows upwards so basically you have a lower address here for example 0x not 0 0 and here the bigger address 0x f f f and so what happens if you have local variables they are basically just stored after um yeah the safe frame pointer so if you have a local variable and int a which is five you store it here and b as you store it afterwards on your stack and maybe at this point you can if you have a function like mem copy or string copy it will still start from the lower address and will overwrite to the upper address and I guess at this point you can imagine how a buffer overflow can overwrite your return address which we will use later so for me um hacking is like basically are searching some blocks of lego or duplo and you build up your um tower to create and exploit and that's what we are going to do we are going to search for um stones to build up our tower let's start with a buffer overflow so for example we have this function um which is yeah vulnerable to buffer overflow because you have um a buffer of eight bytes and you're not checking the size of the buffer you're just over writing with whatever was passed to a function in argv and then let's assume you have this magic function give shell which just opens a shell for you and you can do whatever operation you want to do afterwards um so what we want to do um is use a buffer or flow to overwrite the return address of the usual function with our give shell function which gives us all their rights and so basically after the buffer overflow we don't care what is in the buffer um we want to put a any valid address into the safe frame pointer or it depends sometimes you don't care and we just want to be sure that in this address um on this deck we have the give shell address later and so how do we go about how do we approach this problem so first of all we have to find the address of grab shell and we can use object dump minus um decode for that so we have our address and then we just play around with gdb so we pass some input into the our program and see where it crashes so in our case it crashes at this address and for those who have played around with hex a lot that is basically hex byte um 0x41 is basically just a in hex and you wonder why uh why this one is a 40 instead of a 41 um as i said with five access to instructions is always at least um two byte aligned or half word aligned that's why you can never have a one here so next step would just be to replace whatever is here all those a's with the address we found before and that's what we do and as i said uh risk five is a little indian architecture which means you have to put the least significant byte in our case 0x um c0 0 first and then the rest of the bytes and uh you see that uh shell is spawned and you can just do cat etc pass vd for example this point yeah and then you just have to double check that works on your local system as well outside of gdb so but what do we do if we have we don't have this magical gif shell function what do we do to get a shell in this case um that case you can write shell code what a shell code shell code is some hex bytes you can pass to the cpu which will spawn a shell for you it will basically do the equivalent of um system with um the string bin s h in it or something and normally for known architectures you can download most of your shell code um from shell storm a website but since describe as a new architecture we have to do it by hand this time so yeah so this is the basic idea um you find some executable area in your memory like the stack or the heap sometimes you have to leak the address but in our example to make it easy the program will just give us the address and then you write some assembler code put your shell code into your buffer overwrite um the return address and jump there directly so this is that's our um vulnerable function this case again we have a buffer uh which in this case is 128 byte big and that's approximately that's the place the space we have to put our shell code can be up to 128 byte in size and again we have a string copy which does not check um for any um size of your buffer and just overwrites whatever it finds and again we call it with the arc v one argument so whatever is passed to your program and so how do we go about um we can for example yeah start um and take exec v and exec v will execute any program you pass as the first argument as a string and we will pass here the string bin s h and yeah both those arguments we basically don't care so first we have to find out so exec v is a syscall so we have to find out the syscall number and for that we can either look into the header and see okay that's um syscall number 221 or we can again um look at um lipsey and see um how how this um function is called and um lipsey will just load the number 221 into register a7 which is designated as the syscall number register and then call the instruction equal so yeah so that's what we basically want to do we want to pass um bin s h to execute v and uh the other arguments we don't care just null so then my idea was um to make life easier i will not start from scratched um writing similar code instead i will just implement a c function compile it and see what it does already um so this is my c function and this is the resulting shell code and as i as you can see that that's basically the prologue and the well we don't have an epilogue in this case because we don't care what happens with the stack afterwards but we see the prologue v so on the beginning but um the problem now is um uh yeah um but in that compile function is using the plt the procedural linkage table which is a mechanism where um a program can can find dynamically where lipsey functions are located so basically it's like a trampoline it goes to a place searches for the address of the real exit v and jumps there later and yeah so instead of going around circles we can just directly put um whatever the exit v function does in there um and have our resulting shell code yeah then you compile it and you execute it and um you double-take that it works but at this point you have a problem because you have null bytes and you know what happens with string copy if you um pass something which has null bytes it stops copying exactly so you don't want to have uh any null bytes in your shell code because you will not copy the whole shell code then so after having this base of a shell code you want to remove all the null bytes and uh how can you do that yeah for example you um in the original version it used a power of um two num number in order to adjust um the stack you can use a an odd number let's say in the base two system in order to have a function which doesn't uh use null bytes and then you later have to adjust the offsets um when you are referencing as zero as well but um then here for example we are loading um the immediate value of zero x um 687 into our um register a5 and instead of um doing that we can take a larger byte uh larger number to remove null bytes and then we have to account for the offset we created by doing more operations which makes our shell code bigger but at least we removed um the null bytes and then the last problem we have is the e-construction and we have to perform it um s is so in order to remove that in our case we are lucky because we have a writable stack as well so basically with this shell code we are creating exactly this combination of numbers and putting it on the stack and then we are jumping there but it only works if your stack is writable so um what we see here it's exactly a shell code we just created and um that's exactly what we want to pass into our buffer yeah so you find it here again same shell code and um um so we overwrite um the buffer with the shell code and then the rest of the buffer size we fill up with a because we don't care this is where the frame pointer would be located and this is the address we want to jump to because luckily our main program just gave us the address where the buffer is located where our shell code starts so we can just um jump there and we agreed it from our um shell here and so but what happens if our stack or heap is not executable because that's what the program normally is right you don't want anybody to execute code on your um stack then you can perform writ to a red tulip c um which is the technique of rob or return oriented programming so yeah let's assume that's our vulnerable program and um it just reads in whatever it finds in the file um it pass over fd and um yeah that's it and our stack is not executable we can try to perform rob now and rob means you basically search you know what libc version is running on your system so you know what um possible assembly instructions you have in this uh libc and you're searching for um assembly instructions which do something do something and then do red uh in the best case it loads a value from this deck in the registers you need and then it does a return and you basically chain those so-called uh rob gadgets one after another um so that you have the execution of the program you want to have to execute so for example in our case we want to execute um the function system which again we pass uh the string bin is eight um so yeah first we have to find our rob gadget again we use object dump minus decode for that we grab around and um out of the 1000 gadgets I found this one is nice because um what we want to have after our buffer overflow is um having the address of um system in our um register r a the return address register we want to have the the address of the string bin is h in a zero which happens here and the other two we basically don't care and we want to override um our return address on the stack with um that one plus the offset wherever libc is mapped so um next step is we have to find out um where the address of um system is actually we use object dump for that again um and then there's uh the python script which will generate um the file with the exploit it um so we fill the buffer with ace because we don't care this is the frame pointer again and this is the um gadget address we found so the shellcode I showed you which basically initializes uh the return address and a zero for us and then um on the stack further you find um the address of this string bin is h um yeah obviously you need to have address um space layout randomization disabled at this point because otherwise your address will change all the time um and um then yeah the address of system in the buffer and so this is the generated that's the file called exploit generated by python code which is read in our program into the buffer so we see um this is the ace for example the bees and um the addresses yeah in the code as well so we just pass it to our function ball and it opens a shell for us and actually um I used like normally if you want to try out your exploit in this case in gdb the problem is that gdb puts um environment variables into your stack like um environment variables are located on the stack so it will move around where your buffer is located in the end so in order to um facilitate that you can either remove it by hand or you use this fancy script called fix and which you can get from github and um it will adjust um your stack for you yeah that's basically it um if you have any questions feel free but otherwise I'm giving away some risk five boards and hardware um if you have some fancy project come talk to me and I will hand you a board and maybe tweet about what you want to do to motivate you more yeah that's all um do you have any questions yeah the rest you pass on the stack hello okay um you said earlier that we have 32 general purpose registers but only zero to seven is used to pass uh arguments and return values so I was wondering what happened with the rest of them yeah the other is for example the null register or the return address register or a frame pointer um yeah so basically they are used but they are not used from argument passing or temporary registers where you just store values but you don't they are not defined as uh passing to function the registers yeah there was another question I think no hello why uh I have some questions because uh the the share the share point or we we say the function you call the when you return that is the the the code you can access in your Linux uh uh OS but which pages do you know I think everyone likes some some show you can return and create a share share command console uh maybe you say stop yeah yeah okay yep yes you after you run this program you will just run the vulnerable function yeah and this function is the code the code you can access or the memory you can access or because OS or Linux have some memory MMU yeah that will limit to you access the page you want to but you put it on a stack so basically um you can access buff right so you can also access buff plus 12 or buff plus 42 okay that page you can access you have the authority to access that page all the reads and writes and executions I did um you have the authority to write at least and it's an example to execute as well oh okay because if you are in kernel space that you can access but this one is not in kernel space this one is for example user space but I mean the same techniques apply in kernel space you're just accessing what you can use in your program as well you're not you're in your threads you're not looking at threads two or three which does something different you're in your address space basically oh so if I have one one function in my main app application and another one there's another another user's uh right there application then you don't so I cannot uh you can it would be much more complicated okay this one basically targets one program one program that I can access the the form memory the examples yeah okay it's not the complicated examples we have to access other users thanks hello I have a question about that about our ROP attack I think that you have a fixed event hey fixed environment to the shell script so I think that may be the asr disable yeah for all those examples basically asr disabled because I mean otherwise you have to run it one thousand times and might do the same and what is the effort for you to find that gadget um for normal architectures you have actually programs like for arm or x86 you can call rob gadget or um their python modules which you have a tool right to do that but yeah since that one is not it will be implemented I think in the next month but but for now I just use crap okay okay and uh have you compiled the uh the I mean the same application that compiled for both by on the risk of fire and that on arm uh any code size change any I mean the code size change difference compare with that's uh in any case you have different architecture you have different instructions you have different code size yeah I mean uh which one is a smaller depends I mean my shell code is not very sophisticatedly written so you can make it smaller I think you can just put it in 32 byte or something depends on how much time you want to dedicate to write your shell code but um actually risk five and arm are quite similar in terms of instructions that in terms of philosophy um most techniques you can apply on arm you can also apply on risk quite nowadays okay and so actually um and also like if you're asking about novel programs it depends highly on your compiler how well he can optimize for the platform yeah you mean for gcc they have optimized on the yeah I don't know how good their optimization for risk five is okay and uh can you go back to the the general register compelled uh the slides which is the difference in the arc yeah um yeah yeah that's or the previous the next one yeah uh or was everything compelled with arms that like the bank to register is same in the risk of fire design so again uh I mean the bank to register back to uh yeah when you uh switch the context and uh I mean the privileged mode uh so the same bank uh some register will be banked um I didn't look into that okay but I guess it depends on your implementation uh okay and yeah I can change to the next question that uh you have mentioned it that says that there are compressed instructions yeah uh compared with the sum mode so this is a dynamically changed uh like in arm we have a blx uh but in risk five it's implicit it's yeah it can directly decode you don't need to change between modes like you saw uh in the shellcode for example I presented somewhere um that you have instructions which are two byte in size and instructions which are four byte in size and you don't need an instruction in between just works like that okay so it's implicit okay thank you yeah thank you uh time's up thanks a lot for listening and if you have more questions come and talk to me I'm happy to talk about those topics