 Hey folks, Adam DuPay here and today we're going to be looking at the Poneable.kr challenge mem copy. This is going to be a live hack through, I have not seen this code or looked at this at all yet. So, let's jump right in. So mem copy, 10 points. Are you tired of hacking? Take some rest here. Just help me out with my small experiment regarding mem copy performance. After that, the flag is yours. There's a link here to C code which I've downloaded and then SSH mem copy at Poneable.kr So when we go here we can see the mem copy.c and we can see the readme which has the compiled binary of mem copy.c source code with the real flag will be executed under mem copy Pone privilege. If you connect to port 9022, execute the binary by connecting to the daemon. Okay, so that's how we access the flag. That's good. So let's look at this source code. So teaching Pone Devils, Poneable.kr, mem copy, all right, and this is a 10 point challenge. So let's look at this, set buffers, okay, hey, I have a boring assignment for CS class. The assignment is simple. What is the best implementation of mem copy? Implement your own slow fast version of mem copy, compare them with various sizes of data. And your experiment has made a report. This time just help me out with my experiment and get the flag so maybe this looks like it's just going to be some kind of programming assignment. No fancy hacking, I promise, I don't really believe that but let's see. So what's happening here, variables, making a private memory mapping, M mapping, I believe this is the size but I have to look at all the flags of M map. We'll see if this is important. Some source is a character pointer, setup experiment parameters, do stuff, specify the mem copy amount between low and high, gets in a size, sleep one, okay, let's run the experiment. So we actually get to specify some kind of configuration, run the experiment for i0, i less than 10, i plus plus size sizes i experiment percent d where did low and high come from interesting oh size so we specify what is it 14, wait for e equals four less than 14 plus plus. So we specify different size amounts and those go into the sizes thing and so we're going to loop over them. So that's going to be 10 so four to 14 this is zero to 10. So it should be 10. So we're going to present d so I'll lock a destination buffer of that size, mem copy, what is this cash to to cash one says to eliminate cash effect, our DT actually don't know what this does our DTSC of the assembly instruction okay I'll need to look that up what that does I don't really remember then call slow mem copy source destination size okay slow mem copy looks like this for i equals zero i less than length i plus plus destination right so copy it byte by byte RT DSE is the number of CPU cycles right I see so a lap CPU cycles for slow mem copy T1 minus T2 mem copy then to eliminate another cash effect a lap CPU cycles for fast men copy so this looks at destination source 64 by if okay if length is greater than equal to 64 see how many blocks we need to do i minus minus oh it's using XMM thanks for helping in my experiment oh is this going to be something like it's super slow and C0 22 specify the men copy amount between 8 and 16 8 16 32 64 6 10 24 2048 yeah i'm thinking this is going to be super slow with this size oh size is the amount of memory 40 96 okay experiment five men copy with both their size 128 view cycles for slow men copy um okay super weird let's see what the heck this size thing is because that's the only input we have into this program I can't it's weird does it crash okay let's see it's allocating 40 right it was just 16k okay interesting okay so it stopped at so it's did mem copy CPU slow 128 so let's see if I can compile this locally on my machine and 32 syscdef.h because oh it's giving me a nice I need to install it must be some install so it looks like do I not have oh well oh the libc the 32 bit libc okay so installing that this goes to install this locally and see what happens it must be something weird about how they're doing this so you must have to be certain to give the right value here so I'll just do exactly what I did on that other one which was put the lower value 64 128 6 5 12 10 24 20 48 496 okay run the experiment oh it's like fault interesting huh I wonder why that is so we'll have to just run 8 16 32 I should just make this into a file 24 128 256 5 12 10 24 20 48 296 so seg fault in fast mem copy in fast mem copy move n tps oh now I do want I should reinstall Jeff so next time around gdb I'll have a nice Jeff environment but that's okay because let's look at this all right push ebp move ebp this move the stack pointer into ebp subtract 10 for the stack pointer if length is greater than or equal to 64 so this makes sense because this is the the only time we hit this branch is when we're actually at 128 which is the first time we're above 64 that makes sense jbe jump below equal plus 88 so skip all of this junk but we don't want to do that so we say okay we're in here so now move eax into I don't know move ebp plus 10 into eax this is going to be length shift right by 6 that's going to be divided by 64 move that into I which is going to be ebp minus 4 I should probably do this in hopper but I kind of like looking at it in here let's see ebp minus okay yeah eax into there so that's that shift right so it's 128 divided by 64 so that should be 2 right just calculating you know 28 divided by 64 2 let's see and l ebp 10 and it with 3f okay so that's a little weird oh that's this and equals 64 minus 1 so right so 3f looks like this so only keep the 1 2 3 4 5 6 bits there and then how did this get translated so then while I minus minus is greater than 0 so do this let that number of times jump to plus 75 and then jump up here so move ea move qda okay I have no idea what these instructions do move dqa xmm 0 into eax move aligned double quad word moves a double quad word from the source operand to the destination operand this instruction can be used to move a double quad word to from an xmm register and a 128 bit memory location between two xmm memory locations so first okay so dereference eax let's see move it into 0 then move eax plus what's that 16 yeah like 16 into 1 2 3 and then move it back but this didn't work but why didn't this work edx ntps it's 804c4a destination destination should be something all right I'm gonna quit this there we got our jeff okay breakpoint on fast mem copy let's see so I have actually the input here I'm going to create my input input file 8 16 32 64 128 256 512 1024 and I will obviously be playing with these later to bypass these checks or to somehow get this thing to work 8 192 so where am I there run input there we go I'm at the beginning pair okay so now I know this is the the stack so so now I have basically the stack pointer base pointer so that's so compare oh so yeah it's just gonna do push push push call slow mem copy to so I actually wish I could see more of the stack let's see if moving this down allows us to see more now so x 20 x yes p is this 40 is that the value is 40 128 not 64 there we go and then now this value should be 80 which is 128 80 128 great okay so now we can see what's happening here next instruction so now we check we don't take it we then are going to move ebp plus 10 into eax so this is going to be 128 into eax we shift it right by six so this is going to divide by 64 it's going to be two we're going to move that onto the stack until ebp minus four to the local variable i we're then going to and it with three f so and it with three f jump down move that there load effective address move ebx into there test eax to so this is the loop counter checking length greater than or equal to oh no this is while i minus minus greater than zero so because that's going to jump us back up that's a very common technique of how while loops are done so they first jump down to the bottom to the check and then jump up to the top so okay so move into eax ebp plus c so eax is going to have f 7 f c 90 and edx is going to have ebp plus eight so these are the parameters ebp plus eight test ebp plus c source so a should be source and this should be the destination why is this the destination i'm eight of four let's see so where is test okay source okay source is an end mapped value and test is scan f okay wait so no no desk sorry desk was a malloc yeah is malloc with the size that we passed in should only have 128 bytes is that why okay it was at that very location okay nothing in there it's on the heap i believe okay so now move to dereference eax and move that onto x mm 0 x mm 1 x mm 2 edx is this value six have move n tps move a line four packed single ft non temporal with the double quadward in the source operand which is the to the destination operand using a non temporal hint to minimize cash pollution during the right to memory the source operand is an x mm register which is assumed to contain four packed single precision floating point values the destination operand is 120 bit memory location the non temporal hint is implemented in using blah blah blah processor just not right down in the cash hierarchy nor is it fetched of course in cash line in the realign uh because the wc protocol uses a weekly ordered memory consistency muscle should be used in different operation protective memory exceptions for an illegal memory operand effective address than this if memory operand is not aligned on a 16 byte boundary regardless of segment edx so a 16 lined on a 16 byte boundary okay how do i tell if it's aligned on a 16 byte boundary well i guess that all of the bits would have to be zero so zero zero one zero zero zero one so one two four eight sixteen so i guess because that one so what if we redid that so what if i did adx still at eight four eight okay well we can do something different let's disable edx is now 0804c460 bits okay so i think this should work then interesting okay so to do this edx so to do this um x mm word i have to make sure that all the values that get returned these memory addresses have the first one though so the first bytes cleared so it's on a 16 byte boundary uh otherwise it will seg fault so now i can look at the next one so the question is will this actually work the same on the real one and that's something so this should be good this is actually something that i have no idea so let's see that's good this one's messed up yep so it's going to cause a seg fault and this is on towards the end then this was like the second to last one 4096 i don't know what did i just add bytes so let's see if that actually makes a difference um by doing it here so this is you know we want to so okay good so i did get past the no i did not see this is the problem 816 32 70 128 oh 128 did work there okay now how do i this maybe i can just essentially brute force this in some sense just try various offsets here two so that's plus four there we go i got me past that to 512 so 256 let's do plus so this isn't you know my preferred way to do it but basically we just need to keep incrementing these values until we end up getting on a 16 byte boundary um so 264 24 512 plus 8 right well so another 1024 plus 8 i can just use a max's calculator plus 8 and since i it looks like i'm just adding 2048 plus 8 let's see 2056 between 4096 so 4096 104 there we go all right so memory alignment i guess was the challenge there so hey you know what all types of challenges uh that was interesting i got to learn about um the semantics of these functions here this move dqa and move ntps so yeah that was interesting i'm happy i did that all right well thanks for watching see you next time