 Cool. All right. Let's make sure that works here. It doesn't work. You said it doesn't work? Oh, we're live. Cool. I actually have the real Twitch app. Okay, are you all getting familiar with GDB and reverse engineering? Yes. Loving GDB, the power that it gives you, the control of our programs. On that note, new thing, release, there's another, a new extra credit opportunity, so the NSA, yes, that NSA that you're thinking of, has a yearly challenge. This is the NSA code breaker challenge. I actually did this one year with, at the time, the Pone Devils. It was really fun. There's 10 different challenges, each with different, using different skills. And I actually do know if this is something that floats your boat. I know people who got hired from the NSA, one of the things they pointed to is they did these challenges. So if that, you know, people are, can be interested in that. And let's see. So it'll be 0.3% of extra credit for each tasks. All there's 10 tasks, a total of 3% extra credit, get you into a plus or into a minus or out of a minus into a regular, however that works. So if you want to take advantage of this, again, you don't need to, but register with the site using your ASU email address. That way I could link it up to you later. And then they actually have this cool feature now. I can actually scroll through. There's like some programming assignments, hardware analysis, emulation, dynamic reverse engineering. Hey, that's what you're learning right now. Reverse engineering, forensics, forensics database, exploitation, all kinds of stuff. So using literally all the stuff we're talking about in this class. And once you've created your account, go up to sharing. Then you will be able to put my email address in here, which is on the syllabus. And that way I will see your progress right here. And at the end of the grading deadline, I'll take a snapshot. And that way I can give extra credit to everyone who's done this questions. Cool. Okay. Other thing, if you haven't already started, hopefully you already have. I realize even though I put some info here, the debugging and reverse engineering challenges are slightly different than maybe that we were used to in the past. So if we start one, I will hopefully show how to do it. Let's go desktop. All right. Hey, my wire shark thing is back. But I just still got, okay. Okay. So if if we just do challenge run, like we've been used to, we got this error. Ah, no such file directory. What do we do freak out? Just over just quit, leave ASU in disgrace, go back home. No, we can look at what's in challenge and we can see, oh yeah, there's something else just called something different. Another fun tip. If you've never done this before, the handy dandy tab button on your device, nice catch, will do auto complete. So if you type slash ch and hit tab, it looks and says, oh, there's only one directory that is named child, like starts with ch. So it auto completes the rest. And then I've hit tab again, there's only one file in here. So it just auto completes the name of the file. So I don't have to worry about typing it in or getting it done or anything. So running it, it says the program is restarting. So this is one where you need to read some of the output. So the program is restarting under the control of gdb. Hey, we love gdb. You can run the program with gd command run. So we can see all the gdb output. You can see run. Oh, more output. So we read some more. gdb is very powerful. We're running, we're currently paused. It's because there's a breakpoint. You can use the command start to start a program with a breakpoint set on main, use the command start I to start a program with a breakpoint set on underscore start. You can use the command run. Of course, you can all read this in the gdb manual. There's a ton of great stuff in here. Use the command to continue or see for short to continue your exploitation. Oh, shoot. Give me the flag. Cool. Okay. So I'm not really spoiling anything because I just literally read it and did what it told me to do and gave me the flag. Other levels will be more complicated. Cool. Now, oh, that was almost bad. Okay. Now the other type of level. So these are all similar than the reverse engineering levels. There's two versions. I think I mentioned this, but a dot zero and a dot one, the dot zero prints out output that tells you what it does. The dot one, you need to actually reverse engineer the binary to figure out what it's doing. But let's look at this challenge baby rev level one. Oh, again, I didn't do anything. I just literally type slash ch tab tab enter. Okay. It's a license verifier software. Hey, that's cool. However, before you can do so, you must verify that you are licensed to read flag files. This program consumes a license key over standard in. We know what standard in this. Each program may perform entirely different operations on that input. You must figure out by reverse engineering this program what that license key is. So it's expecting a certain key and taking our input doing operations on it and we need to provide that. So let's say Adam. So the initial input was this looks like, does it look like what I typed in? Does it look like ADAM? No, what does it look like? What numbers? What kind of numbers? Hex numbers that correspond to the ASCII values of what I typed in. So it says mangling is done. The resulting bytes will be used for the final comparison. So the final result of mangling. So this is exactly what I gave in and it expected a different results. So it said wrong. No flag for you. I can also, I think I showed, I mean, probably I talked about it, but you can use the up or down arrows or control P to go to previous or control N to go to next to also scroll through your history. So you again, don't need to type this stuff again. Oh, okay. So it's only reading in one, two, three, four, five bytes. We can see that. So what if I wanted to pass in non ASCII values to this challenge? How would I do that? Is it impossible? Copy paste the symbols from the web? Wow. That's a kind of a deep cut. How do you, what's the ASCII symbol for the null or the thing that you can copy from the web for the null character or the acknowledgement, zero space, space, zero, something like this, 30, 20, 30, 20, 30. Okay. So how is the program reading in its input from standard input? Thank you. So which means if we did this test, right, using the, the pipe, we can pipe the output of one, the standard out of one program into the standard input of another program. So here we have a, this is test T E S T new line and that compared there. So that was incorrect. We can look at the man page of echo and we have a very nice option here, the dash E option to enable interpretation of backslash escapes. So that actually allows us to use all of these new lines, but we don't want new lines. We want this very handy dandy hexadecimal value. So we can use slash X and then a hex value. So if we did slash E and I wanted this to be zero zero slash X zero zero slash X zero one slash X zero two slash X zero three zero four. Now what would I expect it to tell me the initial out input is here would be a B C D. I can hit enter until we guess what's going to be. Yeah, it should be this input right because we're telling it in hex. So it's literally sending these exact bytes into the program. So there we go zero zero zero one zero two zero three zero four. Right. So we can have this to be whatever we want. This is a nice way to do it. Another thing we can do to check if you are ever confused about what bytes are in your file. A hex dump is a very cool program you can use. It literally dumps out in hex. I think by default it'll do it like this. So but the ordering is a bit weird but let's look at it like this dash C says okay zero like these are the addresses and these are zero zero zero one zero two zero three zero four zero A and then this would show you an ASCII of what it was. So if we did a cat. So this is the file. So this is the offset. So offset zero are these bytes and then an offset 10 these are all hacks. So these are 16 bytes on a line and we can space through that and see this whole file in beautiful hex. Okay. That helps us test. Okay. That's great. Did this. Okay. So I guess should we solve this one? No. You can do it on yourself. Okay. Good. I like that. Good attitude. You know, it's not a curve. You don't have to like push everyone down. So you get a bit of great. Okay. Just human nature. I have no humans. Okay. Cool. All right. So that's the basics of the assignment. That's good. We'll go back to it in a second. So we left off talking about functions and function frames. And now we are going to move on to data access. Okay. Okay. Cool. So as we know, and we've been doing this programs need to operate on data. Right. We need to have actually something to compute on. You'll see in the reverse engineering challenges that they have data of what they expect after they do all the mangling with the your input. And so your goal is to figure out what input gets it to that correct state where it passes that check. So data can be in many places. It can be in dot data. So global arrays dot RO data. So the RO dare to go over this, I feel I'm feeling weird deja vu. Nobody remembers what we did on Wednesday. It's way too long ago. Time ran out right here. That's why it sounds funny. Okay. Cool. Okay. Read only data. DSS uninitialized data. The stack. We can sort things on the stack. You've been using this. I think some of you who use the stack for their web server building a web server module. Yeah. It caused you grief. Why did it cause you grief? It's the stack. Probably didn't allocate enough space for the stack and you just used it maybe. But yeah. So the stack, we can use things. That's where function frame data is stored. And finally, we have the heap. So the heap is dynamically allocated variables. How are dynamically allocated variables different than variables stored in a function frame? So what's those those equivalents in like C plus plus? Yeah. So the stack would be a local variable in a function. What would the like, how do you create dynamic memory in C plus plus? Yeah, with what malloc you can use? What's the C plus plus way new yet? So it uses malloc under the hood, but yeah, either malloc or new, right? And the idea is you the programmer have to then figure out when you want to free that memory, right? You're saying I want to allocate memory. I'm not sure when I'm done with it yet. I will tell you when I'm done with it. Whereas a local variable is only valid in that function. It goes away as soon as the function is not called. Okay. And these are all in different sections. We can use, I definitely remember showing the proc file system and using proc self maps to show the file map, the memory mappings in here. It's super cool seeing how all that. So as we saw for the stack, and we've been playing with it, there are several different ways to access data on the stack. And again, this is when we're looking at that low level assembly, trying to understand what was the code likely written either in C C plus plus whatever higher level language. So we know that pushing data pushes it onto the stack. And when you push onto the stack, which direction does the stack pointer change? Lower addresses. Yeah. So lower addresses is definitely more the correct way to say it, right? It subtracts. And how many bytes does it subtract eight? What why eight? Yeah, but why is it 64 bits? It's the register sizes, how I would say it because you're the push and pop operations, take a register as the input. So you can't say, Oh, I just want to push. Well, I guess you can say I want to push a x or something, but it will always do eight bytes. Cool. Okay, so pop operates exactly opposite of push. We push something onto the stack, and then we pop it off. We can also have relative access based on RSP. So we could have let's say this load here is loading from RSP plus 10, whatever's at that memory location into RDX. Is this a valid reference into valid memory RSP plus 10? Yeah, assuming so that maybe feels like a trick question, but you can a well formed program wherever RSP points to positive values should be already like correct allocated, right, because we subtract to get new memory. So if we ever access something negative from RSP, that would be bad, right, because that's memory that we haven't said we're using yet. So stores, oppositely store RDX into wherever RSP points to plus 10 hex. So that would be if RSP is here, let's say each of these is eight bytes. So that would be move this value right here. So this would be, I guess that's RSP, RSP plus eight, this would be RSP plus 10. That makes sense. Okay, we can also access data on the stack relative from the base pointer, RBP, right, we looked at how every time we call a function, the first thing a function does is set up the function frame by creating a new base pointer for its function and then subtracting space for its local variables on the function frame. And so base pointer accesses will typically be negative, because RBP is pointing somewhere up here. Actually, in this example, it will point should point right here. So whenever you see RBP minus, that usually means a local variable, we should probably look at an example. I think I had an example, right? I have no idea. I cannot remember for the life of me what that code was, but let's look at it, because we'll be able to figure it out together. Okay, push RBP, move the stack pointer into the base pointer. So setting up a new function frame with a base pointer, and then subtracting hex 43, four, sorry, not 43, 430. And then what are we doing? Moving EDI, so the function parameters into RBP minus 414, RSI into 420, and RDX into 428. So this is already telling me that there's at least three local variables in this function, right? I don't know exactly what they are yet, because I don't know how they're used. But because it's an offset of RDI, it means it's a local variable here. We also had, if I remember, this should call scanf, and it uses RBP minus hex 410. I think we had a giant buffer, if I remember correctly, of some maybe 1024 is probably my guess. Yeah. Into RAX. So load effective index to RAX, RAX into RSI. Why is this, how come it's doing this? Why is it moving something into RAX, and then just moving that directly into a register? Why isn't it moving load effective address RBP minus hex 410 into RSI? No, because those should be identical, right? If we don't need this value in RAX later, which we're going to move here zero into EAX, so the value inside of RAX of whatever is here is completely gone. So we're making a function call here, and we're passing in, I think, with scanf, this will be the second parameter. So RDI is the first parameter, RSI is the second. So we're passing in the second parameter here. But why these two instructions when I can do the same thing in one? Yeah, who knows? The compiler did it. Like, the compiler decided to do it like this, and that's the way it did. Probably we didn't compile it with enough optimizations. Oh, I don't even remember how I compiled this. Let's see. This was dynamically linked, not stripped, GCC-O example, example.c. The capital O is the optimization level. So if I turn this up to crank it up to optimization level two, so now we have main push R12, push RBP, move RSI into RBP. Oh, wow, it's like a lot different. The scanf, so here's the scanf. So move R12 into RSI, move RSP into R12. Wow, this is actually crazy. So I think it's used, like, it's too optimized. Hey, there we go. Wait, is that right? No. Oh, I guess it's doing this. So yeah, I guess it's, oh, I see. So it's subtracting 418 from the stack pointer. It's not even, it's optimized. And now it knows that the stack pointer, this only one variable, so it's using the stack pointer, and it's just moving the stack pointer into RSI. So it's, it knows that the buffer also points to exactly where the stack pointer is. So it's not even doing any RBP calculations or anything. Anyways, compilers are crazy. Like, the more you crank up that optimization level, the more they will spend time and effort optimizing that assembly to try to get it down. If you do it, I think by default, let's say, I think O0 is the default optimization, which means don't do any optimizations where any is in quotes, because it still does some, but that's where we get this, where it does slow but correct code, right? We're talking about the difference of one instruction here. You're probably not going to notice unless it's a very performance critical code. Anyways, the point of all that discussion, digression maybe, was that compilers do weird stuff. Like, it's hard to and it may change based on compiler versions, different compiler versions may generate different code, but it all works the same in the end. Okay. So now we see how these local variables look with RBP relative access. And we can also do things like move RSP into RDX and then move 41 wherever RDX points to. So we can do that with a stack pointer or the base pointer. Yeah. Say it louder. Yeah. So why do we push the base pointer at the start of a function call? So the idea, let me, the reason is that so that every function can access its own local variables based on its base pointer. So it'll say, okay, for this function, A is at RBP minus 20 hex, let's say, for when you call me, but when I call another function, it needs to set up its function frame. So they end up on the stack looking like chunks of memory. And that's why whenever it returns, it needs to restore whatever the base pointer was of whoever called it. So it's like setting everything up correctly. Because there can only be one base pointer register, but you may have one function at function A calling function B calling function C calling D. And each of them, when they come, when control flow goes back to that function, it needs to have its base pointer back set up where it, where it thought it was. Cool. And data in global variables, so global variables will be stored at known offsets from the program code. This is what's kind of crazy is everything is relative to RIP. What's the RIP register for? Instruction pointer. So it's literally the offset from this current instruction plus some amount. So this is nice, because this data access will work no matter where this program is laid out in memory. It just says whatever the instruction pointer is, access it from there. So we can see that in our example, the scanf option here when we called scanf, this was RIP plus. Now actually doing that calculation is a real pain. I think it's off of this address, but we can see that object dump is very nice. It tells us that it's at 10,004. So we can actually look at that. And because I use object dump dash capital D is trying to disassemble all of this. So even though this is actually bytes, this is, let's see, 25 is what in hex? Anybody remember? What ASCII character does that represent? Space is 20. New line is 0A. It's 10. It was, I thought it would come up in, what should I call it? But let's look here. Percent, which makes sense, the scanf and percent, 31, 30, 32, 33, 73. So if we look up 31 percent, one, oh yeah, remember it was percent. It was the size. So it's percent, one, zero, two, three, and 73 will be S. I am pretty confident, S. And if we look at our example.c, that is exactly this value here. So the compiler decided, okay, I'm going to store these bytes, this ASCII string, percent, one, zero, two, three, S, null. Why does it need the null byte at the end? That's how C strings know when to stop, right? String length counts until it gets to a null byte. String copy counts until it gets to a null byte. All that stuff relies on these null bytes and knowing where we are. So it just decides to put it there at this offset and that way, and then it calculates what that value is based off here. But you shouldn't need to do that. Your tools should do that calculation for you. But that's how we can see, oh, this is a global variable. And we can read and write from them, loading, storing, referencing, getting that address. Cool. Okay. So data stored on the heap is usually stored in the form of a pointer. You've played with pointers before in building a web server, right? Did you know that that's what you were doing? Maybe painfully by the end? Yeah, when did you use pointers when you were building your web server? Yeah, like specify addresses, like for what's this call? What's this call that you have to pass the addresses to? Great socket? Yeah, it's a great one. What else? Bind? The bind structure? What else? It's the only ones you remember. Except, I think except just took file descriptors. I don't think it was... I think it took the pure stuff, right? Exactly. Yeah, values not addresses. Read. Read. You got a buffer in, and then you had to iterate over that buffer with a pointer to that buffer until you got to the correct thing, put a null byte in, right? You did all that fancy stuff. That's all pointer arithmetic that you're using, right? So you were using pointers, manipulating them, incrementing them. And so that's typically how data is stored on the heap. So when you call, we talked about malloc, when you call malloc, what is malloc return? A pointer, right? A memory address. And we know pointers aren't scary, they're just memory addresses, right? So it gives you a memory address, and then you can access that. So the difference here is kind of subtle, but the idea being that heap data references to heap data are generally stored either in memory or on the stack, because we need that pointer to reference that memory that was dynamically allocated. And so in this case, rather than saying, hey, move RSP into RAX, then dereference RAX and move it into RDX would copy whatever's on the stack. See how this is different, because it's essentially a double dereference, right? We're accessing RSP as if it's a pointer, and it points to some memory location. And then we're moving that into RAX, and then we're dereferencing RAX and copying that into RDX. So usually, it's pretty clear which is which as we get through them, because you'll see that they're pointers to things. But okay. The other big thing that we talked about is type information. And you should also understand this viscerally from the, from the building a web server assignment, right? We talked about the structure that you passed a bind of what socket and what address to listen on, right? We saw that it was a C struct, but then when you go to write that in assembly, you can't write a C struct, you just have to put the values in memory of precisely the right locations. Okay. Now this will be very quick. And then I'll show a demo. I guess, I guess it will be less of a demo would be a collaborative reverse engineering. So there's kind of two classes of approaching reverse engineering. And again, what's the goal? What is your goal when you're reverse engineering something? Who said that? Perfect. Yes, understand how it works, right? Your goal is to understand how this thing works. And so there's really kind of, there's two general classes of approaches. One is dynamic reversing where you're running the program, you're interacting with it, you're debugging it, like you're doing, learning how to use in the first couple of modules. And then there's static reversing where you look at the binary and you just look at what the code does. What's the benefit of dynamically analyzing something? Yeah. Yeah, you can see the values as they change during runtime. That's also part of what you're learning in this debugging section of how to do that observation. What's the benefit of static reversing stuff? Yeah, I'd say rather than between runs, I'd say there are no runs, right? So what if you're, could you envision a scenario where you are reversing some piece of software that you can't run? What would that happen? What would be an example? Yeah, the program's for a different, you're trying to reverse engineer a Windows binary, but you have a Mac or you, it's a MIPS router firmware binary and you have a Windows machine that's running x86, right? Maybe they're emulators and stuff and we'll talk about that, but they have limitations. So static reversing is really important. And actually, I think the key is combining the two and kind of acquiring as much information as you can about a system by interacting with it. Okay. So the idea is just like we talked, static tools help you reverse a binary at rest. So without running the binary, what does it look like? What does it do? There are simple things. We talked about some of these, some of my favorites, strings is an excellent, excellent thing. We've been using object dump. It's incredibly simple, this simple disassembler just boom shows you all the instructions. This is a cool one. CheckSec is used a lot in like CTFs and stuff. I'll show that really quickly. It, when we get closer to exploitation, that'll be helpful, but it statically looks at the binary and tells you about a little bit about the security posture. So this is saying what the architecture is, the stack, if it has stack canaries, we'll talk about what these things mean. But PIE means position independent executable where the code can be moved around. The code can't be moved around. It's that fixed addresses that changes how you exploit things. If the stack is both writable and executable. Anyways, all kinds of fun stuff in there that is checking. Anyways, and there's a lot of kind of simple things. And I think I talked about last week, advanced disassemblers, right? The whole idea of how do you go from a binary back to the original C. We showed IDA Pro. The other big commercial decompiler is binary ninja. That is, we know that some of the people who created this, they're very nice people. It's actually a pretty cool system too. There's a binary ninja cloud that you can use. And we talked about, we showed Ghidra and anger management running on the dojo. So you have access to IDA anger management and Ghidra through the dojo. There is, let me check this. Actually, I'm a news cutter. If you want to be crazy, I don't know about crazy, but R2 is what they call, I actually don't know how to pronounce it, Rader, maybe. This is an open source reversing tool that's like all command line and all command line interaction stuff. So yeah, you can see if you're, this like really appeals to you of this sweet, sweet, why is that not showing me the picture of this having disassembly here, having functions. Oh my gosh, having a function graph all in ASCII in your terminal. You can be like playing door fortress, but with a binary. Wow, hex dumps is actually pretty cool. That's a great question. Is it on the dojo? Yep, it's there. I just recently, we have one person on Shellfish who's really, really, I think he used to be, or is an R2 developer. And so he showed me how to like patch binaries easily with R2 and replace bytes with other bytes and do other cool stuff. But it's like incredibly complex and complicated, which is cool. It's like a interesting thing. Apparently they have a GUI that you can check out, Cutter, I believe. Yeah. So as you can see, like most, like we talked about, right? Most, hello, why is that? Yeah, that's getting bigger, but the picture's getting smaller. Okay. So as you can see, very similar, right? Again, this concept of basic blocks and then having branches between them and putting them as a graph. Yeah. Alternative. I'd say it's closer in spirit. I'd probably say to anger management in the sense that I know this is open source and it's built on R2. So it's like R2 under the hood. I don't think they do any I'll, oh, they do have a decompiler. Yeah. I guess they have a decompiler. So I don't know how good it is, but yeah, just showing different alternatives. So you don't, you can to your heart's content go and play with whatever you like. Oh yeah, integrated decompiler. That's, oh, I see. So it's fully integrated with native Ghidra decompiler. So under the hood it's using Ghidra to do the decompilation, which is cool. So yeah, basically you just have to kind of find the tool that works for you. For me, like I said, I've been using IDF for so long, it's hard for me to use other tools because I think and I know the keyboard shortcuts and everything, but you just have to pick a tool, get used to it, read, there's really good quicksheet guides and other types of things. And so let me see if I want to, okay, let's go through the dynamic tools real quick. And then let's actually do like a CTF challenge together. Yeah. Sounds like fun. Cool. So dynamic tools, right, we want to analyze the program while it's running. We want to see what's happening. We want to explore and look at what are in the registers, what's memory looking like. And this is really important because we want to use both of those things. So we already saw our, our good friend S Trace, somebody remember by me, what is S Trace do? System call trace, right? That's what the S stands for, system calls. So it shows you all of the system calls that a program makes. Why is that important for reverse engineering? Yeah. Yeah. As we talked about, right, the whole point of a system call is for a program to ask the operating system to do something, to interact with the environment. And so by understanding the system calls that a process makes, we can understand more about what it's trying to do, right? If we just have a random binary and we literally have no idea what it's supposed to do, we can try running it with S Trace. We may see it fail. We may see system calls fail. And then we may see it output a warning message and we'll understand, oh, it's trying to listen to this port or, oh, it's expecting a folder in this directory, all kinds of stuff you can learn just from this. Other things, L Trace. So L Trace is library trace. So that traces calls into libc and outputs more about that. So yeah, you can get super far here. Yeah. You would definitely not be able to see that. Yeah. So exactly. So it uses the fact that at runtime it boots up and loads the library and then it intercepts those calls to the library. But if it's all embedded in the binary, it can't do that. You have to use other techniques. Yeah. So this is pretty simple. I mean, actually the even the step zero of dynamic analysis is what? Just run it. Yeah. Run the program. What does it do? What's it asking for? What are the strings that you see when you run the program, right? Use your intuition about what's going on here. So other thing, debugging. So this is why you're going and learning a GDP super helpful. I'll say for, I think this is definitely something it's worthwhile to learn. Not mean GDP is great, but understanding how to use a debugger effectively and all the crazy things you can do with any bugger was scripting. There was and it was a while ago, but there was like a Facebook develop like a somebody who is working at Facebook use GDP to profile their applications and understand what function calls. There was like their system was slow and they didn't understand why. And rather than adding a bunch of instrumentation, they use GDP to collect runtime information. All kinds of crazy stuff you can do here. So we can look at it quickly because we have a whole module on this. So we have our example. We can run it. It's waiting for our input. Okay. Didn't really do anything useful. So I think we can do what? Start. Stop. Start. So we're now in main. So there's a lot of commands to GDP. Dojo where in this module has links to the documentation. I highly recommend perusing that, especially as the challenges tell you about different functionality. We can do things like X. So X is examined. So we want to examine memory. Slash is then what type of memory and how much. So X slash five I says examine five instructions from, I'm going to do dollar sign RIP from the RIP register. So hit enter and this shows me the next five instructions from where we're going to start executing. We can see that my good friend AT&T syntax is back. So I can do set dis flavor Intel. Hey, there's all that all on the first drive. Do you want to see that? Hey, and there's my nice Intel syntax. I can do info. So I can actually just type info. The other great thing is if you like help X, if you don't know what any, what a command means, just type help before that command. So help X examine memory X slash format address. Address is an expression for the memory address to examine format is a repeat count. So the count followed by the format letter and a size letter. So we can print things out in octal hex decimal unsigned decimal binary, which is cool floats addresses instructions character strings. We can print things out in terms of bytes, half words, words, giant words. And yeah, default count is one. So, so like I said info, so we can actually help info. There's all kinds of stuff we can look at. I like info registers. This shows you the content inside all of the registers. So the registered names, the hex value and the decimal value. So like we can see what's in RAX, RBX, RCX, RDX, literally every single register. If we wanted to know what's at this memory location at RXI, we can do examine slash five, a giant X. So that will show me five, eight bytes of these about what's at that memory address. So I know literally the values add each of these memory addresses. If I was concerned about bytes, I could change it there and print out 20 bytes of hex values. I guess I could do T. I've literally never done this before. What does that look like? Well, that's crazy. So you can print out like binary values, which not that useful. Hex is way more useful. We can keep examining. We can just keep hitting enter and it will keep scanning through memory and showing me these this output. I can say treat it like a string. So one S, it'll try to interpret that and print me out the characters until it gets to a null byte. GB super powerful, like super powerful. We can so looking at the disassemble main. So this then would show us this shows us the disassembly of main. So we can say, okay, I'm really interested in this scan F call, which we had right here. And the scan F call is at main plus 72. So if I say break at the address of main, so star main means the address of main plus 72. And if I hit C for continue, this is the other thing that's cool. So like break is the technical command name. But as long as it's unambiguous, you can just put B, if there's no other command that starts with B, or continue, I can just put C. So now we're here right at main. Let's see where we are. We are right before the scan F. So now if I do info registers, what are my what are the arguments to scan F? What was like, what's the calling convention? What's the first argument? What register was it? RDI. So so examine as a string. Boom. There's that percent 10 23 s, the scan F string that we just saw. Right. And we see that this is a percent s. So we know the next argument is going to be an RSI. So there's a bunch of junk in here. Why is there a bunch of junk in here? It's just whatever was on the stack, right? This is where our buffer is on the stack of that we're scanning into. If we do next, next and I for next instruction, it's going to wait for our input. If I look now, we can see that it changed between these two outputs. And if I printed it out as a string, oh, it did a space. Okay, scan F reads up into a space. So that's why it was just hello. Cool. And okay, let's say I'm done. But oh, wait, I want to go back, uh, star main, what was it plus 74? All right. 72. Just two. Yeah, just run it. Oh, run past the AT&T syntax is back. So the nice thing to get rid of that is, oh, that's not what I want. We can. There is a handy dandy file called gdb init that is loaded with gdb commands every time gdb is run. So you can customize gdb however you like. So here I'm going to tell it that I only ever want to see that. The other thing is your, as you quit the instructions that you ran last time are still here. So you can use up and down to go back to what you did last time. Star main. Oh, cool. So now it's Intel syntax. I never need to worry about it ever again. Every time I run gdb on this machine, it will look the way I like. Um, I think I can just do this, right? I should not usually use this help display, display slash format expression. So display slash five I RIP. So this display tells me every time I stop or go to the next instruction to do this thing. So this means that every time I hit a break point or go to the next instruction, it will always just print out the next instructions. So we see how we're going through this until we get to the printf. We saw the hello world. We're going through SI is a single step instruction that will go through instruction. So I can keep stepping until we get to exit. It's going to run these exit handlers. And I can literally step through. So if I do, uh, start, I can literally step through every single instruction by doing SI and just keep on hitting enter into the scanf library. So now I'm in, you can see a scanf plus 112. We're in the scanf library doing the scanf things now into a function inside of there. That's going to do other stuff. Um, we do BT for a back trace. It will show us different back traces there. Oh, that's going to keep doing that. I can do SI. Boom. So I can like single step all the way through this program. All kinds of stuff. We can, um, let's restart. Let's go. There was a printf here, right? So this is before the scanf x, five, I RIP. Let's break right before this printf main plus 22 B star main plus 122 continue. I need to pry it input. Okay. Info B for my break points, main plus 122, uh, dis main main plus 122. Where's, okay, something is my understanding of this program is not correct. Ah, there's two calls to printf. What is this? That's the hello world percent n and this hello world are percent s. Okay. Was that where I was trying to, yeah, that's where I was trying to stop was in here. Remember, we added an if condition. I just came back to me. So I want to stop here, break star main plus one 51 continue restart. Okay. So now info registers. Looking at that printf RDI, uh, X slash S as a string. Okay. So this is a string. Hello world percent s. Let's say I want to change what string it is, uh, writing out. Let's see. Uh, I wonder if I can just change these bytes set. Well, that doesn't look like it worked. Let's see set. Whoops. Let's set a dollar sign RDI. Oh, cool. It actually like allocated memory and got that in there. So now I changed. So now if I do info registers, I can look an RDI is pointing to that, which if I examine it as a giant piece of memory, I can see something like that. And if I do X slash S, I'd see that that's the string testing, which GTV just created for me nicely. And so if I continue, it should just print out testing and not hello world, whatever. Hey, there we go. So we can completely change memory. We could modify registers. We can do literally whatever we want. We are debugging this program. We have total control over what it does. Cool. All right. You'll get into scripting. So, uh, we did this anyways. Cool. The other crazy thing that is very cool. I mean, is, uh, there are tools called timeless debugging, which the idea being that you don't have to think about where you want to stop the program. When you run it, you can basically go backwards in time. So you can say, Oh, actually, you can go to the end of the program and be like, Oh, actually, uh, I wanted to go back and set a break point at this place. And so the way these effectively work is at first records the execution of the program and then allows you to rewind it or replay it to any point of time. So you can check this out. It's, uh, there's a lot of really cool stuff. It's actually built into GDB by some of these features. I think it's hardware dependent on what it actually can do. Um, okay. With all this, oh, should we just knock this out? I think it's too early. We'll say this for next time. Yeah. Oh no, we only have 15 minutes. Okay. Okay. So I have here, uh, this was a challenge from crypto verse CTF 2022. Let's open it up in here. So I basically the challenge you were given was this world cup binary. So the first thing always run file. It's an elf 64 bit dynamically linked. It's stripped. What does it mean if it's stripped? No debug information, no function names, right? We'll just see functions at places. Uh, should we run it? Clearly this was around the world cup time if, okay, predict the first place in each group. Who do you think's going group A? You failed. But what did we learn about the program? We should just keep guessing countries again. Yes. How do we know that we're even typing in the right country names? We don't, right? But what do we know about the program? Let's take reading from standard input and it's reading one, two, three, four, five, six, seven, eight pieces of input from us. We also know what's the output when we get it wrong. Yeah. The string you failed. So there's probably some, and some other string that tells us when we got it correct. So the first thing I would do is run strings on this, pipe it through less. Okay. I can see get C F opens puts put car standard in, go start thing there hit enter. Oh, hey, that's interesting. So I can see flag.txt. I can see something, but it says real flag is on remote. Don't submit this. So if we went back to the challenge, you had to, you were given a binary that had an embedded different flag and then you've had access to a remote one that had the real flag. But you still have to figure out how to get it to give you this. And now we can see something interesting. So what do these look like countries? Cool. So we at least have the list of countries. Great. We could brute force all of these that can actually could be a way to solve this. The cool thing about reverse engineering is nine times out of 10. Everything you need is right here in this binary. So whatever crazy scheme you have to do it, like go for it, try it. Okay. So stage one predict the first place in each group group percent C the percent reminds us of what scan F or print F right those types of functions. We already saw group A colon group B colon group C colon. So it's highly likely that the percent C is C for a character. You failed. That was the string we got that we don't want to see stage two predict the winner. We didn't even get to that point. If I remember correctly, we only saw that part. Sorry predict the winner. So we probably have to bypass both stages. Congrats. Here is your flag. That seems like the string we want to see. Right. And then stuff about where was compiled the operating system. Elf information about all that fun stuff. Cool. If we were interested, we could run it in S trace. Is there anything really interesting in here? No, just reads and writes kind of seems pretty basic. So we want to fire up our good old friend, Ida. I'm sorry, that is the only thing I will use. I guess if you really want me to, I can use something else, but it would be just painful. Okay, just assemble a new file. We got a new file. It's called the world cut. Okay. Boom. Do stuff. See, can I change this? Well, whatever. I want to change it to dark mode, but that's fine. Okay, so this is start. The deacon biode assumes the segment GOT is really because it's main. Cool. Just do it stuff. Okay, this is like the actual, actual entry point, the start of main. This calls underscore libc start main, which calls main and sets up a bunch of stuff, but we can double click. We want to go to main. We can go back to the Ida view to see or tab to go to the Ida view and see main. We can see the instructions here. Boom. Welcome to the world cup predictor stage one. We can even go further. We see the group percent C interesting, interesting, doing stuff, doing stuff. Okay. Stage two predict the winner. We can look at the pseudo code here. So these are all the local variables. What's the first argument going to be here? What's the first argument of a main function? ArcC. Oh, that was a freebie. Arc, right? And we know this one is ArcV and this one is ENVB, the environment. Cool. Okay. So we can see stuff on here. We can see characters. We can see it thinks that there's a buffer called S. It's puts some offset. So if it's calling puts, what's puts? What argument does puts take? A string, but it just says offset. We double click here. We can see that this is at data at hex 40, 20 that there's something here, but it didn't, it thinks this is an offset to something else to the soccer ball, but that doesn't make sense. That should be in there. We can go back here, move this into RAX, move RAX into RDI. Yeah. So that should definitely be. So this is when the things get things wrong. So in IDA, we can hit make stir. So I think I hit you to undefine it and then, wait, why did that? So that's a character pointer to here. That does not make any sense. Why does that go into RAX? Okay, whatever. We know this is the, why does that point to this? And this is the string. Okay, whatever. This is the soccer ball. Everybody agree? Like this is a string with new lines and all that fun stuff, print out the soccer ball. We can add a comment. If we think that's important with a slash welcome to the World Cup predictor stage one, v 18 equals zero for I is equal to one. I is less than equal to eight print out group I plus 65. So what's I plus, what's I being used for here? Yeah. So iterating over the groups, there's the, in the program, it called them A through H. Yeah. So if we did, right, going here, I did ABCDEFG, but it makes sense, right? It's kind of silly to iterate over that. So it's iterating zero through eight. And then it's printing out to get that A. So what is this likely ASCII capital A, and we can even ask it to show it in a character. So I plus unsigned int ASCII A. So print out the group F gets, what's the, what does F gets do? So F gets character pointer size and the stream. So we can see F gets read it, reads in at most one less than size characters from stream and stores them into the buffer pointed to by S reading stops after an end of file or a new line. If a new line is read, it is stored in the buffer determining no bite is stored after the last character in the buffer. So we can see where F gets thing into the V 22, some local variable, 32 times I 32 characters. So how much data can we put into this program for each group? So how did F gets work? Say it again louder. What? Yes, 32 bytes per group, 32 characters, the characters of bite. So 32 characters max. And then if something sub something this thing V 18 minus minus, that's weird. That's doing some other weird stuff. And then getting if this is greater than seven stage two predict the winner and string like crazy and two and this and this congrats here is your flag. Okay, so at least have a direction, right? We want to get here, right? This is when we win to get the flag. We can pop in here and say, look at this. What does this do opens flag dot txt? I see. So if there's a flag dot txt file, then it outputs it. So what would we call this function? What does it look like it's doing? Output flag. And then we never have to look at this function ever again. But it's updated here. So we know that we've already looked at it and documented it. So let's see quickly before we go, what is this doing? So F gets V 21 32 times I. So let's look at what it thinks V V 21 is. So character V 21. So it thinks it's 33 bytes. X. So I'm what I'm using is X ref. So you hit X and it shows you everywhere in the function that V 21 is used, you can flip between them. So the address of V 21 32 times I. So how does this seem like it's being used? Why would you do that? Yeah. Yeah. So it Ida thinks that this is a 33 byte character array. But it seems like it's actually an array of arrays, like there's eight 32 byte arrays. So we can try telling Ida that failing to remember what the heck that syntax is. I think it's like, is it like this? Anybody remember? There's my C experts. How to do like a multi dimensional array? Y X. Let's just do anything. It's the other way around. Should be the same, right? Let's just try this and see how bad that breaks things. So it looks slightly better, right? Yeah, that's right. Okay, so undo. Oh, no. Oh, there we go. If you, maybe you don't believe me, but Ida for the longest time did not have an undo functionality. Like I think it was probably 2012 2013 that it finally had an undo functionality. It was very painful. So where were we 32? Oh, we just did that. So 832. Hey, look at that. And what do we want to call this variable? You're like groups. And now look how much nicer this line is, right? This is a line that you would actually write for this code. You would have a multi dimensional array on the stack. And you would say, okay, f gets group I, and then pass that to this sub this function, we have to probably figure out what that function is. So we could go in here, we could say, actually, usually I like to call it the first check. I probably call this not a type oops, sorry, maybe check group. I can go in here. So character pointer hit f5 to make it use that call this group. Okay, so string length of group minus one is equal to zero for I is equal to zero, I is less than or equal to 31 I plus plus. If not equal string compare group offset, we have a new thing offset 404 I return one, otherwise return zero. Ah, there's our friends, you see our friends in here. All the countries. So this is call that countries. And it looks like, so this is an array of offsets, each of these points to a string. So it's probably I would call this a character pointer pointer. It's an array of character pointers. I guess we could say exactly what they are. We can go back to the pseudocode, we can run it again. Oh, I don't want to have to call it a const. So what does it look like this function is doing? Oh, we're like way over time. Oh, not way over time. What does it look like this function is doing? Yeah, checking if the input is one of these array elements. Kind of cool. Right. So like, I'd probably then change it to maybe change this to is known group, or is known country name. And so now I'm going forward. And now I can know how to manipulate this. And next week, or next week, on Wednesday, we will look back and we'll finish this because we're like halfway there. Good stuff.