 All right, so today is when assignment six is assigned. Yay. And there's month rejoicing. OK, so you actually have a lot of time. I know so we're going to cover, like assignment six is going to touch on some of the stuff we're talking about today and on Thursday. So to give you enough time to do this, the due date is basically the last date I can kind of have it because the grades need to be due the next day. So the assignments due on the 8th, on December 8th. So no late submissions for this because I literally have to get grades out. So consider that a final deadline, whatever the scores are at that point, I'll take a snapshot and that'll be the scores. Question? Are we going to do anything like reviewing the final? No, not in class. It will be on your own to do that. OK, so this submission, so basically we're at this point putting together everything that we had talked about. I didn't update this. That's all right. I can do it right now. We are going to be putting together everything that we have talked about beforehand, specifically basically what you'll be given is access to a Linux server. This Linux server has a number of challenges. And every challenge that you solve is worth 15 points. The assignment itself is worth 100. You can get up to a maximum of 120 points on this by solving the levels. You can't go over that. That's like a hard cap. And I may add new levels. We'll see kind of how we were doing. Network is unreachable. That's fun. We'll be able to, oh, no, because I want to actually review those menu. Connected. There we go. OK. So this is a shared Linux server. So you all have accounts on this server. You'll need your username and password. That will be on the submission site. You will not be able to get your username and password until after this class. So at the end of the class, I will run the commands to update the things to be able to see it. Basically, OK, there we go. It's going. So why did I do that? OK, good. OK, so every and all actually SSH is in the server to show you. So inside the directory bar challenge are a series of directories that are a challenge. Each of those, actually, let's just go ahead. Are you guys deliberately messing with my network? You're still getting the assignment regardless. No, network selected. EBU wrong. I feel like I shouldn't be getting this in front of you. Password. Obviously, password one. I have to create good passwords. Yeah, there we go. OK, cool. So before we get kicked off, if you look in home, you'll see that there's a user account for each of you. If your username is obfuscated, it is basically one of these random hashes. If we do LS bar challenge, we'll see that there are one, two, three, four, five, six, seven, eight, nine different levels here. And maybe we're going to not add one. We'll see what's going on. Yeah. Is that nine levels, 1.35? Yes, hard cap at 1.20. Hard cap at 1.20. Yeah, so the nice thing is, you can choose basically what challenge you want to solve. There's a score command that will actually show a scoreboard of how everyone's doing, so you'll be able to see how you're doing in relation to everyone else in the group. Let's see. So if I go to my user, so for instance, I'll give you all a hint since you're here, there is a challenge called just execute me. So if we look at this, why is it yellow? If it's SUID, this bit would be an S, right? Not an S. Yeah, it's a group ID. So the S on the group ID, so this means when you execute this command, it's going to execute as this user just execute me, and what happens when we execute it? Well, there you go. So I'm the first person on the scoreboard. You all need to catch up. OK, so just execute me, done. Other ones will be more difficult, I hope. So the other thing you'll need to do, so some programs. So let's say you exploit a program and get it to execute arbitrary commands for you. What you need to do, so the whole way this all works, is getting added to different group levels. So this leak command that I've provided, L-E-E-T, is a command that will, when you execute, it will automatically add you to the group that you're a part of. So just execute that command, you'll see yourself on the scoreboard, it should be very simple. It's also explained here. Score, good, eval. OK, so submission instructions. So if you write source code for this, which you may write, I don't know, Python script to do something, submit all of your source code that you wrote for the assignment, please, and a read me. So the read me needs to contain name, ASUID, and a description of how you broke each level. So just create a text file. And as you're solving this and breaking each of the levels, have a description of how you broke that level. Any questions on that? Can we write the source code? Just like you write any other source code. Those need to transfer it to the server. Or you could write it on that system, too. There's Emacs and all your favorite text editors are there. Yeah. Is this the reason you signed up here? Yes. Oh, not for you, though. For me, it is. That's a pretty question, though. Yes. Yes. This is what I said, I'll release it at the end of class. None of you have your username and password, so you don't know how to get access. So is the objective to just, where are we trying to get out of these challenges? Are we trying to get flagged or what? It will be you need to basically get, on a technical basis, you can add into these groups. This just executing group, read select group, rock me group, the groups group. So you can get added to that group. It's challenge dependent as to what you need to do to get there. So I don't want to reveal too much about it. But it should be clear based on interacting with it. Some of them, like just executing, will call lead for you. So you don't have to do anything. You have to call the lead program yourself as that user. But we know, so all of them, I believe, all of them are set group ID with their own group tidy up. So when you execute them, that program will be executing with the privileges of the just executing group. So that way, when you execute the lead command, it will add you to that level. So if you have any questions, post on Piazza. You guys are very good about helping each other. I very much appreciate that. It's kind of arbitrary. Does the scoreboard post our obfuscated username that you generate? Yes. So all of your user accounts on here, if you chose to be obfuscated or a completely obfuscated thing, that's all that shows up on here. And I'll of course map that back when I give you points. I hope not. What do you mean by writing source code for this assignment? So for some of these levels, let's say, I don't know, if I was making up an example that you've already done, let's, maybe I want to do this as a challenge. There's one I'm not going to do. So let's say there was a challenge that was, every time you executed the program, it gave you a hash and then said, give me a password. And you had 10 seconds to get that password. You, as a human, probably couldn't figure out how to do that. So you may write a script, like a bash script or another program in order to interact with that script and automatically break the password. And I'd say you have to write any source code. I'm saying, if you do, submit that. Just like you do normally, just submit all your files. Like it's like a group. You can just select all the files that you want and upload it. We'll take it from there. Yeah, it's not getting automatically graded or anything. Just part of it so that we have it. Cool. All right, the fun bit at the end. There's a bug bounty for this. If you manage to get root access to the server, so if you manage to root the server, you'll get an additional 50 points on this assignment. Is that past the 120 cap or up to the 120 cap? Past. Nobody has yet won any of these bounties. I don't figure it'll happen. I'm just asking. It's possible. You never know. Is that the same server you use for like other assignments or no? I don't know how to answer that. Oh, is it the actual same server? Yeah. Oh, no, not a lot of chains. They're separate version of chains. It's still like Ubuntu 1804. But yeah, they're running. Actually can't even talk to each other at all, so it's all running on Amazon. Also be nice on the server. I've hardened this over the years, disabling the wall functionality so you can't chat with each other by default. Disabling, what else did I do? Oh, yeah, adding PROC rules so that you only have a max of 100 processes running per user, because otherwise you can easily do a fork bomb and take down the entire server, which has happened in the past. So please be cognizant that there are 99 other people than you using this machine. If you have any questions, just ask if you're before everybody can do something crazy. You'll do great. I have the most faith in you. And then at the end, I'll just run this, and you'll be good to go. All right, let's rock and roll. OK, so we stopped at looking. So we need to, we're going to cover some core vulnerabilities here, and where I believe we'll probably get through today almost everything we need to get there on Thursday. We need to first understand a little bit about how, and we looked at this, and we looked at, so how does an application ask the operating system to do something on its behalf? A system call, and how does it do that? What's the steps and procedure for that to happen? What was that? I do a register, and you assume that the structure is operating on it. Yeah, so we need to basically put, we need to figure out what's the numerical value that equates to whatever system call we're interested in, put that in the EAX register, put other registers, put the parameters to that function and other registers, and then call and interrupt an ITB to trigger and interrupt the operating system, which we'll know how to handle. And so we looked very briefly at this Hello World x86 assembly, which you can download and run on your own machine. So you have a main function. You'll move, so we'll move four into EAX, so we're writing, so this will be the method to write. We're going to move one into EVX, which is going to be the file descriptor we want to write to, so we know zero is standard input, one is standard output, and two is standard error. We are then going to move the address of the string Hello World line into ECX, we'll move 12 into EVX and call in an 80, which will write out 12 characters from this string that's pointed to an ECX, so that should be these 12 characters. And then that system call will return, we'll move zero into EAX and we'll return. Wait, so system calls on behalf of the application, right? So the system call, so the operating system provides a lot of functionality to applications, such as printing output to a console on standard output. So an application itself doesn't know how to do that, it doesn't know what, I don't know, it doesn't know what devices or whatever that you have of how to do that output, so it asks the operating system to do it and then the operating system figures it out. Similar things with opening up sockets, so making TCP connections to remote machines, all that is handled in the operating system until the application needs to ask the operating system to do that for it. Okay, so then the question becomes, what actually happens, so how does it go from your program being basically bits in a file, so ones and zeros in a file, to actually being an executed, like a process that's running on your system? So who's in charge of doing that? Is it also the OS? Also the OS, why? They have to figure out, they have to find like a space of memory to put your program and stuff. Yeah, they're managing all of the hardware, right? They're managing the memory, they're managing the file system, they're managing all the file system permissions, so they need to know, do you have the permissions to even execute this program? What permissions should that process have? What user ID should it be running as? What group ID should it be running as? All of this complexity. And so it's the operating system that creates a process that basically you can think of that takes this L file and turns it into an actual executable. And this happens every time you're at whatever the command prompt and you type in something like LS, that your bash program is asking the operating system to execute the LS program. And so it has to go through this process every single time. So it happens, it basically all is encapsulated in the L file, you can think of that L file has all of the information necessary in order to load that process into memory. Very cool thing, if you've ever poked around on Linux and you can, I think do this on our system, I don't know if you have access to the proc file system, but on Linux slash proc is like a pseudo file system that shows you information about all the, everything that's running on your system. So you can go to proc PID maps to show the memory layout of a process, which why don't we look at that real quick? I believe it's cat. So this shows you, so self is the currently executing process, so this looks at the memory mapping and it shows you that bin cat is mapped to these memory regions, readable and executable, bin cat is mapped to these regions, readable, all this whole thing, all of the libraries, so the SOs are all the libraries dynamically linked into this application that are loaded at runtime, all loaded at all these different locations. And if we look at proc, we can actually see, so these are all of the other processes that are running on the system. I think if we did this, we would see those ones, yeah. Oh, that's right, I have the proc system set up so that you can only see your, you can't see other people's, right? That's a security method. So it uses the information in the L file to do that and also if any objects need to be relocated, so we talked about maybe moving code and data in the application around that happens there. And then finally, it sets the CPU's instruction pointer at the start location of the application and then control flows from the operating system to the application. And then bang, now you can start executing it. Questions? The magic of the process. Okay, so we need to dig in a little bit and understand the memory layout of a process as it's running, so here we're only looking at 32-bit applications, so what's the size of the memory we need to consider here? Four games, yeah, which is basically zero to one. So zero's gonna be a low address with high address. Yeah, all s is what I think about it on X, right? F, F, F, F, F, F. It's basically 32 bits of all ones. And the way we're gonna draw basically all of our memory layouts and stacks are high addresses on top to low addresses on the bottom. And so kind of, well, we'll say by convention the kernel usually reserves the top in a 32-bit operating system, the top one gig of memory for itself, for the kernel itself, which the app can't touch, obviously, and the rest of the three gigs of the program, so the program basically only gets actually three gigs of memory from here, from zero to B, F, F, F, F, F, F on a, it's slightly different on a 32-bit application or running on a 64-bit binary, if it doesn't change what we're looking at here. So then we need, so what are all the things that go into an application's memory, like a process's memory space? So going back to this diagram, what is actually inside this pink layer here of the program? The instructions? The instructions, the programs code, why does it need to be there? So I can run it, yeah, this is not a trick question, right? Like you need to execute the instructions, but the instructions are part of the program's space, memory space, what else? For dynamic memory allocations, specifically what, what is dynamic memory allocations? The heap, yeah, so we need a space to ask for more memory, so the heap in C is when you call malloc or C++ when you call new, you need dynamically allocated and freed memories. Great, yeah. Aren't there memories for like stack frames? What do you do, function calls? So you need memory for your stack, right, and your stack can grow depending on your function calls, and we'll look at that in a minute, so it's not too crazy. What else, what other types of memory regions are there in the application? Or should there be based on what you know of things, like an application needs to reference things in its memory, yeah. With global? Global, so global variables need to be stored somewhere. What else, yeah. Libraries, maybe? Libraries, libraries get loaded and placed kind of in this space too. What about the arguments that you pass into a function? Right, those need to be, so when you're calling, invoking a function, when you're invoking a program from the command line, it needs to pass in whatever command line parameters you gave it into this process space so that it can read those commands. What about the environment? Can you read environment variables from a process? That's stored, well, we'll see, that's stored actually in the application in the uppermost region, so. Diving into this, we basically again have the top of memory to the bottom of memory, and we first have the environment, so we have all of the environment arguments at the top of memory, followed by any argv strings, so the argument strings, followed by the pointers to these things, so what arguments are passed into a main function in C or C++? Yeah, what are the arguments though in your writing, like a main function? Yeah, argc and argv, so what's argc? The number of arguments that are passed in, and what's argv, what's the type of argv? Yeah, it's a pointer to a character array pointer, right, so it said, and you know how many character pointers there are to go through because of argc, so it will tell you exactly how many, and you know what the third parameter is, there's actually a third parameter that you can put down the environment variable, the environment variable format, so it's an ENVP pointer, and I believe, let's see, it's a character pointer pointer, and you know you're at the end when you finally hit a null, so you can iterate over all of them, and each of those strings is a key equals value pair, so in there you'll see home equals something, all of the environment variables, so all those need to be stored in memory, so you first have the strings, then you have the pointers to those strings, and then you have argc, which is the number of arguments to ENV pointers, then we have our stack, so our stack is gonna basically start right after that, and it's going to grow down, we'll see kinda how the stack is used, but it becomes incredibly important to understand a buffer overflow of vulnerabilities and how to exploit stacks, a stack overflow, well, no, a buffer overflow on the stack, let's say. So the stack, maybe it's a good thing to start, actually right after argc and grows down, and it's going to change depending on how the program executes, every time a function is called more data, more of the stack will be used, and then as functions return, the stack will go up and keep going up and down depending on the execution flow of the program. Do you have an infinite stack? No, how do you know that? We're gonna reach the bottom of memory at some point. Have you ever encountered that? Yeah, how? When, for example? Yeah. Yeah, recursive function where you mess up the parameter, so it calls itself again and again, every call adds a function frame on the stack until finally you overflow memory, and I think you'll just get a seg fault on a C program, and Java will probably tell you that you ran out of stack space, so you're gonna stack overflow exception, I believe, but it's OS dependent. So at some point it ends, and you'll have some shared libraries in there like we talked about now, so you basically have one section that can kind of grow unbounded, which is the stack, and you also have this other place that can kind of grow unbounded, which is the heap. So the heap, you can keep allocating memory over time, and so kind of from a design perspective, you have two pieces of memory that are gonna be growing and shrinking over time. If you put the stack here, and then you decided to put the heap here, you've essentially fixed the amount of stack space that you can use, so the idea is put them at offset points and grow them in opposite directions so that the heap will start at the bottom and grow up at a certain point. And then finally we'll have our data section with our uninitialized variables, our initialized variables, read-only variables, and then finally our code segment at the bottom. Is this like shared amongst processes? No, so every process thinks it has an exclusive lock to this memory. The operating system plays tricks with memory mapping so that that's not the case. So yeah, if you look at virtual memory management in operating systems, that's the process by which it lies to each of these applications to make them think. And if you think about it, an application doesn't need all three gigs of this memory, otherwise you'd never be able to execute this. So it intelligently figures out where to put these in physical memory so that you're using as much physical memory as possible. And it can do cool things like if you run out of memory, it can decide, hey, you haven't touched this data segment in a while, I'm gonna page this to disk and put it on disk. And so it'll stay there until you need it, in which case it'll bring it back in and put something else to this. Will it do the same thing if you open a new process that will print to the size of the current process if it knows that you're right now? It will page if it needs to. Yeah, there's some, I think you can share some libraries between processes, but I'm not, I think it may do basically copy on right where they'll use the same memory page. And if you try to write to it, it'll make a brand new page just for you. And you can use that one. But yeah, so handles, those operating systems are complex and awesome because they handle all of this stuff. So an application has this nice beautiful view and thinks that I just have this memory layout and doesn't care about anything else that's happening on the system. Cool. All right, so now that we've looked at and touched upon kind of the life cycle of a process and how a binary application becomes a running program, now we wanna look at the problem of you are given a binary, how do you know what it does? So in which cases may this happen? Like why would you want to know? Yeah. Maybe if you download something? Yeah, you just downloaded some garbage from a malware site. You wanna, it says it's a password cracker or a key gen. How do you know it's not putting additional crap on your computer? Do you have the source code to that? Probably not. Probably not, yeah. What are the circumstances? Yeah. Maybe once you like what architecture it can run on. Oh, I'm really excited to see this. Yeah, maybe you've downloaded some program, you don't know the specifics, you wanna understand what it's targeted for. What else? You're doing assignment six? Also very good, yeah. What else? We're not preparing you just to do an assignment, right? Preparing you to go do other stuff outside of the class. Yeah. I wanna see if it's in a couple, if it actually is in a couple of ways. Yeah, you wanna know, so maybe it's, maybe you're the military or a military organization and you bought this piece of software from a third party and you're about to deploy it on all the machines in your network. Does it do what it's supposed to do? Or you bought this router, you're gonna put this router in your home network or your network. Does this router have a hard-coded password where somebody can just log in with admin admin or admin magic one, two, three? They've actually had cases of this, like hard-coded passwords burned into the code, which were like administrative or maintenance functionality from the manufacturers themselves. So other things would be you wanna see if there's any vulnerabilities in this piece of software and you don't have access to the source, which happens basically in almost all non-open source software, right? So all of these circumstances, you want to be able to understand the behavior from a binary and you do not have access to the source code. So how do you do that? Seems like magic. Do you start looking at the ones and zeros? So in general, how do you mean, like how would you try approaching that, yeah? Generally you can decompile it back into an assembly code that's a little bit more readable to the people. Right, so rather than looking at the ones and zeros, you can try to take that step back to look at it on the assembly version and then study that to understand and try to figure out what it's doing. What else can you do? Yeah. So symbols, so specifically what type of thing are you talking about? Functions and objects that are in the code. Yeah, so look at function names in the code if the developer left those in. Any other types of clues that are in the binary itself? What else, yeah, you just said that. Like, he just said it turns into an assembly code, I'm sure. I think the one like I did in the binary. The decompiler, so yeah, disassemblers, the process which we'll look at and take a binary to assembly and then there are other programs that try to take an assembly code and turn it back into the C code. It's usually not very readable, but yeah. Could you execute it in like a C? Execute it and see what it does, yeah, in a safe environment, right? That's definitely another way and these are all, and I guess the other way is when the circumstances where you have a binary, you don't know what it does and you need to figure it out is like a capital flat competition. If you're given some binary, you need to either identify a vulnerability in it or you need to reverse engineer it to figure out a password that's somehow embedded in there. And often times people will lock in on just one thing of like, okay, I gotta take this binary and decompile it and disassemble it and figure out what it does when they never even execute it to see, to have an idea of what it's supposed to do in the first place. So the first step of the process is disassembly and that's basically looking at the binary and starting to go through and translate back into assembly instructions. Are these trivial programs or complicated programs? But why is she to write one of these? Could you do it conceptually? What is it doing? I would assume it's reading the ones and zeros and it's translating what machine instructions those map to and then since assembly is essentially that they're just mapping those. Yeah, so it's looking at, I think it was text values because that's kind of on the order of bytes that we think about it, but it looks at the bytes that says, okay, of this byte code instruction, it's basically like a compiler, but in reverse. I mean, that's essentially what it is, is it's looking and saying, okay, this is an add EAX to EVX, boom. And this next instruction is this. This is pretty easy. The question is, how can you distinguish between what's code and what's data in a binary? There's flags before the data, isn't there? There can be flags before the data, yeah. Or how do you, let's think of another way. So you start, let's say, at the first instruction, because you know the entry point of the application that's in the elf header. And then what do you do? You start disassembling from there. What happens when you get to a statement that says jump to some dynamically computed offset? Also, and the reason why that is tricky is because specifically on x86 instructions can be a variable length between one byte to, I don't know, five or six or seven or eight bytes. So depending on what byte you start at, you may get completely different disassembly instructions. So anyways, this is just, they are, can be very simple, but to do them correctly, they actually have a lot of kind of interesting works in here. So one type is just basically you just start at the start and just try to do linearly parsed instructions as best you can. Recursive disassemblers try to follow the flow of the program. So they'll disassemble, disassemble, and then to a jump instruction, they'll start disassembling from there. And then when there's other branching or call instructions, they'll start disassembling from there. So they'll try to find as much code as possible that way. There are nice disassembler tools you can use. I suppose I should put a binary ninja here, but it's actually never used it myself, but I know Radoware is a tool, a program analysis tool. It has, and the idea, and what we're trying to do is kind of this idea of reverse engineering, right? So what's the traditional engineering process? Requirements, then design, and then. Implementation. Implementation, and then. Employment. Do you think of deployment, which means like compiling and all that stuff down to a binary? Right, so the reverse engineering process is going from a binary, kind of going back up that process. So going from the binary and saying, okay, can we try to figure out what's the source code? What's the logic behind this? Maybe what the requirements are, the design, even all the way back that way. And so this is why it's often called reverse engineering or figuring out what this binary is supposed to do. You can do scripting in there, it's free, so you could check it out. IDAPRO is the kind of, is the state of the art tool for reversing it costs, and anybody know off the end? I think it's like 1500 for a license. For a single user for one architecture maybe, and other architectures are like 500 a pop or something. It has, and then if you pay, I think another $2,000 or $3,000, you can get access to this hex-wise decompiler, which tries to then take that assembly up to the C level. That's a very weird C level, yeah. Very compact, about a second. Yeah, so the cool thing is, you can integrate it with GDP and debuggers. There's a script, you can actually script it in Python to do cool things with binaries. It's a commercial product that's expensive, mainly because it is basically the industry standard for reversing tools. There's a free version, and I actually think it's IDA 7 now, that's the free version. So there's actually a decent version out there for free that you can download and play with and start messing with. It won't have this hex-wise decompiler, but that's okay. Actually a tool I use is called Hopper. It is a decompiler, it's a commercial product. You can use it for free for like 30 minutes, and it's only about $90 if you decide to pay for it. It actually has a pretty decent, like I don't know, all these tools to me are kind of the same, except that IDA is a huge pain in the butt to learn how to use, like the user interface was designed by, I don't know, a charitable way to put it. Somebody who doesn't know how to design interfaces is me. Like there was no undue functionality for a while. Like key, there's like certain, yeah, anyways, buttons do various things. So, hey, these are tools. There's also Object Dump, which we looked at here, which is just a command line tool, which really for a small binary you can really use to try to figure out what it's doing. For instance, if we, very quickly we'll make time. So I can use Object Dump on VAR challenge, just execute me, and I can look at the main. So, what it's doing is, we'll see this as kind of some preamble. It's pushing some things. And again, so this is, we should see this right away. It's not 32-bit. How do we know it's not 32-bit? Is that it again? The registers start with R. Yeah, so a couple different ways. The registers start with R, right? On 32-bit it starts with an E. So R I guess is like really extended. I actually just made that up. I don't know if that's true, but it probably is. Which is meets the 64-bit register. Plus you can see that this memory address is a 64-bit memory address. This is a 32-bit memory address, but. Anyways, we could go through this mode and see the assembly instructions from this binary of what's happening. It's doing this, moving zero into EDX, moving zero into ESI, loading effective rest. It's IO standard in, blah, and it's calling exec VE. I believe, and then if we can do things like run strings. Yeah, so by disassembling it, you'd see that this is just executing user local bin lead. So that's basically all this binary does is execute that binary. Okay, so we've looked at basically different ways of how to understand an application, different ways of how applications actually work. And so when you're talking about attacking a system, so like in a pen testing scenario, we talked about you may need to do reconnaissance, do port scanning, figure out, or maybe do a ping scan to figure out what systems are active on a given network range and then port scans against those systems that are up to determine which ones are running what services and then maybe try to do some kind of remote exploitation against the network service. You can also try to remotely exploit the operating system, which is very rare nowadays. Why is that the case? How do you remotely exploit the operating system? You talk to remotely, right? So over TCP, IP, PCB, UDP, IP, right? So there's a bug in their TCP, IP implementation stack. You can cause it to either like set, like cause the operating system to fault, which kills the machine or maybe you can exploit it that way. How bad is it if your operating system can be exploited remotely? Very bad, it's like the worst thing that can happen. So this is actually the early 2000s, Microsoft had a huge problem with worms that happened because they had vulnerabilities and exploits in core Windows systems. And so they took a really hard look at themselves and kind of created this security development lifecycle and all these other techniques to basically improve the security of all their Microsoft products and specifically like the operating system parts that can be exploited remotely. So over time, those stuff gets hard in a lot. So what we're gonna be focusing on is mostly local attacks against set UID applications because we don't have time to go into all of these type of things. So, and this is a more hints or tips or if you're trying to go after that bug bounty, this is gonna be how to do it is, so a local attack is you're a local user on the system. We talked about the difference between local and remote users. You're a local user on the system and you want to get root privileges. So you're trying to escalate your privileges from your normal user account which can't do anything. When you play around on that system, you won't be able to look at other people's files, you won't be able to mess with other people. But if you're root, if you're me, you can do that. You can do basically whatever you want. So, in order to escalate to root, there's basically two ways. You either exploit a set UID root binary program to get those root privileges or you target the operating system kernel itself. If it's a kernel with a known vulnerability that has an exploit you can run, that will pop a root shell for you. So, to attack SQID applications and it goes back to remember that picture of the application model of how inputs get into that application. So, we can think about changing the input to the application. So, the command line arguments, the environment, we're executing the SQID application. We control all of these parameters. Possibly during execution we can try tricks with dynamically objects doesn't really work but file inputs, socket inputs, any other kinds of inputs during runtime to the program. Interaction with the environment. Is it creating files? Is it accessing files? Can you trick it to read from a configuration file of your choosing? Can it, can you send signals to that process? Does that process invoke other commands? And this is basically what we're gonna, this is kind of your high level roadmap of how to try to exploit these applications. We will specifically look at three different types of things here. This is a slide that's basically taken from my grad 545 class. So, if you're interested in that, we go over all these things in depth as well as actually web vulnerabilities and more in depth into the network stuff that we went. The question so far. So, how do applications access the file system? Application, so does an application make a connection to the SATA drive that your hard drive is plugged into? The issue SATA commands to it? Yes. Yes? Yeah. System calls. System calls through the operating system. Again, if you think about it, does applications that you write do very little on their own? Which is a good thing, because otherwise you would have to write all of that logic and that functionality. You would have to remember how to load balance in SSD so that the ware leveling happens. So, okay, but more specifically, how do applications access files? Like what are the, if you think of it like an API, what's the API like for an application to access files? Yeah. So yeah, you request either a pointer to the file or you request the operating system to open the file and give you a file descriptor. Both of those you can think of as conceptual pointers in the sense that they both point to some file. You get this opaque reference to a file where you don't actually get the file itself and then you can pass that opaque reference to other functions to read from it, write from it, all those kind of things. Great. How do you tell the operating system what file to access and read and write? Have you thought about this much? What is it? The name of the file? How do you do that? Yeah, maybe. One of the different ways to do that is when I'm more mean, so following all of that. Yeah. You can tell it where it's located. So you tell it where it's located, how? So pretend I'm very, I've never done this before. You give it a path to the file, like C, users, my file. Okay, but we're never thinking about Windows, so. Yeah. Yeah. In Linux, okay. You give it, you. Okay, so how did you do it? How did you do it? You give it a path directly to the file, to add, ECT, whatever. So just give it any of them, okay. Slash whatever. Okay, great. What are other ways to do this? So what does this mean? So let's say we say open this file, yeah. Oh, what does that mean? Yeah. That means, so start at the root and then go into the directory. The root of the file system and then go into the directory called ETC and then inside that directory, look for a file or directory called whatever. And either open that up for, depending on what flies you pass, reading, writing, append, whatever we want to do. Well, what are other ways we can, and so going, sorry, just to clarify, so going something like this and there's other arguments in here. Just gonna add a lot of dots. Fill in the dots. What are other ways, actually? Yeah. Instead of going from like the root of the file system, you can use like other variables, so like your home directory variable. So we can use whatever this, so what does this mean? Yeah, so actually we need to look up exactly if home will expand this or not. I think so. But this may mean expand this using your environment and specifically using the home variable and substitute that in there for the tilde and then open up that file. So think about it this way. Is every user that executes an application that's written this way, are they gonna open up the same file? Let's say running on the same system. Yeah. Yes, why? It proved to me that that's the case. The home variable is the same one, so it's the system. No, this one, right here, this one. This one's the absolute. It's the absolute path, which means that every person on the system, no matter where the application's executing or who their user is, will try to open up the exact same file. What about the second one? Depending on the implementation, they can have different cold variables. Each of you on the system will have different home variables. Your home is slash home slash your username. And which makes sense. What we're trying to do is have some configuration file that's specific per user. What are other ways to open files? Yeah, how do you do that? Instead of specifying the absolute with the slot at the beginning, you could just start with either the file and then you can open or the directory starting from, you could do everything. Yeah, so you could do, there's two different ways. I mean, not two different ways, but, yeah. So, exactly. So here, this means, so what does this open mean? What are the semantics here? Yeah. Yeah, so look at the executable, the current working directory, CWD. And if you'll look, so we talked about, we can look at PROC, we can go self, CWD, there you go. Cool, so this self PROC is referring to the LS process. This current working directory is, you can think of metadata associated with this process about what directory is executing. It's being executed. So here, when I execute it, it's being executed in Home and Ubuntu. If I go to CD slash, and I do that PROC self current working directory is slash. If I go to whatever media, it'll be slash media. So every open is using this current working directory as the start of any relative access. So if I wrote a program and said open file whatever in slash media, it will try to open the file slash media slash whatever. If I try to execute that file being in a different directory, I can, it would try to open a file there. So what's the, what's this dot dot do? How does that change the semantics here? Wait, it goes to the previous directory? The parent directory, yeah, exactly. So if I, so dot here dot dot slash, so slash media slash dot dot slash slash whatever would try to open the file slash whatever. So what I want to convey is these are depending on where, what directory the process is executing in. What's the current working directory of that process? So what happens if an attacker can influence or control the paths of the files? Depending if like the file is like a configuration file or something that you can maybe modify how the program is going to behave. Yeah, so we can maybe trick it into either opening files and reading them that we choose, maybe writing to files that we choose. It depends really on the context of the application but fundamentally, so let's look at the classic dot dot attack. So this is, anybody ever write code like this before where you say, we can't see it, so. Oh good, that's why it's so ugly, let's say that. Let's ignore this. And let's say we're writing some code that does something like name is equal to user input. Right, so let's say it's a function that reads from standard input, outputs a standard output, please tell me your name, and then reads in some characters from standard input and returns us a character pointer. So now we have the name, and now I'm gonna write basically pseudote, basically Python and C code, we'll think about it in terms of C code so I don't have to do all the string cat and all that stuff for right now, just to make a point. So what if I do open, so I say prep. So the user's preferences are located at the directory slash ETC slash program and append the name. So my program is basically gonna store and then we'll say, and open prep, we'll say something like FD and then print. So basically, this similar program is going to output the configuration that's under slash ETC slash program slash, concatenated with whatever I put in as my name. Again, this is not valid C code, but I'm concatenating strings here. I've grown up here with that. So it's nice about not actually programming on a real machine. Then I'm opening that file and I'm printing out the contents of that file. So assume now that this program is set UID root because it's a whatever file and it needs to have all your configurations and store your configurations. Where did we talk about are the user hashes stored on a Linux machine? ETC shadow file. So if you get access to that ETC shadow file, you may be able to try to brute force and break hashes just like you did in one of the assignments. So how could you trick this into giving you the contents of ETC shadow or any file on the system? Yeah. That's your name is dot dot slash shadow. So let's take it in steps like this. Okay. So if I just pass in my name as shadow, what file will this program try to open? Now, what if I input my name as saying? Dot dot slash. Dot dot slash shadow. Now what's it going to try to open? ETC. Dot slash shadow, which is what file? ETC. Shadow. Right, which would result to ETC shadow that will open the ETC shadow file and output it to me. Using that with any combination of dot dot slash, dot dot slash, you can get this to read out any file on the system that the root user has access to. You could potentially read out their SSH private key and then maybe try to SSH into the system as them. You could try to do read their mail, all kinds of stuff. You can actually read any user's system on the mail. And this is basically the dot dot attack. So, and this is important, and this is why I want to talk about this in this class because this comes up, this will come up in other types of contexts in your future programming careers. This isn't just something you need to worry about if you're writing C applications. This comes up in web applications all the time of uploading files if you're not careful about user input, including the dot dots, accessing files. So, here's a better example with some real code that you can look at. The other name is a directory traversal attack. If you think about it, you're forcing the application to traverse directories and open files that it didn't intend to do. So, how can you prevent against this? Yeah, so one way to think about it is, one way to think about it is, so when we think about this, the key problem is sanitizing the user input, right? So, we have some user input. We're using it in the context of opening a file, but we didn't restrict the user, or we didn't do two things. We didn't check that the user is accessing a file that is good, so we didn't check that the file name matches what we think it should, and the flip side, we didn't check that it doesn't have anything bad. So, there's two ways always to think about sanitization. So, one way is saying, well, if we only ever want people to look for files in this certain sub-directory, why don't I just restrict the user names to only be, let's say, alphanumeric? Then, is there any way that they could possibly do a dot-dot attack? I mean, if we don't want them to access any other files in that sub-directory, they can still access those files. They still, if we did alphanumeric, they need slashes to get to sub-files. Right, so you need foo slash bar to do sub-directories, but yes, so yeah, that's something we don't want that. Yeah, I was gonna say, it might prevent, you can still access other people in the same directory, though, not sub-directories, but other files in the top level. For sure, yeah, so we need to think about that, too. Maybe there's more validation, but the idea is, so one type of way to prevent these type of attacks and think about sanitization is a whitelist-based approach. Basically, you say what user input is good and acceptable, so only A through Z and only A through Z and only a zero through nine, and those are the only characters you're gonna allow, otherwise you air out, right? The flip side is blacklist-based, where you say, okay, I'm not gonna allow anybody to use any dot-dot character. What's the problem with a blacklist-based approach? Could be things that you haven't thought of to put on that list. Exactly, you don't know if your blacklist is complete or not, so unless you're, or as I'm experiencing trying to get the software to work every year for this class, is you don't know if a library that you're using has changed and now what used to be a safe character is now a bad character to that library. Who knows, they added some new crazy feature that now, like in JavaScript templates, you can use double curly braces to do crazy stuff. Normally, anyways, normally in a web context, you wouldn't worry about those characters. For all these attacks, I want you to be thinking about these types of sanitizations and how you could break them. Yes? Would it also be possible to escape the special characters from the input? Rather than prohibiting what they're allowed to use, but to go through it and clean it through escaping? Yeah, so it depends on, actually it depends on context is really the key. So for instance, here, I don't think there is any way, like dot dot is a special character combination to talk to the, that the file system, like the OS will interpret, I believe. So I don't think it's ever possible to create a directory called dot dot. You get into weirdness with like UTF-A and all kinds of other crazy characters, like you can create file names that are named space or I don't know, all kinds of weirdness, which can cause strange things to happen, but in this case, there is no way, I believe there's no way to create a file called dot dot, but I don't know if that's true. Yeah, you just have to really understand these, the exact semantics of this open version. And we'll talk about escaping, I think in a second. Okay, cool. So we talked about this a little bit. So what does the path environment variable do? Say it again? Yeah, so it determines how the shell, your shell searches for commands, right? So you have bin bash, you're running bash, which is your terminal, or your shell, and you're typing LS, and it has to figure out what LS program is this crazy person talking about. Where can I find that? So it looks in the path. So, think about your running an SUID application or a set group ID application, which you're doing in assignment six. Do you control the environment variables? I thought there's something you control as like a user. Yeah, you control all of them. So you're executing a process, and even though it's a set UID process, your environment is still copied into the program itself. So the program actually inherits all of the environment that you send in. So, certain commands, okay, so I need to pull back some curtains real quick. So when we talked about an application, okay, so when we talked about an application being able to, the operating system being able to turn, and what am I trying to say? When we talked about the operating system being able to turn an L file into an executable, there is essentially one system call, exec the, that gets called. So this is a system call that takes in the full file and leaves half the argv parameters to pass in and the environment pointer parameters. This is where you can find out if you've ever been curious. This is where the shebang stuff is all defined in here. So if you want to learn more about that. So everything gets passed into the main of that new program, exec ve doesn't return, but exec ve is the assist call that does all of the actual work, and that's what actually happens. There are other ways to execute a program, for instance, exec lp, so there's a whole family of functions, exec l, exec lp, exec le, exec v, exec vp, exec vp e, they all differ basically on how you call them and how you pass the specific parameters. What's the difference between these functions and the functions that we just saw? Yeah, so underneath the hood, these all end up calling exec ve because that's the function that actually does the work. So these just translate into it. There we go. The file is sought in the colon separated list of directly path names specified in the path environment variable. So this is how you can, exec ve, you cannot say exec ve ls. You have to say exec ve, the actual file, like the full path to the file that you want to execute. So these are all different front ends and when you read and understand their semantics, you see that they will use the path variable to locate applications. And it will, and this is where it gets slightly tricky, modern operating systems have some defenses against this, but for these type of things, and also another vulnerable function is system. So anybody know the system call? So system basically is the exact same thing as saying slash bin sh dash c, some command. So you can call system and do a whole command line with ands and ampersigns and semicolons to execute whatever you want. So this is a libc function that parses that exactly as bin sh does. Which means again, it's just as if you're on the terminal, which means it's using the path environment, it's using the path environment variable to figure out what you're executing. So why is this important? Let's say you are, let's say you're running an SUID application, a SUID root application. Now, one of the classic UNIX philosophies is to write small programs that do one thing and do it well. You've heard that before, maybe hopefully. So all these applications, so rather than reinventing the wheel and implementing some functionality that another application does, you want to call out to that program to make it do something. Let's say a JSON parser. There's command line, it's a little weird because that's not built in, but there are command line JSON parsers that you can use. So if you're calling exec-lp-json-parse, that program is going to be executed as root as if it was root. And if you just say exec-lp-json-parse, it has to look in the path environment variable to figure out where, which exec-lp, which JSON parser to execute. Who controls the path variable? The person executing the second ID program, which is us. So we can set that path to be whatever we want, and we can make a file in our local directory called JSON-parse, which gives us or turns into a shell or whatever we want to do, change our path to include the very first thing, our local directory, dot, and then execute that program. It'll look for dot-slash-json-parse and execute that with our functionality. Similar things we just talked about of applications using home path, so using the tilde expands to the home environment variable. And at that point, if an application uses that, again, and set you ID applications, we control this home variable. So you can set this variable to be whatever you want. OK, some tips. Maybe if you haven't seen this. So how do I set? So ENV shows the environment variables. How do I set an environment variable for only executing for one command? Yeah, type it in the command first. So if I want to say old pwd is able to var, foo, var, there we go. Old pwd. So using this, I can overwrite any of the variables here. So this is an interesting thing. So I actually set my environment here. And it actually couldn't find. I'm surprised I put all this. So it couldn't find ENV, even though it knows where to find it. So there, I've messed with my path, and I've been able to change my path to whatever I want while executing this. You can do the same thing. You can also change it for the rest of this running process. You could say var. So now it thinks slash var is my home directory until I change it back. So you have total control of your environment. You can change it to be whatever you want. And all right, very quickly. Sorry, I know I said, do you get it? I just want to finish this. Now in the next class, we do buffer overflows. And we're good. So the other type of vulnerability we need to talk about is command injection, which is exactly a similar type of thing we just talked about. If you're creating a string to pass to a function-like system, you are, like we said, we want to call external commands to execute our tasks. If they're building up the string based on our user input and we can control that input, we could potentially trick it to execute other applications of another process. For instance, so system and pOpen are specifically vulnerable to this. We'll look at a simple example. We have a function that does an SN print dev. So it takes argv1, appends it to cat slash bar, log, whatever file. So not only is there a dot-dot attack here, so we can change directories and do all that stuff. How do we, in a command line, how do we execute multiple processes at once? Or what are different ways, yeah? The vertical line bar is a pipe, so we can run one program and put its standard output as the standard input of the next program. What else? Yeah, semicolons. Semicolons, so semicolons. And again, it's about parsing. So bash will parse this string and look for semicolons to split commands, bars to do pipes, spaces. So we can do, in this case, we can say foo semicolon space cat slash edcshadow. So the system calls sees cat this bar, log, foo, and then cat edcshadow. And we can get the whole edcshadow file. And you may think this, but you should go look at Shell Truck. It's a cool vulnerability that's related to that. Anyways, this is a classic problem again. This command injection is why we're talking about it. This happens also in web applications as well. Other ways to do this is backticks. So backticks is basically or dollar sign parentheses. It means execute a process and put its input or put its output here in the command. These are all ways you can trick the application into executing a process of your choosing. When we come back, we will tackle buffer overflows.