 All right, and we continue forward, so we actually covered all of the network security stuff that we're going to cover. Now we're going to move, and we have to basically, I want to get through all of this by the last class. So we have three classes to go to cover all of application security. Not to terrify you, but there's 233 slides here. A lot of it is pictured, like animations. There's like 20 or 30 slides that are just one animation. So I shouldn't concern you except that we have stuff that I really want to cover. So what we're looking at now, we're shifting our focus from the network and we're looking at now applications. So what do we mean when we talk about applications? iOS apps. Yeah, what else? Just like any general user interface. Yeah, like what, give me some examples. Your email client, like Outlook, what else? iOS applications, that was one. Android. Android applications, so mobile applications, what else? Browsers. Browsers. Yeah, so, and actually, basically this ties in very nicely with the network stuff that we looked at, because who is generating that network traffic? Applications, right? There's some application and maybe you, the user, ask for something to happen, but fundamentally there's an application running on both the client and the server that are using the network to communicate. So, and this is kind of important to think about because we want to, we're going to start off with thinking about what are applications, how can we influence their behavior, because that's how we can eventually exploit them, and what does that mean for us as an attacker? So, basically you can kind of separate it into two categories, local applications that are running on your machine, and remote applications that are accessible over the network. Why is that difference important? Application for the binary. So maybe if it's local you have access to the binary that's executing, yeah. Applications that are remote might have different security concerns than ones that are just run locally. Yeah, so maybe different security concerns, and what, so if you think about an application, right, I don't know if you think about it conceptually, what actually controls what happens and how that application executes? This isn't a trick question. There's some low-making for, yeah. The operating system. The operating system that it's running on, yeah. Why does that matter? Somebody else, yeah. Well, like as an example, if it doesn't have brief access or move authorities, then the operating system might stop if it's trying to do something else. Okay, yeah, so then that maybe depends on the behavior of the operating system and what the application does, so maybe there's interactions there. What else? Possibly so you can have different operating systems implement the same kind of system calls or standards differently. Like what about more concretely? Like what actually dictates what happens to an application, yeah. The code. The code of the application, right? You write the code. Somebody's written the code to that application to do something. You'd agree that that informs how it executes? What else? So what else controls kind of how that code executes or I don't know the way to think about it is what branches through the program the code takes. Yeah. I was just going to say the user. Why the user? What about the user? Just like the application magically scans me and it knows the thing. The user makes decisions on that application, I guess, provides. How do they communicate those decisions to the application? Some sort of input device. So input to the application, right? So that could be from user input. It could be typed on a command line. It could be clicking things on a GUI interface. All of those are different ways for input to come from the user to the application. Yeah. What else? The output of the application? I don't know if I agree with that. What? Another way to think about it is what does like how is the output of the application generated? It takes some input. Yes, by what? It takes some input across the code as an input. An input and a code, right? So you can think of it in terms of data or inputs, right? So you have an application. It has some code. You have different ways of feeding input to that application but fundamentally those inputs constitute and along with all the other things that we'll talk about in a second about the operating system, what about its configurations important? Yes, why? Why are configurations important to the application of execution? The machine has configured the environment variables and we have permissions, things like that. Yeah, so exactly. The environment variables, the permission. So you can think of the environment in terms of like everything about the system where the program is executing but you also have configuration parameters. Maybe there's a configuration file that the application is reading from that chooses, I don't know, which port it allows access to or what IP addresses to trust. Like these are all, you can broadly consider those inputs to the application but they're not something maybe you directly control. Yeah, so the idea is so you can think of the behavior of an application and so the code that it's executing, the data that's being processed and the environment in which it's run. Can you control the code that the application executes? Yes, you write that application. If you write that application, if you write that application you can control everything that does. Is that important from a security point of view? From a security standpoint if we want to trick the application to do something it's not supposed to do should we assume that we can just change the application's code? Yes, yes. Yeah, why? I don't know. Shall code is actually a wrong change? Are you still changing the code of that application? No, you're just... It's a tricking it into executing some other code but you're not changing the code of the application, right? If you can do that you can fundamentally make the program do anything you want, right? So if you allow me root access on your machine I can just change any binary on your system to do whatever I want. But at that point you've given me root access to your machine so you've given me already the access that I need to do that. Think about a system running on a remote machine. Do you have the ability to change the code that is running on that machine? If you find the machine. If you find the machine but you can't, it's remote. That's the whole point of remote. Do you have access to it? Do you have access to it? But if you can edit the... I would argue that if you can edit the code to an application it's already game over, right? So how could you possibly secure assists like... if you think about maybe Apache you can say, well yes I just get commit access to Apache and I insert my own back door and then I get access to every single Apache installed around the world. So not to say that doesn't happen but that's a much more intensive process than thinking about how can I break into that specific version of Apache that's running on that system. What about like the submission server? Isn't that... you're technically... it's like a remote machine but we're writing the code that is executing on it. Sure, have you broken it? No. Has anyone? Is that a challenge? Is that extra credit? It's good. Nobody has done it yet. So... Don't ask if it's extra credit. Sure. Why not? You should be doing it because it's fun. Can we actually try? I don't want to be punished. Did I say that at the beginning? Yeah. If you found bugs in the submission system you could get extra credit. But... It's the same submission system I've been using for three or four years in grad classes and undergrad classes and nobody has yet broken it yet. But is it... are our next assignment running as rude because of the scapula or whatever? Yes, you have to do it very carefully. I still posit that it's... you still would not be able to break out of it but you could cause some problems. That's why you could do some denial of service but you can't get access to everyone's grades or anything like that. I'm... fairly confident in that. Yes. Yes, sir. So... But you think about... Okay, take that as an example. So how would you try to exploit something like that? You should trick me to change code in the system. That would be one way. You need to deploy that code first. Yeah. I know in some specific cases you can redirect memory addresses to point to malicious code. Potentially. But how do you get that? I mean, how do you even go about doing that? So you say influence and application, right? What does that even mean? How do you do that? You just... magic incantations that things happen? I'm saying maybe you could mess around with the remote system and like the positive restart. Maybe... while it restarts maybe there's an exploit or something that it does differently. Okay, try to maybe get the system to restart. Yeah. So why input? Why is input? Yeah. So it's a remote system running on a remote machine. You can't change the code. You can't change the environment. You can't change the operating system. The only thing you can change is the input that you give to that system. Right? And then trying to trigger some kind of vulnerability or something that we'll look at. Yeah. What if I do what the Russians did like put a USB like in a shop that will have to do with the system? Yeah. That is a different way of doing that. I still think in this case well, USB drives don't auto load so you'd need another vulnerability that would exploit the operating system to get that to automatically exploit it when you plug it in. I'm not saying it's impossible. It's probably possible, but... What do you know? So the way to think about this and we think about what we're looking at now in terms of the CIA triad we're really looking at things to violate confidentiality and violate integrity or to violate availability but we won't focus so much on availability getting things to crash in this case. And it helps if we think about and this is what going back to what we're talking about thinking about conceptually what is an application? So you think about conceptually an application is code running somewhere and what can influence that execution? It could be the environment and here we mean very specifically the... I mean in terms of like a UNIX environment what kind of things are we talking about here? Yeah. Would the version of the operating system count? I'd say the operating system is a bit lower so it's not like part of this environment that we're talking about. Yeah. The tools and machine like drivers and stuff like that? I'd say that's lower level than what we're talking about so we'll talk about the file system and all that stuff in here because those are different things that are on the system, yeah. The services being run in the environment? What was that? The services being run in the environment? No. So more specific in an environment rather than thinking generally more specifically, yeah. Would it be like the Java environment? Interesting. So the Java... I would say the JVM is actually part of the application so that would be part in there, yeah. Would it be like the config of the application? Possibly. What about you who said something about the environment? I want to go back. The environment variables? Like what does that mean? So there are strategies that the application can access that are specific to the system? Like what are some examples? Like you type a tilde in Linux and it goes to your home so that where the path is for your home directory would be an environment variable that depends on who's logged in what that tilde translates to. Yeah. So the environment here we specifically mean in a Linux environment let's see if this will work. I have no idea. If you type in ENV it will show you your environment and define all these variables. Some important things are like home. So this is the home directory and this is exactly what happens when you do like CD tilde. It looks up what is this home environment variable and that's where it takes you to. So it goes to the directory slash users here and you pay one but you could change that home directory or anyone ever have messed like path is also the one that's most kind of used and changed. This is what bash uses when I type in LS right. Did we talk about this? I feel like we did about how it kind of this is bringing up a lot of memories earlier in the semester of typing LS and then it has to figure out which LS program you're talking about. So because if I do this it's I think it's been LS. Yeah. So been LS is the actual program that actually is executed and what bash does is it looks up in the path for every directory in this path variable separates everything by col by colons and looks up does LS exist here? Nope. Does LS exist here? Nope. Nope. Nope. Nope. Nope. Nope. Nope. Nope. And this we're doing which That's the topic. It's in the top. Was it? No. No. Use your band? No. I don't know There we go. OK. And it finds it there and then it's done looking and that's what I know which LS to execute so so this can impact our would call having to do that a lot to get Java compilers to work and all that fun stuff. So you also have the operating system that the application is executing on. And you may have the network that the application can talk to. But you also have local files in the file system that the application can talk to. And not only that, you can have another application's other processes running on the system that the application can communicate with. And finally, you can have the user on their terminal being able to access this application. So if you think about where can user input come from to an application? And again, we care about where user input can come from because that's how we can try to influence this application. So if we assume we can't change its code, maybe we can change the environment that it's executing it to mess with its behavior. Maybe we can mess with the file system that we can put it in a weird directory that it's not expecting and execute it. Maybe we can. When you think about where our input comes from, right? The user at the terminal is one thing. So this would be some user input to the application. What's the difference between running the application locally on my machine and SSH into a system and then running an application? Is that local or remote? Local, why? This guest is 50-50 shot. Who's the first one I said? Because you are logged in as one of the users on the system. So you have those permissions. Yeah, it says if you were sitting on a terminal on that system. So when we say terminal here, it can be remote SSH, but you're still a local user on that system. Whereas if you think about a website, when you're talking to some website, you're talking to an HTTP server that's living on that system. You're not an actual user on that system. And then the application gets its input also from the file system, from the network, and other applications. So this is what we need to be thinking about when you're looking at an application and thinking, how can I break this? How can I make this do something it's not supposed to do? You need to think about what are all the different ways that input goes into this application? And then seeing what you control, because that's kind of the key, is thinking, what do I control in here? So what types of things could you control in here? Maybe, depending on the system, you may control the hardware? What else? The file system. What? The file system. The file system, you can control parts of the file system. Again, it depends on, you know, you have access to your files that you control, if you're not rude on the system, you have a limited access, so you may not be able to control everything. What else? What was that? The terminal, the input, right? Assuming it's a local application that you can run, you can interact with it that way. What else? Environment variables, so you may be able to control the environment. What else, yeah? Network, you may be able to send packets like if it's listening on board or something like that. It's as input. Yeah, you may be able to send packets to the application, or you may be able to get the application to talk to your application, so you can send it packets that way. You may be able to inject data into the network, right? This is why we looked at the network first, so you understand that the input that's potentially coming from here is the input that can come from the network is potentially untrusted. So what we're gonna look at here is kind of application vulnerability analysis, and the idea is how can we identify vulnerabilities in an application as it's deployed in a specific environment? And where can vulnerabilities or bugs come from? What was that? The code. The code, what do you mean by the code? Unexpected errors? Yeah, okay, so the code itself could have, like the code could have a mistake that the programmer made while writing the code. Is that the only types of bugs that ever exist? Or security bugs, yeah. Hardware? Hardware, so can you be more specific? Okay, so the hardware, so this is when we think about deploying in a specific environment. If the system that it's deployed on has a vulnerability, that maybe could undermine the application itself, yeah. Maybe the actual programmer engineer built it properly, but a library they used had some bug in it. So maybe a library, so a third party library, right? So you have the application itself, you have the application code, you have any libraries that the application uses, you have the operating system that is running on, you have the hardware that it's running on. Essentially a vulnerability at any one of those stages could compromise the application. What about even before the code is written? In the design stage, they might have overlooked something? Yeah, maybe they didn't think about access controls in the design phase, or maybe they didn't specify, I don't know, the password requirements, and so everyone has the password and password, and it's trivial to guess. So you need to be thinking about vulnerabilities kind of at each of these stages. So we kind of think about that in terms of design vulnerabilities, so is there a problem in the design of this system? So you think about a system that's coded 100% correct, but it's still fundamentally insecure. So you could, the other example I like for design vulnerabilities is anybody ever used like a coupon code on a website when you're shopping? Like reduce the price? Have you tried that code multiple times? What does it say? So it's already been used or not that? It should say it's already been used, right? But you can think of it, the design doesn't specify it if you can continually submit this coupon over and over to reduce the price to zero. That's not really an implementation problem, it's not like some buffer overflow or weird vulnerability. The problem is in the logic and the design of the application. So there's a discrepancy between what the programmer thinks the code should do and what the code actually does. Lower than that, you have kind of implementation level vulnerabilities, where, which is kind of some of the stuff we're gonna be looking at here of it mainly boils down to kind of trusting user input and having that have the potential to exploit the system. But even more so, we have kind of lower level even of deployment vulnerabilities. So this kind of gets into, not necessarily talking about the operating system or third party applications, but thinking about it in terms of deployed this on a specific system, is the system itself secure? Is the configuration of this application secure for what you wanna do? Can you identify vulnerabilities there? All right, let's look at implementation. So, all right, we'll actually skip these details here. It's important to understand the differences between being a remote attacker and being a local attacker. So again, what do we mean by remote versus local here? So less than that, because we still consider, let's say somebody's SSH in a remote machine, local to that machine, that doesn't actually have physical access. I'd say that remote is when you're strictly looking at how the program reacts and local is when you can mess around everything besides just the input. Yeah, so the way I like to think about it is in terms of this model. If you're a remote attacker, you are some other node that's outside of this network living somewhere. And you're trying to take over that application that runs on some machine somewhere. So you have additional problems with what we've talked about of figuring out what that application is, what version number, all that. But assuming even that you know that already, in this diagram, what now if you're a remote attacker can you actually influence? Only the network input, right? You can't control the environment or the file system or the any other application, right? All you can do is talk over the network on this system. So then contrast that with being a local attacker where you now do have, you can create things in the operating system, you can create different files. You can maybe run the application itself with a different environment to do something there. And so it's a key thing to understand. And usually the goal is, if you think about this system, if you're a remote attacker, do you have any user accounts on that system and an ability to look through the file system arbitrarily? What can you do? Think about it in terms of capabilities, what can you do? You can send into it to the application and get its response, right? Fundamentally, nothing else. So what you'd wanna do in a typical scenario, you wanna escalate from being a remote attacker to being a local attacker. So once, let's say you're able to compromise this application, you now can execute as if you were this application's user on the system, now what can you do? You can create new files, what else? Yeah. Pretty much just manipulate whatever you want in a local system. Yeah, you can do, what can you do, anything you want on this system? Applications. It depends on the application, specifically what about the application? What permissions it has. Very close, yes, I think that's close enough. Yeah, so the permissions it has are more specifically what user it's running as. So what's the user idea of this process? If this process is running as root, congratulations, you've completely owned that machine, you can do anything on that machine. If it's running not as root, if it's running as some random user, you have now have, you essentially escalated your privileges, I think we can all agree, right? You can do the file system, you can do all kinds of stuff, but you fundamentally don't own the whole operating system. So then your goal becomes, you've moved from a remote attacker to a local attacker, and now you wanna escalate from being a normal user to being a root user. So how do you try to, yeah, please. Well, so somebody at the beginning of class mentioned that our homework for this assignment was running as root, we used it so that we wouldn't be able to compromise the system if that was the case, how does that? You can, modern Linux systems have fine-grained capabilities. So you can be a root user with only the capabilities to have a raw access to a socket. So that's the only thing you can do. You can't edit weird files or do anything like that. And that's addition to other layers of security to that part in that system, so, yeah. I'm just gonna ask, do they match the vulnerability with a log file dumping into the Shadows file system? Which one, I have no idea. Hopefully. You can make a dump of the log, but I think Shadows are like a password with no password and a hash has no password. So you become root. When must you override the Shadows file? Yeah, that's why capabilities are difficult, right? So the way, at least this is set up, if your only capability is being able to read from a socket, or to raw socket access, as I think the one, you can't open up a root file for reading or writing. So you can only listen to that device. But yeah, that's part of the problem. That's part of the interesting thing is figuring out these capabilities. Like, do they have escalation parts in there? So yes, that is one mechanism that I'm using to try to restrict their access. But normally when an application is running as root, if you take over that application, you can fundamentally do anything as if you were root on that system. There's something else I was gonna say. I don't remember what it was. That's not important, all right. Cool. So, when thinking about local versus remote attacks, so local attacks are great because we have, oh, thank, this was it, okay. So, right, so how do we then escalate? So we're a local attacker. If we're not root, we must be one of the other user accounts on the system. So then what's our goal? How do we then try to escalate to root? Programs that run as root? So find programs, so setUID programs that run as root. We talked about setUID, right? Okay, I do remember that. So look at that discussion a long time ago. We'll talk about it too. But anyways, look at programs that run as root. The setUID programs, like Change Pass, you can look up all the setUID programs on a system and see if you can now, you're a local attacker, trying to trick that application to doing something it's not supposed to do to elevate you to root primitive. You can also try to identify bugs in the Linux kernel itself. If you exploit one of those, then fundamentally you become root. You are the operating system. The operating system has total control. From there, so most modern systems, so you're running an application in an operating system which is running in a virtual environment with virtual machines. So you can then even find exploits in like Zen or VMware to then exploit the hypervisor. So now you have access to every operating system that's running on that same hypervisor. Then you can even try to go further and get to like BIOS level and all kinds of craziness. But you have to like, you think of as you do this, it becomes more and more difficult or more and more expensive to do. But this is kind of the key thing. So in local attacks, we can manipulate the local interaction. So we have more fundamental ways to try to trick the attacker. Whereas remote is usually more difficult to perform because we can completely take over a system or take over an application running on that system. Local attacks are more frequent because the attacker, they're more frequent and I think in general, they find out more of them. So if we were to then think about, okay, the life of an application, how does an application, how do you get from, how do you get to the end goal is some software running somewhere in an application? How does that actually happen? How do we get there? Someone designs it, so you design it and then what? In our ideal world, something else. You have a project lifecycle, so you want to define your scope of this project and go through it and design it and implementation. Implement it and then what happens? So what are you doing? Are you writing, so does this CPU, so you write let's say even a C or C++ application? Is that what the processor executes? But do you write that binary? Are you writing ones and zeros? Yes. Are you writing hex code that the CPU will directly execute? Yeah. Yeah, so you write the code in a high level language, right? And then, and this is kind of ignoring the design and everything because those are all separate things. Then we need to translate it into some, so it needs to be translated into some executable form and usually saved to a file. There's a whole thing here between interpretation versus compilation, but we're gonna kind of ignore that for now. Then what happens? So now we have bytes on a disk represent kind of the logic of our application compiled down to something that the CPU can hopefully execute. Then what happens? Do you want to test it? No, I want to run it. Yeah, well I need to run it to test it, so yeah. I mean, technically if you're writing this, yes, but we're ignoring some steps. Well, if you run it and then the operating system loads it up, it may memory and then like leads line by line to the operating system. Okay, so yeah, so somehow we need to get, so this is the important thing, right? So if we think about a CPU, what does a CPU actually do and operate on? Zeroes and ones, and how does it do it? So those are the operations, but how does it actually, what is it operating on? Registers and what else? Memory. Memory, right, so when you boil it down, the CPU doesn't care about file systems or anything, those are other stuff that kind of happen or can happen in any ways, but if you boil it down, you have registers which are basically little pieces of memory on the chip and then you have your, whatever, memory. So somehow in order to execute those instructions, those instructions cannot be on the disk, they need to be in memory somewhere, right? So the application needs to be loaded into memory, which happens in most modern systems by the operating system will load, once the user says, I wanna execute this application, it takes that binary code, loads it into memory, and then what happens? I execute it, yeah, so then the operating system needs to somehow execute it, right? So as I tell the CPU, they start executing at this instruction where I just loaded this application into, it starts executing, and then at some point, it terminates, so this is kind of a boiled down version of the life cycle of an application. We will skip interpretation versus compilation, look into that when you study compilers and all that stuff, we'll go into compilation. Has anyone, I know there's definitely some of you, has anyone not taken 340 or not currently in 340? Okay, cool, we have to understand, we will go over everything you need to understand, we're not gonna go into how compilers work, but more so like what they do and kind of why they do it, in order to understand, so this is kind of, in order to understand like buffer overflows, you need to fully understand that whole life cycle of what happens there, because the problem is not necessarily in the C or the C++ code, it's in how the CPU is executing the binary code that is translated from the C code. But this is all things that we understand from coding in C and C++ and other high-level languages. So we first step of the compilation, we have some pre-processor that processes all our pound includes, pound defines, all those fun things. And then what is the compiler's spin-out? So what is like GCC or CPP, or I mean not CPP, what does the compiler actually do? It gives us like an execution. What was it? It gives us like an executable. But what is an executable, what does that mean? It will translate code into machine-readable, so it looks up pretty much what the command was and then translate it. Yeah, so it needs to somehow translate our high-level C logic into low-level instructions that the CPU can actually execute and understand. Right, and this is architecture specific, so it depends on what chip are we trying to execute this on. Is it a 32-bit x86 chip? Is it a MIPS chip? Is it an ARM chip? Is it a 64-bit x86 chip? So one cool thing, if you've never done this before, if you wanna develop a better understanding of what compilers actually do and what this code looks like, you can run GCC and pass the dash S option and it will not generate executable. It will generate the assembly file of your original C or Z++ program. Other thing is, so here, from here on out, we're gonna be looking at basically exclusively x86, so which is 32-bit. So most modern systems that you're running on now are 64-bit, so you just pass the dash M32 option to create 32-bit executables. So you gotta follow along. Cool. Okay, so once we're here, at this point, we just have assembly instructions. What do assembly instructions look like? Like in general. Like one line instructions. That do what? Like, yeah, so. They're like add two registers together. Yeah, add Eax to Vax and put the result in EVX or whatever. So you just have all of these very simple memory operations. I mean, you can basically do everything from that. But at this level, at the assembly level, this is something you can write. So you can write assembly instructions, which I believe you've done, right? So you can write assembly by hand, but the CPU can't execute that. I mean, it's text. Like what does the CPU actually execute? Yes. So it executes like binary? Yeah, binary. What does that like? So one way to think about it is it translates those add Eax, EVX into some very specific byte combination that the CPU knows how to interpret and say, ha ha, I should add these two registers together. And then now I know what the next instruction is and so on and so forth. But at this point, if you just output this assembly file, we need to actually turn that into into some, this binary object. So this is why if you do this dash S option, it will output a human readable text file that is the assembly instructions of your program. From there, you need to assemble that. So you need an assembler. So you think of GCC, it's actually this whole collection of tools that has all of this end to end for you, but breaking it down and understanding what is happening there is important to understand the whole process here. So now we need to turn it into some binary code, but do we just have like this blob? Like how does, like when we're compiling things, what are some important things to note or to have? Or I don't know, yeah. We need to know like the address of the instruction. We need to know of what instruction. Of any of the instructions that we're executing. So like for, like if we're jumping to a function call, we need to know where that function call is to memory. Yeah, so depending on the architecture, right? If we are jumping to very specific memory locations, we need to know where our application is gonna be loaded in memory. So we need, yeah, what else do we need? I thought that each executable one says what architecture is executable on and so on and so forth. So like the header, the 32-bit, 64-bit. Right, and the question kind of is why and what stuff should go in there. So like different systems do not run like 32-bit operations in 64-bit systems? Yeah, so we want some meds, like some information so that if this binary is copied to another system, it would know should it execute it or not. What about even where to start executing? Is that important information to be able to run an application? Yeah, right, and thinking back, so it may not be at the very first byte. You may have setups, you may have data that's being initialized, so you need some way to specify this. So basically there needs to be some additional metadata along with this binary object to say all these things. Furthermore, you, okay, so all this kind of information about relocation information, so if you do move the binary, so this actually allows the binary to be positioned independent, so you can move it around, which is possible debugging information. So this is where you do the dash G flag. This is what keeps a map of the actual binary like instructions to your C code so that the GDB can properly show that and deal with all of that. Then, so at this point, we basically just have some binary code with some metadata, but it's not necessarily executable at this point, so why not? Does it have to do with, so even though we have like binary, we don't necessarily know like where is the memory, like we haven't loaded into memory yet. Yeah, we haven't loaded it in memory and also maybe this isn't all of our program, right? We can write our program in multiple files, you can compile each file down to an object file and then at the end combine them all together. And this is how you can have libraries. There's a little bit of difference there between when you're kind of linking all your libraries together, but we need some other phase that's gonna go and going to link our binary object up with all of the library references that we have. So the Linux linker is called LD. You can statically link in anything at compile time, so statically linking means that that binary, that library is included inside the source, inside the binary code of your application, whereas dynamic linking is performed at runtime, so the system. So for instance, what's one of the most popular libraries that I'm sure all of you use when you're writing C or C++ code? Wow, we'll see C code. What was that? Yeah, but what is that? Where does that define? What was it? The standard library. Yeah, the C standard library, right? So that's a standard library that defines all of this printf, scanf, most of the input-output functions. Basically, you can think of it as anything that's not an operating system call, which an operating system call is basically read and write, those are basics, and everything else is done in this library. So the question is, when you're writing an application, do you want, should every application include a copy of printf in their binary? I've seen heads nod in different directions. Is this something I want to argue for one side or the other? There's no right or wrong answer here. Yeah, well now everything goes up, yeah. Why ship like code or stuff that you're not ever gonna use? Ooh, that's just wasted space. Let's say I do use it, though. Let's say I use printf a lot. Oh, I'm sure. Then sure, okay. Yeah, in the back. So by including it dynamically, we can save space because there would be redundant calls to printf in different applications. Someone else may use printf. But if I keep it static, then I know that I have a dead set version of printf and it can update or change that. Yeah, so it's actually a pretty interesting, kind of like software engineering concern when you think about it, but also that's security concerns with it. So the idea is if you're statically including every library that you have in your application, that A, inherently need your applications going to be larger, right? I mean, I don't know, in this day and age it's not a huge deal. But fundamentally, when you're dynamically loading this library, let's say there's a vulnerability or something in that library. Now in order to fix that, you have to ship every application that's on your system that statically linked needs to be shipped and you need to update each application rather than just updating the library in one place and now every application that uses it uses that new version. The flip side is if some developer, and this happens a lot more, it's in the web and other types of areas where some developer pushes a crappy version of this library that then breaks your application. And so the previous version was working fine, so if you, but maybe they mixed in functionality like security functionality in with this or security bug fixes in with this new functionality, now you kind of have to move to that latest version. So anyway, this is something that comes up over and over about the benefits between statically linking stuff and dynamically linking. So most now and then at this point we finally now have something that is in an executable format, so all this kind of really means is that it can be executed. On Linux, the file format's called Elf. Something, something's format, I go to the F is format. Elf is probably extended or extensible, or it's also executable link format or something. I don't know. Anyways, it's a very, and then on Windows PE, so if you see a .exe file, they're all in the PE file format. And all this basically means is that there's a bunch of information to understand it, so. Okay, cool. So the Elf, so we're focusing on this class on Linux, and we need to understand how this happens so we can start to dissect various parts of a binary application so we can understand how to exploit these applications for our own nefarious game. Elf files have a bunch of information, important tools to deal with them in the read Elf application. You can pass it an Elf file, it will show you all the headers and all the important things about the application. The file application is super useful because it reads the header data of a file so it will tell you what type of file it is, not based on the extension because on Linux extensions are meaningless. It's the file format that tells you this. So if we look at it and we think about what type of information, and really this kind of this Elf file format is in some sense a contract between application developers and the operating system. So if the operating system has as much depth, has all the information it needs in order to actually execute your application. Because without this, it can't have it. So there are different sections inside of an Elf file with different permissions. And so this is, you can do read Elf and it will show you exactly all of this. Important things to kind of, I don't know, be familiar with are the dot tech section. This is where your programs code lives. So all the programs code will be in the tech section. The dot data section is all initialized data that is like, so if you have constant strings, they're gonna probably be in the dot, well, maybe the dot data section, or if you have like global arrays will sometimes be in there. Dot RO data for read-only data. So if you have data that's read-only, the dot BSS is for data that is not then initialized by you, the programmer. So I think maybe technically, and again, this all depends on your compiler. So the compiler figures out based on the data that you use in your program, where to put what data in what section. Important things here are these flags. So these are basically, you can think of it as permissions on these regions of memory in your application. And what this says is that the important point here is the execute like the exact instruction. This means that in the tech segment you can execute instructions here. It means that, what that also means is you can't write. There's no write to that section, which means that you can't overwrite your own code. You need to do fancier things or change those permissions if that's actually what you're gonna do. Similarly with the read-only data, it doesn't have the write flag, which means you can't write like the operating system will cause a, let's see, is it the operating system? Let's say an interrupt will be triggered if you try to write to read-only data, for instance. Cool, any other questions or any questions? So tech segments, you can look at each application in each binary and we'll have different, can have different sections. So you can look at some of your favorite applications or actually better yet, compile your own applications to look at them and try to see what the sections are. This helps you get much more familiar with what's going on. Questions, crash course in assembly. So assembly is very easy. I think it's really important to learn in general because this is how computers actually work, right? This is how they, I mean, it's very nice for us to stay on our computer science theoretical program level, but actually something has to execute all these instruction, all these high level concepts. And so being able to actually look at a program and understand what's going on is really important. So x86 is actually really interesting. So this architecture has been around a long time. It was originally for the 8088 and 8086 chip. It, so it's been around a long time, it's been used in a lot of different CPUs. Does anybody know that it's actually a lie that I just told you that modern CPUs actually execute like x86? Does anybody know what actually happens? Is it virtualized on top of this? Kind of, can you be more specific? Yeah, so it turns out, so you think about it's kind of crazy. Like we've been using this exact same architecture for a long time. Essentially what, I think it's, yeah, the phrase is microcode. So basically the CPU executes this very limited set of instructions in microcode and there's a translation layer that actually grabs the x86 code and translates that to microcode instructions that actually gets executed by the CPU itself. And all this happens like on the CPU. So your operating system, your program, nobody actually ever knows this. It's like the subtraction provided by the chip. This is why everything's crazy and complicated but it means that Intel when they have things like Spectra and Meltdown, they can actually issue CPU upgrades to upgrade the microcode on your chip which is crazy. So anyways, peeling back the veil a little bit, seeing the horrors underneath and then closing it. So, but you can go look that up, all that stuff. So, just as we talked about, we have a number of registers in x86. Registers are basically you can think of as the local variables of the processor. And fundamentally at least on x86 you can't do any operations on, actually is that true? I don't think you can do any operations directly on memory but that may be false. No, I think that's true. You have to bring stuff into a register, then compute on it and then copy that value back to some memory address. So there are four. So again, this is when we say that it's a 32 bit CPU, essentially what we mean is that each register is 32 bits and that means when we're referencing memory, we can reference up to two to the 32 bytes of memory. So that's where you get, and that's actually one of the major reasons why we moved to 64 bit architectures is because it was impossible to reference more than four gigs of memory in a 64 bit application. So we have four registers, it's very easy. So it's gonna seem very complicated. A, B, C, and D, four, A, B, C, D. So a little bit of an historic back, so x86 evolved from a 16 bit architecture and those registers have the names, basically AX, BX, CX and should be DX here. So the E in front of it stands for I think extended. So extended EAX, EBX, ECX, EBX. Those are the four registers that we care about general purpose registers. There are some conventions, I don't think that's that as important. What is important is referencing those 32 bits in an instruction. So it seems kind of crazy when you're reading. So I tell you there's these four registers, EAX, EBX, ECX, EDX. And we think because we're a programmer as well, I will see variables of these names and no others. Transfer memory into EAX, add EAX to EBX, subtract ECX from EBX, and then put it back into some memory location. That is not the case because there's different names to address different bits inside that register. So you have the whole register, EAX, in a nice bit of backwards compatibility. So this is, we notice 32 bits. The last, or the lower 16 bits is referenced by the register, EAX. Inside there, the upper eight bits are registered, you can address by AH and AL admit here. So I know it's kind of crazy, but you do have the A, so that tells you kind of where it is. This means that whatever, if you have like, if this is all zeros and the rest of this is all one and you're moving AL into some other register, then it's gonna be just these bits that happens or computation happens. This actually is important when we look at shrinking shellcode size and all this stuff, which we'll kind of cover in a bit. When you're writing assembly by hand, sometimes you wanna avoid zeros in the resulting code. So we'll look at that. Other registers, like there's the ESI, the ESI register. Yeah, so we also have two registers that are mostly used for memory operations, ESI and EDI. And there's a number of other, so we have our four general purpose registers, these two registers, and there's a number of kind of, we'll say registers that have like a semantic meaning in the while executing assembly instructions. And we'll actually look at what these mean and how they work because they're important to security and this is where kind of, you can completely control an application. So you have the ESP, which is the stack pointer, which we'll look at, and we have the EVP, which is the frame pointer, which we'll look at as well. We also have EIP in here, which is the instruction pointer, which points in the tech segment or whatever is the code that's gonna be the next thing that's executed. All right, there's a number, and I'm not gonna get into this, yes, okay, so there's an EIP, so the EIP register is the next instruction to be executed. And so when I say it points to, what do we actually mean here? And this is, I think, important when thinking about what's actually going on here. It's like, it's the memory address of the next instruction. Yeah, so if you look, and this is, so, is the same as pointers in C? A pointer just has the memory location of something. So here, when you think about the EIP register, it is just like any other register. It's 32 bits, and the address inside of there is where the CPU is going to go fetch the next instruction and execute it. But there's nothing special, like the bytes inside there are just a memory location, but again, the processor doesn't really care about that, only when it goes to go fetch memory. So you don't change EIP directly, but you can use jump instructions to change the program's execution to somewhere else. You can do call and return instructions, which we'll look at in a second. That's not important. There's floating port stuff, which is crazy. Okay, first, a little word of warning about endianness. I believe we talked about this when we talked about networks, but we'll revisit it again. So again, what does endianness mean? Close, bytes. Yeah, it's on the byte level. So it's very, so yeah, for 32 bits, you have four bytes, which is the most significant byte or the least significant byte. So this is just something to remember. So Intel uses little indian ordering, which means that if you have the 32 bit value in hex of 03020100, starting at an address 00F67B40, what little indian means is at 40 will be the value zero, which is the little value. At 41 will be one, at 42 will be two, and at 43 will be three. What about how do computers represent negative numbers? Yeah, sine, but what does that mean? Intu's complement, exactly. So you flip the bits and add one. It's basically the way you do that. So negative one is all Fs, negative two is F, F, F, F, F, E. You'll need a calculator. Most calculators have nice programming modes that I use, so I just use, and I don't do anything fancy, I just usually use the built in Mac calculator for, sometimes you need to calculate offsets or looking at instructions, figuring things out. All right, continuing our crash course into x86 assembly language. It is a slightly, sorry, this sounds silly, but it's a slightly higher level language than just raw machine code. It does have some abstractions, but not a ton. Okay, one thing that makes this very confusing is that there's actually two different types of syntax to look at one assembly language program. It seems silly because the instructions are very simple. In one, like AT&T syntax, you'll have whatever the instruction is, like add, and then you'll have the source register and the destination register. So for instance, you'll have add EAX to EVX, and implicitly, the destination means put it back into EVX. Intel syntax is the reverse. It's gonna get super complicated. So, you can change this in this class, and there's huge fights about what's the proper way to do this, blah, blah, blah. Honestly, it doesn't matter as long as you know what you're looking at. It's not like, I use, we'll use AT&T syntax for now. Use whatever you want. I mean, it's in the slides. The fun, I mean, the easy way to tell, and this is my cheat, if you're looking at some code, you don't know which way the syntax is. Look for the constants. Look at what a constant looks like. I believe in most syntax, it starts with a dollar sign. So if you have a constant value of 10, you can't move a register into 10. You can move 10 into a register. So you just look where the constants are that tells you it's very easy. It's not, I don't know. Honestly, I look at like, it doesn't really matter as much. So now the question becomes, how do we actually address memory? So how can we do something like move memory from one location into a register? Go by example. So, okay, so the main way we'll look at is, let's say we want to move, and we have source. So we're trying to move, that's MOVL. And we wanna move memory from some memory location into, let's say, E, A, X. Let's say we have some memory value inside the register, E, E, B, X. So the way to kind of translate this from C is you can think of as the parentheses as a dereference. So dereference, whatever's in E, B, X. Whatever's at that memory location, move it into E, A, X. Similarly, the reverse, if we want to put the value E, A, X into E, B, X. And this is different. So how do we read this third line? Just move E, A, X into E, B, X. The percent sign is this syntax is what I'm saying, it's a register, yeah. Since the parentheses is for dereference, I assume that the third line is that it's just taking whatever value is stored at E, A, X, and putting it into the register, E, B, X. Yeah, so the key difference is in the dereference of the parentheses. So here, take whatever's inside E, A, X, those bits, and move them and copy them into E, B, X. Whereas this one says, take whatever's in E, A, X, and put it into memory wherever E, B, X points to. So look in memory, look at that value, copy that value there. And now, this could cause an interrupt to trigger if we had some illegal memory access, we're trying to write to memory we can't write to or we don't control. Similarly, this is the reverse of reading it into E, A, X. Now, one thing we'll often see, now often times, basically you can think of as a convenience factor, we wanna do operations on a value. So we don't wanna get that value exactly. We wanna get, let's say the base pointer, minus 10. And we wanna move that into E, A, X. So we wanna dereference E, B, P, get that address minus 10, X, which is 16, and move that into E, A, X. So this is the syntax to do that. The syntax is actually quite complicated because you can have a starting address, an index, a scale, and displacement, but mostly what you'll see of what's important is something like this. So move E, B, P, minus 8, because we'll see this actually means that it's a local variable. So move a local variable from the stack onto a register. Copy the contents. So the thing that we'll look at here, so the dollar sign here means a constant value. So this means literally copy not what's at the location 804, A0, E4. It says move the value 804, A0, E4 into register E, B, X. And this, so how do we think there's difference between this? The second one is going into the memory location. It's specified by that hex number. So go dereference that memory location, take whatever those four bytes are that are at that memory location and copy them into the EAX register. Questions on this? This kind of help you decode when you're looking at this stuff. All right, there are various number of, this is kind of a way of looking at what are the types of operations. What we'll look at here are not, we're not going to use all of these so it's not like I expect you to learn everything here. But push and pop are important ones. All kinds of arithmetic operations, add, subtracts, multiplies, divides, increments, decrements, logical operators, ands, ors, XORs, nots. Control transfers, this is kind of important. We'll look at exactly how call and return works. We have jump instructions, int and iret, int is trigger and interrupt. And basically you can do it, you can do a comparison instruction like compare two values and jump if they're not equal or jump if they're equal or jump if they're greater than or equal. Some input, output, we'll look at one so there's an instruction to do nothing. Anybody remember why there's instructions to not do anything? It's kind of weird. Interesting. Waving, yeah, what do you need to wait or when might you need to wait, yeah. I think if you're like doing a branch instruction the processor will load up the next instruction but then if it turns out you don't need to do that then you have to clear your thing. Yeah, it depends on the processor but if the processor doesn't handle pipelining correctly after a branch statement we may need a couple knob statements in order to handle that ourselves. Modern OSes don't need to do that and your compiler handles all that for you. Okay, now, okay cool. So how do we get our applications to actually do something? Let's say we just want to write an application in assembly. How do we get it to do stuff? In somebody working on my system. Yes, compile it, execute it, and then what? How does it do stuff? How does it actually interact with us, yeah? Like, isn't there like a read and write that the program can talk to the OS so like read inputs from the user and write outputs and make it work? Yeah, fundamentally we need to be able to talk to the operating system. We basically have to ask the operating system to do stuff for us. So if we look there, if we look at, I know this is a BSD system, but no, okay. It should be the second section. Yeah, so read is a system call that takes in the file descriptor number to read from a buffer to write the results to and the number of bytes to read from. So in our application we need some way to signal to the operating system, hey, we need you to do something for us from our application. Normally libraries take care of this. That's what I mentioned with printf and all these things, but fundamentally they don't do them themselves. They have to ask the operating system and this is all input output. So on Linux and x86, triggering an interrupt if you do it in 80 that will call the interrupt and call a system call, but if you look through and if you Google for Linux system call tables you'll find over 200 different system calls. So we need to have some kind of protocol to talk to the operating system to be able to say which system call do I want you to do? Do I want you to read, write, how do I do that? And so the way you do that is you specify which, there's just a table you look up, which system call you want to call and put that in the EAX register. So if we want to write a nice Hello World program in assembly without using any libraries or anything we can write an application when we can say that we want to string Hello World in our program and we'll call it HW in our text segment. We also need to tell the compiler which main is just like a main C program is where we want to start executing from. So it turns out if we look up that read system call is system call number four. So the very first thing we're gonna do is move four into EAX and then we have to pass those arguments. So we need to pass the file descriptor, the buffer and the size. So what's the file descriptor we want? Oh, sorry, I lied. We're not gonna read, we're gonna write but it's the same argument. So write is system call number four. So what file descriptor do we want to write to? Standard output. Standard output, what's the, so what are the three main file descriptors? There's something you should learn and burn in your head. What's zero? So standard input and then one, standard output and two, standard error. Standard error, yeah it was process of elimination but you still got it anyways. And so we need to say that we want to write two standard input and so we move one into EVX and this is just the protocol. We move the address of the string hello world new line into ECX and then what do we put as that last parameter? What do I have to tell the operating system? The number, so we need to, you guys can see this. So we need to pass the file descriptor in EVX. So the system call number in EAX, the file descriptor in EVX, the buffer in ECX and then the size in EVX. So we need to say how many bytes to write out. And if we look at this string, we can see that it's 12 bytes, it should be hello space world and then a new line. Then we need to call in 80, that triggers the operating system. The operating system will then look up each register and see which system call we want to call and set the arguments properly. It will write that out, then finally it gets to us and now we want to call exit. So we'll call, we'll move zero into EAX which is the system call exit. And, oh, sorry, we're not doing that. We're returning from main, we'll see that we're setting the return value of main to zero. And so this program, you can take this program, compile it, it works, run it, it's super cool. All right, we'll go back and we'll talk about exploits.