 Cool. All right. Hello, everyone. Looks like we got a good group, 111. Okay. So, first off, let's go over assignment four. So, I actually just finished calculating the scores for this about 20 minutes ago. So, I haven't been able to upload them to Canvas yet. So, as soon as class is over, you'll be able to get that and so without further ado. So, what do you guys want to see first? The victims or the scams? I think it makes sense to look at the scams. Yeah. That way, that preferences the scams. So, okay. So, this is the list of people who successfully scammed people. Okay, let's do it again. So, this is ordered by how many people they scammed. So, you can see it has this really like high growth where a few people scammed a lot. Right? You had 23 and 23. So, this actually made at least, I guess history of CSE 365 is the first time we've ever had two people scam the exact amount. So, two people scammed 23 people, which is pretty cool. And then you can see that it tapers off pretty quickly. You have 20, 16, 15, 14. And then I guess I didn't do... So, this means that 176 people were scammed. No, you don't name them. That's up to them to decide if they want to claim credit for doing this so well. No, this is not duplicates. These are unique, real signatures that they had on their adversarial key. You'll know after class. So, after class, I'll post it on Canvas and the comments will tell you exactly how much. So, these are signatures, right? So, basically the scammers were able to obtain 176 total signatures on their adversarial keys. So, then let's look at the victims because... Yeah, so then we can look at the victims. Well, it's not really 176 out of 384. It's hard to calculate, right? Because there is roughly 30 times 300 signatures. There's about 9,000 signatures that happened roughly. So, anyways, let's look at the victims. I think that's pretty fun. So, we can look at maybe the top 10 victims. And it's actually not as sharp as the previous one. So, this is the most scammed person only signed six adversarial keys, which is actually not too bad, right? And then beyond that five, and then a couple of people signed four, and then threes. And then the really interesting thing is how far you have to keep going out. So, this basically means that our scammers were very good at 60, 70, 80, 90, 100, 110. Yeah, so there we go. So, we got our first zero. So, 109 people signed adversarial keys. So, this shows you kind of the damage that somebody can cause about getting scammed. So, roughly a third of the class, if you think about it like that way, got scammed and signed an adversarial key. So, yeah, so it's really impressive for these scammers that actually are able to do this and trick people to scamming their keys. So, yeah, so I think this is a really interesting finding. So, the exact grades aren't calculated a ton or exactly yet. I still need to, like I said, I just did this, so I need to examine them, make sure they make sense. Yes, but this was pretty interesting. Let's see, I've sent an email to the top scammers to see if they wanted to out themselves and tell the class if they're strategies so we can hear from them. I'll take this opportunity. If you're here and listening and you would like to participate, please unmute yourself and you can tell us how you scammed people. I mean, I didn't scam anyone, but I know how people usually do it. That works. Sure. So, a lot of the things that people did, we took a UID and so we edited the key and we added a new UID and deleted off the old one and then we would re-sign it with a fake CSC 365 key and then package that on top. I can verify this person is not in the top two, so. I also did a decent amount of snaking or scamming or fake key signing and what I would do is I would troll a Piazza or like my Canvas DMs or just even people on the CSC 365 Discord we had. I looked for people that were struggling with the assignment and needed help and I'd help them get set up. I'm like, need everything, but then to help them test that they could sign keys, I would send them my fake key instead of my real key. And I would send them the sophisticated one like what the other guy was talking about. That's awesome. Yeah, so basically building trust up with people and then using that trust in order to get them to sign your key. That's good. So, one thing that I realized worked pretty well was if you send someone your fake key with the same name and email, even if they realized it after they imported it and then you sent them your real key and they imported that and used your email to sign it, they would sign both. And then if they use the email to export it and send it back, they would export your fake key sign, which worked pretty well. Yeah, so this is something I noticed actually for people submitting, right? So, part of the submission system checked to make sure you just submitted one key. So, if people just use the email address in order to export keys or to sign keys, like Ryan is saying, they would actually export the, they would sign both of the keys. And so, then when they would export it, they'd also export two keys. So, this is why it's really important to focus on the key IDs and the fingerprints rather than the email address. Yeah, I'll say in general though, some of the people are saying you're definitely, your Discord was taken over by people and people were scamming in the Discord and nobody was able to figure out who it was. We were doing our best to like moderate that. We were trying to quarantine stuff at least on the secure channels. There's a few that were on secure though. All it takes is one. And from this list, you can see that there is a lot of people that just got scammed once. So, if it was just that, yeah. Anyways, I think it was just fascinating. And the really interesting thing is it follows the distribution I've seen in almost every class that I've done this in. Where the victims, the scammers are not a ton, but there's a bunch of, a few people that scam really well and then it kind of tapers off. One thing I did notice is I was actually in a little sneak chat. We had like 10 people in there or something like that. All just collaborating. Well, there's only like a couple of us truly highly active. There were those, everybody was in there going like, just trying to like, hey, let's try this out. And so like we had like a little think tank going on there. And of course we would try to invite people to fake discords that were exclusively for scamming. That we would all try to act all friendly and what it was to catch a fish and bring them in. So I think I know people might have an idea if we got that six. We also were working on with the fake server key to make it harder to check. What we'd do is we'd load the armored exported server key, fake server key under a real key. We'd seed someone and then we'd put that person's name and like a chat that we were keeping track of everyone who would seeded with the fake server key. That way you could just like, I could just send them a fake key without like having them like if they checked and see that there's only one key included in the file they're importing. Nice. Yeah, so in the chat, so know that as far as I know there's no way to fake your name, email and the class signature perfectly. So because the class signature. So if you look, that was based off of the, that's the fingerprints are all hashes of things. You could probably get it to match pretty closely like people are saying, but to get it a hundred percent you would need a lot of compute to try to get that. Something that also worked well was because you uploaded the assignment and then like you had to take it down at that one point. Like a lot of people saw that and then didn't finish their assignment. So then you could tell people like when they were like, oh, why is yours seemed to have a different class? You could say like, oh, I got that from the first time he uploaded the assignment and then they'd believe you because they didn't check. That was one of the excuses I was using for it too. I was a big fan of the fake Adam DuPay offering signatures on a fiatse. That was a quality scam. Nice. I almost wanted to sign it just guys. That's great. Yeah, cool. Awesome. Any other thoughts on the assignment? On scamming? There's like a huge human component to this too because like if you tried to scam people like right before the assignment were due you got all the people who were really desperate. So I think they were less diligent and checking. Oh yeah. Yeah, there's that weird psychological component to this assignment. Or even if you think about somebody who was talking about earlier about leveraging trust. So if you build up trust in someone by teaching them how to do it correctly and then you just slip in your adversarial key as something for them to sign, they're so new they don't really know exactly what they're doing, but you just help them get set up. So that's kind of super interesting. So nobody, so people are asking in the chat just a second. Nobody authorizes me for this assignment. Nobody authorize me to out them. So I can't tell you if they were or were not one of the people who talk. I'll authorize myself for that. Okay. So Ryan was one of the two people. Oh yeah, someone also suggested just writing for the people that like needed help like getting set up just writing a script that you could give them where they would just, it would check all the keys authentically and they would do all that job except like a whitelisted group of people, keys. Did anybody use the key servers? This is kind of an interesting thing that happens can happen organically with the class. So there are key servers that you can upload your keys to and download other people's keys from. So different classes sometimes glom onto using that. The highest number you can see on my screen here, it's 23. These are the number of successful scams ordered in greatest to least. So the two people tied for 23. Well those people that got 23, will they throw all of the rest of the sneaks off with their point waiting? I don't know, we'll see, don't worry about that. It's all, it'll be fair. Cool. All right, well thanks everyone. That was a, hopefully that was a fun assignment that you learned a lot in. And yeah, cool. Okay, let's go on to application security. Cool, all right. So now we're moving on from from networks. So we've talked about networks. We've looked at the network protocols, how they work. We've looked at how our data kind of moves and transmits along the network. And now we're gonna start moving our attention and our focus to applications. And specifically different ways that applications can be vulnerable and that attackers can take over those and exploit them. Cool, okay. So what is an application? So this and kind of it seems like a maybe trivial or silly kind of thing to be thinking about. But we can really kind of applications are everything that we're using. So you're seeing my screen now. You're seeing the PowerPoints are being shown in PowerPoint. There's this Notability app. Of course, we're using Zoom and we're chatting over Zoom and having this video call. These are all applications that are running, some are running locally. So some you can think of like PowerPoint, this PowerPoint presentation that we're looking at right now is running inside PowerPoint, running locally on my machine. Other things that maybe I'm running local applications that I have that are connecting to remote applications over the network are things like Zoom, right? You're seeing my video maybe in the upper right or in some part of the Zoom screen. And this is information that's coming from my local applications getting sent to an application that's running on Zoom server somewhere that then kind of figures out how to get that data and that video to everyone else. Yeah, so for this, we're gonna call applications very, very broad. So you can think of programs, applications, widgets, routines, I guess kind of everything all the same, right? And this is kind of the important thing when we talk about, we're gonna get to where applications get their information from. And one of the key things there is the networks. This is why we studied network security and understood how networks work so we can understand the security implications of data that comes into an application from the network. And basically the behavior of an application. So this is a really important thing. Like what influences what an application does, right? So if we think about PowerPoint, this PowerPoint app we're looking at, right? There's kind of some very easy trivial things. There's some things like the code that's being executed of the application, right? So the behavior of the application is obviously very, is very influenced by the code of the application, right? But that's not all. The other parts that influence the behavior is the data that it's processing, right? This application is operating on some data that data controls in some sense what code gets executed and what type of things happen to this application. Similarly, so what are some examples of maybe the environment and how could an environment? So what are some examples of environment in an application? Environment variables, operating system. Yeah, these are good. If it's running in a browser, so what kind of browser? Linux, Mac, Windows, Docker. Yeah, all kinds. So you can think of kind of the environment around the application. So it's not just the application itself. Library versions, that's great. So libraries that are dynamically loaded when the application runs, those versions may change and that influences the application's behavior. All type of fun stuff. So and what, again, this is coming back to what we always wanna do when we're thinking about the security of an application or inversely, we're thinking about how can we identify a vulnerability in an application in order to take advantage of it? We always wanna be thinking about how do I violate the three components of security? Can I violate confidentiality, integrity or availability? And we'll look at different types of attacks here. And so, cool. And this is kind of a graphical picture of this. So I wanna make sure, so everyone can see this, right? Give me a thumbs up that you can see the screen. Yep, thanks. Cool. And so it's really, so when we think about an application, so if I want to, let's say, and we talked about it a little bit, we'll think kind of abstractly, I'm just gonna call it right now an app. So we have our application. So, right, it has some code in it that's running or whatever. And if we think, okay, so what my goal is, I want this to either violate confidentiality, integrity or availability of this application. How can I do that as an attacker? Right, how can I do that? And one thing to think about is, well, can I change the code of this application? What do you think? Yeah, it kind of depends, right? And it depends on the security of the application. So if I can just change, let's say, and the other thing that this is getting into is, right, so what this is getting into is what constitutes a confidentiality, integrity or availability attack on the application, right? So if we think about this PowerPoint application that's running here locally, right? PowerPoint is running on my machine. Does it violate the security of this application if I can modify the code? Yes, integrity, but it's my application that's running on my machine. Why can't I go ahead and just modify the binary? Maybe I can, I don't know, figure out where it says notes and change this to something. But it doesn't matter that I didn't write the program, it's running on my machine, right? So if we look at the permissions of this application, I have permissions to edit the code of this application, right? Now, think about it differently. What about PowerPoint that's running on your system, right? Yeah, I mean, there could be some EULA stuff but that's not really what we're trying to get at here. You know, fundamentally, I have all the bits on this machine. I own all the bits that are running here. I can go ahead and edit, that's just a file, this binary is just a file on my machine, right? So I could change this binary, overwrite the code and do whatever I want. But I can do that because it's running locally and because I have, I mean, not thinking about anything that Microsoft has put in there, but you know, I have permissions locally to edit this file, right? This binary application. But I don't have permission and I shouldn't be able to just, let's say edit the code of your PowerPoint running on your machine, for instance, to sniff your password or send me all your PowerPoint slides or whatever, right? So kind of what I'm trying to get at here is, oftentimes we, you know, on a, the security of a system that we want to think about, we can't just edit the code directly because it's running on somebody else's machine and we don't have control of that code. If we had control of that code, then being able to modify or edit the program's behavior is trivial, right? Me on my machine, if I download a binary, I can change that binary to do whatever I want. I haven't necessarily violated any security principles here because I'm editing a binary that I control, right? I have 100% access to modify and change this binary on my system. Now, that being said, if I'm able to, let's say compromise a application that's running remotely on somebody else's system that we're not talking about distribution, right? This is just on my machine, right? So I haven't gotten anything by editing the code here, but if now, let's say this app is running, we'll use zoom as an example right now, zoom, right? This app is running and it's running on a server that's doing video, right? So it's my video coming in, it's distributing video everywhere. Now, if I'm able to, let's say, trick this application into doing whatever I want. So using some kind of behavior I want, let's say violating integrity by giving me access to everyone's video on all zoom calls, right? That would be a massive confidentiality and if I can edit videos, that'd be an integrity problem. If I can crash this app, that would be an availability problem because it's effectively because it's running on somebody else's system and the core thing is I don't have permission to do all those things, right? I don't have permission to edit to listen in on other people's zoom calls or to disrupt video calls over zoom or, yeah, I can turn on all your cameras. That would be pretty good. That's a good attack. I do not want to do that, don't worry. And so the question is, so if we looked at all the things that influence the behavior of the application and really what it comes down to it, what you're trying to do is modify the behavior of the application. Usually I cannot change the code because if I can change the code, I can make it do whatever I want. What I want to do is I need to be able to influence the data or the environment in order to trick the application into violating confidentiality integrity and availability. There is that option, but obviously you'd see that. I can also request that people start their cameras. That's something that a host can do. So, okay, so this is the core idea. So we need to think, as an attacker, what are all the ways I can influence the behavior of this application? And to do that, we need to understand how applications operate. And the other way of thinking of what we're doing is thinking about here and the way I always think about this is what are all the ways that eyes an attacker can influence the behavior of this application through giving it input? And so it helps to think about the application in this way. So we have our little application, right? It's got its code, it's doing things, but its behavior is influenced as we discussed by the environment, right? So different types of things we've talked about and I would say no software is really secure. And hopefully that's what we'll learn in this class. So the environment influences your application. So for instance, if you write an application that just writes out to a file, that is influenced by where you are, what directory you're currently in and that's influenced by the environment. The other thing, so your application needs to talk to the operating system. So the operating system can influence the behavior of your application in various ways. And so we can think about, and we need to be thinking about this, this is critical in order to think adversarily about an application is what are all the different ways our input can get into this application. So for instance, you at a terminal interacting with an application, that's clearly one way that your input can get into an application, right? And we already saw, so what would be on a Linux system that we've seen, what would be a good types of target applications that take input from you, that you would want to maybe use that input in order to influence the behavior of those applications to elevate your privileges on that system? Yeah, CHmod would be a good one, what else? SetUserID, yeah, any setUserID program, right? So any setUserID program as we saw is run with the permissions of root. And this means if we can trick that application into performing actions on our behalf, we essentially can become root. So that's one way. So we think about applications get interaction from us, just from our input. Another way is from the file system. So just as we talked about applications maybe read and write things from the file system. And as an attacker, we can maybe influence what files exist in those locations. Similarly from the network, so applications that interact with the network, anything that comes from the network, potentially as an attacker, we can influence that behavior. And also other processes on the system, right? So most operating systems have a way for one process to send data to another process, signals, different types of things, different types of communication. And this is what we need to think about of all these different ways that we can get data into the system. So I like to tell one example of this I think it was a cross-site scripting bug. I found a long time ago into an application that was an online music playing application. So you could actually upload your own MP3s to this and it would be part of your playlist or whatever. And I realized it was taking the name of the song from the MP3 metadata. So I put like HTML tags in that metadata and I saw that they ended up not being sanitized when they were used. So this was like a crazy way that data coming into the system from an MP3 file that you uploaded, which most people wouldn't think has these characters and it can cause a vulnerability. No, this isn't one of my blog, it's an older one. So yeah, this is one example, other types of examples. I've seen applications that load like tweets in from other data sources. So you can actually, if you can influence your tweet to get added to the system, you can cause vulnerabilities that way. But really, and this is the critical thing, is this is application-specific. So all of these types of vulnerabilities, you need to know what does the application do, what's the functionality, and how can I as an attacker influence the application? Cool, and so what we're gonna be kind of studying is this notion of application vulnerability analysis. So the process of identifying vulnerabilities in application, and these can be anywhere from design level vulnerabilities where the design is just fundamentally flawed to implementation vulnerabilities where the, there's a bug in the code of the application that allows, and that bug allows you to compromise the security of the application. And so how does, what's the difference between like an implementation vulnerability and the deployment vulnerability? Yeah, so a little bit close being configure related. So you can think of an application that is perfectly secure. The code is 100% fine, but when it's deployed, let's say it's on a shared system and the permissions on the directory are every user can read a certain directory, right? So this would be a way that an application can be 100% secure in the implementation phase. There's no bugs, no buffer overflows, no format strings, all the stuff we're gonna talk about. But when it's deployed, it's deployed in an insecure and vulnerable way. Other types of deployment vulnerabilities can be leaving the .git directory lying around in your deployment, which gives you access to the history of the application, which has some maybe secret passwords in there. All types of kind of pretty cool stuff. We're gonna skip these. Cool. And so, okay. And so now we need to think about, uh, okay, the difference between local attacks and remote attacks. So, and it really comes down to where you are as a user in your relationship to the application. So for instance, local here, we actually have this example of PowerPoint right here. PowerPoint's a bad example though, because it's running as my user. So for me, this PowerPoint is kind of, not a good target for a local attack. But maybe, like we saw, which change ownership. There we go. So I do ls-la this file. I should see it is not set UID. Maybe Mac doesn't use set UID stuff. So this is not gonna be helpful, but, uh, I can, at least this I know will be, okay, nope. There we go, found one. Okay, that's clearly I was using the wrong file. Yeah, change ownership doesn't actually need set UID. So here, as we saw when we looked at this, there's a set UID binary. So if I can find a vulnerability in this change ownership binary, then I can elevate my privileges. So right now I am operating as the umbuntu user on this system. And if I'm able to exploit this binary, I would then be able to elevate my privileges up to root. So on a local attack, what you're trying to do is elevate your privileges from your user to another user. And this requires you to have a established presence on the system. Either you have a user account or you have another application that you've already taken over and have control of. And so this contrast to remote attacks, so remote attacks, we actually have a good example of like this zoom application that we have here, right? You know, I can't directly, let's say on a local attacker, I can maybe directly change the file system. I can change if we go back to this pretty graph here, right? As a user on the system, I can actually influence the environment. I can influence the file system. But if I'm a remote attacker, if you think about in this diagram, right, the only way I can get data into this is through the network, right? This is the only way I can influence the behavior of this application because I can't directly, let's say add or remove files. I can only do whatever this application allows me to do. So we call this, and we think about this in some sense the attack surface, right? So a remote application has a smaller attack surface because you only have kind of one entry point into the application, whereas a local application can have a much greater attack surface. And so the typical way this works, if I draw a, and these are related, so these are not independent concepts. So like, even if you think about my server here, I'm running, you know, you're talking to a web server. So, and this actually goes with what we talked about with port scanning. So if you were to port scan my website, you would see, but I can show you. So this is a net stat. So net stat is showing me what ports are being listened to on my local machine. So port 22 is being listened to. This is SSH, this makes sense. So I have port 80 and port 443, which are both web ports. So AD is HTTP, 443 is HTTPS. So by port scanning, oops, I don't wanna do that, quit. So by port scanning this machine, we can see that port 80 is listening. So we have a web server here. And this web server, if we actually look at this, so let's see, I can do, so I can actually show you the exact application that's listening here. So this is patchy to this thing. And if we go PSUX, Grip, right? So these Apache are actually running with the user www-data. So as an attacker, what my goal would be is to use a first remote attack against Apache, take over Apache. And now once I've exploited Apache, I'm on this system as the, what was it, dub dub dub dash data user. And that user has a very little access on this system. They only have access to the, I think it's a, R-dub-dub-dub HTML directory. And even there, they can't even edit anything. So even if you hacked into this website, if you took over Apache, let's see, you wouldn't be able to alter these files. You also wouldn't be able to look at, I don't know what I have in here. I don't think probably anything interesting, but you wouldn't be able to edit any of these files because we can look here and they're all owned by me and they're only accessible to this Ubuntu user that I am right now. So what you'd wanna do, if you wanted to take over the system, you'd then need to find a local vulnerability because now you've shifted as an attacker from a remote attacker to now a local attacker who's running as this dub-dub-dub data user. And now what you wanna do is find a local attack and a local exploit to elevate your privileges to root. And this could be maybe you exploit a bug in the operating system that allows you to elevate your privileges to root. Maybe you exploit a bug in a set UID program that allows you to elevate your privileges as root. So this is in general the difference here. And so the differences remote attacks are in general more difficult to perform because the attack service is a lot smaller. But they're more severe when they occur because it means anyone with access to the system can get access. Local attacks can be slightly easier because an attacker has more knowledge of the environment. But the attacker needs more capabilities in order to even try to attempt a local exploit. No, so there should be no difference. So the web address is just the DNS address that gets translated to an IP address under the hood. Cool, okay. So now if we want to truly be able to identify vulnerabilities in an application, we need to understand the exact life cycle of an application. And I hope in something that's kind of been clear through this class is what we're really trying to do is understand exactly what's happening at all these different layers so that we can identify vulnerabilities in them to exploit them to make the application do something it's not supposed to do. And to do that, we really need to understand the exact life cycle of an application. And this is something that should be very familiar. You've all written applications. You've written programs that do things. So you know, you as an author, you write, usually you write your code in some high level language, like C or C++, Java, Python, whatever. The application is then translated to some type of executable form, maybe interpreted or compiled against that later. But at this point, it's essentially just bits on your disk, right? So this application has been translated into something that a CPU could execute. And the operating system has a way to take those bits on disk, load them in memory and then start executing your program. And so it's executed and then it finally terminates. So just a quick difference between interpretation and compilation because this comes up and this causes confusion. The idea is essentially that interpretation typically it's a, you have another program that interprets your program. So for instance, if you're talking about Python, typically you write your program in a Python file and the Python interpreter goes through and interprets all the instructions of your application. So it's kind of a program interpreting your application which is distinctly different when you compile something with C or C++. You're creating a binary executable that your CPU knows how to execute. So there's no other program that has to interpret your binary. The CPU itself can do that. So, yeah. The cool thing about interpretations and actually this can lead to vulnerabilities is that you can actually evaluate strings at runtime and turn them into code which is much more difficult to do in compilation. Cool. Okay, so. Yeah, okay. Cool. So compilation is the art and there's obviously entire classes devoted to this so we're not gonna get too in depth here but the idea is you have a, no. So well, I don't know exactly what Zoom is written in but interpretation is not what, interpretation is something like Python or Java. So Java you compile down to Java byte code that a Java virtual machine can interpret and execute. Now, this is a very deep topic that the lines are not as clear drawn. For instance, something like Java will detect when you're executing code in a loop and it will say, okay, rather than interpreting this code what if I compile this code to executable code and that way I can prevent interpreting it every time. So this is just in time compilation or jit. So no, typically like if Zoom is written in C++ you are directly executing that Zoom binary when it executes. And so the part of compilation you have an assembler that so first when code is compiled so your compiler turns it to architecture specific assembly. So again, we're getting so we have kind of our high level C program and I guess for those of you that have written C this maybe doesn't seem so high level to you but when we start looking at assembly code you will long for the days of C. So here then we have something like GCC that will compile our C program to assembly. So what some assembly language is that you all are familiar with? Yeah, MIPS, what else? Yeah, so we'll go with x86 for now. ARM, there's x86, x86, 64, ARM, MIPS, risk five, that's a good one. And so, and what's the difference between like C code and this assembly code that we all talk about here? That's too broad. But what does it mean to be more low level? And actually it's basically a one-to-one mapping if we think about our CPU, right? Our CPU can actually exocote x86 code essentially. There is a slight translation step where it translates it out to binary. It's not just closer to the hardware, it is the hardware. So the CPU essentially says, okay, I have ways to access memory, I have registers, I can do addition, all these types of things. And this is how you tell the CPU what to do is through its architecture-specific assembly. So this is usually something like x86. And then the insane part here is there's actually another layer here where your CPU actually translates x86 to microcode that it actually executes. And it does this so it can do all kinds of insane optimizations and all this stuff. But for now, all we care about, and this you can go deep into architecture here to learn all those cool stuff, but all we care about is we give the CPU some x86 code and the CPU itself will execute it. And so this assembly language is defined essentially by the CPU that says, hey, if you wanna program and talk to me, here's the language you have to speak with me. So yeah, so this x86 code, so you can use GCC that will output assembly. And if you actually, one of the really cool ways that you can get better about reverse engineering and understanding what's going on is GCC has a dash S option that will actually just generate the assembly of what it compiles. And so you can actually look at the x86 assembly. And then from there, so let's say there's a slight translation that happens because if you think about it, this x86 code as will look like, looks like code that we as programmers can still read and understand. So the assembler then translates your x86 code to binary that actually gets executed by the CPU. So you actually can see even the way we've drawn it, it is definitely more lower level and you are getting closer to the hardware, but really this delta here between the, between the your assembly code and the binary code is very small. Like assemblers are not incredibly complicated things. So, and then once more, there's actually more steps because it doesn't go right to our CPU. I'm gonna call this essentially an EXE. So this is, I guess it's slightly stealing Windows terminology of calling this an executable. And this is, you can think of it an executable as a file that has the binary code, all the data, everything that's needed. So that our operating system, so the operating system can take that file and then finally actually put that on the CPU and start executing this thing. And to do that, it needs a lot of different metadata. There's another step in here that is worth mentioning. So there's a step of a linker. So this is why you can use GCC to compile different C files or different C++ files to different binary objects like .o files and then to create your final executable. So you can think of having a lot of these different steps and then they all then feed into a linker, which takes all of the binaries of your application, merges them all together into one executable file. And actually going on some of the things that we talked about today, different types of linking can be formed in different ways. You can have static linking where all of your code is merged together at once or dynamic linking where libraries that you use are loaded at runtime. And so there's, and now this part, right, of what does this executable file look like? It can't just be your code because we need to tell the operating system, no, I have some data. I need you to put this data at this memory location and this other data at this other memory location. And so we have file formats to specify to the operating system what this looks like. So Linux typically uses the ELF file format and Windows uses the PE format. So this is, you can look up the details of these to see exactly what an executable file looks like. There's really cool things you can look up online of how to make the smallest possible executable file by abusing these file formats. And so digging into these, these are just formats, right? So ELF is, it's called executable and linkable format, a great acronym ELF. It's architecture independent. So this means an ELF file format is the same whether the binary code is x86 or ARM or MIPS or whatever. And it has a lot of information and lots of different types of files. Very cool. You can use tools in order to analyze and understand these ELF files, which can be important if you're trying to understand the security of an application. So you can use a tool like read ELF, which will read out the ELF header files. Let's do that on my server. Do I have that command? I do not. All right. Let's install it quickly. And let's read ELF. I can just, I think dash A is everything. Bin LS, let's say. So there's actually a ton of information in here. I can see that it's a 64 bit, all types of stuff. The entry point of the program, the very first command that gets executed. This says, we'll get into it in a second, but this basically says where to lay out everything in memory, different program headers, I don't know, all kinds of stuff. The relocation table, these are everything that gets linked into LS from Lib C. Yeah, lots of cool stuff in here and information that tells you about what this program essentially does. So we can look at this quickly. That some typical ELF sections to help you when you're kind of looking at an ELF binary about what's going on. So you can have different sections of the binary that say, hey, the program's code is usually in a section called the dot text segment. So let's look at this up in here when we can see that the text segment will start at, and you have to kind of parse all this, so I won't go through all this, but it has different flags on it that it's essentially executable, which means we can execute that code, data, read-only data, BSS, uninitialized data, and all kinds of stuff in here. So this kind of briefly describes the way that, like we mentioned, this ELF file format, right? Describes, oh, this looks ugly like this. Okay, the ELF file format describes this executable file that an operating system can turn into a running process. But how does that actually run? And so in order to do that, we've got to look at, we're gonna have to study x86 code. So we're not gonna deal with MIPS. We're gonna be exploiting real binaries that run on real systems, so we're gonna be focusing on x86 code. So we will have to cover x86 assembly so that you're familiar with it. I'll say as you kind of learn more and more of these assemblies, they're not anything very tricky, just kind of have to, you see patterns over and over. So the jump from MIPS to x86, besides there just being a ton more x86 instruction sets, are not anything crazy. But the interesting thing here is just basically that x86 has been around for a long time, starting out with like 16-bit registers and all that stuff instead. And so the first thing, and we're gonna specifically be focusing on 32-bit x86, you can do ARM on your own time. You can do everything on your own time. So, and the way to think about, okay, so a quick refresher for those that maybe are not, don't remember the aspects of a CPU. So a CPU, right? The essential idea of a CPU is you have very fast registers on the chip, a number of them, and you also have the ability to access, let's say, like the RAM and data, right? So, and this is, so if you've ever wondered what the, when they say like eight-bit, you know, video game consoles are systems, 16-bit, 32-bit, 64-bit systems, these really oftentimes comes down to how big are the registers? How many bits can you put in a register? And so with x86, 32-bit, these are 32-bit registers. And there's a number of these registers. The way essentially the naming scheme works, and this is something that can be very confusing when you're first looking at these kinds of things, is that there can be different names for the same register, just different parts and specifically bytes of that register. So, for instance, there's four general-purpose 32-bit registers. So, think of this A, B, C, D, A, E is for extended, this means 32-bit, so the EAX register, EBX register, ECX register, and EDX. And this all refers to all 32-bits of that register. Now, and actually the very cool thing that you can see here is actually from backwards compatibility, they have the way that you can refer to the lower 16-bits, so this is here, this AX register. So if you refer to a register AX, this means the lower 32, lower 16-bits of that register. And then you can further subdivide it into just the lower eight-bits with AL or AH, so low AL, the lower eight-bits, and AH, the higher eight-bits there. So, all the registers you can do this, EAX, EBX, ECX, EDX. And yep, okay, so typically you have ESI and EDI that are used for memory transfer operations. There are super important registers that we're gonna get into, ESP, the stack pointer, and EVP, the frame pointer. So there's a number, so we look at four general purpose ones, plus these two, five, six, seven, eight. I think there may be more, I don't know exactly how many registers this CPU has. There are, let's see, other important things. Segment control registers, which we're not gonna get into right now, but you can look this up if you're interested in that. The flags register that shows basically changes with every instruction that gets executed, so you can branch on if the previous instruction was, or if a previous value was greater than zero, less than zero. The really important one is the EIP register, so this is the instruction pointer. And this basically has the value of, it points to the next instruction to be executed. So this is what controls your CPU about exactly what is the thing that gets executed. And then over time there's been a lot of other types of registers that have been added, but we won't get into this, cool. Okay, so back to, now how do we, again, interpret endian-ness, and specifically how do we interpret a 32-bit number based on the order of bytes? So this again comes back to endian-ness, which we talked about in networking and network byte order. So Intel uses little-endian-order, which means that, okay. So this means that if we're thinking about, yeah, here we go, okay. So if we have at-memory address 00F6-7B40, if we had the value hex-32020100, this means that at that memory location is the smallest byte, the next smallest byte. So at 40 is the byte 00, at 41 is the byte one, 42 is the byte two, and 43 is the byte three. If this was big endian-ordering, this byte order here would be flipped. So it would be three, then two, then one, then zero. This will get really important, and this will come up when we look at exploiting buffer overflow style vulnerabilities, because you need to make sure the data that you're overriding is interpreted the correct way. Okay, so again, refresher. So signed integers are represented in two's complement. So you flip the bits and add one, ignoring overflow. So negative one is all ones. Negative two is all ones, and then a zero at the end. You can calculate these kinds of things. And this is important because, remember the interpretation of what these numbers mean is up to essentially the program and to us as we're trying to understand and reverse engineer these programs. So all that we see when we're trying to understand what's going on at the binary level is bytes. And that's why we use hexadecimal notation here, right? When we talk about addresses, we just see memory addresses and we see what value is at that memory location. We don't know if it's a 32-bit number, maybe a 16-bit number, we don't know. And so we have to understand how the application interprets that data. Cool, okay. Now, so now that we understand a little bit about the different registers in the CPU that we're gonna be dealing with, now we can look at x86 assembly language itself. So as we saw, it's a slightly higher level language than the machine language. It has some features that make it so that you can easily write programs in them. And essentially, as we'll see, we have directives, which are pretty trivial instructions that are the actual operations. Now, the really annoying thing that comes up when we're talking about binary and understanding and reading assembly language. And this seems insane to me that we haven't standardized on one syntax. But there's two possible ways to read and understand a line of x86 assembly with completely different ordering of the operands. So they're actually very different. So for instance, where is my, okay, yeah. So for instance, if I had something like, let's say move. So, okay. So these are just two registers, right? We can say this is EAX, this is EBX. This is EBX. So in one way in AT&T syntax, it's move from EAX into EBX. So after this instruction executes, whatever was inside the EAX register is now in the EBX register. If it's this DOS or Intel syntax, usually it's called the Intel syntax. Oops, that was weird. Let's go here. Cool. If it's Intel syntax, it's the opposite. So it's move EBX into the EAX register. So a lot of things like Ida Pro and Ghidra use this Intel syntax. Just we're gonna use AT&T syntax. This is something that is so trivial that every tool has settings to translate between one and the other. So for instance, GDB by default uses AT&T syntax. If you wanna flip it, you can set an option for that. We'll see there's a program object dump which will disassemble a binary for us and show us. Actually, let's just look at that now. So object dump. So. I'm just trying to look for some interesting ish code. Okay, move. Yeah, so here's a line of code in that is in AT&T syntax. So it's saying move from, move this value 61 E600 into EDI. Now if we, now if I can figure out how to, there we go. ATT, I'm gonna say six. Okay, I think I can just do dash MATT, let's see. No, it's already in AT&T, I wanna do Intel. Now I can look at this line, yeah. So now the exact same line, the exact same tool can show me the opposite syntax. So I can say move 61 E600 into the register EDI. So this is Intel syntax. So don't let the syntax fool you too much, just, you know, I don't know. It's not difficult to go between them. Cool. Okay, so we actually, so we need to be able to move things. So if we think about what is a CPU, right? A CPU is actually fairly simple. You have different registers on that CPU. So these registers on the CPU are the things that can actually do computation, right? They can do things like add, subtract, XOR, all the kind of operations that we want. And we need a way then to get data from memory onto the CPU into the register and put that data back. So essentially it looks complicated, but it's not that crazy. So this is, and again, we've standardized on AT&T syntax here. So what this essentially does is, so this is, and you can tell, so this is a key thing between what's a memory operation and what's a register operation. So here in AT&T syntax, all the registers are prefixed by percent. So, and we know it's source to destination. So we're gonna be moving something into the EDX register. So where is that something that we're going to move? And this is an important thing. We know it's memory because of the parentheses here around this command. So we're moving, let's see, what is this? From EAX plus ECX times four minus hex 20. So this is just a deployment. So if you look at this standard line here, so what this means is essentially start at EAX and ECX is how far we're gonna move down every time. Let's say it's an array. So this would make sense. EAX points to an array in memory. ECX is the index of that array and the elements of our array are size four. So we're gonna move down like that. And what we actually want is at EAX minus 20. It's so you can do nice things like this, like iterate over an elements of array. So you can have a for loop looping over ECX from zero to the size of your array, just moving things into EDX. It's really not that crazy. So we can practice this. Cool. So simple example. So we're gonna be moving what into EAX. And there's kind of important ways to think about instructions like this. It's a constant offset. So this means we're gonna be taking whatever's at memory location, EBP minus eight, whatever that points to, and moving that into EAX, right? So the way to read this is copy the contents of the memory pointed to by EBX minus eight into EAX. So the way to think about this is in terms of, one way of thinking about this is in terms of pointers. So EBX points to some value in memory. So what is inside EAX is a value of a memory location. And when this line executes, it's gonna say, okay, go fetch whatever's at EBP minus eight and move that into EAX. You can do whatever you want. No, normally, most systems like to do like even aligned memory. So you'll see that a lot of them will be even, but you can put technically, I think whatever you want in there. Yeah, you could do plus eight. You could do, it could not be negative as we'll see it's frequently negative, but we'll see kind of why that is. You could do plus eight. You could do plus 12, whatever you want. This is just, so now here we have move. So what is this basically? So just copy. So no, it doesn't do nothing. This is an important thing. So this is not a move EAX into EAX. So if it was move EAX into EAX. Yeah, this is exactly like, and it's important to think of it like a pointer dereference operation, right? We are seeing what does EAX point to, move that value in back into EAX. So copy the contents of the memory pointed by EAX into EAX. This is in contrast. I'll go over here. We did something like this. Why does it do that? I have no idea. Move EAX into EAX. This does nothing. Right, there's just copies, whatever's inside EAX to itself. It does nothing with the parentheses. This means a dereference. This is essentially, if you were to write this in C, it would, EAX is equal to star EAX, something like that. So we're dereferencing EAX and we're storing it into there. Cool. Now here we're gonna store to memory. So now our destination register is a dereference. It's a memory operation. So we're gonna move whatever's inside EAX to EBX plus ECX times two, whatever that memory location is. So move the contents of EAX into the memory address EBX plus ECX times two. So this is actually a very cool thing and a very easy way that you can, the trick that I always use in order to know exactly what syntax you're looking at, which way is which. So this is a constant value. So this says move the constant value 804A0E4 into the register EBX, right? What if this was reverse? What if it said move EBX into 804A0E4, right? That's actually a nonsensical operation. How can you move something to a constant value? It would be different if it had parentheses around it which said that at this memory location, but you can only move something into or out of either a memory address or a register. So this tells me the source must be on the left and the destination's on the right. So here's the exact difference here. Here we have parentheses around here. So this means dereference at runtime. Whatever is that memory location 804A0E4, put that into EAX. So copy the contents of memory at location. I'm not sure I understand. Oh, Siri, you're fine. 804A0E4 into EAX. Cool. And, okay, cool. Let's stop here. So object dump on an SO file, if you do dash D does all of the things. So usually it will, there's a lot of null bytes in files basically if you look at them. Cool. Okay, let's stop here. And when we come back, we'll talk about the different instruction classes. I highly encourage you start looking using object dump or different types of tools. You can look at the code here. Another cool thing is create your own little C file, compile it, look at it and look at what it looks like. The move with the dollar sign, this is a literal value. So this means move literal 804A0E4 into EBX. So this means that after this instruction executes, the value inside the EBX register will be 804A0E4. So we actually don't know if it's a pointer or anything, right? So the question is this a pointer? We have no idea. It depends on how it's used. If EBX is dereferenced later, then perhaps it is a pointer, but really we don't know at this point without more information. Cool. All right, well, thanks everyone and I'll see you on Thursday.