 Alright, morning everyone. Thanks for being here today. Alright, so Assignment 2, the description is up. I don't know the grading infrastructure 100% up because it took me a little while to actually set up the second part, much longer than I anticipated. So Assignment 2, we're going to have two weeks. The first part of that assignment is data exfiltration. What does exfiltration mean? Exfiltration is the opposite of infiltration. You're trying to break into somewhere, but once you break in, especially in a security context, you want to get data out. So how do you get data out of a system? It's not a trick question. Just send it back to yourself, or send it to another machine. The best way to hide your tracks is to break into a third machine and send the data from that first place to that machine you broke into and then transfer it to yourself or maybe transfer it to somebody else. So you create these series of misconnections between yourself. So let's say, put yourself in outer Gonzalez's shoes, you've broken into the Heartland credit card payment system. This is not some realtor, this is a credit card processing firm that's processing millions of credit cards. So they have a huge database of all of their users' credit cards. So how do you get that back? Do you guys are saying FTP it back to yourself? What would be a better way? What would be the problem with that if you just... Somebody give me an estimate. Somebody give me worse databases. How much would like 200 million credit cards be plus all the customer information? Thanks. Cakes. Yeah, we're talking probably 100 maybe that's a little high, but depending on what additional information there is, they're going to be compressed or not compressed, right? But I think we all agree it's going to be a significant amount of data as an attacker you want that data. So what's the problem with just sending yourself 100 gigs of data? Great noise in the network. Yeah, great noise. There's some kind of blip, right? There's going to be something that's not, you know, if the company is secure, right, or they think they're secure and they have good security practices in place which you would hope from a credit card payment to a customer they are, you would hope that they're monitoring their traffic so they could see a huge spike like that and try to investigate what's going on. If they're really good, they have some taps and some network traces in their networks so they can figure out exactly what's going on. But either way, you've just alerted them now they are on their system, right? So is that good? Now you play like a cool cat mask game with them? What's your ideal scenario? What's the attacker's ideal scenario? They're undetected. You steal all the data and you remain undetected. So you can steal new data running out of fingers. You can steal new data and you can use that vantage point to launch other attacks to other institutions, right? Maybe they end up partnering with another company so now you have a VPN connection from their company to the second company that you can then leverage to exploit this new company, right? Does an attacker, you want to stay on as long as you possibly can and you want to remain undetected. That's the big thing. Okay, so this is where data expiltration comes in. So if we want to say let's not send ourselves to 100 gigabytes of data that's going to do this huge spike, what are some of our other options? Distribute several botnets. Yeah, we could distribute it maybe out so distribute the fan out the data in some sense, right? So send 10 megs to a bunch of different machines that we control and then aggregate them coming back. What would be some of the problems there? Right, so if they're looking at their aggregate bandwidth usage, right, maybe that 100, still 100, 200 gigs you're transmitting so what really are they using to detect you? That's tricky. I would say that it would be a very sophisticated organization that's actually doing that. But just like this, for the size thing, right, how are they actually detecting it? They do that. They should, they should be doing that. Because credit card numbers have to check that you're on the S so it's really easy to think about it. You can encrypt it, you can do whatever you want. Definitely encrypt it. Well, so you can probably just zip it or card GZ it, right? If you want the raw credit card numbers compress it. What are they using to check you? How can they tell that there's this like 10 gigabyte spike or something? Have normal like traffic pattern. What is the traffic pattern based on? Request and responses. All request responses? Are they looking at every single request response they've ever gotten? What are they looking at? Right, that's what they're going to look for. How do they detect that spike? What are they using? How do they detect it? I feel like they're just looking for a third foot threshold if it goes above X number. How do you calculate a throughput threshold? How is the throughput calculated? So probably we are not going to investigate ARP packets and ping packets and apart from those we are going to check the fields of the IP packets and add them up over a minute to understand what the throughput is. Add them over what? Maybe over a minute. Over a minute, why a minute? I just chose the number. How do you calculate throughput? Data over time. You can maybe try to compress the data component but at some point there's a limit. You still want to get 100 gigs out. What component can you mess with? The time. You want to get credit cards timely but I'd rather not get detected. What if you just send a mega day? If you send a mega over an entire day, 24 chunks or something like that and do one every hour, then you're never going to get this spike. You're going to flatten out that spike and distribute that. Hopefully your traffic will blend in with the random fluctuations of their network. That's the idea. It's hard to change the data but you can maybe split the data out, fan out the data or send a massive amount of data. You've got to realize they're doing this over time. If you can take this to be slower, you can actually improve your data exfiltration. The other thing is, if they are checking for credit card numbers, if they're looking for weird FTP connections, that's going to be a problem. The other thing we can try to do is try to encode data in bits or parts of packets or random requests or something to make it look like it blends into the normal traffic or just make it look like irregular traffic but they're never going to look at. The goal of this part is we're going to write a program that exfiltrates data using an IP diagram. Can we just do arbitrarily any protocol we want? Can we put this data anywhere? Can we put data in the destination IP? No, right? We need it to come to us. We need to control that. We need to be able to read that data on the receiving end. It can't just be arbitrary. That's why we're going to define a protocol and then your goal is going to be to implement that protocol in a program that's able to do that. Your program is going to be called secret sender. The interface, you're going to run it. You're going to pass the IP address, the destination IP address. You're going to pass the interface on which to send the packet. You're going to pass in the type. We'll see there's three types of packets that you're going to send. Remember, IP by itself usually carries a higher level packet. This is going to allow you to figure how to hide. Finally, the message that we want to send. This is just going to be a string input. Basically, your goal is to get a string transmitted from one system to another by encoding data into IP packets that conform to whatever type we send here. Questions on high level interface? You're going to encode the message, send it to that IP address using that physical interface. Why do you need the interface? You've got multiple interfaces, right? The routes maybe are not set up. You need to know which device to attach to to get a raw socket on so it's a lot easier if you specify it. The ease of view. Type is going to be one of three. If it's a one, it's going to be ICMP echo request message. What common program uses ICMP echo requests? Bang. Otherwise, it'll be two. It will be a TCP send packet to port 80 on that machine. Why is this handy from an exfiltration perspective? You need to establish a TCP connection. From the exfiltration. How does this help this packet get out? What was that? Does that help the gal? Most of the traffic is there. Yeah, right. So to the firewall, right, this just looks like we're trying to start a web request to some IP address. How's that ever going to fly? The fact that we don't send a SNAC back, maybe you should apply something at some point. But the firewall definitely shouldn't let port 80 request through, most likely. Because otherwise the employees can't visit websites. Otherwise three will be a UDP packet sent to port 53. Not DHCP. DNS. It should be DNS. I hope it's DNS. It's supposed to be DNS. It's not, somebody tell me. Well, it doesn't really matter, right? So that also helps because you should be able to make DNS requests to any DNS server and hopefully your system supports that. So how do you structure the TCP SIN packet and the UDP packet? Do you have to mess with them or do anything with them? Are you putting any, encoding any data inside the TCP or UDP packets? Let's keep that in mind when we look at the more description. Okay, our exfiltration protocol. So here we have from the IP diagram RFC, which you should review if you want to review. So here we have the IP diagram. So can we put anything in version? Yes? You want to mess with the IP version? Huh? It can only be v4 or v6, right? But then we have to change IP addresses, right? Then we have to support IPv6 addresses, which is a completely different protocol. So I don't think there's no. For IPv4, which is what we're doing. We don't want to mess with version. The header length, can we mess with this? So I want you to try to think about, okay, where can we put this data? How did I figure out where to put that? Can we do it in the header length? No. Well, we can't make the header length not be the correct length, right? Otherwise the packet is going to be dropped by the operating system. We could maybe encode some data in the size of that header length and make extra fake options or something. That could be an option. Maybe I'll do that next time. The type of service can we put data in there? What does that actually do? Yeah, we could do it in there. I think it actually I think it tells you what the protocol is underneath. Yeah, it tells you what the protocol is underneath. The total length probably don't want to mess with it for the same reason as the header length, right? We don't want to mess with that too much. What about the identification? What was that used for? For fragmentation. What did the spec say that this number has to be? Put any limitations on what this could be? Does it have to be four or six like version? No, right? So we can put actually data in here inside of this identification field. Then we have the flags. We probably want to mess with the flags too much. What about the fragmentation packet? If we're not sending a fragmented packet, maybe we could play with that. Because remember, this packet doesn't need to be interpreted by a high level program, right? We don't care about what the data is inside this IP packet. We care about the information we're injecting in this header. Okay, time to live. We probably don't want to mess with. That would be dangerous, right? You try to encode and send a zero byte and it's zero in here, right? That would be bad. Protocol on a protocol is the lower level. Yeah, I think type of address is the QoS stuff, which is not really used. So maybe we could actually play with that. The checksum, can we play with the checksum? No, right? No. The checksum is invalid. It gets dropped along the way, right? So we definitely don't want to do that. A source address is tricky. We might be able to. We can. We've seen. We get arbitrarily spoofed source addresses on IP packets. But there's no guarantee that the firewall is going to allow packets that are not come from that network, right? They could have a rule there or the upstream ISP from our victim could have a rule to drop those packets. So it's probably safest if this source address comes from the address of the actual machine that we're on. Yeah. Can we modify just the host part of the, yeah, source address is not the network part? So I guess it might come out from the firewall. Possibly, but it's tricky because we have to know exactly what network product. Exactly. Yeah. And so just to not deal with that, let's try to make it at least as safe as possible. And destination address, I think it's pretty obvious we can't mess with it, right? Yeah. We could add stuff into options, but that could be fishy depending on how we're doing it, right? So if we see a packet with a bunch of weird options that don't really make sense, we'll probably drop it. And you could do something with the padding. Maybe you can code information into the padding. Okay. What we're going to do is we're going to focus on the identification and the fragment offset. Okay. So every byte of our message encode into an IP layer of one packet. So bytes of message are going to correspond one to one with packets. So we're going to use one packet to send every byte. You can think maybe we could configure the time that we do that, right? To slowly do that. What we're going to do is we're going to first, so we're going to send a byte at a time. We're going to encode that byte into the high 8 bits of the identification. So high in this is going to be here, right? 0 through 8. So we think about it as a 16 byte number, right? The high bits are going to be the first 8 bits. So that's going to be how we send data, right? So whatever those 8 bits are, that's where you consider to turn that into a byte and say that's the character of our message. So what do we know about IP? What does IP give us connection-wise, data-gram-wise? Source IP and destination IP? What does it give us in terms of communication guarantees? No guarantees, right? So packets can get lost, packets can get duplicated, right? Is that good if we're trying to like exfiltrate some data? Yes. What if our data gets lost? Right? So we don't want to necessarily use TCP, right? Because we want to hide this into other protocols, but we want to borrow the ideas of TCP, right? So one thing, what if our packets come out of order? How do we know which byte corresponds to what message byte? Maybe add something in the identification so that we can know which packet is the order of the Yeah, right? So we want to encode that information there. Just like TCP sequence numbers, state which bytes they're sending, right? The other thing is, what if we're sending multiple of these packets to the same server? What if we broke it into several different systems? We have them all sending messages to us. How do we differentiate between those two? Server number kind of thing. Server number? Yeah, we need some kind of identifier, right? So identify if you need a message pair, right? So that's actually what we're going to use the lower 8 bits of the identification field. So in a sense, if you think about it, right? So the ID field in an IP header contains the, tells us which IP packets come from the same single packet if it's fragmented, right? So we're going to squeeze that concept of an identifier down to 8 bits, right? So that will identify a single message transfer. And then the higher 8 bits will specify the actual byte that's being sent. Yeah. We're going to limit the size of the string to get to in a second. So this identifier is going to remain constant while sending one message. So this is what specifies I'm going to send you some bytes and here's how you know all those bytes are from the same message. So these have to last forward just like an ID have to be consistent. Then I get to the question then how do I know, how do I specify which byte I am sending? Is it the 0 with byte? The first byte, the second byte, the nth byte. So we're going to use the fragmentation offset. So we're going to use the lower 8, so this is where it gets a little bit complicated, right? We're going to use the lower 8 bits of the fragmentation message, of the fragmentation offset. So that's going to tell us if it's 0, it means this byte that's in the high 8 bits of fragmentation, that's the 0 with byte of the message. If it's 1, it's going to say it's the first byte of the message. If it's 2, it's going to be the second byte of the message. But we don't want to be too loud, so we're not going to do any kind of hacking or anything like that. This is just information that the client needs to know in order to successfully get the message. Key question, how do we know when we've sent the whole message? It's not possible to know without hacking, right? No, maybe we can send this top down like 30,000, 29,000, I mean, just go down. So how would you know that you didn't drop 30,000 in one? Yeah, so we either need to send the signs or send some kind of sentinel to say that hey, we're done, we've actually sent you all of it. Right, yeah, we can always, you know, all those packets can get dropped, but we need some way to do this. So we're going to use the sentinel thing. What we're going to do is we're going to use the highest bit of the fragmentation offset. We'll be set to 1 and the lower 8 bits of the fragmentation offset will be the entire size of the message that we've sent. So we're actually kind of going to do 2. So if this bit here is a 1, then that means this lower 8 bits is the size of the message that was sent to you. So for example, some questions on like the high level idea. So this means because of size we're using 8 bits to represent the size, so we can only send messages of size 2 to the 8 minus 1. That's the size of messages we can actually send. There is a bit that specifies where to fragment. Yes, there's a thing in here. One of these flags is the it is fragmented and there's more fragments. We're actually going to keep those as 0s. So we're going to treat these as non-fragmented packets. Because if it happens to be that this, whatever data does get fragmented we want the 0s to reassemble those packets for us. Although it's going to mess up the offset, so whatever. We're going to assume that doesn't happen. Yes, there's 8 and 1, 9 and 13 4 bits. Yeah, we're going to ignore the 4 bits in there. Cool. Okay, so the idea is when we execute, and we give an IP address and an interface the type and the test so it's going to generate some random ID for this an 8-bit identifier to identify the transfer. Let's say it's hex 41, whatever that is. So where is it going to put this random ID? The lower 8 bits of identification. It's going to be constant for every packet we send while sending this message. So we're doing tests. So what's test in ASCII? The T. Sorry, what's T in ASCII? 605? 605. The 74? So let's think about this first. Before we even look at what's going on to make sure we understand how many packets we're going to send in total. Five actually. Five? Why the fifth? So we send the T, the E, the S, the T and then tag it at the end and it says we're done. So the first packet is going to encode that T into the upper bits. It's going to encode 41 into the lower bits and then the fragmentation offset is going to be all zero so we know it's the zero byte of the message. The next packet is going to encode the E which is hex 65. It's going to keep ID consistent. It's going to increment the fragmentation offset. Then we're going to encode the S. 73, 41 write the same thing and we're going to do 2 and then we send the last one the T, 74, 41, 3 to say it's the third byte. And then now we're going to do the last one. What's going to be the lower 8 bits of this fragmentation offset? The lower 8 bits are going to be 4. The high bit is going to be set as 1. What's this going to be? It doesn't matter. It doesn't matter what the high bits are. Does the lower bits matter? Yeah, that's how we can identify that it came from this same message. So yeah, these can be zero. It doesn't really matter what these are. But this can be 41 and this has to be 1004, 1 for this high of 13th bit. Why are you citing the hybrid for 1? To specify that it's the last message. Otherwise if this wasn't 1 we would consider that the fourth byte of the 0, 1, 2, 3, 4th byte was zip byte 0. Which is totally valid. We could do that, right? But we need to know when to end. So now we can actually, the client can do some kind of cool checking, right? The client can say, okay, I know if I get this message that says hey this is the end and this is how many bytes in total that I sent as a message, I can check, did I receive all of those bytes, right? Those offsets. If I did then I have the message, right? If I'm missing something then I just go crap. I don't know the message. I know everything except for a few bytes, right? Maybe I'll wait until some packets get in. If I never get this message though, this packet gets dropped, what do I do? What was that? We don't ever talk to the server, so I know. Yeah, time out, right? We just say we wait for a while and then we say okay I didn't get anything within 30 seconds so whatever, something happened but maybe I got some data so that's good enough, right? So this should kind of design this so it'd be a cool, so you have to do some nitty-gritty packet manipulations and byte twittling or bit twittling, but also to think about the reliability guarantees that TCP gets you, right? And the fact that we're not using that means we can't really guarantee any of these sending, but we can still use some concepts in there. Will you be using like TTL to specify the time out? No, you don't mess with TCL because otherwise your packets, the time goes further down and your packets may not get there. Okay, so you can do this in any language you want, just like the last project. I recommend that you use Python and use the Scapi library. It is really good. It's fairly easy to understand and use and to create and manipulate packets. The other option you can do is to use libnet and use whatever bindings in the language of your choice that, you know interface to libnet, but you have to use you're gonna have to use an arbitrary package. You're gonna have to create an arbitrary package, right? The operating system is not gonna create these IP packets for you. So you have to manually create them and so if there's a library that helps you do that I'm totally supportive of that. I'm also very supportive of if on the mailing list you wanna talk about which libraries you're using, whatever that's also totally good. You know, but you have to do the research, right? And say, okay, if you start using a library and then four days from now you find out, oh, it turns out I can't actually do what I want to do and change the IP, the ID field of this packet, we got probably, you have problems. I think Scapi does support like a package. Oh yeah, Scapi does. It's awesome. That's why I support it. But if there's something that exists in your language of choice, feel free, but I want you to know that like you vary off the beaten path, right? That's your responsibility to make sure it works. Okay. So, we've seen that in order to access raw sockets and just create raw packets you need to have root privileges, right? So keep this in mind when you're developing your code, right? You're gonna have to run your own code as root to test it. This also means your system's gonna be running, the submission system will be running your code as root. Right? I'm trying to take precautions but it's very tricky. So I'm gonna try to do the best so that you're not gonna crash, but don't do anything overly malicious with this power, right? It's also why it's taking me a while to figure out how to do this stuff. Other than that, everything else is the same. Packages file, submission instructions, submit on the website. The submissions are not yet, but they will be soon. Any questions on part two? You don't want this to be the only one? Okay, part one. Alright. Part two. Part two, now we're transitioning so assignment one, part one here is about creating things so you can understand how this low level protocols and everything we're talking about actually work. Now you're gonna start breaking things in a good way. And that's where your goal's gonna be for this part. So, the idea is you have just been hired by a not new startup that just got Series A money. They've created this amazing web service that has the best, fanciest graphics and styling that you could ever imagine to allow other companies to securely execute only trusted code, right? So awesome web service, it's super cool. They're using this really good encryption program called Chexum, which I'll let you know what that is. So they hired you. How do you prove to them that you broke their system? No, don't do that. Yeah, right? So they put a file on there called secret.txt in the working directory where their program is running. So if you can read the contents of this file, you have successfully broken the service. So that's the goal state you're working towards. The link will be, I'll send you the link in the mailing list because I don't want to deal with access control issues and making sure you're logged in to access it. So I'm just gonna distribute the link there. Don't try to share it out too much. It's clearly a service with, I mean, it should be a way to execute arbitrary code on one of our servers, right? So, you know, be safe. Don't do that. There will be hints. Oh, I'm pretty solid. I'll put some hints on here. They'll be like collapsed. So you can kind of self check when you get stuck. Maybe you can expand them that way. I mean, it's like a puzzle, right? So you have to try to solve the puzzle. So I'll put some hints in there so you don't go too far astray so that you can stay on track. But I don't know. For me, I like the challenge of doing it without any hints or anything. So I will, I'll let you decide that. So when you submit on the assignment submission, you'll submit the secret, right? So what is the secret? The content of the secret.txt file. Do not submit secret.txt. That's not a secret. Whatever is in that file. Okay, this is a very vague what you use to break the service, right? So the code that you wrote to break the service, submit that. And to read me. I don't care about make files for this. I don't really want to make your code. And I want this read me to be more descriptive than normal, right? Because all I see is yes, I broke the secret. Here's the thing. It's very easy to, once it is broken it's no longer a secret, right? So I want a description of how you did it, right? How'd you break it? What led you to that? What programs did you have to write to help you break it? What was the thing you used to actually break the service? So any questions about that? So we're going to be reading those. I'm reading based off that. Cool. So here's the link. Here's the beautiful, crazy, the best website you've ever seen in your life. So. Alright, so there are files on here. Ah, PY, PY. So it's so secure. It's the most secure file you've ever seen in your life. So I can submit code here. That's weird, the slash. I don't think that'll break anything. I hope it'll break anything in the middle of this demo. Code output. Hello world. Yay. It actually works. If we look at it, we can see that my code outputted in Hello World. Yay. This actually came. So it's securely executed code, right? It executed this trusted code. Is this server running on a Linux machine or something? It's running on a Linux machine. It's running on a 4204. Everything we've been using to do everything. So if I go to downloads, Python, right? So let's say, what are you going to print out? Does it just read like this? Just read. Yeah, things like that, right? So I'll return a string and I'll print that string out. Alright, time to break it. You guys only have one thing to do. Part 2 done through all. Come on. It's not that simple. I'll go run code if we have time. It's a secure web service, guys. They don't give out money to just anyone. So this is your goal. You have to figure out what's going on here. So they'll be hints, they'll be pointers, but for right now... Huh? Yeah. You have complete access to this, doesn't it? You just have unlimited access to this URL. You can do whatever you want. You can try to do whatever you want. So the code will be running only if the hash is correct? I don't know. You don't have to find it. They'll be some hints for that. I don't want to give away too much right now. I want you to poke around and try to figure out what... what kind of hash anything you use. So I want you to upload these files. I'm going to post some links to GPG so you can figure out how GPG works. There's hints and filings. The end goal is to get the contents of... The end goal is to get the contents of secret.txt. Exactly. So you want to... something like this, right? Cool. Questions? Other questions? Yeah. When's this time you're going to get posted? You can make links to it. So kind of now. You can guess URLs are good. Okay. It's not done yet. It's slowly coming. So slow. Okay. It'll be in underscore sign too. Okay. Any other questions? Let's get started. Back to applications. We talked about vulnerabilities. We talked about compilation. We're here, right? Yes. Okay. So we talked about interpretation. And now we want to talk about high level source code. It is compiled. I think everybody in this room should have taken or the process of taking 340 or the equivalent of compilers. Part of the required classes. More or less. So you should be familiar in some sense with the compilation process, right? So we're going to kind of tackle it at a high level. So in C, right? The first thing that happens is the pre-processor goes through and expands all of the macros and C or C++ is done by the pre-processor. Then the compiler takes the output of the pre-processor and what's the output of the compiler? What's its goal? Talking about C and C++. Object code. Yeah. Something that's executable at the end, right? So for whatever architecture or whatever operating system you're interested in, we want it to be executable. So it's, first, the way it does that, right? At a high level, this is what it does. But really it's kind of broken up into different segments. So first, the compiler is turning it essentially into our C or C++ program into assembly code. So GCC is the C compiler. If you just run GCC on a C file, it's going to do the whole thing and spit out an executable. But with different flags, you can actually force it to show you the output of the various stages. So you can use the dash capital S flag to generate the assembly file. This can actually be really useful to understand what's going on, especially with what we're going to be doing is looking at things at a very low level, right? Because we want to understand how is this code actually implemented in assembly and how is the machine actually executed. Nowadays, most of the time if you're running, if you're doing this on a 64-bit machine, you're going to want to use the dash M32 flag, which says generate 32-bit assembly or executable. This way you can system what we're working on because we're going to focus exclusively on x86. So 32-bit, we're going to ignore 64-bit so that we can just focus on one thing at a time. Okay, so after the compiler generates the assembly what tool goes next or what process goes next? The linker. The linker links up object files but right now we just have an assembly file, a list of x86 assembly instructions. What do we need something to do at a high level? What do we want something to do if they get into this pipeline? Get the machine code. Yeah, we need machine code, right? We want something that turns in assembly instructions, just text instructions to actual ones and zeros that the computer can understand, right? So that's the job of the assembler. So the assembler takes in the assembly file and outputs a binary, some kind of binary object right now we're using a binary object very broadly. So in Linux the assembler is called AS, that's the programming and so what's this binary object? Is it ones and zeros that just goes right to the CPU that the CPU executes or just a bunch of code for the CPU? What makes up an executable? Is it just code? Other information. Okay, good. If you can link the information to other files. Yeah, right? So it's going to be the binary code, right? It's got to have the code in there otherwise why do we care about it, right? But it has a code plus some metadata, right? So we need more additional data about what things are data, right? Are there any global strings or are there any constant memory locations? What are the labels in our program in this binary and can we relocate them? So that's another thing. So can we move this assembly around? So there could be so the binary object by itself cannot actually be executable because things could be relocatable. We could link with other binary objects so that way we can call different functions between different binary objects, right? It could have information about the symbols so when you do debugging the symbols, right? With the dash G flag it's going to add extra information to the binary object that says how to map the assembly instructions back to source code, right? So those things could be in there. Yeah, any debugging information, other kinds of symbols that this thing relies on? So it could say, hey I'm using the printf symbol and the exit symbol so we need to make sure that those are resolved before we can actually be executable, I guess. We're going to phrase that. Okay, so next now we have this binary object, right? Now we need to link the different binary objects together into one final executable, right? So we need to try to resolve all of the references, right? So we can say hey, so it's actually not until here at the linker stage where we say hey, you're using a function that's never defined, right? Like nobody defined that function, right? So that's actually why the Linux linker is LD and that's why if you use LD and you compile a C file and you use a function that's not defined, right? If you've declared that function you're saying that's going to come somewhere else, you'll actually get paired from LD and it'll show you in the error output, LD, then it'll show you the error message, right? Because it's coming from the linker. Your program compiles just fine. It translates into X86 and then it assembles just fine. It translates into a binary object but it can't create a final executable because you're trying to call a function that doesn't exist. So what's static linking? So what does it mean when we statically link assembly C? Kind of. Exactly. And then what do we do with it? Is that function ever going to change when we execute the program? No. We actually take that code and put it in with our executable so our executable's version of that function always gets called, right? You can actually do this with the libc library so you can statically link printf or whatever from the library so at compile time you take that thing, you put that function in the code executable and this way you can ship this binary to anybody, right? They don't need all these libraries. They don't need to know that you are specifically using version blah blah blah of libc or version blah blah blah of OpenSSL, right? Because you statically linked it but what happens to your binary? Expenses. It's huge, right? These libraries are big, right? Then you have to ship around all these. Do you do any Go programming? It doesn't go statically linked with all their executables so the output is just a binary that you can always just run. You don't need a Go run time. I think that's one of the benefits of Go. So it's the opposite of static linking. Dynamic, right? Anybody do Windows programming? Was it DLL for? Yeah, dynamically loaded library, right? This is something that's loaded at runtime by the code, right? So the code is an executable. It says I'm relying on these functions but they don't actually exist in my code. They exist in this other DLL but it has this name and maybe this version, right? So the same thing here so we have dynamic linking but Linux does just the same thing. So normally when you compile a program and use the printf function by default you're doing dynamic linking so you call printf at runtime your program goes out and asks the system, okay, what's the latest version of libc like who has printf and then it loads in that code for you to execute. So what are some of the benefits there? Right, printf isn't a system called printf, it's a libc it's instilling user space. There actually is a kernel version of printf but the idea is, yeah, we're trying to put like we have to then put printf and everything it depends on in our program. So what happens if there's a vulnerability in printf? Yeah, your executable is going to have that vulnerability because the code is inside your program, right? But if you're dynamically linked and the administrator updates libc to a safe version, now when your program loads it's going to use the safe version of libc. So there's trade-offs here, right? So you have portability on one hand and on the other hand you have easily benefits easy updates, but then you get into a problem if I just throw you a binary if it's dynamically linked you may not be able to execute it because you may not have a little libraries. The two most common executable formats, ELF on Linux, which is what we're going to focus on, and PE on Windows, which is also known as the EXE format. Cool, alright, so we'll stop here and we'll get into ELF on Monday.