 All righty, welcome back to operating systems. So today fairly chill lecture. This is not testable This is not on the midterm. This is not on the final. This is not on anything. This is a prelude to networks And well guess what if you use the web browser if you communicated across different computers You've used sockets and turns out aside from the setup You already know how to use them because they're represented as file descriptors So they can just read and write So let's get into it because it's uh, I mean you use them if you use the internet most of you use the internet So you use sockets So sockets are just another form of IPC or inter-process communication We've seen pipes. We've seen signals. We've also briefly talked about shared memory So the thing with all these mechanisms of IPC is well, they all have to be on the same physical machine So all the processes running on the same machine in order to share memory Well, both processes have to be able to access the same memory So what about if the processes are not on the same machine? So sockets are the IPC mechanism to communicate between physical machines where the processes are running Like I said on different physical machines, typically this will be done over a network So you go to Google.com you are communicating with Google's process that Sends you the source code for Google.com then your web browser renders it and whole lot of things go on with that, but this is networking in a nutshell so For networking there's typically a server and a client and a server is basically the thing that you connect to It waits for a connection you connect to it and then once you connect to it You can send data back and forth between it So if a server is using a socket it has to follow four steps and this is the most complicated The most number of steps you have to do to use a socket. So they're all system calls They have all the usual C wrappers. So if they fail they return negative one and they set error No, like every single function. We've pretty much seen that's a system call so the first step is a socket system call and that Unsurprisingly creates a socket and it returns you a file descriptor So we'll see what the arguments to that are later But all we have to know right now you create a socket and then step two is you bind the socket to some Location so socket by itself doesn't do anything. It needs to have an address associated with it So you can bind it to some location if you're on the same physical machine if you want to use this You can bind it to like a file name Otherwise, you can bind it to an IP and a port and then it will use the internet or your local networking or whatever Whatever you have the third step is to say you are listening and this indicates to the kernel that you are Accepting connections. This also lets you set a kernel Determined queue for how many connections it will hold waiting for your program to actually accept them Then third step is or sorry fourth step is to actually accept the connection So this returns the next incoming connection for you to handle except is a blocking system call and Whenever it's done if it doesn't have an error It will return you a file descriptor and then you can read and write to it and then magically that information will go across the network But otherwise it's the exact same thing as we've been doing So if you're a client you have two steps much easier time One connection is just one socket. So you do a system socket system call Create a socket and then you do a connect system call that says where what location do you want to connect to? So a server needs to be listening on that same location in order for the connection to be successful Otherwise, you will get an error We'll return negative one. It'll set error. No, it'll tell you why if Connect is successful Your file descriptor from socket is now valid and you can now read and write on that file descriptor And the opposite thing will happen on the server so you can send information to the server and read information from the server So let's go over the system calls really briefly So the socket system call sets the protocol and the type of socket So it takes three arguments. The last one is no longer used. So it's just always zero So domain is the first one. It's the general protocol Basically saying what the location is you want to connect to the three ones you will see ninety nine point nine percent of the time are AF Unix So that's for local communication on the same machine and you're connecting to a file name So this is what a lot of programs use. So like if you've used Docker or anything like that Guess what? It uses that it uses a Unix socket If you set it to I net it wants an IP address and it means I'm going to connect to something across the network If I do I net six that's for IPv6 because well once you get into the networking class Realize that IPv4 addresses are only 32 bits Those are only like four billion devices on the internet and since everyone plugs their toaster into the internet We've ran out of IP addresses some time ago so This is an IPv6 is basically just a bigger number and the type is usually one of two options a Stream or a datagram socket, which you probably never heard of before, but you may have heard of the term TCP So if you've heard the term TCP, that's what a stream socket is So that means all data you send across that socket appears in the same order in the client So if I send, you know one string and then another string then the server is going to see those in the same order and You also have a persistent connection between the client and the server They acknowledge each other all the time to make sure that they're still alive and you still have an active connection Otherwise if that connection drops because the Wi-Fi drops out which what happens in this room a lot of the time Then you'll get notified of it. It'll say connection lost and you'll you'll be notified of it You'll get an error immediately So this is a reliable form of communication But it may be slow because it goes against how real networking works. So when with real networking It's something closer to datagram sockets and that uses UDP basically all that does is send messages between the client and server and It just sends packets if you get into the networking course you will realize that packets have fun journeys So there's no guarantee if you send two packets to the same server that they will take the same route One might go through Japan or something like that while the other takes a direct connection You're not guaranteed which one takes which path. They won't be guaranteed to be received in the same order Like someone could put it like you could be using Wi-Fi Someone could turn on the microwave and screw up your Wi-Fi signal lots of things happen For UDP ones, there's no persistent connection between client and server You just send a packet to them and you pray and that's about it So there's no persistent connection your messages might be reordered because you're at the mercy of the actual network Which again is a whole nother course. They might get dropped the machine might not exist And you're not going to know about it So if you send a packet off you're not you're not even sure if it ever arrives there So this is fast. This is close to what networking actually does. It doesn't do any fancy stuff It doesn't care and for some applications. This might be more warranted So for instance, if you did video streaming or something like that Well, that's kind of like a real-time system, right? There's you have to send a certain number of frames a second Otherwise, you don't see it if the connection drops or something like that Well, I might not want to resend it if you're listening to a live broadcast Because that time is already done. I should just skip it go ahead to the next one something like that Or like if there was a game or something like that Resending a packet not gonna work because the whole state of the world has changed since then so resending it doesn't even make sense It's not going to be accounted for or anything like that So typically use UDP if those things are true if you want things in order Some people get clever and think that UDP is just really fast because it doesn't have any overhead But then they want to get things in the same order and then they implement TCP themselves on top of this And they do it badly and they have bugs and they should have just used TCP to begin with So know what you're going to use and again if you need that concept reinforced take the networking course All right. All right that aside back to the system calls so the bind system call sets binds a socket to address or location Takes the socket as the first argument and then well because it see it takes an address to a struct called sock address Which or sock adder sorry, which is just short for socket address? And then because it see it needs the length of that structure because otherwise there's no way to figure it out and There are different structures depending on what you're trying to connect to so if you are Connecting to a Unix socket you use this sock adder underscore on Don't ask me why they didn't want to add two extra characters. So it said Unix I don't know but this is just a path. So it just looks like a file Then there is socket there underscore in instead of I net again I don't know why they save two characters, but this is going to be an IP address like eight dot eight dot eight something like that And then I net six or IPv6 or in six is going to be an Address that is much longer. So it'll be hex digits separated by colons and it's kind of ugly so The next the third step in our server journey is a listen listen system call Sets the queue limits for incoming connections basically how many connections a kernel will hold for us before it starts dropping them The socket the first argument is still file descriptor return from that socket system call And this in argument is just how long of a queue do you want the kernel to maintain? So if you set this to zero that kernel picks for you So the kernel has a default generally the kernel developers are smarter than me So then just pick zero let the kernel figure it out if you have problems you can come back and change it later But generally kernel is smart All right The last step is the accept system call and it is a blocking system call that will block until there's a connection So it takes that socket and then optionally It will also take a different address if this was a datagram socket You could just accept connections from any address you really wanted to or without setting it up in This just set them both to null and ignore it. We're just accepting connections on the socket So this will block until a new connection is formed and then when it returns the return value is an int That's a file descriptor representing that connection So you can do read and write system calls to it like it was anything else All right, so we're almost done. So this is in the client It just has a connect system call that allows a client to connect to an address so it takes the socket file descriptor return from socket and then takes the same socket adder and Also needs a length that we saw in the bind system call in the server So they have to match if you want to form a connection between a client and the server So the only difference is if this call succeeds it doesn't return a new file descriptor It just makes that file descriptor you've got from socket actually usable now So now you can do read and write system calls on that So that's a lot of talking. I like code better. So let's switch to the code example So Let's get rid of this we can make that Sure, we can make that bigger. So here is the server code So the server remember has to do those four steps. So before I even start I'm going to register two signals We know kind of what that does. I'm registering a signal for SIG int Which is what happens when you hit control C and then SIG term Which remember that's like asking a process to exit nicely if you just kill it Without dash 9 so that is the nice way and we will figure out why I'm doing this very shortly But let's get back into our four steps for the server. So we have a socket system call we're going to use a unix location and a Stream socket so those if you look at the documentation, they're just little macros their numbers That's what they're called So of course we check for errors because if you learn anything from lab two or learn anything from this course Hopefully it's checked for errors and if you see an error Either handle it properly or if you don't know if you have anything that's unexpected just exit immediately So here I check if there's an error and then I exit immediately if there is Next step I need to bind it to a location. So I create a socket address for UN or the Unix socket Having a structure and setting it like this having curly brackets in a zero might look weird That just zero initializes the whole structure So fills out full of zeros and zero initializes explicitly then here I Set the Sun family field, which is basically what type of Socket address there is which might seem redundant But this is basically C's way of implementing C plus plus classes So you always have to say at the beginning what the structure actually is so it might seem redundant But here we specify that hey the socket address is a Unix socket Then the only path we have to set is the Sun path and it is a string so we will copy from a Static string into it. So this expands to just example dot sock So I'm setting Sun path to example dot sock and I will copy the bytes from that string With the size of that whole structure Minus one because I want to make sure that there's always a null byte there if I fill up the structure Then of course of course I check for errors if there is an error I exit immediately Then I just do list a listen system call default argument is zero if there's an error I exit immediately then the fun part. So this while true This is basically your web server. This is what your web server will do It just has an infant loop that just accepts connections. So if you've ever connected to a web server or anything Way deep in the code Essentially it looks like this. So in an infinite loop. I call accept with that socket and Whenever this returns I have a new connection. So now I have a new file descriptor that represents the connection I'll call it connection fd because that seems appropriate Check if there's an error if there is I immediately blow up. Otherwise. I just Create a static string that says hello there Then I determined the length of the string and then add one because I want to send the null byte Just to make my life a bit easier in the client then here I Do my right system call. So I just write that string all of it To that file descriptor represented by the connection file descriptor here I will check for errors and then I will check that the number of bytes written is equal to the length I requested otherwise I Would have to retry it or something like that and I don't want to rewrite the code. I'll just put it. I'll just Exit if that case happens and deal with it later then finally I close that connection and Check for errors then After I close the connection just goes back up to the top of the loop and accepts the next connection So new client can connect send a message a new client can connect send a message new client can connect So on and so forth So the reason why I have a signal handler is because well, this is an infinite loop I can't just exit the program without giving it a signal or Requesting it says shut down. So that's why I register signal handlers because I Create that example dot sock file and I want to clean it up. So I register a Signal handler. I say I want handle signal to run Inside of that, I just write an assert to make sure that it's the signals I expect In case I try and change my program later and I forget that I handle other signals Because all I'm going to do in the signal is close the socket clean up all my work and be done with it So in close socket, I close the file descriptor Representing the socket, which is why I made it a global variable So I can actually access it in my signal handler and then I have this unlinked system call Which we have not seen before all that does is get rid of this name So it just gets rid of example dot sock in the current directory You haven't seen this yet, but you will so there is fun fact whenever you remove a file It doesn't delete anything removed There's no remove system call or anything like that the closest thing is just an unlink Which just gets rid of that name in the directory and then you hope the kernel actually Deletes something for you if it can delete it, but if you s trace RM, you'll see that it calls unlink There's no RM system call. We will see why that is once that we get to file systems All right, so that is the server If I try and run it Whoops, I see address already in use because I screwed up. All right, if I try and run it It just sits there. It runs. It doesn't do anything because if we s trace this It would be blocked on that accept system call because it hasn't returned yet So let's see what the client does so in the client Same thing creates a socket. It's a unix socket. It's a stream socket check for errors This is also the exact same thing I'm Connecting to a socket address representing a unix one. It's just going to be a path So it's going to match between the client and the server. So the client also wants to connect to example dot sock So in here because we're in the client. I just call connect using that file descriptor return from socket and the address of the socket address and The size of it because it's C and then if that is successful that file descriptor now represents that connection to the server So here I can just create a buffer of size 4096 now we know why I picked that number. It's because it is the size of a page and the kernel loves pages so Creates a buffer of a page then creates this bytes red variable It puts a read system call and a while loop just in case it needs to go over and over again in this case Well, it'll only return once So we just keep reading while there's bytes to be read check for errors And then all we do is print the string that gets Received and then close the file descriptor and that's it So I can open a new terminal and now If I run the client I have to go the right directory first. I receive hello. We're hello there. Yay So I received that From the other program that is running so that information is clearly coming from the prop the server process So I'm communicating between processes. I don't have to use pipes. They don't even have to be related at all like The two processes can be completely unrelated as long as you know the name of where to connect to your process can just connect to it No problem. So any questions about that or anything you want me to do to break this Because this is the internet. Yep. Yeah, so the question is what happens if I don't unlink the socket function So we can see what that does. So in the server. Let's see You'll comment out So right now if I kill the server, it's going to unlink it. So here. Let's see So here there's example sock there. So it looks like a file, but you can see it's like in this pink color So that's it's a file of socket type So if I kill the server right now, it won't have my change of commenting out unlink So if I control see it the server is now done And if I look at the directory example dot sock has gone because unlink happened So now if I try and run the client Well, guess what says no such file or directory because that name no longer exists So what happens if I forget the unlink then I can rebuild my server run it again and Then in the client. Hey it now can receive data because it is now listing, but now if I kill the server well the example dot sock is still there and If I try to connect to it in the client it just says connection refused So good thing I checked for errors and you've probably seen this message if you used your web browser, right? You've seen connection refused. They didn't make up that thing. It just came from this failing All right, any other questions with that and also with that Also, if I try to run the server because example dot sock still is there If I try and bind it says address already in use because that file already exists So that's why you should clean it up If you wanted to write a quick and dirty server. Well, the server could just see that hey Is that file name all the way there if it is just unlink it then but you should probably just clean it up So that's why I just have to remove it Which basically does the unlink for me and then I can run the server again lots of stuff lots of fun stuff All right, any other questions? That's the internet. That's a web browser. That's like a very poor web browser, but it's the same thing so if you want Create a socket you can connect to anything you want connect to google.com Just figure out whatever IP address they're using today and just send it some random bytes and see what happens Nothing's just allowing that it'll probably be like what the hell are you doing and close the connection immediately in fact Even what I wrote here is pretty bad because this client it expects a string and Expects a string with the null byte so I could just completely screw up my client right now This is why you should not trust things so I could probably exploit myself. Let's see so let us Screw up So we can do this So that means I will not write the null byte to the client. So let's have some fun with that. Let's make a fun server Address already in use because I commented it I guess All right, so in this case if I connect to the client now it gets hello there And it didn't crash Wait, did I compile? I think I was still uncompiled huh Weird it must have assumed that It probably zero initialize that memory so I got myself lucky where I didn't overfill my buffer, but You're not guaranteed of that, but you shouldn't trust information from the server that's how Exploits happen like just sending random data that servers don't expect That's how a lot of exploits happen because they might give you a response that kind of makes sense or Shares some memory with you that they shouldn't or something and that's how exploits happen Yeah, yeah, so the question is does DDoS relate to any of those so kind of so DDoS relates to Essentially this This queue size here, right? so the kernel has a bit of DDoS protection built into it at least for Connections themselves where it will only hold a certain number if you try and connect pass that it'll just start dropping them So that's kind of like a DDoS protection Sometimes DDoS just means you are wasting the server's resources and the kernel can't help you with that So this server like held the connection open as long as your client held it open You could just open thousands and thousands of connections and then suddenly it's going to run out of space eventually Run out of file descriptors run out of something and then that will be a denial of service attack So that's one instance of it and that's denial of service on a single machine You could have multiple machines hosting your network or something like that And then another denial service try and do that to all them or just overload them or give them useless work to do Or something like that tons of denial of service you could do But your defense to that is like hey if I see someone that's opening a connection and they're not doing anything I expect with it just drop it So you can that's one thing to protect yourself kind of related But the kernel can only do so much to help you so it can set the queue length and then start dropping connections But I can't handle what you do with them So that's was that in the networking course or something? Okay Yeah, I had the same question last lecture, so I thought it was some course or something maybe But yeah, so here too This is also kind of bad too because I only handle one connection at a time Someone put the brought up the point like hey after I accept can I just fork? Sure, you could fork you can have another process handle that connection Then those two processes could happen at the same time the other one could just try and connect to another one Make a new process for that connection so on and so forth then you can handle a bunch of connections at the same time turns out processes are a bit of a Big hammer and these connections are usually really short-lived so that motivates threads, which is next lecture But yeah, basically if you just do that idea, but use threads, that's a web That's a good web server. This is a this is a crappy web server that can handle one connection at a time But if you wanted to handle multiple at a time right now, you could just fork after everyone. Yeah. Oh That was an equipment. That was a stretch. All right Yeah Can I what I have to look up the format for all that and what their IP address is today and stuff and If if I send it garbage, it's going to probably as soon as it doesn't know what the hell I'm doing It's just going to close the connection immediately Which is what the same thing to do is so if you wanted to write a like web server Well, you connect to Google comm and you have to figure out like the HTTP protocol and figure out exactly what to send it In order to get a response out of it Otherwise most web servers if you send it something unexpected, it'll just be like nam good and close it and just carry on All right, any other questions? We have learned the internet today. Didn't expect that in offering systems. Did you? So why is this here? Well, I mean it's related Motivate sockets and you actually already know it already aside from the setup so the setup is just something you like look up once and then you copy and paste it for the rest of your life and The rest of it just using a file descriptor is what you already know how to do So there is some exceptions once you get into the networking course. So instead of read and write There's also send and receive system calls specifically for sockets that do essentially the same thing Except they take some additional flags to control what happens with them So some examples are message out of bound which you have to ask the networking people what that means I don't really know then there's like peak So that means I'm going to look at a few bytes of the data without actually consuming it in the kernel So I could do a read call later and get all the information and then there is this funny one That's called don't route which doesn't look like it makes sense So if I'm connecting to the internet, why would I want to send information without routing it? Because even if I need to pop to your computer simply like that while there needs to be some routing involved So turns out that well, guess what routers are also use Linux So they would want to use this don't route because if you are implementing a router yourself You want to make sure that packets don't take an unexpected leaps or anything like that You want to just make sure that whatever you send goes directly to the device you intend because you are the router And if it was also routing while you were trying to do that You might get in the situation where you route back through yourself and then you create an infinite loop with the packets And then they never actually arrived and it's real bad. So that's why there's something like don't route Which you will never use unless you are implementing a router Maybe you look at that in the networking course I don't know and then there's like send to receive from which basically you can just tack on an address to them and They're pretty much just used for datagram sockets All right, cool. Now you can do the web you you are internet So you can perform networking through sockets You can s trace your web browser it uses sockets any program that connects to the internet uses sockets again Everything has to use a system call so you can s trace everything. Don't believe me. Go ahead s trace Chrome It's going to create lots of sockets. So sockets are just IPC They're not anything actually that new aside from the setup. They're just IPC that can go across physical machines Basics are they need an address either. It's like a file path for a local one or an IP address Two types stream datagram, which are basically TCP and UDP servers have four steps They need to create the socket bind to an address listen to it and say how many how deep the queue is Then accept connections Clients all they need to do is create a socket and then connect to an address Yay, so now we can use the rest of the time for whatever you want. So just remember I'm pulling for you We're on this together