 Alrighty, welcome back to 353. So we kind of survived a midterm ish. So grading should be done. Grading will be nice. Grading should be done by March 4th at the latest, the TAs tell me. So scan them in later today, start grading. Doesn't look too bad initially. So although no one finished early, so kind of concerning. But today we get to take a little bit of a cleanser and just talk about fun sockets. So sockets thankfully are not anything new, but they will set you up for doing a networking course or distributed course or just any course where you want to communicate between machines. So sockets, just another form of IPC. So when we had pipes and signals and shared memory, that all needed the processes to be on the same machine, but now while the processes can be running on different machines. So some web server running somewhere else in the world and your actual machine that is trying to connect to it. It is also another form of inter-process communication. Typically when we use sockets, we do it over a network, but we can also do it locally as well as we will see in today's example. So thankfully aside from the system calls used to set up sockets, they are just file descriptors like anything else. You can read bytes to it and write bytes or read bytes from it and write bytes to it. So unlike just like opening a file, which is just one system call and you get a file descriptor back sockets have a little bit more setup. And if you are the server, so there's usually like a server and a client. So server is just something that is running, waiting for people to connect to it and usually provide some type of service. Again, website is an example of that. You have to follow four steps. So these are all system calls. They have the usual C wrappers where if there's an error, they return negative one, set error no, all that good stuff. So the system calls that you have to call in order are one, you have to call the system call socket that creates a socket for you and it returns a file descriptor. The next is bind. So we have to attach the socket to some location. So it can either be, it's basically just a name or a way to actually access that socket. So you can attach it to a file. You can attach it to like an IP address. You can attach it to an IPv6 address and other things. Then after it is attached to an address, you use a listen system call. That means you are ready to accept some connections if someone tries to connect to that name and you can set like a queue limit and the kernel is the thing that manages this queue of incoming connections. And in order to actually get a connection, you have to do an accept system call. So that's a blocking system call and it blocks until there is an actual connection made and then it returns a new file descriptor and that file descriptor represents that connection. So the file descriptor you get from accept, you can do a read system call on it and read anything someone else sends to it or do a write system call and send information across the socket and it will go across a network. And we'll have code examples today as well. So clients, it's a bit easier. They only have two steps to make. So their first one, they do that same socket system call to create a socket. They get a file descriptor back. The only difference is they don't have to do all of the setup. They just need to connect to some name. So there's a connect system call that connects to some location and after that the socket file descriptor can be used to send and receive data. The connect will want to use the exact same name as the server used for bind. So they are actually connecting to the correct thing. So what does the system call look like that both have to use? Well, it has three arguments, one really isn't used. So there's like domain, int type, int protocol. So domain is just like the general protocol and it tells you what the name is you want to connect to. So the three main options for domain is afunix. So that's for local communication on the same physical machine. That means the name you're going to provide looks like a file name. And then the other ones are inet. So that's for IPv6 like normal IP addresses like 1.1.1.1 something like that. And that will typically use your network interface. And then there's IPv6 because while IPv4 is 32 bit and everyone likes connecting everything to the internet now. So they ran out of IP addresses quite a while ago. So there's IPv6, which is just a 64 bit address but works the same just more bits to use for the address and it will also use the network. So the type, there's usually two options, stream or datagram sockets. And in a networking course, you might have heard the term like TCP or UDP. So these things I'll show you, they just mean the same thing. They're just called a generic name. And then this third parameter protocol pretty much it's always zero, mostly unused and we will not use it either but it is there just in case some new type of name comes along. So stream sockets, they use TCP. So all data sent by the client appears in the same order on the server and vice versa. So with stream, there's like a persistent connection between client and server because while networks are weird especially if you get into the networking course just because you send data in a particular order does not actually mean the receiver is going to see it in the same order. So it could like route your data. Some of your data through, I don't know hopefully not North Korea but something like that and then the other one gets actually routed through Toronto or something like that and it's more direct so it might not actually arrive in the same order but TCP will take care of all that stuff for you and even if it takes some weird indirect route it's going to handle it for you and it's going to appear that all the information arrives in the same order on both sides. So it's reliable, it forms a persistent connection and if you lose the connection you'll be notified of it it'll say, oh, you've lost connection and it'll be reliable but it might be slow because well it has to do some sophisticated things again that you'll learn in a networking course to make sure that things actually arrive in the same order. So if it's hard to do things in the same order well that's what datagram sockets are. So datagram sockets use something like UDP and that just allows you to send packets of information or send messages between the client and server. There's no persistent connection between them so it's fast but you might actually if something gets received out of order you will see it received out of order. You might not get them in the same order it's not guaranteed to actually make it to the server and there's no way to actually check like there's no persistent connection so you don't know if the client actually sent you information or not if you don't see it so you don't know if it was lost in transit or if they just simply never sent it. And yeah so like sockets again if you wanna see that sockets are actually used like if you had the VPN client or you had a web browser or anything and you S traced it guess what they're gonna use sockets because that's how they communicate with web servers or anything like that. And then yeah the question why can't TCP just attach the sent time to each packet so TCP will do something like that and that's how it will know UDP while UDP doesn't have a persistent connection so it's gonna be fast messages might be reordered or dropped and typically you'll use UDP if it's something where like the overhead of checking everything just isn't worth it so if you're doing like a game or something like that by the time you click your mouse and try and update some game state well if it gets lost it's not gonna get reset because the game state already changed or and everything like that so using TCP isn't gonna do anything for your game so you're gonna use something like UDP when it doesn't make sense to resend anything you want things to be as fast as possible and you can't really do anything to recover if you want to go ahead and be able to recover and all that then you should just use TCP so the bind system call that kind of sets a socket like binds a socket to an actual address so it takes that socket as the first argument here and then well it just takes a socket address because it's C it just takes a pointer to a socket address structure and then you have to tell it the length of it and yeah so for most web servers and everything like that they're going to use TCP so TCP generally for web servers and UDP is generally for games or like streaming lectures or stuff like or streaming video and stuff like that because if a packet gets dropped it's kind of whatever so it just kind of depends what you actually need so in here there are three different socket as structures for different purposes so it depends on that domain that you set in the bind or sorry in the socket system call so there are one structure for each type of name so for Unix ones they're just socket address underscore un again I don't know why the developers decided to shorten things to the nth degree socket address underscore Unix would have been just fine but whatever but for these it's just going to be a path so it's just going to be like a file name and that will represent a socket and we will see that in our example today if you wanted to do like IPv6 and all of that or IP it would just be socket address underscore in instead of iNet again don't know why but this would be an IPv4 address like 8.8.8.8 which would be a DNS server and then you have socket address underscore in six for the IPv6 and that's going to be an IP address that looks something like that which generally you don't see IP addresses but if you actually look at the system calls if you go to like google.com or something like that it's going to connect directly to an IP address how you go from a name to an IP address you'll cover that in a networking course so we won't go over the details but basically there's just some service that has all the name to IP addresses mapping mappings and you can go ahead and just look up IP addresses so if you want if you know the IP address you can just like create a socket to anything so if you really want after this lecture you can just create a socket figure out an IP address of something that the university does not own and just send random data to it and see what happens and see if it responds to you so it might, might not you can go ahead you can try it out so the listen system call all it does is sets Q limits for incoming connection so it takes that socket that you have bound to some name and then takes a backlog and the backlog is basically the Q depth that the kernel will maintain so it's just the limit and if you pass it a zero the kernel will manage like manage the limit it'll just take the default limit and the kernel will go ahead and handle it for you at any rate it is the backlog that the kernel handles of incoming connections that just kind of pile up waiting for your process to accept them and if you do not accept them fast enough they're just going to get kicked out of the queue they're going to get dropped and you're probably if you're trying to connect to it you'll get a message like connection refused or some or something like that so the accept system call so that blocks until there's a connection so it takes as the first parameter that socket that you have binded to something and are now listening to then optionally takes an address again but if you're doing like a Unix socket or like a stream socket it's optional you don't have to set them again so you can just set them to null and ignore them and at the end of the day after this file or after the system call returns you get a new file descriptor represented by this int here and then you can do your read and write system calls as normal but now when you write to it you are sending data to whatever client just connected to you and when you read from it you are reading whatever they sent you so you can see what they're doing and this is going to be the basis for the web like web servers or anything like that so if you just have a client there is just, oh, yep, yep, yeah so like the web server running somewhere would have an accept you connect to it, they would accept your connection and then like normally if it's part of a web browser the web browser knows how to speak to it to get like a web page back so it will like ask it nicely please send me a web server asking it an HTTP or whatever the protocol is and then you'll get a website back and then your web browser will render it but if you don't know how to speak to it and you just send it random bytes who knows where you get back yeah, yeah, so the question is the file descriptor we get from accept the same as the socket and the answer is no the file descriptor you get back from accept is a new file descriptor that represents that connection so you can keep on calling it over and over again because while web servers will probably have like thousands and thousands of people connecting to it so you need to handle each connection separately so you know who you're sending information to and who you're getting information from and some other questions is a socket just a bigger pipe that connects between devices? Yeah, it kind of works the same way so the kernel kind of manage it the same way the only difference is with a pipe we didn't need to give it a name we didn't need to do all the setup but it had to be on the same machine and we could only actually access the pipe if we forked and then through sharing file descriptors we could go ahead and both processes could use the pipe with sockets because it's just a name you don't need they don't need to be related at all in fact with sockets they can be running on completely different machines as long as you know the name you can connect to it but after all of this setup their file descriptor same thing as anything else then other questions if we receive too much data will we run out of the buffer or you set up listen yeah so if you just send too much data like if your file is too big or you send too much data with system calls you'll probably get an error at some point where it will just reject whatever you're doing like web servers would probably be built in to just have some limits so if you send it if you don't tell it you're like sending it a large file if you just start sending it a gigabyte of data probably it will might listen to you for a while and then after a while be like okay I have no idea what this person is doing I will just close this connection and then you would just get errors if you try and send any more information from it because the server just denies it and yeah we'll see an example later but first let's see the last system call we need in the client so the connect system call allows the client to connect to an address so looks exactly like bind but except this is just for the client to connect to some server that is actually accepting connection so you it gets the socket file descriptor as the first argument and then that same structure as before so the socket file descriptor again is that file descriptor return from the socket system call and everything needs to agree so the client needs to use the same protocol and type as the server and then the address that to connect to would looks exactly like bind and they have to agree with each other like you have to connect to the same address as the server is listening to in order to form a connection and the only difference is for this system call if it succeeds then you can use the file descriptor you got back from socket directly just like a normal file descriptor and you can read and write data to it and it will go to the server so we can just go ahead and write a server now so it will be a very, very silly server so in our example the server simply waits for connections and then sends the string hello there to every single client that connects to it and then it just closes the connection and it's done with it like all of our examples they're in your materials repository which I'll update right after the lecture and the relevant source files they're just going to be a client and a server so we're just going to use a local socket for demonstration you could use IPv4 or v6 if you want for this because we're using a Unix socket it looks like a file so it's going to create a file just called example.soc in the current directory and we're going to use that as our socket address so our server will also use some signals to clean up and terminate from our infant accept loop so let's go ahead and see what the example is so this is the server so in the server first thing it will do so this is in main first thing it will do is do that socket system call so we're saying the domains Unix so we're just going to use a file and then we're just going to use a stream socket because while we may as well we'll form a persistent connection we'll make sure everything is in order and for the protocol we just have a zero so again we do what we always do we check if we have an error so these are the normal C wrappers so if it returns negative one means there's an error it's set error no so I just wrote this error no exit wrapper just to go ahead and give us an error message so after that because we're using Unix sockets I need to use socket address on that structure to define it so one of the parameters it needs is it needs as one of the fields there's son family because this is really old when son micro systems existed and I guess they were there first so they got the name and because as we know with libraries now once you put something out there you are now never allowed to change it ever so it is now called this even though I don't maybe son still exists I don't even know if son still exists may or may not still exist so we have to give it the same domain as we gave the socket to set the son family and then the only other field we set is the son path and it's just a string that has it's just an array of a fixed length and we can go ahead and just copy the C string to it so our socket path is just going to be the string example.sock and we just do a string copy just to go ahead and copy all the information to it and then after that that's where we do the bind so we bind the file descriptor we got back from the socket system call we give it the address of this socket address structure and then because it's C we have to tell it what the length of that is so it's just the size of that socket address then after that we have to do the listen system call so we just listen we give it the parameter just zero to say I don't care just do your default kernel you're probably smarter than me so go ahead and listen we'll go ahead check for errors and then this is the main loop so pretty much every single web server in existence will look like this it'll just say while true except so while true except so this accepts on the socket we don't need to use the address again so this system call will block until someone actually connects to it if someone actually connects to it it returns and we get a new file descriptor back that represents that connection so we'll go ahead check for errors and then we're just going to do a write system call using that file descriptor so we have our message hello there we calculate the length of it and because it's a C string we have to add one for the null byte then we just do a write system call with that connection and just send the string hello there to it check that it we didn't have an error and then make sure that we actually wrote all of the data just like everyone surely checks print the return value print F to make sure that they wrote all the data and then we just close that connection so we just close that file descriptor boom we're done with this connection and then the loop just keeps on going over and over again waiting for new connections now in the server you're sorry in the client same idea we just have a socket so we create Unix socket stream everything the same we do the exact same socket address so it's going to be Unix again we set the son family set the son path and the two paths are going to agree so it's going to be example.soc and then we do the connect instead of bind because this is the client so oh jeez okay so we connect using that file descriptor we got back from socket the address of the socket address structure then the size of it and then after that we can now use this file descriptor we are now connected to the server so in this case we know that this is very like this is using our assumptions that the server is just going to send us data so all we're going to do is read so we'll create a buffer that's the size of a page go ahead do the read system call in a while loop so we'll just read over and over and over again so just keep reading all the data we can until we it is zero or below check if there's an error if there's not an error while then the file descriptor return zero meaning that like for pipes means no more data is possible and then we'll just print off all the data we received and close so now we can go ahead and run it so right now if I look at my directory it's just a bunch of random files and there's no sockets or anything like that so now if I run the server doesn't look like it's doing anything because well if we S traced it the server is currently in that accept system call just waiting for someone to actually connect to it and if I go ahead and look at the directory now I can see there's this example dot sock looks like a file but it's just the name of the socket and it is in this fancy magenta color look at me with fancy names alright so now I can have I can run the client and the client will just say hey it received hello world so I there was some inner process communication there so that hello world came from the server sent it to the client and I can because it's in an infinite loop it does the same thing every time I can just connect as many clients as I want because it's just sitting there waiting for someone to connect to it over and over and over again and it's just sending information to it so and then yeah question can we can we try and connect to your this server and no you can't because the only way to access it is because it's a unix one it's a local socket so the only way to access it is through this file so if you can't access this file you cannot access the socket too bad you can't go ahead and use it if I went ahead and bound it to like an IP address like whatever my laptop's IP address is well you might be able to connect to it it depends on you have teased network security and whether or not they allow you to do that so again you will learn about that in the networking course so whether or not you can connect to it is a whole another thing but if we knew IP addresses we could go ahead and connect to it and so and then yeah another question if I share this file can we connect to it and the answer to that is no because it is not a real file it's just something that exists it's kind of like an empty shell of a file and when we bind to it that system called the kernel is going to know that oh if someone is trying to access this file on your machine it actually references the socket you have created and you someone else can't connect to it it has to be on the same machine this file if we look at it actually well it just gives a bunch of question marks because it is not real it's just a name so there's no way to actually share this file with you because in reality it does not actually exist so any other questions or any fun things you want me to do with this yeah so the question is what if a client connects while the server is still processing so in this case how I wrote it right now like it just accepts and then if someone else is waiting they'll just get thrown onto the queue that the kernel's managed so they'll just sit there waiting to connect until this process goes ahead and actually writes it out and then closes the connection and then doesn't accept system call again so it's sitting there waiting blocking until that the server goes ahead and actually calls accept so you can imagine this is where that example came up where using threads with web servers is probably a good idea because I could instead of just like waiting to accept the new connection until I'm done with the old one what I could do is I could have my main thread just continually call accept and then after it accepts a new connection it can create a new thread and then give that new thread the file descriptor that represents the connection and let that new thread solely handle that connection and the main thread can go back to trying to accept the next connection and this is basically how web servers work so they would have the same thing but the main thread would just do accept over and over again for each connection they would probably just create a thread and then the thread does all that work yeah so the question does the ping command use sockets and yes it has to so if you s-trace it it'll use sockets because while everything has to use system calls sockets are the only way to use your network so has to use that uh was there another question yeah so yeah if I fork then I would have multiple processes using the same socket and then if two of them are doing accept and a new connection comes in you don't know which process is going to get it but one of them will get it it's a bit weird it's a bit weird if you do that because then you don't know what process is actually accepting the connection but it would work but it would like all the rules of fork still apply all right any other questions or fun things I can do so first problem with this server is oh okay well how do I stop this server well if I want to stop the server I can this is where you would actually use a signal for something useful so this server has like that example.soc file open and maybe I want to clean up that file when my server is actually done running but I don't know when it's actually done running it's sitting here just most of the time sitting here blocked in this accept system call so if I want to actually clean up this example.soc file well to actually shut down the server I should probably just actually use a signal so what I did is I registered the sig int and sig term signal so I registered them to call handle signal and in handle signal that's where I close my socket and do any cleanup I need to do before I actually exit my process so in this close socket function all I do is I close the file descriptor that represents the socket which I made a global variable and then to remove or delete the name there is a system call called unlink so I do unlink socket path and then that gets rid of this example.soc so I go ahead and clean up everything I was using when I close my server so if I go ahead now and switch back to the server and hit something like control c so now my server is done and if I look at my directory boom example.soc does not exist anymore so if I go ahead and in my client I try to connect to it I'll get oh connect failed no such file directory because well there's no one actually listening to that socket if I go ahead and I let's say I messed up and let's say pit of server so if I don't want it to clean up all the resources well I can be mean and say just kill-9 and say screw you I'm going to not let you terminate cleanly I'm going to not let you clean up again kill-9 kernel kills it immediately it has no option so if I do that I just see it's killed and I'll see example.soc is still there but no one's actually listening to it but the file does exist so if I run the client now just says connection refused so the file is there it's a name that actually exists but there's no socket currently bound to it so I just get good old connection refused which you've probably seen the same message if you tried to connect to a web server that was down or if you tried to connect you know I probably shouldn't make fun of corkis or anything like that but you've probably seen this before and yeah so there's another comment if we use a thread to handle them they might not execute in the order that they arrive that is true but typically you don't care so between clients it doesn't matter yeah can I unlink a network socket so unlink is only for like file names and things like that not network sockets so if you had a network socket just be bound to some IP address and if your process isn't running it's not bound to it anymore the kernel is going to know about that all right any other fun questions or weird things we can do but hey this is the internet if you yes trace anything it will use sockets if it's using the internet like you said ping command your web browser chrome firefox as long as it's running on linux or macOS it doesn't matter it's going to use sockets so this but aside from the system calls used to set it up as soon as we got a file descriptor then nothing really changed right it's the same thing we've been learning but you know there's going to be some weird protocols and stuff to actually successfully talk to web servers but at the end of the day they're going to just use sockets and then other questions we want to get in trouble so can we make a lot of requests to the servers will that break it so sure you can do that that's called a DDoS attack if you want so denial of service where you just connect to a server you just send random stuff to it in order to break it so sure you can do that but there's ways to get around that so if you see that you know some some silly person is just connecting a thousand connections or millions of connections and just sending you random data well you probably will want to set up something that just blocks them and then you just don't accept connections from them anymore and then you know that's your networking course you can do that at the network level where the kernel doesn't even have to see it and it just is blocked for you before you ever get to see it lots of fun things you can do with that because well like these things are on the internet they can go ahead and they can do whatever but if you use SSH or anything like that guess what that use sockets if you use literally anything that probably use sockets and that's yeah that would overload the kernel's queue you would get that connection refused if the queue's exceeded uh and essentially that's the same thing as a web server going down and then another question can you send signals between client and server and answer to that is no so for signals there's signals are still kind of a form of IPC but you have to do the kill system call and you have to know the process ID so it's just another way to do IPC but you can't send signals across the but across computers unless you know you have you use sockets and then you like make a request to send a signal on that machine to another process or something like that so you can build these things up to do all sorts of different things but at the end of the day you're just sending data across a socket all right any other fun things we can do with this or any other questions and then yeah question uh what's the difference between unlinking and closing a socket so in this closing the file descriptor that you get back from socket means i'm not accepting connections from it anymore so you can't connect to it you would get connection refused and then unlink is separate from the socket unlink is a system called that get just gets rid of that example sock name and if you actually like s trace if you do like rm example dot sock like that so that's like remove that file right and you can use that on regular files but guess what there is actually no system call called rm rm just uses the unlink system call so there's no actual thing as like deleting a file there's only unlinking which is just getting rid of that name in the directory so if i did rm example dot sock that's the same thing as doing unlink example dot sock that system call um and we can see if that's true so like i can s trace it and i can see okay i got invalid argument oh i can't remove it is it still there cool all right thanks rm what about if i force the issue really remove it there we go okay let's set it up again i guess we have weird permissions because of that so server all right let's just make sure i'm not full of it so we can s trace it now so if we s trace it we'll see we got slightly different but it's the unlink system call it's just unlink at the current working directory we just unlink example dot sock so that's what rm does there's actually no system call that's called rm and then one another question yeah i did yeah so if we use threads to handle sockets again they might be in the same different order that's okay you don't really care about the order between clients realistically they're usually quite independent and then another question do you need to remove unlink something if you're using an ip instead of linux so if you use ip instead of local sockets you don't need to use unlink so unlink is just removing a file name so if i use ip addresses i don't have a file name i don't have to unlink anything all right any other final fun questions all right so instead of read and write there's also some slightly different system calls called send and receive but basically they're the exact same thing as read and write except they have some flags so there's some ones that get into more networking courses so there's message oob so that's receiving data out of band which means something if you take a networking course there's peak which is just look at the data without actually reading it so you just kind of look at the data see what you need to do with it and the next time you do a read system call you'll actually reread the same data and then there's this don't route so send the data without routing packets why is that there well that seems really silly if i need to connect to something the packets probably need to reach the destination so it might look like it's kind of weird to have this don't route but that is there because guess what if you're implementing a router or something like that well you can't route because you are the router so that's what that is for so you just send data directly to the source without routing and that's used typically to actually implement a router otherwise you could get into like infinite loops of packets and if you thought this course was bad for infinite loops wait for networking so except for message peak you will never use these and routing just means like if i'm sending packets like say my laptop was actually running a server and you wanted to connect to it well your laptop doesn't magically directly connect to my laptop it would your laptop would at least talk to that access point on the wall and then through that access point it would go then probably to my laptop if it was being direct so that is called a route so in order for your laptop to get to my laptop it has to jump through at least one hoop if we're in the same room if we're in different rooms then your laptop would talk to that which would then talk to some switch in some closet somewhere which would then talk to some big router that however the university connects to the internet which would connect to some backbone which would connect to you know some something closer to your house which could connect to the cable provider which would connect to something eventually your house and that is how information would actually reach your house again set up for a networking course i don't teach the networking course but they're generally lots of fun because you get to go over all those details i think the one here is actually quite good so just to set it up if you want to perform networking relating it to operating systems you do it through sockets so their IPC across physical machines basics are they need an address local which would be a file name or an IP address two types stream and datagram so stream everything's in order it's a persistent connection so you know if you lose the connection datagram just sends messages you don't know if they'll be in the same order you don't know if they get lost you don't know anything so if you're the server you need to create a socket bind it to an address listen to it and then accept connections and you can accept multiple and then if you're the client you just create a socket you connect to an address and that's it so i will be here for the last five minutes uh for anything else but just remember phone for you we're all in this together