 Well, welcome everybody to CS162. We're getting down to the very end here. And there's no class on Wednesday, just so you all know. I would like to pick up where we left off. And we were talking about a number of things in extending operating systems out to the network as a whole. And so we talked about the distributed consensus making idea. And that was basically a situation which you have several different nodes spread throughout the network. They all propose a value. Some nodes might crash or stop responding, but eventually all the nodes decide on the same value from some set of proposed values. So that's the general consensus problem. There's a simpler version which is distributed decision making. And that's where you choose between true and false or commit and abort or one of two options. And essentially the job of all the nodes that are participating in some protocol here for consensus are basically collaborating and eventually coming up to exactly the same decision. Equally important to the initial process of making consensus is making sure that it's recorded for posterity. And so that's basically, how do you make sure the decisions can't be forgotten? So the simplest thing of course is recording on disks, but in a global scale system you could start talking about replicating much more widely somewhat like a blockchain application. So the particular type of distributed decision making that we spent a little time on talking last time was two phase commit. And basically the key behind two phase commit is there's a stable log on every participant to keep track of whether a commit is gonna happen or not. And if machines crash in the middle of the protocol and they wake up they can look at the log to see what they've committed to in the past. The two phases of course are the prepare phase is the first one where a global coordinator requests that all participants make a decision to either commit or not. And so basically you ask each participant what they wanna do, they either say commit or abort and they make sure to record their decision in the log as we mentioned so that if they crash in the middle they can come up and they will never come up with a different decision than the one that they've committed to so to speak. And then during the commit phase if everybody had said commit, then the coordinator will tell everybody to go ahead and do the actual commit at which point they all record that the final decision was committed and they go forward. And of course if any one participant decides to abort then they all abort. And the crucial idea here is either it's atomic decision making either everybody decides to commit or everybody decides to abort and there's no mixing of the two, okay? And so that was kind of the simplest example of this and we talked about some of the downsides of two phase commit among other things being that a crashed machine can prevent everybody from moving forward. And so then we started talking about alternatives after that. Okay, let's see here. So the log is basically a crucial part of that. So if you go back and look at several of the slides that I had walking through the protocol you can see how the log make sure that we always have that atomicity property of everybody decides to do commit or everybody decides to do abort. The second topic that we just started with toward the end of the lecture was we were talking about network protocols and we mentioned that there are many layers in the network protocols. There's the physical level which is the ones and zeros could be optical phases could be any number of things. We talked about the link level which is packets being set down a single link with their formats and error control for instance. We talked about network level communication where you put a bunch of links together for a path. We talked about transport level we just started about that which is for reliable message delivery and we're gonna spend a lot a good chunk of today figuring that out as well. And so this is a rough diagram to keep in your mind here the physical and link layers down at the lower level can be any number of technologies like Ethernet or Wi-Fi or LTE or 5G or whatever you like. And those get you one hop in the network IP typically gets you more hops, okay? So once you got the IP protocol then you could route from here to Beijing for instance as long as you knew the right IP address things would be forwarded hop by hop through the network. Above that level is the transport layer where we actually start doing better than just talking about machine to machine communication we can actually start talking about process to process communication. And then of course you build applications on top. Okay, RPC stands for remote procedure call we'll show you that a little bit later in the lecture. Okay, so a lot of things are built on top of remote procedure calls so we'll talk more about that. So this layering is building complex services from simpler ones and each layer provides services needed by higher layers that utilize those services. So this is something that you've known for all the time you've been in computer science at Berkeley layering can be a good thing. The physical link layer is typically very limited. So it's one hop and not only is it one hop but it's unreliable typically there's a maximum transfer unit. So somewhere between 200 and 1500 bytes are very common. It's only high performance networks inside of cloud processing that might have what are called larger packets that might be 9,000 bytes or so but typical 1500 is the max you see routing is limited with a physical link possibly through a switch. Okay, what we're gonna try to figure out now in the next bit of the lecture is if we have these limited messages that are of limited size how do we basically build something we can use? So the physical reality is packets the abstraction is one of messages so we can build our decision-making algorithm so we can build distributed storage which we'll hopefully get to by the end of the lecture today. The physical reality is that packets not only are they limited in size but they're unordered. So sometimes the packets might arrive in a different order than you sent them. The typical abstraction is that random ordering is not good for us we'd like things to be ordered. Physical reality is that packets are unreliable. Remember when we talked about the end to end philosophy we said gee the network ought to not do things that the end points still have to do anyway. And so a datagram networks where the packets are not guaranteed to make it to the destination are the typical thing in the middle because at the end points we have to have some reliability protocols. We'll talk a little bit about that today. Physical reality is that packets go from one machine to another which is only sometimes useful it's much more useful to be processed to process. The reality is that packets only go on one link over the local network we'd like to route them anywhere. The reality is that packets are asynchronous they kind of go when they can. We'd like them to be more synchronous so that we know when something is completed. And then of course packets are insecure and we'd like them to be secure. So the reality of the physical pieces on the left are ones that we would like to be able to basically hide under a virtual communication abstraction giving us a much cleaner messaging abstraction. Now just to, I showed you this last time but I just want to pop this up really quickly. IPv4 for instance basically has a header that's wrapped around data. So you put this on the front of it and this 20 bytes have a bunch of fields including the source and destination address. So where am I going? Where am I coming from? And then a protocol which I have highlighted in red here is typically what type of IP packet is this and we'll show you a couple of those. Now, my process to process the question is do I mean on different machines? I certainly mean process to process on different machines is something we would like to achieve. Sometimes you use the IP protocol abstraction and go from process to process on the same machine but by the thing that's much more interesting here for this discussion is going from one machine to another. And from one process on one machine to another process on another machine. So now doesn't the protocol field violate abstraction somehow? You might think of it that way but it turns out it gives you enough information that when an IP packet comes in you can put it and de-multiplex it to the right protocol handlers and so that's kind of a minimal call it an abstraction violation if you like but it's kind of a minimal requirement there in order to very rapidly process incoming IP packets. Now protocol can be TCP, it could be UDP, it could be ICMP, it could be any number of things. So today we're gonna talk about UDP and TCP, yes. In particular, how do we build process to process communication from machine to machine? So looking back at this header again, notice that it's 32-bit source address and destination address. So the source is where I'm going, let's say that's in Beijing somewhere, the destination's where I am, that's my local machine. These are two machines, they don't say anything about which processes on those machines like a web browser for instance or a web server doesn't say anything about who would like to communicate, okay. And so the simplest thing we can do is called UDP which is a type of process to process communication we get by taking this IP header, this is the one I showed you earlier, 20 bytes and adding a UDP header which basically has a source and destination ports, these are 16-bit numbers, a length for the data and a checksum but these two ports, the source and destination ports are part of that five tuple. If you remember when I said you create a socket from machine to machine, remember it was source address, destination address, source port, destination port and protocol. So all five of these things that you see here together work to give you a unique connection between two processes. UDP is very simple, it's another type of datagram but one that basically goes from process to process. And so if you see here, this is IP protocol 17 which we put a 17 in that header. It's a datagram so it's fully unreliable as we use it. It goes from source to destination and it's really low overhead. And it's really low overhead because we just put a few extra bytes, eight bytes on top of the IP header to get the UDP header, okay. And it's often used for high bandwidth video streams and so on and it's a very good way to sometimes overuse network bandwidth if you're not careful because there are no restrictions on how many packets you can try to force into the network. And so a number of uses of UDP can be considered antisocial almost if you use them incorrectly, right. And we'll see how TCP is different than that. So all right, now here's this layering that we just talked about. See in the gray at the bottom here is the physical ones and zeros and the data link layer above is that link to link. And so basically going one hop goes over the data link physical combination here to go from say a host to a router to a destination host. But that's not gonna get us very far without being able to route. So this actual hop, the data link physical gets us some host data to the router or from the router to host B. It's this network layer on top that's doing IP for instance that decides how to go hop to hop to hop using routing tables to get you from your source to your destination. Above that is gonna be the transport layer which is gonna be for instance UDP or TCP and then applications on top of that. And of course applications are the ones that open the sockets. So the reason I've got these arrows the way I do here is you think when you're writing your application that it's communicating directly with an application at the destination in reality what's going on is your application sends something through a socket and it really goes through the different layers and host A it goes across the physical and data link layers to the router which goes up to the network layer the router makes a decision of what next hop to go and so on and eventually you get to the destination host and then it comes on up through the application through the various layers in the operating system at the host side and eventually into the sockets and the application. So these arrows represent communication but at an abstract level it's only the very lowest ones that represent direct connections. And so the way we can look at for instance this communication is we can think while we've got an application with some data what happens is it goes through a transport layer where we wrap a transport header around it. So that's like the UDP ports for instance and then we wrap a network header on it which adds the IP address and so on and then we put a frame header which is the MAC addresses for let's say Ethernet like I said and then that goes down the physical layer there's some bits that are transmitted the other side and then things are unwrapped. So this is like adding an envelope that then you put it inside another envelope inside another envelope it gets transmitted and then you pull it out of the envelopes and eventually get back to the other application. Okay, so this wrapping is something that is basically this layering that we're using for abstraction it can get expensive and sometimes really high performance routers are gonna completely violate all of these layers and they'll squash everything out and process everything at once in parallel in an FPGA or whatever but it's important to try to understand the process as the way I've given it to you here where it's putting a series of envelopes together and then taking the series of envelopes apart and the other thing I wanted to point out here is this from the network layer to the network layer this is machine to machine it's really this transport layer that hands it off to the right process. Okay, so that's where we de-multiplex based on port and then eventually the right application gets it because we've de-multiplexed it at the transport layer once we've gotten through the network layer, questions. Now, let's look at these transport protocols a little bit. So transport protocols are things that we put on top of IP. We gave you UDP earlier that's protocol 17 that really means that you put a 17 in that red field I showed you earlier. This is a no frills extension of best effort IP to be processed to process rather than just machine to machine like IP is. TCP, which is something which we'll talk in more detail about in the next number of slides is more reliable, okay? It's so it's got connection set up and tear down. You discard corrupted packets. You retransmit lost packets. You make sure there's flow control so you never overflow anybody's buffers. There's congestion control so that if too many people are trying to use a link in the middle, everybody fairly backs off and so on. And so that's gonna be a slightly different animal than UDPs. And furthermore TCP is a stream which I'll show you in a moment. There's a lot of examples. Obviously there's eight bits there in that protocol field. So there's many different things other than UDP and TCP. There's for instance, DCCP which is another datagram protocol. There's RDP for the reliable data protocol. There's SCTP which is a pretty cool multi-stream version of TCP that isn't used all that much. But so there's many different things you can put in that protocol field. So just to flash back to, I don't know, a month and a half ago we were talking about this client server example for a web server. And if you remember, we talked through the various setups and so on where server gets a listen port, the client connects and then there's a socket that's set up and so on. And ultimately, once everything's set up we somehow are able to write and read through the socket and everything just works reliably as a stream. And so we're gonna talk about how that works. Now the question here that's in the chat is sort of how many non TCP and UDP protocols are actually used. They're used for a lot of things that you might not normally encounter. Like for instance, if you have an encrypted VPN from point A to point B, some of the, one of those protocols is used basically for the encrypted packets. And there's other versions Port 500 that's actually a UDP, so that's a UDP packet but that's used to set things up and then it's in the encrypted IP after you're done. There's a number of other protocols there that are actually used in ways that help manage. So they're around the outside of the typical connections that you run into, but obviously TCP and UDP are extremely common. But once you get into another thing, I guess another good example would be when you get into some of the streaming multimedia then when you get into streaming multimedia connections those are also other protocols. So data link is talking about the part of the protocol that gets you one hop. That's part of the networking protocol. Data gram is just a packet that gets tossed through the network and that might or might not make it all the way. So those are different things. Data link is the layer in the networking layer. Data gram is the thing that we're sending. It's a packet. So back to our sockets here, let's take a look at kind of what is involved in this middle part here and actually communicating. And then we'll talk about set up and tear down. So the problem of getting reliable delivery is that all physical networks garble or drop packets. We said that already. So the physical media has lots of problems. Like the packets might not be transmitted or received. It might be that multiple people try to talk at once in which case there's an exponential back off that has to happen. If you transmit close to the maximum rate, you might get more throughput but you might start losing packets. And so sometimes there's this trade off between throughput and absolute reliability. There's also, if you're in a very low power scenario, you might transmit an extremely low voltage right on the edge of a bunch of errors occurring but you put a heavy forward error correction code on it to make up for that. And so there's a lot of playing with the fact that these packets are unreliable. And if you remember from the end to end principle again, if we put reliability by retransmitting on the end points, it means that things don't have to be perfect in the middle and in fact, we may not want them to be perfect. We just want them to be good enough that we can retransmit and get the data through eventually. The other thing that's gonna be a big deal is congestion. So if too many people try to go through too small of a pipe in the middle of the network, then they're gonna have to stop dropping, start dropping packets because the routers will have more input than they can for their outputs. And so they're gonna have to drop packets. That's kind of the IP idea. So, and there's many options I kind of give here insufficient queue space, a broadcast link with hosts going at the same time, buffer space, the destination rate mismatches, you're sending it too fast and so on, can cause congestion. And then the way we, so we wanna start with that, we have to start with that. We wanna make reliable message delivery on top of that. So what are we gonna do? So we're gonna need to have some way to make sure the packets actually make it so that every packet's received at least once and every packet's received at most ones. And that because if we get duplication that we're not aware of or we get dropping that we're not aware of, then all of our applications that are relying on that are gonna start having problems, okay? Or they're gonna have to do all the work on their own. And this is a level of, this reliability is common enough need that we're gonna wanna make sure that we can do that in a common facility like TCP rather than having everybody roll their own, okay? And we're gonna show how dealing with misordering in the network and dealing with dropped packets and dealing with duplication are actually handled by similar mechanism. So that'll be nice. So TCP is really a stream, okay? So the idea is, this is the alphabet, right? A, B, C, D. So you stream the alphabet in or your bytes in and one side they show up on the other side. Every byte that goes in comes out the other side and we don't see duplication. And the other thing about it being a stream is there's really no, we're not packetizing it. It's just you send bytes in and bytes come out. And if you care about packets, it's gonna be up to you to make a packet protocol where you say, well, every message in my connection is gonna start with a length and then the data is gonna be after that and now I've got a packet, okay? But that's up to you, the user to packetize on your own. Now, of course underneath the covers is all the IP packets but the TCP view is really that bytes go in and bytes come out, okay? And there may be many routers in the middle and it just works, okay? Now this is protocol six in that little red IP protocol point that I showed you earlier. It's a reliable byte stream between two processes on different machines over the internet, okay? And we get read, write, flush, et cetera. And that's exactly with our web server, web client example that we gave you with sockets. The sockets are going to be the things that connect on either end of the TCP and this is basically we're gonna talk about what's inside that process. So some details which we're gonna go into in a bit but since the underlying system has got a limited packet size and so on, it's gonna be up to TCP to take your large streams worth of data and fragmented into lots of little pieces sometimes in the middle of the network, IP will fragment into further pieces. And so we're gonna need to make sure that after we've fragmented it we can reassemble it at the other side and we can reassemble it in order. It's gonna use a window-based acknowledgement protocol and I'm gonna show you a lot more about that in a second to minimize the state at the sender and receiver and make sure that the sender never sends more data than the receiver has space for and the sender never sends things so quickly that it clogs up the routers and prevents other people from using this, okay? And so this windowing is gonna be important for both reliability and for being a good citizen in the network. And obviously automatically retransmitting lost packets, okay, and being a good citizen. So without further ado, so one of the problems is dropped packets. How do we deal with that? And again, we've said multiple times that all physical networks can garble or drop packets. And so IP can garble or drop packets as well. And so that means we gotta build reliable message on top of that. And so the question is how are we gonna do that? Well, the obvious thing to do or maybe not so obvious, the thing that we do is typically use something called acknowledgements, okay? And so the idea here is you've got A communicating with B and so A sends a packet to B and then B sends an acknowledgement back, okay? And what is the acknowledgement good for? Well, it says, first of all, B got it. It says, hi, I'm B and I've got this packet, okay? And assuming that we put a check sum on the packet then B can also detect garbled packets and just throw them out. And in those instances, you could imagine B sending back a NAC or a negative acknowledgement. In fact, what happens is B just treats a garbled packet as one that just never arrived. And so that's gonna cause the other mechanism to come into play. So if A sends a packet to B, which gets lost along the way or garbled, eventually there'll be a timeout at A and then A will send the packet again and eventually we get an act, okay? So some questions about this. If the sender doesn't get an act, does that mean the receiver didn't get the original message? What do you think? So just because A doesn't get an act back, okay? Right, so I see no, I see unknown, I say, who knows? Good, this is very philosophical tonight. So just because you don't get an act doesn't mean that A didn't successfully transmit something to B, like for instance, the act could have gotten lost on the way back. So what that means is once we do a timeout and retransmission, suddenly we've got duplication as an issue, okay? So what if the act gets dropped or the message gets delayed, same idea? So now all of a sudden we've got issues here. Now I see somebody asking about Byzantine. So we're gonna assume here in the moment that the network is trying to do its best to act in the way it's supposed to. So we're not gonna worry about malicious components in the middle for B being malicious. So let's just look at the underlying message transmission and then the way we get Byzantine agreement on top of that is we build something on top of unreliable messages, but let's at least see whether we can get our messages to make it from A to B, all right? So what we've just talked about here is what I would call stop and wait. So we send, we wait for an act, repeat, okay? This is like, put it into the washer, turn it on, wash, repeat over and over again, right? So we call the round trip time is the time from the sender to get to the receiver and the act to get back. The round trip time represents basically twice of the transit time, of course. And the receiver, we can talk about a one-way path, which is the time from when the sender sent it to when the receiver got it. And so two times D is gonna be our round trip time, okay? And we keep doing that. And as you can imagine, the problem with this is there's a lot of lost opportunities here because we have one packet kind of going at a time, okay? And how fast can we send data? Well, we can actually use Little's law of all things. If we've got a B bandwidth and at times a round trip time kinda tells us something about the number of packets that are on the wire or waiting in the queue. But in fact, we've set this up so that we only have one going at a once. And so the bandwidth is basically one packet per round trip time. And this depends only on latency, not on the network capacity. So it doesn't matter. You could basically have strings and two cans on it on either side here for all that matters because we're not sending very fast. This doesn't have to be a gigabit link, okay? In fact, you could do this computation pretty simply like suppose the round trip times 100 milliseconds, the packet's 1500 bytes. You come up with about 120 kilobits per second, which is pretty slow, okay? So this is clearly, this stop and wait is clearly not what we wanna do. We gotta get some more packets going, okay? So if you have 100 megabits per second link, you're wasting a lot of it, you know, almost 1,000 times. So, and the other thing is, how do we know when to time out and retransmit, right? So here's a case where the sender sent something, the act didn't make it or got lost somewhere along the way. Clearly the time out needs to be at least as long as the round trip time before we start resending because otherwise, you know, we'll resend before getting the act back. So that's not so good. So we're gonna need to be estimating this time out somehow with knowledge of the round trip time. And, you know, if the time out is too short, you get a huge amount of duplication, it's too long, then the packet loss really becomes disruptive even if you just happen to lose one packet, you'll wait a huge amount of time to keep going, you're gonna really suffer for your communication, okay? So, and then how to deal with duplication. I mean, here's a situation maybe where the act just got delayed and we went to retransmitted, but then the act comes in and we get another act and now we got two copies at the receiver, okay? So how do we deal with message duplication? Well, we put a sequence number in, okay? And this is a very simple bit sequence number where the sequence number is either a zero or a one. And the idea is the sender is gonna keep a copy of the data in its local buffer until it sees an act for that sequence number, okay? And then furthermore, the receiver is gonna track packets and by having exactly two options, a zero or a one, then the receiver can figure out if the, if there's a retransmission because it'll see two packets zeros in a row and it can know to throw one out because it's a duplicate, okay? So that when we start putting some numbering, acknowledgement numbering or sequence numbering onto the packets, we can start getting rid of duplication at the receiver and figuring out how long the sender needs to hold on to things to retransmit, okay? We're gonna call this the alternating bit protocol. So the pros of this, of course, it's very simple. It's one bit. The con is really if the network can delay things arbitrarily, then you had a packet zero that might have gotten stuck in some router in the middle and then got transmitted later, you might not be able to disambiguate the duplication with only one bit. So clearly that's a problem and furthermore, we're still doing one packet at a time in the network. So this doesn't look great. So what should we do here to up our bandwidth and deal with more unexpected delays in the network? Okay, don't wait and send more packets, all right? I'll buy that, but that would seem to make the problem of disambiguating duplicates at the receiver worse. So what else do we have to do? Okay, yep, we're gonna sort packets later. So what do we need in our sequence numbers? Yeah, so we're gonna need more than a bit, right? Cause one bit, you know, the distinguishing between packet zero and packet one and then repeating with packet zero, that's clearly not enough. Okay, so we need a bigger space, larger space of acknowledgements. Okay, so that seems simple, right? Sequence numbers. And now we've got pipelining possibilities cause we don't have to wait for each act before we send more. Okay, so here's what we had before, you know, sender sends receiver receives, but now we have the potential to have many outstanding packets and many received packets in a way that basically allows us to fill up the network. Okay, so if you look during this round trip time, what you see is during that round trip time, we have many packets that are on their way to the receiver and many acts that are on their way back. And as a result, we can actually fill up the network pipe and start getting our actual network bandwidth back rather than something that depends on the round trip time. Okay, so the acts also are gonna serve a dual purpose here. So one, if assuming that every one of these outgoing packets has a unique sequence number on it, then clearly we can confirm that a particular packet got back here because we see its sequence number and we can deal with ordering. So if we have packets zero, one, two, three, four, five, six, seven, whatever, and they arrive out of order, we can reorder them at the receiver side back into sequence number order and deal with misordering, okay? And so the acts in addition to this reliability aspect also help us with ordering, okay? So this seems like we're going into a good possibility here. Now, how much data is in flight? Well, if you take round trip time times whatever your actual bandwidth is, okay? That's gonna give you the window, the sending window that basically makes sure that you have a lot of data out in the network and basically lets you fill up the pipe both in the forward and reverse direction, okay? And so B in this case is bytes per second. Remember, this is the something we learned in chemistry in high school, basically you gotta match up your units. So round trip time is in seconds, B is in bytes per seconds, the total here is in bytes. So in this case, W send is how many bytes do I wanna have in the network at once in order to make sure that nobody is waiting for packets, okay? And so this W send is like the sender's window size and packets in flight, if we wanted to count packets instead, we could take this sending size divided by the packet size and that tells you how many packets we need to have outstanding to fill everything up, okay? So how long does a sender have to keep packets around? So that's an interesting question, right? So the question is, how long do we need to hold on to this? And the answer is, well, until we know that a particular packet has been acknowledged, right? And so certainly we need to have enough buffer space in the sender to have at least a round trip time, probably a little bit more in order to allow us to lose some packets and cause some retransmission, okay? Now, the other question is would a timeout result in starting over from the beginning? Well, what do you think? Do we need to resend every packet if we lose just one? So good, so it seems on the face of it that we'd wanna only send the ones that haven't been act. And because we have labeled every packet with a sequence number, then in principle, we could figure out which ones haven't been received and which ones need to be acknowledged again, okay? And so that's certainly plausible for us. Now, it depends on your protocol, whether you always have the ability to individually transmit packets or not, or whether you have to go back and do a certain range of them or whatever, but at least in principle, we have enough information to resend only the things that were lost, okay? Now, how long does the receiver have to keep the packets data? So the data at the receiver side certainly has to be there long enough to do reordering. So if we get a bunch of the later packets, we need to make sure we have enough space to absorb the early ones so that we can wait, absorb the early ones and then send them in order to the actual application at the receiving side. So we need to have enough space for that. And also we're gonna need to store data until the application's ready. So perhaps it's busy doing something else and it hasn't executed a read against the socket yet. So we need to hold on to data at the receiver as well. And then of course you have to worry about the following. What if the sender is blasting packets at the receiver and the receiver just is too slow and as a result, a bunch of the data that was sent actually made it to the receiver only to be thrown out at the receiver. So that seems like probably a bad idea, right? So here's a bunch of interesting questions. Okay, so let me talk a little administrative here. Just remember, got a midterm, not this Thursday because folks are going to be hopefully overindulging in food on Thursday, but next week from Thursday is going to be midterm three, okay? And camera and Zoom Scream Shattering, just like in midterm two will mail out all your links. There's gonna be a review session link that'll come out in the next day or so. And everything up to lecture 25. So it's this lecture and the next Monday's lecture and we have no lecture on Wednesday this week, okay? And lecture 26 will be a fun lecture. So if there's any topics in particular you wanna cover, let me know. And I don't think I have too much more to say on this. So a question about is this closer to a final or closer like midterm two. As I think I've said before is every midterm is in principle cumulative in the sense that you need to not have forgotten everything that you learned, but we will certainly focus on material in the last third of the class. But we certainly will potentially ask you questions that would require you to have not forgotten everything from earlier parts of the term. Now, I'm not gonna go in this in great detail but please be careful with collaborations. I realize we're getting on to the end of the term but remember that explaining things to someone at a high level and discussing things at a high level but not sitting down line by line going through everything. If there's a lot of individual syntax transfer on homeworks and between project groups it's probably too much sharing, okay? And so just be careful, all right? And don't get friends into trouble by asking them for their code over and over again because you'll put them in a bad position as well of having violated our policy. So try not to do that. Okay, I've talked about this last couple of lectures so I don't wanna go into it in greater detail. So let's keep going on this a little bit. So I think the idea of having a big acknowledgement space or big sequence number space and sending a bunch of messages into the network to get pipelining sounds like a good idea but if you remember here when we set up queues or pipes between processes on a local machine we had a queue in the middle and we had blocking because the queue had fixed capacity. So if you wrote and the queue was full the writer would actually get put to sleep or if you went to read and there was nothing in the queue the reader would get put to sleep. And so we would like to have something similar to what we had with pipes but across the network and using TCP the question is how do we go about that? Okay, so buffering in a TCP connection we have process A and process B there's a send queue on A side and a receive queue for that particular stream and then there's also one going the other direction. So typical if you remember sockets are bi-directional and when we set them up we have queues on both sides and we wanna make sure there's proper blocking so no data gets overwritten or otherwise lost. Okay, and so a single TCP connection needs four in memory queues as we just said here and the host's window size for connection is sort of how much remaining space it has in the receive queue. So for instance in this case if this receive queue has a hundred bytes left in it the host is really only allowed to send another hundred bytes until things start becoming acknowledged because we never want the host to basically overwrite the receive queue. And furthermore just an acknowledging that they've been received is not enough because the receive queue could still be full because host B hasn't pulled things out. So what we really need to say is we need some way to for the receive queue on either side to tell the sender how much space it's actually got left in its queue and make sure that the sender never sends more than that and that'll prevent us from overriding at the destination. Okay, and so a host advertises its window size at the receive queue in every packet going the other direction. It keeps saying well here's how much I have in my queue now here's how much I have in my queue now and as a result we can do this buffer management so that we never overflow a host or lose data. Okay, so the idea is we're gonna build a sliding window protocol so the TCP sender knows the receiver's window size tries never to exceed it. Packets that it previously sent may arrive filling the window up but we wanna make sure there's never more in transit than there is buffer size at the destination and you're allowed to keep sending data as long as there's enough space guaranteed at the destination. Okay, and I'm gonna show you how that works in a second here. So the idea here is I'm gonna talk about packets of space at the receiver even though normally it's bytes. And so the window size to fill is we have a, let's say we have a bandwidth of packets per second times around trip times gonna tell us how much we wanna have in flight at once. And this is a form of little laws again a little law again to figure out sort of how much we can go with but for instance, here's a case where we have an unocked act packet which we're gonna call packet one that got sent another packet. So now the send window is got it says that one and two are outstanding. Here it says that one, two and three are outstanding and we're gonna assume that we're not allowed more than three packets at the destination. Eventually what happens is because one came in in order, we've received it and we potentially sent it up to the application. At that point, the receiver will say, well, I actually now have space in my destination for another one, at which point we'll send another one and so on. Okay, and so this explicit tracking of and here the receive queue is basically never holding on to anything and sending it up. So each one of these acts is basically saying, well, I still have three available. I still have three available. I still have three available. But you can imagine if the receiver was basically holding on to it but the application, the receiver wasn't absorbing it then this queue would start filling up. Now what if you never get an act from the receiver? So what happens in that case is that if we go back to this point, this sender will stop sending because it only knows that it's got three packets worth of space and it'll stop. And if there are no acts that comes back then at that point I'll resend from the earliest one that's missing. So I'll start resending one and then two and three over and over again, waiting to finally get an act back. And then once I got an act, then I can go forward. So the short answer to what happens if you never get an act is you go up to the point at which the receiver has enough buffer space and then stop. Timeout doesn't necessarily, it might reset a little except for the fact that if I time out at this point I'm gonna keep resending stuff that's in my send buffer and when it gets to the receiver, the receiver knows that there is space for it because it's the first slot in the receiver's buffer queue. So I'm never gonna get past sending packets one, two or three until I get one of them actually act and then I can send pack four. So it isn't a full reset on timeout. It really is a, oh, some of the stuff that you thought you sent must not have gotten there because I got a timeout, I'm gonna resend things. Okay, so the difference between timeout and acknowledgement is timeout is a resend, acknowledgement is moved forward and notice how this window here is advancing. So once I've got this first act, now I've got two, three and four here are in my sending, sliding window and at the receiving side, potentially I've got these guys have come in here but I'm forwarding things up as quickly as I can and so we're never building up any buffer space at the destination. I'm gonna show you in a moment what happens if you do build up at the destination. Now, here we go. So TCP windows in bytes, not packets. Okay, so if you look, we can think of the space of sequence numbers now in TCP is not a packet count, it's a byte count. So what you can imagine, member TCP is a stream. So there's a continuous stream going in. We have an arbitrary sequence number that we start at and then we can look at this space of sequence numbers where each sequence number represents another byte in the stream, okay? And so we have the set of sequence numbers representing bytes that have been sent and already acknowledged. We have the set of bytes that have been sent but not acknowledged and the set of bytes that haven't been sent yet but this is a continuous stream from the initial sequence number incrementing by one each time. And then at the receiver, we have the same set of sequence numbers, okay? And so we have this side are parts of the sequence numbers that have been received and potentially given to the application. Here we have ones that have been received and are being buffered and these are ones that have not yet been received yet. So this buffer here in the middle is the thing that we wanna make sure we never overflow, okay? And I'm gonna show you how that works in a moment, okay? Questions. So we're not acting on packets, we're acting on bytes. And that means we can act a whole group of bytes at once by giving the sequence number of the end of the bytes. Let me show you and this is where packets come back into play. But here's an example of the receivers receive queue. Receive queue. This is an acknowledgement that came back from the receiver to the sender, okay? And what it's saying here is we're on sequence number 100, is the next sequence number that I'm expecting and there's 300 bytes worth of space in my queue, okay? And so now we're gonna send a packet in TCP that says here's sequence number 100, it's got 40 bytes in it. So that means that after this packet's received, what I acknowledge is I'm gonna acknowledge 140 is my sequence number because I've received 40 new bytes from what I had before. And furthermore, notice that what I'm saying here is that the buffer now only has 260 bytes free, no longer 300. And as I go again, you'll notice that the number of bytes free keeps going down. So what that tells the sender is that the buffer on the receiver side is filling up and it's never gonna send out more into the network than it knows is available. So at this point at 210, it knows that sequence number 190, it can do another 210 bytes above 190 and be okay, okay? Now here's an example where something happened to a packet in here, the one that was sending between sequence number 190 and 230 and it's got lost somehow. But we sent another one, which was sequence number 230 with size 30 and we got back an acknowledgement which might not be what you expected. If you look here, what you see is the acknowledgement says, well, the latest most sequential thing I've received is at sequence number 190, okay? And there's 210 after that that's available. So this particular base TCP protocol acknowledges the sequence number that represents a solid set of bytes up to that point and ignores holes and other things that might've been received beyond it, okay? Now, this is useful if you can imagine because what it really says is it's, yes, it's got back some data and received some data, but it doesn't make sense necessarily to acknowledge this fully yet because it's not useful to anybody in the streaming protocol. Now, let's look a little further. You can see that this continues for a while and we haven't changed anything about our acknowledgements and the reason for that is we're missing bytes between 190 and 230 and eventually there'll be a timeout. We're gonna retransmit the missing data and if you notice what happened there, we fully filled in the hole because the buffer at the receiver is doing the right thing and the acknowledgement that comes back now is, oh, I've received everything up to sequence number 340 and by the way, I only have 60 left. And so then we can finish this up, et cetera. And at some point when we start feeding these up to the application because say it did a read of 40 or 30 or whatever it is, then these acknowledgements will start coming back and saying, oh, here there's more space in my buffer. So if you ever wondered why when you set up a TCP channel and you start sending data and the other side freezes and doesn't, the other application isn't absorbing the data, then the TCP channel will literally shut down because it knows that there's no buffer space at the receiver. All right. So, all right. And at that point, basically we've shut down because we've filled up all the buffer space and the application at the receiver side isn't absorbing any and so the sender is stopping at that point. And the way this worked out for us is all of the information we need is in this Q size at a given sequence number. And so that'll allow us to put in as many bytes into the network as we want in a way that won't violate this notion that all the bytes in flight would fit in the buffer space at the receiver. So we have enough information to never violate that. The only other thing now is to only send enough data out into the network to try to meet that round trip time span with requirements. It's actually the bandwidth is the slowest link in the middle and no more because otherwise we'll start causing congestion. So here's a question. So during the time when the 190 packets missing, let's just go back here. What if the sender sends too many packets and causes the receiver buffer to be full? Since, so the thing here is it's not gonna send too much. It's not gonna send 210 bytes past the one it sent. It's sending 210 up to 210 bytes past the 190. So it knows that this is the space that's free. And that means it knows that past 340, it doesn't have more than 60 here available. So it's not gonna send anything past what would fit in this. And it's up to the receiver to reorder based on sequence numbers to put things back in the buffer. Now, what if you already go beyond 400 before retransmit? Again, that's not gonna happen because we are never gonna get the go ahead to transmit beyond 400 until the buffer space opens up here. Because when we get to this point, we will never have sent beyond 400 because we will know that that would bring us down to pass zero and so it'll never happen. And it's only when this opens up again after these have been absorbed by the client that we can start sending again. Good. So congestion is an issue. So congestion is because we have too much data flowing through the network. Okay, and if you look all of this different data is all using shared links. And so IP solution here is to drop packets. And the question might be what happens to a TCP connection while you end up with lots of retransmission. So if you drop lots of packets, what you saw there is you end up with lots of retransmissions. By the way, I should say back here on this particular example, I want you to notice that the sender knows where the data was missing because it knows that it was at sequence number 190. And the moment it sends that missing data, notice that the acknowledgement went way all the way up to where it is still missing. And so at that point, the sender is not going to retransmit the remaining stuff. It's going to pick up where it left off. And so we don't get duplication there, okay? And there are protocols that let you know more about more holes than one at a time, but we won't go into that now. So with congestion, we need to limit congestion, okay? And so why do we get congestion? Well, there's shared links in the middle and there's too much data going into the shared link. And so whatever router is at one of these shared points starts dropping packets. And so what we really want to do is we want to back off so that we don't send too much data. And so we want to back off so that everybody that's sending together the rate doesn't exceed the rate of the router at outgoing links, okay? And so that's a congestion avoidance property. And so we can really figure this out, like how long should a timeout be for re-sending messages? So clearly if it's too long, we waste time if the message is lost, it's too short. We retransmit even though an act will arrive shortly. So we need to be tracking the round trip time clearly, but there's a bit of a stability problem here. So if there's more congestion, then acts are delayed and you start getting timeouts which send more traffic, which cause even more congestion and you start getting this positive feedback loop that causes everything to break down, okay? And so you gotta be very careful to choose the sender's window size, not the receiver, but the sender, how much data it's gonna allow to be outstanding so as to avoid congestion, to avoid this positive feedback loop. And obviously the amount of data the sender can have outstanding's gotta be less than what's at the receiver so we don't over flood it, but it's probably gonna be less because we're gonna be trying to match the amount of data we have in the network with the round trip time and the bandwidth of the slowest link in the middle. So we're gonna try to match the rate of sending packets with the rate of the slowest link. There's an adaptive algorithm, which is gonna adjust the sender's window size. And there's a lot of interesting things, a lot of interesting algorithms that have been developed over the years to deal with that. I have one up on the reading for tonight. The Van Jacobsen paper starts talking about this a little bit if you're interested. But the basic technique is gonna be I'm gonna start small and I'm gonna slowly increase the window until I start getting acknowledgements missing. So once I've got that to be too big, I know that I'm sending too fast, I'm gonna back off. And that's the basic way that these adaptive algorithms try to get enough data in the network to make maximal use of that slowest link, but without causing congestion. Okay, this is called slow start, which is you start sending slowly. And typically what happens is when you start receiving, when you start receiving acts being lost, then you cut in half and you work your way up. And so typically there's the sawtooth behavior as it's trying to adapt and figure out what the right amount of data to be in the network. The cool thing about these kind of adaptive algorithms is that if a new person comes along, all of a sudden the acts will be lost, you'll start losing packets, both will back off until they hit a situation where they're both equally sharing the link in the middle. And that's kind of the way these congestion avoidance algorithms work. And so you can take a look if you actually measure what TCP does, you get this typical sawtooth behavior around the right bandwidth for that middle link. So the question here is, aren't acts more likely to be timed out with smaller windows? I'm not sure I fully understand there. The acts are coming back in the other direction. And the acts are basically reflective. What's happening is when you see that the same act comes back over and over again, you know that the data you sent out got lost. And so that's the notification that the forward packets have been lost. And that's the point at which you make some decision to back off the amount of data you have in the network. Okay. Now, so if you recall the setup, remember this where you request the connection, the server socket is listening. It takes the connection, it constructs a new five tuple style of connection between two sockets and then it lets you go. And so remember the five tuple is a source IP address, destination IP address, source port, destination port and protocol like TCP. And that setup is really setting up a TCP channel. Okay. And so what does that mean? So to establish, we have to open a connection. That's a three-way handshake. Then we do what we've just been talking about, which is transmitting data back and forth. And then we tear everything down when we're done. Okay. And so here we're back to this client server, but now let's look at this part, which is the setup. Okay. And it's really a three-way handshake. So the client, so the server is calling listen over here. The client calls connect, which sends a request over. All right. And it looks like this. It's a sin synchronous bit is set in the header. It proposes a sequence number for communication from client to server. The server accepts the connection, it sends back an acknowledgement on that forward sin and a new sin for the other direction. Okay. With its proposed sequence number. All right. And then finally, there's an act coming back. So this last act is acting the server's connection from server to client. So it's three messages. And when you're done, you've both agreed on a sequence number in the forward direction, a starting sequence number in the reverse direction. And you've both agreed that this is a connection that's going forward. Okay. Great. The other thing is just to show you the shutdown. So shutdown is actually a four hop thing here. So when host one is done, it sends a fin bit in the header. The host acts the fin bit, but it also acts the fin bit and it sends its own fin bit. So this is a fin act, excuse me. And the remaining data, and then eventually it closes things down with the fin and you get a fin act going the other direction. So there's actually four control messages to shut down. Okay. And then eventually after a timeout, everything's deallocated. And I'm not gonna go any further on this, but just like regular files, if you have multiple file descriptors open on a socket then the socket's only really shut down when all of them close. Okay. So how do we actually program a distributed application? So we need to synchronize multiple threads on different machines. So if you remember, this is from last time I was talking about messages. And so now we've got this idea of how to build a reliable stream in both directions. And so the question is now, what next? Well, suppose we wanna build an application on top of this. Well, one of the things that comes up is what's the data representation? So an object in memory on one side has a very machine specific binary representation that may mean nothing at the other side. So if you're trying to send data from host A to host B and you want it to be understood on host B, what are you gonna do? Well, you're gonna have to agree on some standardized way of communicating with each other. Okay. And so the absence of shared memory, externalizing an object requires us to basically take an object, which think of a link list for a minute, right? It's a bunch of objects that are all linked together with the dresses and all that sort of stuff. And we need to serialize that into bytes so that it can be sent over the link. Okay. And this serializing into bytes and then marshaling it together into an object, the object together into a message and then sending it off is what you do at the sender side. On the other side, you un-martial, so you take it apart and you deserialize it back into a local representation on the other side. And it's possible that the two communications are, or excuse me, the two hosts have different representations like one might be big Indian and the other small Indian. I'll remind you what that is for a moment. So this serializing and marshaling process has to be done in a way that allows the two hosts to communicate no matter what their representation for various things are. Okay. So simple data type, let me just show you this, for instance. Suppose you got a 32-bit integer and I want to write it to a file. So let's back off from KaSockets for a second. You open the file, okay, that's all, find a dandy. And then you have a couple of choices. One, you could actually print it out as an integer in ASCII text. The other is you could write it as a binary with four bytes. Okay. And those two things look very different in the file and the person, the application that reads it back in needs to know which it is, otherwise it's not going to be able to interpret them. Okay. So neither of these two things are wrong, but the receiver needs to be consistent. Okay. And this gets even more tricky when you're going across the network because if I'm trying to send, you know, a four byte number, 32 bits across the network, how do we know that the recipient has X in the same way? Okay. Like for instance, if you remember from 61C, they talked about endianness, like several of these different types of machines are big endian, the number are little endian. And the question is sort of, how do we match those up if we're trying to communicate? Okay. Here's a good example of a little endian machine where we take an integer OX12345678 and then we scan through the in-memory representation and what you see is the first byte of that in-memory representation is a seven eight. So it's actually the least significant byte of the integer is actually in the first byte in memory. So this is clearly a little endian machine. Okay. And you can write this endianness routine on your own and try running it and see what you get. So what endianness is the internet? Well, the internet has chosen big endian as the standardized network byte order. And so typically what happens is when you're sending something across the internet, you actually put network byte order, you put things in network byte order and then the other side unpacks them from network byte order into its local host order. So you have to decide on wire endianness. We just decided, for instance, if we're talking across the network, it's typically big endian. And then we convert from the native endianness to the on wire format that's in the source side of the communication and then we unpack it on the other side from the on wire endianness to the local format. Now, a downside of this perhaps is the fact that if you take two little endian machines and they communicate over the network, they're both gonna convert to and from big endian to make that communication happen. So the question is, is there a rationale for big endian versus little endian on the web or do you mean in different processors? You know, the web, if you're asking why it was big endian network byte order, I think the good thing about big endian is you can look at numbers in, if you were to take a hex dump of some memory and you look at a big endian number, you can just read it directly out. So big endian kind of has that nice property that it requires a little bit less brain gymnastics to read through a memory dump. That would be my only explanation of why that was preferred, I don't know. I guess at this point, it's all about standards. And so I could just say, well, it is what it is and we got to stick with it. But I think probably people like big endian because you can read it directly out. Now I grew up with little endian processor, assembly language design when I was younger. And so I'm not as thrown for a loop when I see little endian numbers because I rescramble them in my brain and it mostly works okay. But anyway, I think that's the reason people like the big endian because you don't have to rescramble. What about richer objects like lists and whatever, what do you do? Well, if you wanna transfer a linked list of structures from point A to point B, you gotta come up with some standards for serializing that so that they can be packed and unpacked. And there's lots of serialization formats. There's JSON and XML, you name them. In fact, if you were to Google data serialization, you'd find a whole bunch of different types of serialization. So there are many languages, there are many serialization formats. So of course this is a new issue with standardization. You have to make sure that when you're using a serialization mechanism from point A to point B, you actually do the right thing to do that serialization. Okay. Now, so raw messaging, where you just send a message from one side to another and then you build something out of it is pretty low level for programming. You have to do a whole bunch of stuff on your own and you also have to deal with machine representation by hand and calling the things we said back there. The alternative is a remote procedure call idea, which you call a procedure on a remote machine and the idea is to make communication look like an ordinary function call. And you're gonna automate all the complexity of translating between representations. Okay. And so for instance, the client might call remote file system, read RudaBega and at the remote side, the thing reads the file RudaBega and sends the results back. And as far as the client and even the server is concerned, they're just executing a function call and getting a return. Okay. So that's called a remote procedure call. And the concept here is pretty simple. So here's a client. It wants to execute this function of two arguments, which turns out it's gonna be on a remote machine. What's gonna happen is to call, it's gonna go through what's called a stub, which is gonna marshal all of these arguments, V1 and V2 and put it into a standardized serialization format of some sort, send it to the receiver. The receiver stub is gonna unpack it, call the function on the server side. Server is gonna give a return value. We're gonna go back the other direction. Okay. And then return at the client. And if you notice, really these stubs are things that are just linked into the client and the server like regular library function calls. And they have this nice property that when you link function F with this stub, what really happens is when you call F, it ends up sending and receiving messages. Okay. And the server when it links with the server stub is really gonna end up giving its functions to be called by remote clients. And however, when you write the code inside the server, you're gonna just be writing normal functions. Okay. And so this is basically the idea of remote procedure call. So as far as the client's concerned, they're making a procedure call but it's happening remotely. Okay. We can talk about the client's stub interacting with handlers that send messages across the network on multiple machines and that this is really a machine-machine boundary. Okay. And really there's also a application-application boundary. So we're gonna wrap some ports in here as well. Now, can you use RPC for inter-process communication on the same machine? Absolutely. Okay. And what's kind of cool about this, that's a good question, is really that you could start out with this server on the same machine and then if the machine got overloaded, you could migrate the server to another machine and as long as you clean up the packet handling stuff so that the packets are now directed at that remote machine instead of the local one, you don't even have to change the code. All you see is a change in performance. Okay. Now, so the way that this implementation works in general is request response message passing under the covers. This stubs on both sides are providing glue on the client and server side to glue functions into the network. So the client stub is marshaling the arguments and un-martialing the return values where marshaling is putting into a packet, taking out of the packet. They're also responsible for doing the data representation serialization we talked about. The server stub does the opposite. Okay, so marshaling involves converting values to canonical forms, serializing the objects, copying them to be passed by reference, et cetera. And so some details here. There's an equivalence really between the parameters of the function call going to a request message. The result is a reply message. The name of the procedure is typically passed in the request message and is used to decide the receiver stub which function gets called. There are mailboxes on either side. So you need to know both the IP address and the port on each side in order to do this connection. The interesting part about this is there's a stub generator which is really a compiler that generates stubs. So what you typically do is you define your RPC with a interface definition language or IDL which contains among other things, the types of arguments, the return values, et cetera. The output is gonna be stubs in the appropriate source language. And when you design your interface by writing in the IDL and then when you produce it out of the compiler, you now have code that you can link in at both the client and the server side and now you're able to do RPC. Okay. So the way we deal with cross-platform issues is exactly what we just talked about. We're gonna convert everything to and from a canonical form. And this is where your particular type of RPC, so there are many types of RPC out there, will define as part of it what is the canonical form or what is the way that things are serialized. Okay, so that's a part of the RPC package. So how does a client know what they're connecting to? Typically you translate just like with regular DNS and IP you're translating the name of the remote service into a network endpoint, remote machine, port, maybe some other information. And the process of binding is the process of converting basically a user-visible name for that service like a file server or something else into a network endpoint like an IP address and port and then connecting it all up. And so then once you do that, now the client can just be doing procedure calls and they're going to the remote machine. Okay, and this is another word for naming and you could either compile in the destination machine or you could have a dynamic check at runtime. And the question is when are the stubs initialized? So the stubs get linked into the program and they get initialized at the kind of before you actually start executing code that has the RPC in it. So there is this initialization process which you would call into the RPC library to do the initialization stuff. And once it's now connected, then you can make your calls. So this dynamic binding is good because most RPCs use dynamic binding via some name service, just like if you were interested in, you know, www.berkeley.edu, you go to a dynamic DNS service to find the current IP address. There's most RPC systems have a dynamic binding service where you say what service you're interested in, certain file service of a certain name and it will figure that out for you through a binding process and decide what the actual IP address is and so on what the port is. Why do we do this? One, we can do access control to basically dot, even give back the names of machines if people don't have access. The other is failover. So if the server fails, we can basically failover to another one just by changing the binding. If there's multiple servers, you can have flexibility of binding time. So I mentioned last time or time before that Google does this a lot. When you go into a Google search and you do it from Northern California versus, I don't know, Boston, you're gonna get different places for Google. In fact, you're even gonna get different times of the day you might get different server names or the same server name, different IP addresses from the Google resolution. And what they're doing is they're balancing load that way. And so that's why a dynamic RPC service is good that way as well. Okay, I think that's all I wanted to say there. So what are some problems with this idea? So this seems really cool. Different failure modes in a distributed system than on a single machine. So think about the number of different failures. If you're maybe a user level bug causes an address space to crash at the other side or a machine failure, kernel bug causes all processes on the same machine to fail or some machine is compromised by a malicious party. So in the old before RPC, what you'd end up with is a crash as a crash, pretty much everything fails. After RPC, you're now reaching out to different services on the network. And it could be that you get partial failures because only some of them are working. Okay, now the question here, does RPC usually run over TCP? It either runs over TCP or if it runs over UDP, which it can occasionally, it's gotta have its own reliability protocol underneath to make sure things work. So it often running over TCP is certainly the simplest thing for it to do. So before RPC, the whole system would crash and die after you got partial failures, okay? And so you end up with an inconsistent view of the world and you're not sure if your cache data got written back or not. You're not sure if your server did what you want. And so the handling of failure gets much more complicated in an RPC world, but you gain the ability to have your services handled from many places, okay? So the problem that RPC is a solution to again, is that RPC basically gives you a nice clean way of looking at remote communication just as a file as a system call, excuse me, strike that. It basically lets you look at remote communication as a procedure call. And that procedure call, you don't have to worry about marshaling the arguments. You don't have to worry about serializing. You get the return value back. And so your code is nice and clean. It looks like a bunch of function calls, okay? The downside is you need to make sure that you are able to track failure modes carefully. And I will point out by the way that there are a lot of services that use RPC precisely because of the cleanliness of its interface and because it's very easy, as I said, to migrate where the services are from the local machine to remote machines without changing any of the programming. It's just that there are potentially more complicated failure modes that you have to be careful about. And you can do all sorts of interesting things with distributed transactions and Byzantine commit and stuff we've already talked about to make your RPC much less failure upon. So RPC is not performance transparent, right? So the cost of a procedure call is very much less than the cost of same machine RPC, which is very much less than network RPC. So there's overheads of marshaling and stubs and kernel crossings and communication that come into play. So there is a cost to RPC, but the transparency of location is a pretty powerful benefit. And so while programmers need to be aware that RPC is not free, it still is used in a large number of circumstances. And for one thing that I will point out here is now we have a new way for communication between domains. We talked about shared memory with seven fours and monitors. We talked about file systems. We talked about pipes. And now remote procedure calls can be a way to even do local communication. And so you can use this communicate between things on the local machine or remote machines. And just to give you a few, there's many RPC systems. There's Corba, the common object request broker. There's a DCOM, which is distributed COM. You'll see that in Windows machines a lot. There's RMI, which is Java's remote methodification. There's a lot of different ones out there. And one thing I will point out is in the early 80s, I would say there was this notion of micro kernels, which we haven't talked a lot about in this term yet, but basically this monolithic kernel that we've been talking about pretty much puts all the protected code into the kernel address space and applications run on top of that and they make system calls into the kernel. The micro kernel is a little different. The only thing that's in the kernel itself is thread multiplexing, address space control and an RPC service. And so in addition to regular applications, all of these things that we used to think belong inside the kernel, we now put as processes running on top of the micro kernel and using RPC to communicate with one another. And so if the application goes to read a file, what happens is it doesn't open by doing an RPC into the micro kernel, which then talks with the file system, that file system does the open, sends back a handle to the application, et cetera. And so the application is reading and writing from the file system, but doing so basically through an RPC mechanism to other user level processes, okay? And why do this? Well, fault isolation. So if there's a bug in the file system, it won't crash the whole micro kernel, right? It's only gonna crash part of what's going on or if there's a bug in the windowing system, okay? Or other parts of the kernel, we basically have isolated the ability of faults to propagate because we isolate them in their own user level, address space and we use RPC back and forth, okay? And it enforces a level of modularity as well, okay? So this is a good example of using RPC on local machine to help with the overall structure of the kernel. All right, now if you'll bear with me for just one or two more slides, I wanna set the stage for what we'll talk about on Monday. Once we've got a good messaging service and a good way to do serialization and deserialization across the network, we can now start talking about how to build distributed storage. And the basic distributed storage problem is the following. We have a network with a lot of storage in it. So you guys can start thinking about all the cloud storage that you have out there. And we have a series of clients that are all using that storage. And we can start asking some interesting questions about this. So first of all, why bother with this? Well, this is the ultimate sharing scenario because these clients can be using that data that's in the middle of the network no matter where they are. So they could be in the West Coast here using some data and then they get an airplane and hopefully are careful with their social distancing and their masks and they get on the East Coast and now they can read their same data or they can be traveling and their data can be read and written while it's going. And so this idea of network attached storage is a very powerful one, but it's a little different than the type of file systems that we've talked to in this term so far. So among other things, there's what's colloquially called the CAP theorem. This was from Eric Brewer in the early 2000s. And the idea is that there can only be three, there are three ideas, consistency, availability and partition tolerance and you can only have two of them at a time in any real system. So what consistency means is that changes to a file or a database or whatever appear to everybody in the same serial order, that's consistency. Availability says you can get a result at any time and partition tolerance says that the system will keep working even when the network gets split in half. And the problem that you encounter when you have a distributed system like distributed network storage is you start worrying about partition tolerance, what happens if the network is split and if you are gonna be able to keep going while the network is split then you're gonna lose one of consistency or availability. So you can't have all three at the same time. This is also otherwise known as Brewer's theorem. So you can think pretty easily think about this for a moment. So suppose that I wanna always have availability so I can always use my file system and I wanna be able to deal with partitions when the network is split. You can see why consistency might not work, right? Cause if I split the network in half and these clients over here are busy writing data and these clients over here are busy writing data then I'm not really getting consistency because the file system is not consistent. It's got two different views of it on different coasts. Okay, so that's one example of being able to only have two things. If I wanna have consistency and partition tolerance for instance, I wanna be able to make sure I always see a consistent view but I can deal with partitions in the middle. Can anybody explain to me why I lose out on availability when I do that? Why would I lose out on availability? Yep, the reason I lose out on availability is because to be consistent and deal with splits in the network then I can't write anymore and so it's no longer available to write because I can't allow there to be an inconsistent view. Very good. All right, so we're gonna pursue this next Monday on our last official class. We're gonna talk a lot about distributed storage solutions like the NSF and or NFS and AFS. We'll talk about key value stores and probably in the final lecture on Wednesday of a week, next week which won't be responsible for on the exam but we'll talk about things like CORD and CAN and some of the other distributed storage systems out there. All right, so in conclusion, we talked a lot about TCP which is a reliable byte stream between two processes on different machines over the internet. So you get basically a stream and it doesn't matter whether it's local or remote you get the same view of it and we talked about how to use acknowledgements with Windows based acknowledgement protocol and congestion avoidance to make sure that this works well and represents good citizenship. We talked about remote procedure calls which is how to call a procedure on a remote machine or in a remote domain and give us the same interface as procedures but remote, okay? We started talking about the distributed file system in the CAP theorem, okay? And next time we're gonna talk about virtual file system layer and cache consistency and how we can basically build a file system into the network. All right, I'm gonna end there. I hope everybody has a great Thanksgiving. We will see you a week from today back on Monday and hope everybody gets a little bit of a break and enjoys themselves.