 Welcome back everybody. To CS162, we're going to continue our discussion of ways of getting reliability out of file systems. And then we're going to dive into some interesting material on distributed decision making. If you remember last time, we were talking about one of the ways that we get performance out of a file system and that's with a buffer cache. And the buffer cache, of course, is the chunk of memory that's been set aside to hold various items including disk blocks. And the example that I've shown here was basically that when we talk about a file system and we have directory data blocks and inodes and data blocks, etc. They're actually put into the buffer cache, which is typically handled LRU and is the temporary waypoint for data moving in and off the disk. And this is of course the starting point for allowing us to read and write single bytes of data at a time. And this also is an important performance enhancer and we talked among other things about keeping dirty data in the buffer cache and not pushing it out to disk right away and that that had some pretty important performance benefits. It also has some potential issues with reliability if you should crash and the dirty data is still only on in memory and not on disk. So the other thing that we started talking about then and that along those lines was what I like to call the illities. And so that's availability, durability and reliability. And keep in mind that availability is kind of the minimum bar to meet and it's not a very good one oftentimes availability is typically the fact that you can actually talk to the system and it will respond to you it doesn't say that it'll respond correctly. And the other thing that's often the case is, we'll talk about number of nines of availability. So three nines typically means that there's a 99.9% probability that the system will respond to you. More important than availability, in my opinion at least is durability and reliability durability says that the system can recover data, despite the fact that things are failing. And reliability is the ability of the system to essentially perform things correctly. And that's really what you want is you want reliability not availability. Okay. All right. Another way the example I like to give about the difference between durability and availability, for instance, is that if you think about the Egyptian pyramids there's a time when people didn't know what the various hieroglyphs meant that those was written on the pyramids was extremely durable but it wasn't available because people couldn't decipher it. Okay, and it became available only after the Rosetta stone was discovered. So, the other thing we talked about last time is we started talking about ways to protect bits, not necessarily ways to protect the integrity of the operating system file system so to speak, but integrity the bits and we talked about raid which you know from 61 C. And in general raid X, you know whatever your level is is a type of erasure code, which is a code in which you know certain discs are gone, and you fill in the missing discs using the code. That's called an erasure code. And the reason you're able to do that is essentially because the discs have error correction codes on them that let them recognize when the discs themselves are bad. And then you treat the whole disk as an erasure and you bring in the raid codes. And what I did say was that today, discs are so big that raid five which is what you learned about in 61 C for instance, is really not sufficient because it can only recover from one failed disk. And discs are so big now that while you are busy recovering that disk by putting a new one in, it might fail again and at that point you just lose all your data. So, if you ever have a big file system on a big file server make sure you pick at least raid six which is a possibility of two failed discs. And, for instance, even odd as a code that works for two discs that's available on the readings. In general, you can do something that called a general Reed Solomon code like this, based on polynomials. And if you remember, as I mentioned this last time but I thought I'd put this out there when you were learning about polynomials back in grade school. What I learned was that if you have a an M minus one degree polynomial here, as long as you have M points, then you can reconstruct the coefficients. Okay, and so the clever trick with read Solomon codes as you start with something that behaves like real numbers is to go off field. We can talk about that offline if you like, and then you put your data at the coefficients and then you just generate a bunch of points and here's an example where I generate and points where and is bigger than M. And as long as I get M of them back then I can recover the polynomial and then I can get back my data. And so that's an erasure code because I can erase any number of these points here as long as I still have M left. I can get up to M minus M of them and still get my data back and that's a pretty powerful code and you can choose how many you need to recover from how many failures. Okay. And so, oftentimes in geographic replication you can arrange to be able to lose, you know, 12 out of 16 chunks of data, and that's extremely efficient. So, I'm glad that CS 70 also talked about this in general. The other thing we talked about last time. By the way, was there were there any questions on erasure codes at all. Well, you know that raid five is as simple as X or even odd is a slightly different type of X or so that's those are all very fast operations. The Reed Solomon codes come in a bunch of different forms, some of which are fast and some of which aren't. There are a bunch of different types of Reed Solomon's which are all isomorphic to this idea but they're rearranged in a way where it's really fast to encode. In some instances and then it's, it's pretty fast, but typically the decoding phase is an n squared complexity. So decoding can be when you fail the can be expensive. So, the other thing I talked about was, well, we've been looking at file systems like the fast file system and NTFS, which are overwritten when you write new data. So when you put new data into a file, you overwrite the blocks that had the old data in it, an alternative, which you might imagine is a lot more reliable is copy on right file system so here's an example of a file system where I'm just showing you a binary tree. Think of these as the pieces of the inodes and the old version of the file, sort of the blocks are down here in blue, and they're in this tree. And the idea behind a copy on right system is that if I want to say write some new data at the end or overwrite something. I don't actually overwrite the original data but I build a whole new version of the file that uses as much as the old one as possible so here was an example where I took this old block here. I added some new data to it and I made a new block with a copy and now by tying my new inodes in with the old ones. I have by following the new version you can see that we've got a new version of the file with this is updated, but the old version still there. So if I have a really bad crash in the middle of writing the new version, I can still recover the old version, and I can pull various tricks to decide how much of the old version to keep around or how many old versions to keep around. And this is much more resilient to random failures. Okay, and, and there's several file systems that are like that. So it would be potentially the question here is this more expensive in space or time. It certainly is more expensive in space if you want to think of it that way but what you're getting back is extreme resilience to crashes and failures, and the ability if you decide that this that you wrote something incorrectly you can go back to a previous version. So this has some pretty nice benefits you get from the space overhead because you notice that we're. leading old data right away. And it's can be a little bit more expensive in time if you have to worry about how these things are laid out maybe it doesn't have as fast of a read performance is something like the fast file system might be. So, what about more general reliability solutions well if we wanted to go back to the fast file system let's say because we were worried about performance. And we wanted to make sure that the file system operating system couldn't crash in a way that leaves things vulnerable. Then what might we do and one of the things we talked about was very carefully picking the order you write the blocks and then you write the inodes and then you put the inodes in a directory and so on and you do this in an order, such that if it fails at any point. You can kind of throw out the things that weren't quite finally committed and go through a pass on the file system find everything that's disconnected and you're good to go. And that is that requires very careful thought. So more general idea here is to use a transaction which you've probably heard about if you've taken any of the database classes. But the idea here is that when you go to update a file you're going to use transactions to provide atomic updates to the file system such that there's a single commit point in which the new data is. The file system is ready to go and until you reach that commit point any of the things that you do to the file system can be undone. Now if you think back to this copy on writes example, as I'm writing everything here and producing my new version. The old version is fine so if anything gets screwed up including just throwing out the new version, the old version still there. And if the only thing I need is to swap the old version for the new version with a with a single operation. That's a single point of commit for the new file system. Okay, and so that's kind of like a transaction. The transactional ideas are a little bit more general. Okay, and so we're going to use transactions to give us clean commits to the integrity of the file system. And then of course we're going to use redundancy to protect the bits. The bits can be protected with read Solomon codes and erase or other error correcting codes or raids, etc. Okay. Now, just to remind you a little bit about what we mean about transactions. It's closely related to critical sections that we talked about early in the term, they extend the concept of atomic updates from memory, which is where they came up originally in the early part of the term to stable storage. And we're going to atomically update multiple persistent data structures with a single transaction and as a result, we'll never get in a situation where the file systems partially updated, and therefore corrupted. So there's lots of ad hoc approaches to this transactional like thing I just talked to you through the copy on right in the fast file system. The originally would order sequences and updates in a way so that if you crashed, you could do a process that scanned the whole file system called fsck to recover from that those errors. But again, that's very ad hoc. This idea of a general transaction is like this you start with consistent state number one in the file system and you want to get to consistent state number two. Maybe consistent state number one is the original file system and number two is what you get when you add some new files and directories and data, and the transaction is a atomic way to get from the first state to the second one. So we know underlying those that single atomic view change here there's going to be a whole bunch of underlying a whole bunch of underlying changes to individual blocks question in the chat here is what I mean by ad hoc. What I mean by ad hoc is that a person sits down and they very carefully think through. Well, if I update this and then I update that and then I update that and the final thing I do is this. I know that if it crashes anywhere along the way I'll be able to recover the original file system. So ad hoc here means that you come up with a solution that is maybe it works but you've had to go through a long process of thinking it through to make sure it works and it's possible that you got it wrong. Okay. So that's what I mean by ad hoc here. We want something a little more systematic. Okay, so. And we're going to use transaction for this. So atomic here, atomicness is really the process of making sure that either everything happens or nothing happens. Okay, and atomicness in the log will happen, even if the machine gets unplugged you want to make sure that we still have that atomic property. Probably if you unplug it and you've got this atomic property what's going to happen is your changes aren't going to happen. Okay, so let's let's walk through this a little bit more. So transactions are going to extend this idea from memory to persistent storage. And here's a typical structure of course you start the transaction, you do a bunch of updates. If anything fails along the way you roll back, if there are any conflicts you roll back. But then once you've committed the transaction then that mere act of the of the commit operation causes everything to be permanent now. Okay, and so we'll talk about how to do this in a moment. But this do a bunch of updates thing could be arbitrarily complicated it could be allocating new I knows it could be grabbing some new blocks it could be linking them it could be doing all sorts of stuff. And the point is that none of that is going to be permanently affecting the contents of file system until we commit and so that's what we're going to try to figure out how to do. Okay, that's the atomicness here is all of a sudden, it happens or it doesn't happen at all. Now, of course a classic example, you know, transfer $100 from Alice's account to Bob's account. You see there's a bunch of these different pieces right Alice's account gets debited 100 the branch account. That 100 goes to the other bank, and then Bob's account somehow gets a balance and so on. And so there are a series of operations in different parts of various people's databases. And if only some of them happen then the banking system becomes inconsistent. For instance, if it crashes the whole system crashes between debiting Alice's account and incrementing Bob's account then not only did Alice lose money. Well, she didn't get her $100 but Bob didn't get it either. And so that would be bad. Okay. And so this idea of beginning transaction ending committing transaction is one in which none of these things happen until the commit. Modern operating systems. The question is, did they expose the transactions to the user depends a little bit on which file system you've got. Certainly there are some notions of transactions that are available. Others are others are less available. Right now what we're going to talk about is mostly under the covers in a way that the user doesn't have access to. So the concept of a log to make all this work is the following if you look at all of these pieces I've got here that represent parts of a global transaction. I'm going to write them in a chunk of memory slash disk. That sort of think of this as a this is the log and think of this as a big chunk of disk, and all of these things are going to be in there and they might be interleaved with other transactions. So what we're going to do is view this log, serially starting from the left and going to the right. And we're going to start the transaction by putting a start transaction marker in the log, and then we can go ahead and do all of our stuff and everybody else can do their stuff. And it's only when we put a commit transaction at the end that now all of a sudden these actions atomically happen. Now, a couple of things that should be clear from this one is when I put start transaction that needs to get committed to the log kind of before anything happens. And then when I put my various actions in here before the final commit happens, it has to be the case that all of these other things are in the log. So it can't be the case that I do a commit it gets on disk but all of these other things are still in memory somewhere, because then the machine could crash and I see well start transaction commit transaction but I have no idea what I just committed, and that would be bad. Okay, so the log is is clearly going to be something that we're going to need to be pushing out to disk, and it's going to be have an ordering requirement that's very important in order to make this all work. So the other thing that I'll point out here is notice that if I write these operations in the log dump dump a BCD dump. And then I say commit. This doesn't necessarily mean that I've actually put them into the file system or actually produce the actions yet. What it means is that if, if I were to crash and I hadn't done them yet I'd be able to wake up after the crash and go through the log and figure out what the state of the system is supposed to be. So the state of the system is not only what's on disk in the file system but also what's in the log and an ordered in a way that I can go back and reconstruct after a crash and I'm going to show you a couple of animations here just to give you a better idea how that works. All right. Okay, so the commit is like ceiling and envelope and saying the transaction now happen. Now. So now the question is, shouldn't things be logged after they happen. Well, in this type of log we're doing something in which it's called right ahead logging we're actually writing into the log before it's put into the file system. And the reason for that is so it's the opposite of the way you're thinking of this I think we want to write in the log first, rather than modifying the file system so that if the commit never comes because we crash then the file systems okay. If we were to start modifying the file system and then put the commits in the log. Now we're in a bad situation where we might have already corrupted the file system by doing a partial update. So this is, I'm glad you asked that question this is the opposite maybe of the way you were thinking so thanks for that clarification question. So here's a transactional file system. Example so we're going to get better reliability through the log a log changes are treated as transactions and a transactions committed once it's written to the log data is going to be forced to disk to get for reliability. The possibility of using non volatile Ram or flash or whatever to make this faster, because we can put things into non volatile Ram, maybe, maybe more quickly than we can write it to the disk so perhaps the envy Ram can serve as the head of our log. And, although the file system may not be updated right away the data is going to be in the log. Now, question here is does the log negate the performance benefits of a buffer cash. The answer is it depends. It depends on what you're logging, not everybody, not all versions of journaling file systems we'll talk about that in a moment right all the data to log first and then back to the file system. Okay. So let's just, let's go forward a few more here before I answer that last question in the, in the chat here and then maybe I'll answer it for you hold on one second. Okay. So by the way between a log structured in a journal file system is in a log structured file system. All the data is only in the log it doesn't even go to the file system whereas in journaled file system the log is really just helping us get reliability. Okay, and when do we start logging well as soon as we've started up the file system we start the logging. Okay. Now, maybe I will just give you a little bit of a preview here the question that's in the chat which I hadn't answered yet is, if not all actions have been completed. And you crash how do you figure out which haven't haven't been completed and let me let's just hold that question and see if this gets answered okay. So we're going to focus in the next several slides on something called a journaling file system, where we don't modify the data structures on the disk directly right away we write updates in as a transaction into the log, typically called a journal or an intention list. And then when we commit then we're going to have the potential to put them into the file system. Okay, once changes are in the log they can be safely applied to the file system, modifying I know pointers directory mappings etc. And the question that's in the chat here about well, do we have to have all of our operations be I depended to make this work and the answer is no and we'll see how this works in just a second. So, well, some of them need to be I depended but let's see if this answers your question. So garbage collection is going to be a possibility here so once we've actually applied things out of the log successfully into the file system then we can remove things from the log. So Linux essentially took the original fast file system, it called it ext2. And then they added a journal to it to get ext3. So ext3 is really just like a fast file system Linux style with the journal. Okay. And there are a bunch of options that Linux gives you about whether for instance to write all the data to the to the log first and into the file system. So writing, surprisingly enough, doesn't always hurt you from a performance standpoint, because the log remember is is sequential and so it's very fast. So a lot of other examples of journaling file systems NTFS Apple, HFS plus Linux XFS JFS CXT for this bunch of options here. Okay, so let's create a file. We're not journaling yet so think of this as like fast file system or ext3. So we can see where we're going with this so when you create a brand new file and write some data, there's a bunch of independent things that have to happen. So first thing is you got to find some free data blocks. So here's an example let's call this yellow things a single free data block. We're going to find ourselves a free I know dentry so on in the I know table. Find a an insertion point in the directory. So maybe there's some blocks on the directory we're going to change. Excuse me. All right, and then we're going to link things together so we're going to write the map which basically says Mark which blocks are in use. Okay, we're going to write the I know to entry to the blocks going to write the directory entry to point to the I know. All right, and when we're done. Now we've got a new pointer in a directory that's a mapping between a name and an I number that points to an I know which we've allocated which points to a disk block. And now so notice all these different individual pieces and the free space update have all happened just to create a file and write to it. And if we sort of partially do this and we crash then we're going to end up with dangling blocks. Like for instance if we didn't successfully write the directory entry, then we could have an I know an entry pointing to a data block and it's not in any directory and it's effectively lost. Okay, so let's see how we could add a log to this or journal. So if you notice here, we're going to put this log in some non volatile storage flash or on disk for instance is the simplest thing. It's going to have a head and a tail. So the head is the point at which we write the tails the point at which we read. Okay, and let's go through and see what happens when we write our new file so we're going to first find a free data block and notice that I found the block. But what I'm going to do is I'm actually going to find my free I know entry going to find my directory insertion point, but I'm not going to actually do anything. Instead, what I'm going to do is I'm going to write a start transaction in the log. I'm going to write the free space map. I'm going to write the I know entry pointing at the, you know, which where it's supposed to go so I'm going to excuse me, write an I know entry here without actually writing the disk. And then I'm going to write a directory entry without actually writing it on disk. Okay, and notice that all of these things are reversible because if I crash at any point up till now I haven't actually modified anything in the file system so the file system is going to look exactly like it did before I started this process. Okay. Now when I hit commit poof, all of a sudden, it's committed. Now, think this through for a second notice, there's no changes to the disk. Okay, and yet the mere act of writing commit to the log now makes that file committed. And the reason is that the state of the file system is considered what's on disk, plus what's in the log. Okay, and so if I crash at any point after the commit gets written in there what I'm going to do is I'm going to scan the log and at that point I can apply the updates to the file system and things are going to look okay. And I can keep crashing so these are a dependent this was a question earlier, because the, the log has basically been choosing blocks for us but we can keep overriding the same block over and over again with the same data and it's not going to matter. And so I can keep trying until I eventually get past the commit at which point the file system will actually be updated to reflect this change. So the mere act of writing the commit in the log means that that file has been written with it with its new data. Okay, so after commit, we can replay the transaction like I said supposedly don't crash. We can replay the transaction by just writing stuff on the disk, eventually copying everything there and once it's copied, then I can start moving the tail see how I'm applying stuff. And if I get past the commit at that point then I can throw out everything that's in the log. Okay, now here's a good question in the, in the chat here. So what about reads, do they have to scan the log for changes that haven't been flushed yet. No, this is where the block cache comes into play right. So the block cache has the most up to date state of the blocks. It has as reflected by the total state of the file system including what's on the log and what's on disk. And so the block cache. Since the block cache filters the reads and writes from the user. It basically makes everything fast, regardless of whether it's actually only in the log or if it's on disk. Okay. So the block cache is important aspect of making things faster. The question here you can't flush until you commit. That's correct. So this is again right ahead logging here. Well, right ahead logging says that you have to get the log values on the in the log before you hit commit. Once they're in the log, then you can flush things out to disk so yes you have to get them to the log first and then they can be flushed on to onto the file system on the disk. Well, if the cache is full and it's a really large right, then, then you have to make sure that you've committed first. Okay. So, there's a lot of potentially complicated questions about scheduling here and when you're allowed to schedule things etc. I don't want to go into it too much right now but what I will say is you can imagine that the file system knows when it's in when it might be in trouble by allowing too many rights before the log has been cleared. And all it has to do is put the clients to sleep until things have been properly flush and then it can wake the clients up. Okay, and so this is, you just have to keep track of what the current state is so that you'll always have this right ahead logging property. Okay. So once we've committed everything then we can just throw out the log and the tail has moved here. What's the size of the log that's changeable. So, depends on how much data you want to have. Now notice by the way that what we've got in this particular log actually didn't log the data necessarily. So we could write our data to the disk. And it's only the metadata that's logged that's one of the modes. One of the modes that basically the Linux file system has in another mode is one in which the data first goes into the log and then goes back out on disk. Okay. Now, if the system crashes after commit, but before we fully applied everything that's okay because we can just keep restarting because we don't remove things from the log until we've actually gotten past commit with the thing not crashing and everything pushed out to disk. And it which we do a single atomic move of the tail which throws out this particular entry. So we don't really need to know exactly which changes are ever been applied we just take wherever the tail was, you know, so the tail might be here, and we're trying to apply, and we keep crashing over and over again while we just we can restart. And it's back after we get past the commit that we can then throw that log entry out. Okay, this particular version of this the changes are I depended. There are other ways you can do things but we're going to leave it this way for now. Now, let's look at this situation here where we started that process and we crashed and notice this is what it looks like after crash so maybe we found our blocks and we started to write our updates, we didn't get a commit record in here but then all we do is we just detect at this point that we've crashed and all we have to do is we just throw everything out. It hasn't been committed yet and we're, we're good to go. All right, and all of this stuff can be thrown out I didn't quite have a good example here but you can basically throw out things that you haven't touched and transactions without commit records that are ignored from that point on. All right. The other thing is if we recover and we have complete transactions we scan the log we find complete start commit examples and at that point we can just redo as usual. And in the process where update our block cache and then once we got past that part of the boot then everything works as normal. All right. So, I've just given you the start but I hope it gave you the idea what's going on so why do we go to all this trouble. The answer is that updates become atomic even if we crash. Okay, so we either get all is either applied entirely or not at all. And so all of these physical operations and they're potentially many of them are single logical unit. Okay, we get an atomic update. Now you might ask isn't this expensive. Well it is expensive if we are in the mode where we're writing all the data twice, except the log is typically sequential on the disk and so the cost of writing to disk in the log is actually much faster than trying to write all of the different pieces throughout the file system. So that's actually faster than you might think. And there are some circumstances where this right to the log with your data and then put it in the file system can actually give you some boosted performance under some circumstances. Okay, especially when you've got a bunch of random rights then you can get them out on the disk quickly. So, modern file systems give you an option to do metadata updates only in the log. And this is where you're going to record the file system data structures like directory entries I know use and etc. And what happens in the worst case where you crash and but you haven't flushed your data is now you get a file with garbage in it but you don't lose a bunch of files. Okay, and so that's a trade off between reliability and performance it gives you sort of an option to do slightly less than full atomicity when it comes to the data itself. Okay. Now, a full buffer cache is could be an example. Yes, where your right call comes back and not everything's been written that's correct. Okay. Now, let's talk briefly about something that I wanted to remind everybody, but are there any more questions. Okay, and so by the way, EXT three is the is the version of the Linux EXT two file system fast file system it's got a journal in it. And all they did was they took that file system and they had a special file that serves as the journal so. So I wanted to remind everybody because we've had some people that I think you've forgotten a little bit about the collaboration policy for CS 162. So, you got to be careful. Okay, we do not want people importing parts of code from other people. Okay, so things that are okay here are, for instance, explaining a concept to somebody in another group could be okay but explain exactly how to do something if it's a concept that for instance I talk about in class. That's a perfectly okay thing to talk about. Okay. Discussing algorithms or strategies at a high level is probably okay. Okay, discussing debugging approaches like sort of using you know, how do you produce print f's that you can go through easily to find out what's going on or, you know, what is your overall structure for testing those kind of things are okay. Searching online for generic algorithms like hash tables that's okay. All right, things that are not okay are things that are likely to get caught by our catch by the code that we run to catch collaboration cases right so sharing code or test cases, explicitly with another group or write out copying or reading another group's code so you shouldn't be even looking at other people's code. Okay, or their test cases, copying or reading online code or test cases from prior years that's not okay. Okay, so if you're straying into specifics about a particular project or homework, you're probably in the red zone. Okay, helping somebody in another group to debug their code that's also not okay. We did have a good example in a past term, where somebody sat down with a group that was having trouble, and they were helping them debug. And, but they this person sat with them for so long that as they kept kind of incrementally changing their code the code ended up with a structure that looks so much like this, the helpers groups code that the two groups were flag for over collaboration and that that's a problem. Okay, so be very careful not to do that. Okay, because we want you to be all doing your own personal work on homeworks and exams of course and your own groups work in group work. Okay, so we compare all the project submissions against prior prior year submissions, online solutions, etc. And we will take actions against offenders that have sort of violated this code. Okay, and you can take a look on on the homepage we have discussion of this in more detail. So, and don't put a friend in a bad position by asking for help that they shouldn't give you. Okay, we've had in past terms we've had people that have pleaded with friends of theirs. Until the person just gave them some code to get them to leave them alone. And that ended up ending not well for both of the people so just try to do your own work. Okay. And I remind you this because we, we have caught what appear to be a number of collaboration cases. And we've only gone through some some of the things so try to try to not put yourself in a bad position. Okay. All right. Now, I think that we have stunned everybody into silence. Let's, let's talk about some real topics again here. So, I'm going to assume that everybody will be very careful. Okay, so let me take this idea of logging that we just had journaling and take it to its extreme. Okay, so one extreme is called the log structured file system, which is an actual research file system on the sprite operating system. So the log is, I have a paper up in the resources page you can take a look at it. And in this case, it's like what I just told you with the journal but there is no file system underneath so the log is the storage. Okay, so the log is one continuous sequence of blocks that wraps around the whole disk. I nodes get put into the log when they're changed. They get put into the log, etc, and everything's just in the log. Okay. So here's an example where we create two new files, dear one file one dear to file to write new data for the files. And here's the log and notice this is a sprite log structure file system. And notice what happens is that there were some blocks and stuff in the files. There were some parts of the file system in the log prior to this picture. But we're writing file one. And what we do is we write some data which goes into the log. And then we we change the I node for the directory. That also goes into the log. Okay, and then we write some data for the second file and for the directory, and all that stuff goes into the log. And ultimately, if the since the I nodes for dear one and dear two have changed, then we're going to put the updated I nodes for say the root file system also in the log. And when all is said and done, all of our data is in the log it's just in the in the order in which it was written. All right. And we never take it out of the log just stays in the log and then if we overwrite, say, part of file one what'll happen is we'll put the new overwritten data. And then we'll put a new I node which links to it and so on. And at some point the data is going to be kind of obsolete in parts of the log there'll be a bunch of holes. And at that point we're going to do some garbage collection, but up until that point, the log is the file system. Okay, and it's kind of like it. Yep, there's a little bit of that aspect. So here's an example of the Unix file system fast file system, where when we write data, we're actually writing the data on the block groups where it was intended to be close to the I nodes for that directory so here's an I node for the directory, we write some directory data here's an I node for the file we write some file data. It's in a specific spot on the disk it's been laid out in a way to try to make it fast. And if you notice the data here is laid out to be fast for reading, but the rights go all over whereas the data in the log structure file system is made to be very fast for rights, but reads will suffer. And so if you read that paper which is, which is a classic, what you'll see is a justification for this is right bandwidth is often at premium so you're going to make it the rights go really fast and you're going to rely on the block cache to be big enough to give you really fast read performance. Okay, and the other interesting aspect of this is as we've been talking about transactions is that this fact that things are structured as a log means that we can really easily undo things. If we've got a failure. Okay, and so part of what's in here is commit records. And so if we crash in the middle of writing then we just go back to an earlier part of the log and our file system is good to go without any changes so the log structured file system kind of has built into it. This idea of journaling because the log is the file system. Okay. Now. So the logs what's recorded on disk file system operations to figure out what's going on kind of logically replay the log to figure that out and put things in the block cache to make it fast. Okay, everything gets written in the log. Large important portions of the log is cashed in memory which is how we get things to be fast. And you do everything in bulk so the log is a collection of large segments on the disk that are completely sequential relative to each other to make things fast. And if you read the paper you'll see that rather than what I first told you where there's a single log that goes through the whole disk in fact there's a whole series of these big segments, and they garbage collect in segments. Alright. And the way you get free space back is you got to garbage collect all of the holes that are in the log after you've overwritten data. And so there's a garbage collection process to that we won't go into for now. All right, now the reason I brought this up is one thing I promised you a couple of weeks ago but never did was what about flash file systems. Okay, how are they different from the fast file system. And I wanted to remind you what flash is like so this is a CMOS transistor which you've probably seen in some of your early classes. And the idea here is that when this floating gate is high, then we end up with essentially turning a switch on so the data can flow through this switch and when this floating gate is low, then the switch is turned off so that without this extra gate, we, or say no floating gate just the control gate we end up with a transistor. The way that flash works is the way that flash works is we actually trap electrons in this floating gate, which has oxide and either side of it, and the result of trapping the electrons in there, give us enough of a difference that we can detect. And that's a way that we can store a one or a zero in here in distinguishing from the non charge trapped state. Okay, the thing that's funny about this is we can't write it. Once we've written that we cannot overwrite it until we erase it. So if you remember, I talked about this a couple of weeks ago you can never overwrite pages, what you need to do is you need to erase big blocks of bits, and then you keep them on a free list, and you get these 4k byte pages that you use off the free list to build your file system with and then eventually you garbage collect a big block and erase it again. Okay. And so this is a little different from say a disk. Okay. And another thing that's important here is that these, the way I write as I, as I alluded to is I trap electrons on this floating gate. Now the way that that happens is I raise this word line so high that the electrons go zoom in across the insulators and get on the floating gate. And if I go even higher I can encourage them to go away and clear the gate off. Well that's a pretty harsh process and eventually electrons get trapped in the in the insulator, and then this doesn't work as well and so the flash actually wears out. And so anybody making a file system out of this has to be careful not to erase and overwrite too many times. Okay. And, yes, we trap electrons to to store Reddit posts posts and cat videos. And as I mentioned, because we're trapping things in here this is a higher energy state. It's technically it's heavier. And so you can go look at where I talked about a few lectures go. And the fact that a Kindle is technically heavier once you've put books on it. Okay. Now, the part that one of the parts that makes this easier is what's called the flash translation layer, which basically says that, unlike a disk where we number all the processors and then the file system says I want, you know, sector 5496. What happens in a flash, or SSD is there's actually translation layer so when you ask for a particular number that goes through a translation layer and tells you which block on the flash is actually the current version of 5226. And as you go through overriding that from the operating system level, the underlying flash translation level will keep changing which physical block there is okay and so that underlying flash translation layer automatically takes care of where leveling and making sure we're not wearing out our bits. But the question might be is there something we could do with file system and make that work better. Okay, and there's firmware that run on SSDs and so on and so the question is, can we take advantage of this information to do something with it. And the answer is yes so the flash file system. The F2FS file system, which is actually used on mobile devices like pixel three from Google. It was originally from Samsung is actually a file system that's been adapted to use the properties of flash. It assumes that this SSD interface which looks like a disk for all practical purposes has underneath it. So the flash translation layer, the fact that random reads are very fast they're as fast as sequential reads, and that random writes are essentially bad for flash storage. And the reason is that if I write randomly, then I make it a little harder for for the underlying flash translation layer to erase big blocks, because to erase a big block where you have a bunch of random blocks written you actually have to copy the data out of the blocks. You can do some pristine ones and then you can erase and so that actually ends up wearing the flash out a little bit more if I do random writes. Okay, and so we're going to minimize rights or updates and try to keep right sequential. And so what they do is they actually start with a log structured file system with caught and a copy on right file system made out of it, keeping rights as sequential as possible. And you can see a no translation table to help us keep things sequential, and you can for more details you can actually check out paper in the reading session section as well. We'll call the F2FS a new file system flash storage. But just to show you a little bit, the log in the flash file system, which I'm showing you here is actually split into a whole bunch of segments, and those segments are ones that get written a lot versus ones that aren't written as frequently and so they actually lay out a bunch of different logs to try to manage how how busy the file system area is. There's a translation table inside the operating system in addition to the one that's on the SSD, and they try to classify blocks as being written frequently and not. Okay, and there's a checkpoint operation and so on I'm not going to go into great detail on this but I did want to mention some of these things so if you're curious you can take a look. For instance, here is an index structure of I nodes. And if you look at the log structured file system. What you see is that if I update a file file data, I write that in the log then I've got to write the direct pointer block over again into the log then I got to write the indirect pointer and then I got to write the I node and then I got to write the I know maps and so on I got to write a whole bunch of blocks. Just because I changed some data in the log structured file system and that's because I never update in place in the log structured file system I work my way through by writing all of the change things into the log. While this means that there's a lot more changes. And so one of the things that they do in this F2FS is they actually use a second translation table to translate so that the I node for instance at a higher level has a name for this block. And that block is in a translation table. Okay, and so they make some interesting modifications to the log structured file system. All right, and I'm not going to go into this in any more detail but I just wanted to give you some ideas of what you might go through to try to make things faster. Okay, and to take advantage of the fact that you can do random reads, but random writes are expensive and where the file system out. Okay. All right. Now, time to switch gears. Unless there were any additional questions on log structured file systems or transactions or what have you. Maybe I'll pause for a second while everybody's digesting the thoughts here. So in both the log structured file system and in the F2FS files are just in the log. There is no file system. There's no other file system underneath it. So log structured file systems are good for rights. Can anybody answer why the log structured file system might be good for rights. It's a good question. Right, the log is sequential. On a disk, it goes on the track rather than randomly writing all over. And so it doesn't matter what your rights are they all go right at one after another on a sequential set of tracks on the disk and so they're very fast. So the log structure of blocks can be erased and so it matches up with the underlying architecture of the system. So the log structured file system does lead potentially to fragmentation in the system. And that's where garbage collection comes into play. And so if you take a look at the papers you'll see that what really happens is the log as time goes on the old parts of the log have more and more holes in them because you've overwritten data that is in those places. And at some point you just take the data that's remaining you copy it to a new part of the log and then you reclaim everything that was in that old part of the log. So it's a type of garbage collection. All right. Good. Now. So switching gears. If you remember, I think the first day I kind of said what's what's cool about operating systems is they are part of this huge world class system. Everything from little tiny devices tied into local networks to cars to phones to refrigerators and computers and up in through big machine rooms in the cloud and so on. So switching gears, if you remember, I think the first day I kind of said what's what's cool about operating systems is they are part of this huge world class system. So it works to cars to phones to refrigerators and computers and up in through big machine rooms and the cloud and so on. All are part of one huge system. And the one I when I think about when I think about what I'm interacting with on a day to day basis. I like to think about how the things I do down at the small scale are actually utilizing resources spread throughout the globe. Okay, and it's amazing when you think about it. Sometimes when I think about the whole thing. It's astounding to me that it all works somehow and sometimes it doesn't entirely work but it mostly works. One interesting question that comes to mind is sort of how do you get all of these things that are spread geographically and in domains of fast local connection but really slow long distance connection etc how do you get them to all work together. And so for the last few lectures, we're getting down to the last like five or six lectures here. I'm going to talk a bit about how these systems and how they can all work together to do, for instance, distributed decision making, which is a topic we're going to start today. And so to start that topic let's bring back some, what it turns out to be very old terminology but I thought I would make sure we were all on the same page here. So a centralized system is one in which there's a central component a server of some sort that is performing all the major functions and you have a bunch of clients that are all talking to the server. And that's typically called a client server model. Okay, and many of the things that you deal with with your cell phone for instance where the cell phone is one of the clients and something in the cloud is a server. That's actually like a modern analog of this traditional client server situation here. The question that immediately comes to mind with a centralized server is well how do you scale this I mean what happens if you've got not three clients but 100,000 or a million clients clearly one server can't do it. Okay, and so, you know we know that in the cloud there are many servers but the question might be, how do you structure them to do something intelligent when you've got many components. Okay, now a completely different model is what I like to call the peer to peer model with in which every component in the peer to peer model is a peer of the other components so if you notice in this client server model. We really had the server was kind of King, and the clients were subjects or something like that, whereas the case of the peer to peer model. We, we have a whole bunch of peers that are all interacting with each other. And, you know, you might ask the client server case it's pretty obvious who's responsible for what you get in the peer to peer model it becomes unclear. Okay, but the peer to peer model is. Kind of a good starting point for if we want to try to make this server idea spread and handle a really high load so for instance maybe we can draw a box around a bunch of these guys working in peer to peer mode, and treat that as a server. Okay. So what's the motivation for distributing in that way rather than having a single client. And, you know, you could come up with lots of reasons right why do people do anything well here. Maybe it's cheaper and easier to build lots of little simple computers rather than a huge server in the middle, or maybe it's easier to add power incrementally so what I mean by that is, if I've got a good peer to peer model and I, and I need more power I just add some more computers to it. And if things work, then by adding a few more servers or whatever now I've got a more powerful system than I started with. Okay, and I can do that incrementally. Maybe users have complete control over some of their components so maybe that big peer to peer system I've got some that I own. And yeah I'm going to help everybody else a bit but I have full control over my hardware and I can bring it back when I want. And of course collaboration is an obvious goal here because maybe by putting together peer to peer model it's easier to collaborate. So the promise of these distributed systems is really that it's they're much more available because there's more components that are likely to be up. It's better durability. So maybe by copying my data to lots of different machines it's more likely it'll survive a crash. And maybe there's more security, because each piece is smaller and maybe easier to make secure. Okay, now you should be questioning some of these statements here for a moment. The reality is typically different. Okay, so this is Leslie lamport. He's done all sorts of really cool system stuff, and we'll talk a little bit about a couple of them in the next lecture and a half. But what he liked to talk about is the fact that the reality behind a lot of distributed systems and actually disappointing so the availability is worse, rather than better because it depends on every machine being up. He's got a very famous quote, which is a distributed system is one in which the failure of a computer you didn't know existed can render your own computer unusable. It could have worse reliability because you lose data if any machine crashes. It could have worse security of course because anyone in the world can break into one component and if they're all tied together they've broken into everything. So distributed systems have a high promise but you got to be really careful how you use them. Coordination becomes very difficult so you got to coordinate multiple copies of shared state information, and what would be easy in a centralized system because everybody's going through one central computer becomes a lot more difficult when you've got things distributed. There's trust security privacy denial service these are all words that you've heard a lot of, but many new variants of these problems arise as soon as we start distributing so can you trust other machines of a distributed application enough to perform a protocol correctly. I think there's a corollary of lamports quote that I like to think of which is a distributed system is one where you can't do work because some computer you didn't even know existed is successfully coordinating an attack on your system. All right, that's the standard DDoS. So what are some goals of this kind of system so you'd like transparency, which is the ability of the system to mask its complexity remember earlier I said well the way we go from a server system to something that can handle lots of clients 100,000 or 100,000 is we put a bunch of things together but we draw a box around them, and we make it transparently behave the same way as a single computer would. Okay, so we don't have to know about the complexity. So what are some transparencies we might come up with well one is location transparency where you don't have to know where resources are located. So anybody who's dealt with the cloud has understood what location transparency is like, perhaps migration so that resources can move around, maybe for better performance or better durability or what have you without us having to know that they've been moving. Maybe replication. Well, perhaps I pay to make sure my data doesn't go away. And so underneath the covers a system transparently increases the number of copies or maybe it does erasure coding. Not transparently in a way that I don't need to know about but makes my data much more durable. Maybe I don't have to know how many users are out there so one of the things that has worked pretty well about the cloud is everybody's kind of interacting point to point between their phone and something out there. I'm not having to know how many other people are acting with something out there. Okay, and so that level of concurrency works pretty well if you're just working one to one on something. Now if you're actually collaborating on something, then that gets a little more tricky. And so concurrency is problematic under some circumstances. So the system may speed up large jobs by splitting them into small pieces transparently without telling you fault tolerance. Okay, that's kind of like what I said about replication maybe the system is going to hide the fact that things are going wrong and do so in a way that you still make forward progress. So transparency and collaboration require some way for different processors to communicate with one another. And of course that's going to lead to the need for networks and so on and we're going to talk about networks and more detail in a lecture or two but for now, I want to talk about this idea of decision making being spread across a bunch of nodes because that's kind of the beginnings of how we do this particular thing. The question about is it a goal for us to not be able to tell where resources are located. I would say yes and no, I think it's better to think of it as I don't want to know, have to know where the low where the resources are unless I care. Right, I'd like the system to transparently adapt them as long as it's within the boundaries of my policies and my goals for privacy and what have you I'd like the system to deal with that without me having to deal with it. And if I care, then another goal would be able to selectively break the transparency to meet some goal for why I wanted to care but then the rest of the transparencies are still there. So it's really the desire to not have to know. And a really important transparency by the way, is what happens when a machine crashes that's storing some of your data, you don't want to have to somehow go log into your application and change an IP address to point to a different server just because some people should like that process to be transparent. Okay, so think of these goals as things that I would like to be transparent unless I care. Okay. Perhaps you think of it as opacity but I think it's really transparency it's masking complexity behind. Okay, so I don't have to know. How to empty these communicate well some sort of protocol so clearly there's going to be communication through a network of some sort of messages and a protocol is really an agreement on how to communicate including things like syntax how does a communication are structured and specified and semantics about what a communication means. So actions taken where transmitting receiving when a timer expires etc. Okay. So I'm noticing on the chat here so masking equals transparency. So that is a funny, a funny use of terminology perhaps but you'd like things to be invisible to you happening under the covers that's where the word transparent sort of you see the functionality without having to know what's happening underneath and so that's that's often called a transparency. I realize it seems it seems a little, little strange but that is a use of that terminology. For instance, protocols are often described by a state machine on either side so here's an example where I've got two state machines, and part of what the protocol is doing is it's tracking the states on both sides so that both sides have the same notion of the state of the world. And the protocol is responsible for making sure that that state is maintained, so that if both sides poses as separate sides of the world and the state machines are being transparently replicated or used as transparent, then, then I can act on the current state of the system here, say at Berkeley in Beijing, and I, and I have confidence that I'm working on the same information as the other side. And so, usually there's some stable storage that's part of the state replication. You could even think of a simple example might be that these are two versions of the same file system there's a transparent protocol, and the states represent the same file system and it's keeping things in sync. Okay, so that's another example of a good protocol. Okay, and so, you know, we want, among other things stability in the face of failure so even when parts of the system are failing or the storage falls apart in one place but it's there's still storage in other places. There's a state machine replication to continue to work properly. There may be that endpoints are selectively failing. But if I were to vote, let's say, among the states of all the different participants so suppose I've got three participants and one of them fails, a voting process could maybe be employed to figure out what the real state of the system actually is and we'll talk about some of this in a moment. So some of the protocols in human interaction. I mean, I thought I'd put this down just for the heck of it. You know you got you got a phone. You pick up the phone call somebody you listen for the dial tone. Okay, so maybe you don't do that on a cell phone but see you have service, you dial the number, you hear ringing, and the colleague says hello. Hi, it's John or hi, it's me. That's my favorite kind of goofy introduction it's like, well what's that about who's who's me. But then you kind of say, Hey, do you think blah, blah, blah, blah, and they say, yeah, blah, blah, blah, blah, and you say goodbye, and they say goodbye, and you hang up. This is probably a conversation that you had late at night sometimes including the blah, blah, blah, blah. I know I've had a few of them myself, but really you're thinking about a protocol because there's a protocol which goes from ringing to answering at the other side to a responding. The answer comes back and now you know that that connection has been set up, or the caller says something the callee responds with a response, and then there's some process for hanging up. And so this protocol of synchronizing the states between the person that made the call and the other person is is a human interactive version of what we would like to do in our protocols. And the problem is, you know, there's many pieces of hardware this has been our standard issue throughout this whole term where we talked about the fact that hardware is vastly different. You know, at the IO level and so how do we deal with that. And so if you look here when we're talking about communicating, we have a bunch of applications at one level. We have a bunch of ways that things are communicating, you know, maybe on coaxial cable or fiber optics or whatever. And the question is the many different applications have to communicate over a bunch of different media. And there are many different styles and what do you do. Well, you don't want to make a point to point application where Skype talks one way through coaxial and another through fiber optic and another through wireless and so on. Because you're going to very rapidly get n squared blow up in complexity. Right. So for instance we added some new application like HTTP. It shouldn't be the case that we have to write a new communication module for every type of thing that we're going to communicate. Okay, communicate to and similarly, if we come up with a new way to communicate like a packet radio or something, we don't want to have to do an n squared communication between every application and every new communication media. You know, this, this looks silly when you think about it but clearly there's a level of abstraction, kind of like our device drivers that needs to be employed here and if you've taken, you know, if you take a networking you certainly know what that's about. Right, so how does the internet avoid this while we put a layering in here. Okay, we put intermediate layering and a set of abstractions provide providing network functionality and technologies. And so as a result, new application that we add on here like HTTP really has to figure out how to communicate with this intermediate layer which is often called the narrow waste of the internet protocol. Like an hourglass. And, you know, if I put it some new communication technology, I basically have to figure out how to match the intermediate layers communication technology. And I've just made my problem much simpler because of abstraction. Okay, and of course, this is the typical hourglass that everybody sees when they take an IP class networking class where IP is the protocol of choice at the narrow layers it wasn't always that it's became become that way. And now all of the layers above have to just send IP packets and all the layers below have to communicate IP between different sites. And if we do that then we basically have the internet. Okay, and it's astonishing how well this has worked to basically connect a whole bunch of devices and computers and storage and everything. Simply by standardizing IP in the middle here. So what are the implications of this hourglass. So there's a single internet layer module that's the IP protocol allows arbitrary networks to operate any technology that supports IP can exchange packets. It allows applications to function on all networks so applications that can run on IP can use any network. It supports simultaneous innovations above and below. So, you know, you can do all sorts of stuff to the above the application layer. You can do all sorts of stuff below the physical layers you can have many different physical layers, but changing IP itself has turned out to be very challenging so there's a funny story about IPv6 which has been the, you know the next IP protocol for the last 20 years. Only in the last I would say five years has it really taken a hold and started to become a reasonable protocol. It's been very hard to swap out IPv4 which is a traditional one with IPv6 because it had been so embedded in the world. So some drawbacks however of layering are the kind all of the drawbacks that you could imagine, especially now that you've been through 162. So, you know, layer and may end up duplicating stuff the layer in minus one is doing, or layers need a bunch of the same information so you end up communicating a bunch of information up and down the layers, and you got a bunch of memory copies and it's expensive. So layering may her performance well that you know that any API could potentially be made faster by flattening the API out. But then again you know if you do this, the wrong way you end up with this n squared communicate your n squared pattern again and that's not a good idea right so there's this trade off between performance and an API's and and layering and it turns out that with IP that's been an extremely powerful trade off. Okay, now. The, what I'd like to talk about is the end end argument. There was a hugely influential paper which again is on the resources page by Salter reading Clark from 1984. So I realize that's ancient history now but it's one of these papers it still has some very important philosophy in it that I think I want to make sure everybody gets here so it's the someone call it the sacred text to the Internet. There's been endless debate, debate. Sorry, talking too long, endless disputes about what it actually means. Everybody sites it as supporting their position. You know, you could imagine that that's true of pretty much any good documents that lots of people read. They'll get into philosophical arguments about it. The message however is pretty simple which is that some types of network functionality can only be correctly implemented from end to end. And things like reliability, security, etc. are examples of such. Okay, and because of this the end hosts can basically satisfy the requirements without the network help. And therefore, and they must do it anyway and therefore you could imagine that the network didn't have to do that. Okay, so the way that this paper ends if you go to read this is basically that you don't have to go out of your way to implement stuff in the network, because you got to do it at the end points anyway. All right, and the simplest example here that they give which I think is very telling is the idea of you got two hosts, and host a has a file they want to send a host B. And of course you've got applications for the file transfer you've got the operating system you got networks, etc. All of these are parts of that. And you might ask yourself well how do I transmit well, you know the application reads it off the disk. It sends it to the operating system which then you know sends it out of a socket, which comes up the operating system the other side goes into the application which then writes it to the disk. And the question is how do you make that reliable. Well one option is you make everything reliable okay so you make it 100% reliable that you load the file off the disk and then 100% reliable that things get transferred from the application to the OS. So that transfer might not be so bad, but then you got to somehow make sure that when it goes across the network. Every link that's in the middle so if we're transmitting from Berkeley to Beijing. There's a whole bunch of other things there's transatlantic cables, you know there's a bunch of hops at different levels, and there's a lot of detail in this link that we're not talking about right now. We have to make sure that every link was 100% reliable. Okay, and that way, we can coordinate everything together and we get 100% transfer. Okay, except it never works that way right it's very hard to make something 100% reliable, and it's still possible that you missed something and one of the things that is is interesting about that paper is they relate a story from 1984, in which they were transmitting copies of the kernel source code from one host to another. And it was only going across a few buildings or whatever but there were a lot of hops in between, and they were carefully check something and catching every hop along the way to try to make sure that this was never screwed up. Except what they didn't realize what that was that in some of the routers along the way. Even though each of the links were carefully checked some and made to be reliable, the routers actually had a bug in them that would, I think it would transpose bits, every million bytes that it transmitted in memory because there was a bug in the source code of the router. And as a result, even though they checked some everything along the way, the data got slowly corrupted, and the, the kernel had been transferred back and forth across these links a couple of times, many times and as a result the data was slowly getting corrupted. Okay, we used to call that bit rot. And it was totally unexpected. And it things got so corrupted they had to pull things back off of tape in order to fix it. Okay, so this idea of making things reliable by fixing everything in the middle is not only very hard it might not be the right thing. So what's the other option is you take it from point a, and you transmitted as well as you can to point B and then you check at the end you say well did I did I get the file that it was expected and so I compute a hash or check some at one end I send it to the other. I check it out and either I've got the file or I don't. And if I don't then I can retransmit. Okay, and so what's good about this end to end approach is it actually makes up for all sorts of problems in the middle by catching bad transmission. Okay, now of course what's pointed out in the paper is if you've got a one kilobyte file versus a versus a, you know, gigabyte file. The problem is the more data you're transmitting so the gigabyte file is more likely to fail in the middle than the one kilobyte file. And so if you have a really large file and you wait until the very end before you check summit. You know, a lot of failures before you succeed and in fact it may take a very long time. And so that's why you want to break things into chunks and sort of individually check and it. But the point of this example is that if things have to be done at the end points, then maybe you don't need to do them as carefully in the middle, as you might otherwise. Okay. And then as a result, any reliability you might do in the middle is really for improving performance. Okay. Now. So the second option is basically saying well here's the checksum of what I got it goes back and as a result. You pull the file off the disk and this application the original one checks it and sees whether you're good to go. Okay. Solution one as I said was incomplete because of the memories corrupted. The receiver has to do the check anyway solution to is complete because you had to do it anyway. And so, is there any need to implement reliability at all at the lower layers. Okay, and the end to end argument by the way if you know anything about the history of the Internet is kind of what was used to justify the structure of the basic Internet as it is right now which is a datagram service we'll talk more about that in a lecture or two, where packets of small size are sent across and they either make it or they don't. But we don't worry about that because we're checking everything at the end to end. Okay, and so this paper and the end to end philosophy in general was kind of the reason the Internet's the way it is. So, it could be more efficient though to do something okay so as I mentioned, yes we could just send the data to the other side, and hope it gets there and retransmitted if it doesn't. But at some point that might be too expensive to keep retransmitting if I had a really bad link in the middle. And so there's a performance reason for improving things in the middle but there isn't a functionality to improve things in the middle. And so this discussion leads to a trade off about how much work do you want to do in the middle. Okay, so implementing complex functionality in the network doesn't reduce the host implementation complexity because you still got to do it, and it does increase the network complexity, probably gives you delay and overhead and every application even if they don't need it. So this is kind of arguing that maybe you don't need to do something in the middle if you have to do it at the end. Okay, but implementing things in the network can enhance performance in some cases like very lossy links. Now what's interesting is a conservative interpretation of the end to end argument, just like there's always conservative and liberal interpretations are pretty much anything could say well don't bother implementing it at all at the lower level, unless it can be completely implemented at that level and doesn't need to be in the end points. Or unless you actually relieve burden from the host don't bother. A modern interpretation or a moderate I like to think of moderate as well is basically think twice before implementing something in the network. If the host can do it correctly then implemented in the lower layers only if it's going to be a performance enhancement. This is a good justification and only do it if it doesn't impose burden that on apps that don't need it. Okay, and this is the interpretation that I always use and then I suggest in this class. And you might ask well is this still valid. And there are some instances where this particular modern interpretation is in fact, not even quite enough. And this is what about denial of service so if somebody is going to attack a communication stream from outside, there might actually be a pretty good argument for putting firewalls and check sums and everything on intermediate links to basically prevent the denial of service so in that instance, even though the end and communication still has to happen, you're enhancing the overall path in the middle by putting functionality in there, or privacy. Putting firewalls in the middle makes sense. Okay, or maybe there's things that have to be done in the network so certain routing protocols which pick paths from point A to point B have to be done in the network. They can't really be done too well and and all right. So, how do you actually program and distributed application so this is going to be our topic for next time. You need to synchronize multiple threads running on different machines. There's no shared memory there's no test and set so all of the stuff that we talked about earlier in the term really isn't quite available to you in this simple view of the world which is a bunch of messages I send from one thing and I receive on the other. So there's one abstraction over the network. It's already atomic so no receiver gets a portion of the message because typically we check some things and so if a bad message goes through we stop. And we go out and retransmit. So the interface is sort of like a mailbox where the sender directs a message at a receiver's mailbox as a temporary holding area at the destination. And we have the idea of a send of a message to the mailbox and a receive which is blocking often to wait for a message to show up. So what we're going to do next lecture is we're going to say can we take this basic idea and can we build something interesting on top of it that will allow us to build these distributed applications will allow us to do to synchronize state machines amongst multiple machines and ultimately lets us do pretty interesting distributed peer to peer style applications that'll be for next time. So in conclusion, I brought back this idea of the illities. Okay availability is how often is the resource available durability how often is a preserved against false reliability how often is the resource performing correctly. We talked about preserving the bits so I like to think of erasure codes or raid is preserving the bits copy on right. I think about preserving the integrity, not the bits so with my copy on right, I make a bunch of changes that are new by not overriding anything but rather sort of using pointers to the old data that's copy on right, and that allows us to basically preserve the integrity of the old data even while I'm changing it. We talked about how logs can improve reliability. We talked about journal file system such as the XT three and NTFS is similar. And in general we talked about transactions over a log as a general solution, and hopefully the examples that I gave their work out well. We talked to start talking about protocols between parties that will help us build distributed applications. We spent some time with the end to end argument which will hopefully inform us as we go forward. And next time we'll start talking about distributed decision making such as to phase commit didn't quite get there this time but we'll definitely do that next time. So I'm going to say goodbye to everybody. I'm sorry for going over I guess I've been doing that a lot this term my apologies, but I hope you have a good evening and we will see you on Wednesday.