 Welcome back everybody to CS162. We are on lecture 19 talking about file systems and hard to believe, but we're on the final few lectures of the class, I think we're ending potentially on lecture 26, so getting close there. If you remember from last time, we were talking about devices and among other things we talked about spinning storage and gave you some amazing stats about modern disk drives. I'll show you a couple of these in a moment, but basically the way to think about a disk drive is as a series of platters that are double-sided. So there's storage on both sides and there's a single head assembly typically with an actual read write arm on both sides for one for each platter and then head moves in and out as a group and given the current position of the head, if you let the platters spin, which is what they do, it traces out a path and on a single surface we call that a track and if we take all of the tracks that are traced out simultaneously by the head, we end up with a cylinder, all right? And we talked about that and the simple model for measuring how long it takes to get something off of the disk includes at least these three items, seek time, rotational latency and transfer time and the seek time is the time basically to move the head in or out and that's something of order four milliseconds these days. The rotational latency is the time for the resulting sector that holds your data to rotate under the head and then finally the transfer time is the time to actually pull a block of data off the disk. Now there's a good question here about is there only ever one head? Now, just to be clear, usually the head is the thing touching the surface so there's a head assembly and usually there's only one of them and the reason for that, even though it seems like it would make sense to be able to independently read the different platters is that disks are a commodity item and that would be way too expensive and the head is one of the most expensive parts of the assembly. So a complete model of how long it takes to pull something on the disk to write something to the disk is that a request spend some time in a queue we'll say a lot more about this today and then it goes through the controller and then once it's in the controller then it gets fed out to the actual physical disk at which point we have the seek plus rotational plus transfer time and remember by the way the rotational latency probabilistically we say it's half of rotation because on average it takes a half a rotation to get the data underneath the head. Any other questions here? We showed you a picture or two of the inside of a disk last time as well. So if you missed that lecture you can go back and take a look. Here were some typical numbers. So I pulled out commodity Seagate three and a half inch disks are now up to 18 terabytes, nine platters more than a terabit per square inch on each surface. So that's pretty amazing. We have perpendicular recording domains and so the magnetization that represents a one or zero actually goes into the surface. Typically there's helium inside there to help reduce the friction of the disk spinning around. The seek time is typically in a four to six millisecond range although a good operating system with good locality will get this down to a third of that time on average. This particular time that's specced out is the average time to go from any track to any other track. The rotational latency for laptop or desktop disks is in the 3,600 to 7,200 RPM which is somewhere between six milliseconds per rotation or eight milliseconds per rotation for the faster one. Server disks can get to be 15,000 RPM and so the latency is less. Controller time depends on the controller hardware transfer time typically 50 to 250 megabytes per second. Notice the capital B and it depends on a lot of things like what size are you transferring. So sectors which are the minimum chunk of data that can go on and off the disk can be 512 bytes or up to four kilobytes on modern disks. Rotational speed of course we just said can vary from 3,600 to 15,000 RPM, the density of bits per track, the diameter and also where you are. So if you're on the outside, the disk surface is going by the heads faster than on the inside and so you can read the bits quicker on the outside. Okay, so pretty amazing. The other thing that we had talked about was we were starting to talk about the overall performance for an IO path and that performance really goes from the user through the queue, through the controller, through the IO device and there can be many metrics that you might worry about like response time which is the time from when you submit a request to when you get the response back throughput which could be how many of these requests per unit time can you get through the system. Things that contribute to latency are the software paths which are green here which can be loosely modeled by queues throughout the operating system. Are the, are hard to characterize in general so we're gonna have to come up with sort of a probabilistic way of thinking about those. The controller and the device itself, that behavior is a little more easily characterized and depends on the actual device itself but the queuing adds some really interesting behavior here. So there's this non-linear curve that starts out with a fairly low change in response time with respect to throughput and then as you get higher and closer to the 100% mark which is really the point at which your utilization is the maximum the disk can handle this response time kind of goes through the roof and we'll see a little bit about where that comes from in this lecture. Okay, so now to pick up where we left off last time unless anybody had some other device related questions we talked a lot about SSDs as well as spinning storage last time. So let's start talking a little bit about performance of a device in general and we're gonna call this a server. So for instance here this yellow IO device would be a server or the combination of controller and IO device would be a server in this particular view of the world. And so if we assume that we have some amount of time call it L that represents a complete service of something then we could have several of these one after another and assuming that the device takes time L to service a request and we put them right after each other so there's really no spacing between submitting the next request after the first one's done. We could think of this as a deterministic server where the deterministic part is that it's always of time L and the maximum number of service requests per unit time is just one over L, okay? Because that's kind of the best we could do if we put them end to end as tightly as possible. And just to give you some numbers for instance if L is 10 milliseconds then the bandwidth of number of L's we can handle is about a hundred operations per second that's just one over 10 milliseconds. On the other hand if L is two years then the bandwidth might be 2.5 ops per year, et cetera. Now, this idea applies to a processor or disk drive, a person, a TA, what have you. It applies to getting burgers at McDonald's. Each one of these is the amount of time it takes to get a burger and you can compute the maximum number of burgers that can be pulled out of McDonald's for instance, okay? We'll get back to McDonald's in a moment. So we could take that L which is a total operation and we could divide it into a series of parts like say three equal parts. And then we could imagine that those three equal parts are actually handled by three different stages of some device or some pipeline, what have you. So this should sound a little bit like 61C. And so in that instance, here is our L which is spread over these three things. But since we're pipelining now, notice what happens. We have the blue part, the gray part and the green part. And so after you finish the blue part of the first request and it's onto the gray part of the first request then we can get the blue part of the second request and so on, okay? And that's gonna overall allow us to do more things per unit time. So it's gonna up our throughput, okay? And so for instance, if you have a pipeline server like this with K stages and the total task length is L then we actually end up with time L over K per stage and the rate is K over L. So again, we had an L equal 10 milliseconds but now if we can divide it into say four pieces then the bandwidth might be 400 ops per second or if L is two years and K is two then our bandwidth would be one op per year, okay? And so this is just noticing the fact that when we pipeline we can get more items per unit time shoved down that pipeline. And of course, all of the things that we talked about in 61C in that if these are not all equally if all of these pieces aren't equally the same size then you're gonna get bottlenecked by the small one, okay? And so that's gonna be a problem. And so let's actually excuse me a bottleneck by the large one the one that takes the most time, okay? Now, example system pipelines are everywhere. So in 61C you basically talked about the processor pipeline. Here, you can imagine that for instance you have a bunch of user processes, they make a sys call they put things into the file system, queues them up that's a pipeline doing file operations that then leads to disk operations which then lead to disk motion, okay? Or in communication, typically you've got a whole bunch of queues throughout the network and those queues all work for each other and you have a lot of routers and the routers are all working in parallel. And so ideally if you're communicating say between Berkeley and Beijing you have a nice clean path with a lot of packets in the pipeline from point A to point B and they're all moving their way along, okay? And we'll talk about that level of pipelining when we talk more about networking in a week or so. So anything with queues between an operational process behaves roughly pipeline like and so that analysis we were talking about applies. Now the important difference here is that initiations are decoupled from processing. So that means that the reason I put a queue here in the first place is so that the thing producing the requests is decoupled from the thing servicing the requests. And this is extremely important in general because request production is often very bursty, okay? And this is certainly true with file system calls it's certainly true with the network it's certainly true with a number of other things. And so really we're gonna wanna be putting these queues in here to observe those bursts and that synchronous and deterministic model that I roughly gave you here is not reality, okay? The reality is that we're gonna have burstiness and so a lot of things are gonna arrive quickly not at a regular rate, okay? So another thing we can do which we haven't talked about is we can increase our parallelism not by pipelining but rather by putting a bunch of servers in. So that has a similar effect. So in the case of these requests taking time L and not being able to be split up if we put say three or four and different servers K different servers, excuse me here then we can get K times the number of things operating simultaneously. And so notice we get exactly the same numbers here latency is 10 milliseconds K is four we have four different servers then we could get 400 ops per second, et cetera, okay? So there are, there is the option to up your bandwidth by adding more servers or up your bandwidth by pipelining those two things are kind of duals of each other and depends on circumstances as to which ones are good. Now, so parallelism clearly comes into play for instance here when we have lots of individual disk drives it'd be great if certain things can be done in parallel. And in a lecture or so actually a couple of lectures from now we're gonna talk about things like putting a log in to give us better performance to give us better durability when things crash and it'd be great if we could have a separate disk drive to handle the log independent of the file system that'll give us higher performance. Clearly there's a huge amount of parallelism in the network and in the cloud. And so when you submit a bunch of people submit queries they go throughout the network they go to different parts of the cloud and therefore there's a huge amount of parallelism as well and that leads to all sorts of interesting behavior and we'll talk about network systems in some detail in the last few lectures. So let's put together a little bit of a simple performance model. So here we have a hose, okay. And we have the latency L which is the time per operation. So how long does it take to flow all the way through the system that's L. So the latency is from the point where a little particle of water comes at the top until it goes through several times and comes out the bottom that's L. Bandwidth is sort of how many ops per second come into that hose or out of this pipe and that would be operations per second for instance or gallons per minute, et cetera. And if B is two gallons per second and L is three seconds then how much water is in this system in the actual hose? Can anybody figure that out? Yep, six gallons, right? Why? Because two times three is six and you're over the time that you've got those three seconds you keep dumping water in and so over that three seconds you get two times three seconds worth of water in the hose, okay. And so that's a pretty simple analogy hopefully everybody's got and that turns out it's gonna be something called Little's Law which is gonna be helpful for us to be able to get, okay. So here we're talking about kind of water which is dividable into as many little pieces as you like. We could also talk about chunks of work. So here's a case where each one of these little circles represents some fixed amount of work and so L is the time for us to get through the whole system now and if the bandwidth is two operations per second that are coming into this system and L is three seconds, once again we'll have six operations, one, two, three, four, five, six in the system at any given time, okay, same idea. But now we're looking at things that are quantized rather than continuous flow like water, okay. So none of this is rocket science so far, okay. So this is not intended to be complicated but it's just intended to give you a way to think about some of these flow ways of looking at things, okay. Now Little's law is a way to define that, okay. And so Little's law talks about a system which is this cloud, arrivals come in at a certain rate and now instead of bandwidth which is sort of maybe a more normal thing for you all to think about, we're gonna talk about lambda which is a rate of things arriving, okay. And so just think of this as a different symbol for B. There's a length of time you're in the system and there's the number of things that are in the system at any time. So things come in, they're in the system, they depart, okay. And in any stable system, stable meaning that n doesn't grow without bound and it doesn't shrink down to zero, on average the arrival rate and the departure rate are equal to each other, okay. So lambda is arrivals per unit time, departures are departures per unit time, on average the same number of things come in as go out so that this is a stable system. And when we talk about this probabilistically what we're saying is on average n is stable, it's neither growing or departing. And so we're not limiting ourselves to deterministic systems where n is always exactly the same amount but on average it's stable, okay. And so Little's law basically says the number of things in a system is equal to the bandwidth times the latency or n is equal to lambda times L, okay. And this is universally applicable no matter what the probability distributions of lambda are you can use this and no matter what the distributions of L's are so maybe not everything takes L time to go through the system then you can multiply it out and figure out how many jobs there are. And sometimes I go through a full proof of this probabilistically I decided not to do that tonight but if you look at my slides from last year you can see or last term you can see that proof. Now the way to think about this is A, you could look at the hose analogy that I just showed you, right. The other is I like to think of this as the McDonald's law, okay. And so imagine that what happens is a huge bus of people shows up at a McDonald's they all get out and they form a line, okay. And so the bus causes a certain rate of people to come in, that's lambda. And there's a certain line that goes in the door and to the front counter, okay. And if you hit the door, if you come to the door and you look and you see so many people are in front of you and you wait in line, you wait in line and on average the same number of people are coming after you. If you looked from the door in and then you got to the counter and you turned around and you looked back there ought to be the same number of people there because it's a stable system. And so the way to think of that is you take the speed at which they're coming through the door times how long you waited and that tells you how many people ought to be in the line, all right. And so that's the McDonald's Big Mac equation here, Little's law, all right. Questions, okay. So the thing about this law is you can apply this to any number of things. You can draw a box around it called something a system. It could be the cues, it could be the processing stages it could be whatever you choose to draw your box around or your cloud around. Arrivals times average latency, average arrival time, speed times latency gives you the average number of jobs through the system. L is the time it takes from when an arrival comes to the system to when it departs. Okay, so again in the McDonald's analogy you come to the door, you look and from the point at the door until you get to the counter, that's L, all right. And if you turn around and look back and is the number of people behind you and it's the same hopefully is the number of people that were in front of you when you got to the door, good. Now notice L has something to do with how happy we are or how annoyed we are, right. If L is really long and it took us a really long time to get our hamburger, we might be annoyed. If L is short, we might be happy. And so L is that service time or that we're interested in. How long did it actually take for us from the point at which we submitted our request to when we got our hamburger or we got our disc satisfied, that's L, okay. And we're kind of interested in keeping L as short as possible, obviously, all right. Any other questions? Why should we expect the system to be stable? That's a good question. The reason we expect the system to be stable is because if it's not stable, the math is much messier. But in reality, so there is a queuing theory which we're gonna talk about which has to do with stable systems. And in a stable system, if you can come up with Lambda and departure and a service rate, which we'll talk about and you can then compute assuming that things are arriving at a rate Lambda, you can compute something about L, okay. If you're talking about what happens when the system first turns on and starts up or maybe the buses stop arriving at five at night and the system drains, those transient analyses are much more complicated and that's a different queuing theory class, okay. So that's complicated. So this has to do, yeah. So this is related to E120 system stability, okay. Bounded input leads to bounded output. But obviously the other thing that's of issue here is the type of queuing we're gonna talk about we're gonna not put a bound on the queues to start with because the math is a lot simpler, okay. So if you wanna have some really interesting discussion about queuing theory, there are several classes in on the EE side that can do it much more deeply. What I wanna do is give you enough to get back to the envelope calculations, all right. So, all right. Now, let's talk briefly for administration. Midterm two, we're still grading it. Seems like people thought it was long but maybe easier than midterm one, I hope so. We mostly had people complying with the screen sharing. If you didn't, we'll probably be getting back to you because that was definitely a requirement. But we're hoping, I think, to have the grading done by the end of the week, maybe sooner. I know that they're well on the way to being through the grading, so that'll be good. The other thing is, I didn't put this on the administ trivia but there is a survey out for midterm surveys. So please give us your thoughts on how the course is going. We're roughly a third of the way through, I mean, two thirds of the way through. So let us know and we'll see what we can do to help make the end of the class easy, as easy and pleasant as it was at the beginning of the class, all right. The other thing, of course, it's really important is tomorrow, vote if you have the chance, okay. That's one of the most important things you can do. If you're allowed, don't miss the opportunity. I know it sounds silly, but people often say that if you don't vote, you don't get a chance to complain about how bad things are. I would say that's true. And this, my comment here has nothing to do with what you vote for or who you vote for. That's totally up to you, but it's important that if you have the option to exercise your chance to vote. So tomorrow is it and then we get to see, I'm not sure what's gonna happen tomorrow. I'm a little worried about it. Hopefully things will go smoothly, we'll find out. And yes, take care of your mental health as the results come in, all right. Share the results with somebody else, all right. I know that people are talking about actually having vote watching parties this time so they're not by themselves when the results come in. I know that's gonna be in my household. Okay, I don't really have any other administration for folks tonight unless there were any questions. Our last midterm is coming up in the beginning of December. So we have a tiny bit of breathing room and project two is almost done. Okay, all righty. Yeah, I got the correction. All right, moving forward. So let's talk about a simple performance model. So again, we have request rate lambda coming in now. We're going with the queuing theory terminology. We have a queuing delay, which is how long things are in the queue. And then the operation time T, which is the time to get something satisfied. And then we can consider the queuing delay plus the operation time as L. That's one of our options. There are many other ways to draw L and really what we've done here is we've put the cloud around both the queue and the server in this case. And so this spinning wheel could be an example of the disc, for instance, okay. And the maximum service rate, which is how many items we can get through here per unit time is a property basically of the system as a whole, which is the bottleneck. And so one of the things that we may need to look at to figure out what's the maximum rate we can serve things is what is the bottleneck, okay. And then once we know what the maximum rate that we could come up with, which by the way, if you have a bottleneck that slows things down the view max is gonna be lower than it would be otherwise, right. So bottlenecks tend to lower your maximum rate. We can talk about a utilization row, which is lambda over U max. So if you think about this, this is really just saying if I have a maximum utilization rate and I have lambda coming in, row is a number that varies from zero to one, which says sort of what total fraction of my maximum service can I handle? Or am I trying to handle right now? Okay. So if lambda is bigger than mu, then I got a problem, okay. So this utilization here is a number that has to be less than one. So this is the correct ordering for the question in the chat, okay. Now, if you think about it, why is that? So lambda might be something like one hamburger per second. Mu might be a maximum of two hamburgers per second. That would be the utilization is half of the hamburger production possibilities there, all right. Good. Now, what happens if row is bigger than one? Yeah, requests start piling up, right? So in fact, row bigger than one in a steady state environment is really an unbounded and undefined situation, okay. So what we're dealing with in this analysis that we're talking about here is the utilization is never allowed to be greater than one. In fact, the queuing theory equations that we're gonna look at in a little bit have this behavior that they blow up when row gets to one. So as row gets closer and closer to one, the queue is gonna get bigger and bigger. The latency is gonna get bigger and bigger, okay. Everybody with me on that? Good. Now, how does service rate vary with the request rate? So if you look here, you max, mu max is basically the maximum number of items per unit time that I can handle. But if I asked for less, I'm not gonna handle as many, right? So let's assume for a moment that again, mu max is two hamburgers per second and I only asked for one hamburger per second. I'll look up on this graph and what I'll see is, oh, I'm only asking for one hamburger a second. So the actual service rate that I go for is gonna be one hamburger per second, okay. Cause I'm not making use of all my capacity. Of course, as I get up to two hamburgers per second, that's the maximum that I can get out of the system. What happens if I ask for three hamburgers per second? Well, that's in the point at which things are starting to build up and I'm certainly not gonna get any more than two hamburgers a second, okay. So this break point here represents a very crude model of what happens when you ask for more than you can get. And in reality, if you were to actually look at what the service rate is, it's gonna be some smooth function of this to the point that we're probably never gonna quite get the full maximum because of various overheads in the system. And we could try requesting much more than mu max, but we're gonna just build up our cues and we're not gonna get any more out of the system, okay. Everybody with me? Now, so a couple of related questions might be, so here we have our queuing delay and our service rate, for instance, what determines mu max and what about internal cues? So when I said queuing delay here D, I sort of implied it was one cue, but there might be lots of cues in the system. And so one of the things we need to figure out mu max is we need to do a bottleneck analysis. And so if we take a look at a pipeline situation that we were talking about earlier, remember we had each request requires a blue, a gray, and a green, what that could look like in our overall system is there's a blue server, a gray server, and a green server, they each have cues and they feed into each other. Okay, so this is our pipeline and it's possible, if we look at this, if they're all of equal time, so these are all equal weight, then we could come up with a service rate that represents one over what one of these little chunks are, which is let's say all over three or something. Now, unfortunately, it may be that each of these stages aren't equally balanced. And so somebody has the slower mu max, okay? And they're gonna end up limiting the rate. So if you have mu max, for instance, the third one, which is green is the slow one, then what's gonna happen is the cues behind it and everything else behind it are gonna build up. And so you could view this really as a full system with one cue representing everything behind it and a service rate of mu max number three. And that's the system we're gonna analyze, okay? And so that's the bottleneck analysis where you figure out what the bottleneck is. Now, if the gray one were the bottleneck, what's gonna happen is things are gonna come out of here slower than they can be handled. And so these cues, this cue isn't gonna build up, cues behind it will, okay? And so the bottleneck analysis, you have to figure out what the bottleneck is and use that to figure out what mu max is, all right? And so really, once we found the bottleneck, we can think of this in this other simpler way, okay? So each stage has its own cue and maximum service rate. Once we've decided the green one is the slow one, the bottleneck stage basically dictates the maximum service max and we'll look at this as a single cue with a server that has mu max number three. All right, questions? Now, so for instance, let's look at something that you, we talked about earlier in the term. Here we have a bunch of threads. Suppose there are P of them, okay? And they're all trying to grab a lock and that lock has some service time, which maybe requires going into the kernel and doing something coming back out. And so what happens is the locking ends up serializing us on the locking mechanism, okay? So there's a question here, let me back up here for a second. So I didn't say in this example that these are necessarily greater than lambda. All I said is that mu max three is slower than mu max two and mu max one. Hopefully that's, hopefully that was clear. So we're basically, we're coming up with the service side of this situation, not the request side. The request side is still lambda, okay? Now, if it turns out that lambda is greater than mu max three, then we're in trouble, okay? So that's maybe that's why you were thinking that, all right? So back to this example. So this is kind of an Amdahl's law thing, right? So we got all this parallelism, but the serial part is causing us trouble. The other way to look at this is basically that we have X seconds in the critical section. And so we have P threads times X seconds. The rate is one over X ops per second. Doesn't matter how many cores we've got. So this could be a 52 core multi-core processor. Doesn't matter because all of these threads are drawn to a halt while they're trying to grab this lock. And so that's why it's an Amdahl's law kind of thing, but my rate is one over X ops per second, okay? So this is certainly an example we can think about here. Mu max is one over X in this case, okay? And the threads get queued up there. And if we have more threads coming in than at a rate faster than one over X, then we know that the queue is gonna build up without bound and we're never gonna make it. Okay, so that analysis is one that's hopefully familiar from earlier in the term. But we're gonna move this on, we're gonna talk about devices as well. So the other question we've been looking at here is so mu max is the service rate of the bottleneck stage. And so we can think of, as I said, that we really only have a single mu max server and a queue and that basically is a good model for a bunch of queues but by modeling only the bottleneck stage, okay? So the tank here represents a queue of the bottleneck stage including queues of all the previous stages. In case of back pressure, basically what happens is when queues build up, they've sort of back up to the previous and the previous and the previous. And if you were to take all of those queues behind the bottleneck queue, that's kind of what this tank is representing, okay? That's the big queue. Now it's useful to apply this model all sorts of things. We can apply it to the bottleneck stage, we can apply it to the entire system up to and including the bottleneck stage or the entire system. There's many different ways of drawing boxes and saying, well, what's the queue in this scenario? What's the bottleneck stage? Okay, so why do the, so the queues behind the bottleneck stage are gonna back up because the bottleneck stage, well, it depends, okay? The question is, let me restate the question here. So why do the queues behind the bottleneck stage queue back up too? The answer is they do that only if the queues are finite in size and so behind the bottleneck stage when that queue fills up, it's gonna prevent anything further from coming out with any of the previous servers which are then gonna back up and so on, okay? So that would be true if each queue had a maximum capacity, which in reality, they usually do. And so let's talk about latency for a second. So the total latency is queuing time plus service time. So for the, this is again, the McDonald's analogy, right? Here's the front door, okay? Which you go through the queue, you get to the check out counter, you get your hamburger, however long that takes the process and you exit, that's the total latency or service time, okay? And the service time depends on all sorts of the underlying operations. So if we're processing, and this is a CPU stage, it could depend on how much computation's involved. If it's an IO stage, it could depend on the characteristics of the hardware, like if it's a disk, it could depend on the seek time plus rotational latency plus the bandwidth coming off the disk, okay? So there are many different types of servers we could worry about here. They all roughly equivalent to this model. And so what about this queuing time? So we still haven't figured out how long are things in the queue? Now, if we were to ignore the previous discussion about queues backing up and instead allow this queue to be arbitrarily large, then it's kind of an interesting question of how big is the queue on average? How many items are in the queue? And that's something where we need to pull in some queuing theory. Now the queuing theory I'm gonna give you in this class is gonna be something that you can just apply. It isn't gonna be, I'm not gonna really derive it, although there are some references that I'm gonna give you at the end which show the derivations. And they're pretty straightforward cause this is a simple queuing theory. But so let's take a look at our systems performance model we have now. So we have Lambda is items per unit time coming in. We have queuing delay, which is the time you sit in the queue. We have operational time, which is the time to actually do the operation or service time. And then we have the service rate u, mu, excuse me, and mu max is gonna be the one that we're really talking about cause that's the bottleneck. Okay. And again, utilization is row equals Lambda over mu max. And we've already said that row better not get to be bigger than one or we have some serious problems. And in fact, in the model you'll see in a bit, if row equals one, we also have sort of infinite latency. So that's really big. Okay. So when will the queue start to fill? Well, the queue is gonna start to fill when we're busy servicing something and something else comes in, right? So some questions about queuing. We could say, well, what happens when the request rate exceeds the maximum service rate? We already did that, that's queues gonna fill up. Short bursts can be absorbed by the queue if on average Lambda is less than mu, okay? And so we don't actually require that Lambda is always smaller than mu max. What we say is on average Lambda is less than on average mu max, okay? So mu max, actually we can start talking about a probabilistic service time, in fact, we will in a bit and a probabilistic entry time or entry speed. And those two things, entry rate, service rate can be probabilistic averages. And as long as the average Lambda is less than the average mu, then we're good, okay? And it's only if we have prolonged Lambda greater than mu that we have problems, okay? So let's talk about a simple deterministic world here. So a deterministic world, which unfortunately we don't live in these days is as follows, we have a queue for arrivals come into the queue and we have a T sub queue items perhaps in the total, excuse me, we have a total of T sub queue time in the queue and then we have the service time T sub S. And here's some numbers over the left you can see here. So let's suppose in the deterministic world, somebody comes in every T sub A, every T sub A without fail and with no probabilistic variation. So now we can say that Lambda, which is the rate that people are coming in is one over T sub A. And the service time T sub S mu is, well it's K over T sub S if there's K servers there, okay? And then finally the total queuing time L is equal to T sub queue plus T sub S. So if I wanna say what's my total time to get my hamburger, it's the time in the queue plus the time to be served. And that's how long I'm in the McDonald's, okay? Now, if we take a look here, what do we got? So if we have an item comes in every T sub A, okay, so this is what McDonald's looks like, like maybe two 30 or something, right? In the afternoon when nobody's coming in. So a new person comes in every T sub A, it takes you, you spend a very short time in the queue. In fact, you're probably it's the time to walk from the door to the counter. And then it takes some service time to get your hamburger. And notice that the important thing here is this service time, which tells me what my service rate is one over, my maximum service rate one over T sub S is shorter than T sub A. So we're making sure that the time it takes to get the hamburger, you're completely done by the time the next one's ready to go, okay? Otherwise you start building up the queue, okay? And so, and since we're pipelining, so the time sitting in the queue versus the service time, that's okay as long as T sub S over T sub A is less than one in this instance, okay? And this is totally deterministic, there's no probabilities here at all. And so in a deterministic world, we have row, which is the utilization basically goes from zero to one, okay? Which is lambda over mu, which is T S over T sub A. Okay, looking back here, notice T sub S over T sub A is gonna be our utilization, okay? And if we look here that if our utilization is from zero to one, our delivered throughput, which is the maximum we can get against one is goes from zero to one. So what do I mean by that? So our delivered throughput, our maximum throughput here is one item every T sub S, okay? And so if we shove a new item in every T sub S, we would end up with a delivered throughput of one and a utilization of one. And so at this point here, this is the point at which everything's coming in at the maximum rate it can without building up the queue, okay? And then we've got our saturation we saw you earlier and the point at which your utilization gets bigger than one, now you're building your queue up and basically people are out the door and down the street and then around the block, okay? And in this deterministic world, what happens is if you look at queuing delay as a function of time, if you basically build things up too large, then the amount of time it takes is, this should be actually utilization on this axis, sorry about that, you build things up, once you get past the large queuing delay, then you basically start growing without bound as to how long it takes to get your hamburger. Now, let's look at what happens with bursts, okay? So the nice thing about deterministic is it's very easy to understand, right? You can clearly see that once you get too many of these T sub S's coming in so that you're, they're coming in faster than this, than the rate, excuse me, you have too many items coming in so that they're coming in faster than the rate, then you've got a problem and you can no longer satisfy your, without building up your queue, okay? So if we look in a bursty world, we got a different problem, okay? So in the bursty world, notice the arrivals are coming in, servers is handling things, but now the time between arrivals is gonna be random, okay? And so there's gonna be a random variable and so people are gonna arrive in a second and then in three seconds and then in two seconds, there's gonna be variation of how long they wait and now things look a little different, okay? So look what happens here. Somebody arrives, they get through the queue and now the hamburger is being cooked up and they're waiting for the hamburger, but meanwhile, somebody else comes in and now they came in at this point, okay? And right after the blue one came in, so the blue one's being served, the white one came in and now the white one can't be served, why is that? Well, because the blue one's being served. So all of this time from when the white one came in to when the blue one is done, the white one's waiting and meanwhile, an orange one came in, now the orange one's waiting and a light blue one came in and now the light blue one's waiting and they're all waiting for the original person to get their hamburger, okay? And now once the original person gets their hamburger, now the white one gets their hamburger, okay? And that's gonna take in a time to make hamburger and then finally the orange one gets and then the blue one gets their hamburger and there might be some space here where nobody's coming in and then we might start over again. And notice in this scenario, the average number of customers per unit time could be exactly the same as the deterministic one except we have some burstiness where a bunch of them come in and then we have empty spots where nobody comes in. And if you notice what happens here, the blue person is very happy because they get their hamburger in their normal time but white is not so happy because white waits from the point they came in until a much longer period to get their hamburger because they're sitting in the queue. Orange is even worse, right? Orange comes in and they have to wait until here to get their hamburger and light blue has to wait until there to get their hamburger. So light blue is really waiting a long time, okay? And so just the addition of burstiness even if with the same average time, T sub A, okay? Average, we end up with a hugely increased waiting time. Okay, questions. Just randomness on the input. Okay, so everybody see, everybody see how it is that white here comes right away after the light, the blue one came in but now they're sitting in the queue all this time and then they get to be served and then they're done. And so white is basically waiting from the point they come in the door to here before they have their hamburger and blue just waited a short time. So the average waiting time is much longer than in the deterministic case, yes. Even though the average person per unit time even though lambda is the same in the two cases when there's burstiness, the arrivals there are the average waiting time goes up, okay? Yes, pretty strange, right? So randomness causes all sorts of weirdness. Now, of course, the other thing is we'll talk about average waiting time which is really, you know, blues time from the moment they came in to when they have their hamburger versus whites till they have their hamburger versus orange till they have their hamburger averaged over the whole system that's going to be a number that we're going to compute in a moment, all right? Now, so requests arrive in a burst so the queue actually is fills up whereas in this previous case, in the deterministic case with all of the parameters the same there's never anybody in the queue, right? Somebody comes in, their queue is really kind of a null queue because they have to walk to the counter they get their hamburger there's never anybody waiting ever, okay? So that's a case where the queue basically is not filling up at all whereas in the bursty case we actually fill up the queue here you can look at the queue basically has depth three at this point when the light blue one person has come in you now have white, orange and light blue sitting in line and now you only have orange and blue and now you have blue and now you have nobody, okay? Good, I don't want to belabor that point. So, same average arrival time but almost all the requests experience large queuing delays even though the average utilization is low so on average we're not necessarily using all of our hamburger time that we could but people coming in and burst means they end up waiting in line and if you think about this this is really your common experience coming in when everybody shows up at noon at a Pete's coffee you have that queuing problem, right? And that queuing problem is because of the burstiness of the arrival. Now, how do you model burstiness of arrival? So the time between arrivals is now a random variable and there is a lot of elegant math that we're not going to go into in great detail but one of my favorites is the thing called a memoryless distribution, okay? And so this is the, what is the probability that the time between now between the first guy that arrived and the next guy that arrived is a given value and it has an exponential curve that looks like this in fact the probability distribution is lambda e to the minus lambda x and that's what I've plotted here, okay? Lambda in this instance is the arrival rate, okay? And this shows you the probability distribution of how long it takes to between the first guy and the second guy and the question is why do they call this memoryless? Well, the reason they call this memoryless is if you remember your probability if I were to say, well, I've already been waiting for two units of time what's my conditional probability given I've already waited for two so I cut off the first two and I rescale everything and what you see is it's exactly the same curve. So the reason they call that memoryless is the amount of time that you've waited says absolutely nothing about how long you're gonna wait and that's just like buses in Berkeley, right? You've waited for an hour and that tells you nothing about how much more time you're gonna wait because it's a memoryless distribution, all right? And so the meaner into arrival time which is the amount of time between each arrival is one over Lambda, okay? There's lots of short arrival intervals, okay? And there's many, there's a lot of short ones and there's a few really long ones and the tail's really long, okay? So I understand the buses in SoCal are better or worse than in Berkeley what's the implication there? Worse, SoCal buses are dead. Well, all right, I guess in the memoryless model we're assuming that the bus will eventually come it may be days later, but at least it'll show up. So anyway, so this, here's what's cool about memoryless distributions. If you don't know anything about the probability distribution for arrivals but you know that there's a bunch of factors that all feed together to generate the random variable then you can often model it as a memoryless distribution without knowing anything else, okay? So for instance, if you have a bunch of, here's how we use it often you have a bunch of processes that are all making disk requests and they're all random about it but they're not correlated in any way and they all submit at random times and what have you but if you look overall at the rate which requests are submitted there's a rate there, so many requests per second then you could figure out what that request per second is and then model it as a memoryless distribution and it gets you somewhere it may or may not be perfect but at least it's a start, okay? And so people often use memoryless distributions to model input distributions when the only thing they have is the rate of arrival, okay? But notice the thing to realize is that lots of short burstiness for short events and then some really long ones, so long tails, okay? And so the simple performance model here the Q we have a lambda in the rate out and the Q basically grows at rate mu minus lambda if you think about it that makes sense because we have a rate in and a rate out and when the rate in is faster than the rate out and the Q is growing on average mu minus lambda. Now, let's very quickly remind you of some things and then we'll put up a Qing result. So one thing to remember is if we have a distribution of service times so think of this as the disk how long does it take to get something on the disk? We can talk about a couple of things so there's the average or mean, right? That's the sum of the various items at T times T and that's the mean, okay? Probability and that's the center point and you can think of this as exams, right? And then there's the variance or the standard deviation squared, okay? And that represents the amount of time or the how far off the distribution goes from the mean. So you could have a peak where everything's the mean and then the standard deviation be zero otherwise it tells you about the spread, okay? And those two items hopefully are very familiar with you from exams and everything, right? What's the average of the exam? What's the standard deviation? This thing about sigma squared standard deviation squared is called the variance. That's a little easier to compute. So usually you compute the standard, the sigma squared and then you take the square root to get the mean to get the standard deviation. And then the squared coefficient of variance is an interesting one which I'm sure you probably have never seen and that's where you take the variance divided by the mean squared and that's a unit list number, right? And the thing that's funny about C is you can learn a lot no matter how complicated this distribution is you can learn a lot about it based on C without knowing anything else, okay? Now let me pause here for a moment because I'm assuming this is mostly review for you guys but are there any questions on this simple thing, right? So if you look at what's on the x-axis as how long you're waiting for the bus, for instance each of these little slices underneath is a probability that you're gonna wait this amount of time this amount of time, this amount of time, this amount of time and there's a way to compute the mean which is here and the way to compute the standard deviation and that tells you both what's the average amount of time you wait and what's the spread, okay? And the key thing about memory list distributions is that their exponential shape, okay? Means that you don't learn anything after you know how long you've waited, okay? P of t is the probability that you wait time t. So if you were to look at this as a curve that everything sums to one then pick a t, like here's t this is that you waited two hours the area under that curve kind of is P of t or the height of that curve is P of t, does that help? So you take an integral in the continuous case and you'd use that these things that are shown as sums would be integrals in the continuous case, yes, exactly. Correct, now this memory list distribution actually turns out C in that case is one, okay? Because the variance and the square of the mean are equal to each other and C is one. So oftentimes when you see a C of one you actually end up with something that's behaving like a memory list distribution. Even all the other weird things that don't look like this curve and have a C of one will oftentimes behave in a queuing standpoint as if it were memory list which is kind of interesting. The past in when C is one the past says nothing about the future. When there's no variance which is deterministic C is zero, why is that? Well, that's because the variance is zero and therefore C is zero. There's another thing is if you have a C equal 1.5 for instance, typical disks have a C equal 1.5 for instance that's a situation where the variance is a little wider than memory list and so you end up with a slightly different distribution but that's typically what people see on disks. Okay, so to finish this off now if you think about queuing theory we've been leading up to this anyway but you can imagine a queuing system where you draw a box around a queue and a server. You have arrivals and departures. The arrivals on average equal the departures on average otherwise the system blows up. The arrivals have a probabilistic distribution the service times have a probabilistic distribution and what we're gonna do is we're gonna try to figure out how big the queue is on average. Okay, and so for instance with little law little's law applied to this we can say that if we know the amount of time I wait in the queue T sub queue times lambda which is the rate at which things come in that'll tell me how long the queue is. So if we could say compute one of these guys we could get the other one pretty easily. So perhaps we're interested in seeing whether we could compute T sub queue and then we can figure out the length of the queue later. Okay, just by using little's law. All right, so some results. So assumptions are first of all systems in equilibrium we talked about that earlier there's no limit to the queue time between successful arrivals is random and memory lists for instance, okay, on the input. So we're going to go back to our notion that memory lists up here represents a situation where you have a bunch of random things that are uncorrelated that all sum together and are coming in we'll call it memory lists with some lambda. And what we're gonna do is our queuing theory is gonna assume that, okay. And so the departures are gonna be an arbitrary distribution but the input's gonna be memory lists, okay. So if you look here, we have an arrival rate lambda which is a memory list distribution and a service rate which could be an arbitrary distribution. So like a disk drive and mu is gonna be one over the time to service, okay. And that's just T sub S is the average time to service the customer. C is gonna be the squared coefficient of variance on the server. And so in a typical problem you're gonna get a couple of these variables and you'll have to compute the other one. So oftentimes for instance you might have to figure out what C is and usually you have a very clear way to figure that out like this is a deterministic server time where it always takes exactly the same amount of time to service that would be C equals zero or this is a memory list service time then you know C equal one or it's something else and we tell you what C is, okay. So usually you'll be able to do that pretty easily. Notice that if you know that the average time to serve something you can take one over that to get mu or if you know mu you could take one over that to get the average service time. So these are related to each other and so typically getting three of these variables you can get the other two, okay. And so for instance a memory list service distribution is often called an MM1Q this is where not only is the input memory list but the output server is memory list as well. And so you would say that in that MM1Q or C equals one then the time in the queue is actually row over one minus row times the service time. So if your disk on average took a second and you know what row is like say row is a half, okay. Then you could say a half over one minus a half which is one says that the time in the queue is about 0.1 seconds or 0.5 seconds, right. So this is the very simple MM1Q distribution and amusingly enough if you have a general service time which is not memory list on the output you can just say one plus C over two. So the only difference in this is that C is now varying if you have something that's general and if you notice the difference between this first one and the second one if C equals one then one plus one over two is one. And so these two are, this one merges into that one when C equals one and you have a memory list input, okay. Now, yes, 126, there's some similarities here. Fortunately we're not gonna go any further than this, okay, are the dashes part of the equation? No, I'm sorry, this is a little confusing. This dash is the dash is part of the part of the PowerPoint here. I realize that's confusing, my apologies. In fact, you know what, I'll fix the slide when I put up the PDF so that it doesn't have the dashes there, because I agree that's bad. So here's some results here. If we know what the time in the Q is, which we can just compute based on this. If you know utilization, you know the service time you get the time in the Q from little's law we can get the length of the Q, all right. If we, we can compute row by saying it's lambda over mu max or lambda times T sub S. And so we can work this all out and find out for instance that the length of the Q is row squared over one minus row. I hope you've all seen this one over, this one minus row in the denominator. That means that as row gets closer to one what happens to this equation? Or all of these equations as the utilization goes to one what do we see? Infinity, that's right. So this is a curve that blows up, all right. Just like you've been seeing, okay. And so rather than the ideal system performance we saw the moment we have some randomness on the input we suddenly have, we don't have that green curve. Instead, we have the time in the Q is row over one minus row and we get this, okay. So the latency goes up to infinity as we get close to mu max in our input rate which is the same as getting close to row equal one, okay. And so this behavior is because of this these equations all have row or one minus row in them and row is lambda over mu max. So as lambda over mu max goes to one, we blow up, okay. And so this is a very funny side effect of randomness on the input because if we had determinism on the input we would get the green curve, okay. Look at the difference and obviously we wouldn't be going past one here either but we would have much less of a blow up, okay. So why does the latency blow up as we approach 100% because the Q builds up on each burst and it never drains out. And so you got a problem, okay. Very rarely do you get a chance to drain. And so pretty much I think of this curve here as a indicator of all sorts of things in engineering and life for that matter you never wanna get close to 100% utilization on anything because all of the things you're gonna encounter have this blow up behavior as you get close to 100% and that's because there's just randomness and pretty much everything and just that little bit of randomness causes this weird behavior and now you gotta worry about that 100%. And think about it, you got a bridge that's set at 100 tons. You don't wanna be running 99 tons over that bridge because you know the slight randomness on the input of that weight with some extra wind or whatever is gonna cause the bridge to collapse and you got a problem, okay. One thing that's interesting is this what we would call the half PowerPoint which is a load at which the system delivers half of its peak performance, okay. Because keep in mind that what we're seeing here is latency, all right. What is latency is the time for when I get into the front door of the McDonald's to when I have my hamburger. That's what I perceive as latency. However, what we do have and we do know is that when we look at this half PowerPoint where land is equal to mu max over two, that's the point at which the servers that are at the counter are basically handling half as many hamburgers per unit time as they could. It doesn't matter that I as a hamburger user see a really long latency. I'm getting a lot of hamburgers out the door if I'm the McDonald's. In fact, as I get closer to one, I'm actually happy here because as a McDonald's owner because I'm getting my maximum hamburger rate out the door. But from the standpoint of the overall system, this half PowerPoint is often a really good point to be because it's kind of that point just before things really go blow up on latency. And so it's the point at which things are, the system is operating pretty well. Once you get to the right of that, now you got problems and you got to start worrying about there being basically too much load in the system. Okay, and that's when you got to start thinking about this, okay. What do you do? And you can do lots of things. If I want to get Lambda over MuMax to be smaller, I could make MuMax bigger, right? How would I, what's the simplest way to make MuMax twice as big as it was before? In the case of hamburgers. Anybody think about that? Add a server, exactly. Double the restaurant. If you double the number of people cooking hamburgers, what you did was you pulled yourself back from the brink back to the half PowerPoint, okay. Order from another McDonald's. Yes, you can do that too. That's another server. So the point here is that we could go for more servers or we could try to reduce Lambda. Those are two ways of improving our current situation. Okay, and I wanted to close this a little bit. So first of all, I wanted to back up here and show you. So I can compute if, actually let's go back to this one. If I know C and I can compute row and I know T sub S then I can come up with T sub Q which is then gonna let me with Little's law figure out the length of the Q. So pretty much three items, row, C and T sub S or different combinations of these guys down below give me enough to come out with how long somebody waits in the Q which gives me enough to figure out what the number of items in the Q are. So the way to come away from today's lecture is once you've figured out how to identify these different pieces then you can plug them in and you can get a back of the envelope estimation of where you are in the curve. Okay, where are you in this curve here? Are you in the reasonable linear area here where a slight increase in utilization doesn't blow up the time or are you in the part here where a very slight increase in utilization suddenly gives you everybody a huge increase in average latency time? Okay, that's what you wanna get out of these equations. Okay, and so let's take a look here just for a moment to remind you of the deterministic case. Here's a case where something arrives, it's gonna get serviced, another one's gonna arrive, service, arrive, service, and I'm gonna have the arrival be deterministic with no bursts and the service time is deterministic and what I see as a result is I can compute the average arrival rate and the average service time, so the average arrival rate is one over the service time, the average service time is one over service time and lambda is exactly equal to mu in this deterministic situation, but it doesn't blow up, why there's no randomness on the input, right? Cause I can exactly service 100% if they're all point to point next to each other and things arrive at exactly the right rate. So you can imagine this never happens in reality. Instead, we have this where even though we have the same average arrival rate, we put some burstiness in which is we have a bunch of them show up and a bunch of other ones show up and we have these long tails of time and what happens is when we get our burst, we're gonna start servicing them as quickly as we can because they're in the queue and then we have this little long tail where nothing happens for a while and then we get another burst and so on and why do we get this response time as we get close to 100% it's because when we got burstiness we've got these little gaps and we never have a chance to make up for our missing time, okay? So that's why burstiness leads to this curve of growth. So let me give you a little example here. So suppose the user requests 10 8K disk IOs per second the request and service times are exponentially distributed so that means that C is equal to one so exponentially distributed memory lists those two are the same thing, right? Average service time at the disk is 20 milliseconds which I'm gonna say is controller plus seek plus rotational plus transfer added together on average will be 20 milliseconds and so we can now ask these questions like how utilized is the disk? Rho is equal to lambda times the service time, okay? So what's lambda here? Well, lambda is 10 requests per second the service time is 20 milliseconds, okay? And so I can compute lambda is 10 per second the service time is 20 milliseconds which is 0.02 seconds don't forget to keep your units together and so Rho, which is a server utilization just lambda teaser. So the utilization here is 0.2 so 0.2 is a low utilization so I know that I'm doing okay, all right? And so the time in the queue is just the service time oh by the way, I'll fix this this is Rho over one minus Rho sometimes people use U as utilization so Rho over one minus Rho is 20 times 0.2 over one minus 0.2 I compute that I get five milliseconds or 0.005 seconds. So the time I'm sitting in the queue is only five milliseconds the time service from the disk is 20 the total time from when I submit the request to when I'm done is 25 milliseconds, right? That's the sum, okay? And the average length of the queue here is only 0.05 so this queue is really not building up, right? It's got an average 0.05 things in it. If I make the request much faster I will very quickly get to where the queue completely dominates all of the time on this. All right, good. Questions before I put this and I'll fix this U over one minus U this is Rho over one minus Rho here, sorry. I switched my notation to be consistent with somebody else and I missed one. All right, good. So the average time, so never forget this, right? How long do I sit in McDonald's? It's my time in my queue plus the time being served. So in this case it's the 20 milliseconds being served and the five milliseconds in the queue gives me 25 total milliseconds. Okay, good. So you're now good to go on solving a queuing theory problem. Okay, and there's a bunch of good resources that we have up on the resources part so you can take a look at some readings and so on, okay? And there's some previous midterms with queuing theory questions as well but you should assume that maybe queuing theory is fair game for midterm three. Now, so now we can, how do we improve performance if our queue is going crazy? We can make everything faster, okay? Well, we get, we hire a bunch of really crazy hamburger fryers and we give them 10 times the heat on the grill and they have to flip really fast and maybe that's faster. Or we could put more of them, okay? Steroids, that's right. Hamburger flippers on steroids. We could have more parallelism, that's a more reasonable thing to do, right? We could optimize the bottleneck. Well, we could figure out what is the bottleneck in frying hamburgers? Maybe it's getting the hamburgers from the back. Who knows what it is but we could optimize that to make the overall service times better. And we could do other useful work while waiting. So that's kind of what we do with paging where we switch to another process to run it while we're waiting for the disk to complete our paging. Queues are in general good things because they absorb bursts and smooth the flow but anytime you have a queue you have the potential for a response time behavior that goes like this, okay? And so queues are both a blessing and a curse from that standpoint. And oftentimes what you do is you limit the maximum size of a queue so that the bursts are too much then what happens is you put back pressure and you slow down whoever's generating the requests by explicitly telling them they can't submit anymore because the queue's full. So that's a response to a queue being too full and a lot of systems do that as well, okay? And you could have finite queues for admission control and that's what I just said, okay? All right, questions. Now, when is disk performance the highest? It's the highest when there are big sequential reads, right? What's that mean? That means that I move the head and I rotate to get the starting point and then I just read a whole bunch of blocks off the disk, a whole bunch of sectors, okay? Or when there's so much work to do that you have many requests and what you do is you piggyback them together and you move the disk in a way that optimizes for all the set of requests that are out there rather than individual ones which may cause you to move around, okay? And when the disk is not busy, it's okay to be mostly idle, okay? So bursts are bad because they fill queues up but they are an opportunity because if we have a bunch of requests we may be able to reorder things and get better overall efficiency of our disks, okay? And so you can come up with many other ways of optimization here, maybe you waste space by replicating things and so that when you go to read it's faster. So when we talk about RAID, one of the things we get out of RAID is we have multiple copies of things which make it faster to read when we're under high load because we can choose to get our data off of any of many different disks at a time, okay? That gives us a way to do parallelism. We may have user level drivers to try to reduce queuing as represented by software in the kernel. Maybe we could reduce the IO delays by doing other useful work in the meantime. There are many ways of making things faster, okay? But I wanna close out this discussion. I was gonna talk a little bit about the FAT file system today but I think I'll save that for next time but I do wanna say a little bit about scheduling to make things faster, okay? That's useful from a disk standpoints. So suppose we recognize the fact that the head assembly is stuck together and so we have to move the head as a unit and so how do we optimize this thing? Because anytime we deal with mechanical movement like moving in the head or waiting for a rotation to happen, things slow down, okay? And so if we allow ourselves to queue up a bunch of requests, we could do one thing which is the obvious one which is we handle the first request. This is basically saying we go to track two, sector three, track two, sector one, track three, sector 10, track seven, sector two. So we could take them in the exact order in which they were queued and that would be okay, I guess except that we could very easily have to go all the way into the inside of the disk and then all the way to the outside and back to the end and so on because we have a set of requests that don't have any locality with them. The alternative is to try to optimize for our head movement, okay? And so this one example here is the SSTF or shortest seek time first option where you pick the request that's closest on the disk. And so if the disk head is here, I might go request one, then request two, then request three, then request four. And what I'm doing is I'm reordering my requests so that they're one, two, three, four so that the disk head is kind of spiraling its way out in a single movement, okay? And so this is, and although this is called SSTF today, you have to include all sorts of things like rotational delays in the calculation since it's not just about optimizing for seek, you also have to optimize for rotation. The pros of this is that you can minimize your head movement as long as you have a bunch of things queued up. The cons is you can lead to starvation because it's possible that if a bunch of things keep arriving on the queue and he forced the disk head to keep servicing things in the local area, maybe on the inner part of the disk, inner tracks, you may never get to the outer tracks. And so SSTF could optionally, even as you're limiting the disk movement, you're causing a lot of requests to get stuck and never service, okay? So that's the problem. And now this kind of goes back to our scheduling when we were talking about the CPU scheduling where we could end up with low priority tasks never essentially getting any CPU. What's a low priority read? Well, low priority read in this case is something that's far away in tracks relative to the continually arriving requests. Now, another thing we could do is, which is often called the elevator algorithm is we take the set of requests and rather than doing that movement on the fly by taking a look at the queue, we instead move in a single direction at a time. So we started a given track, we spiral our way out, then we spiral our way in and so on. And as we're doing that, we grab all of the requests that are relevant to our given direction and position, okay? And you can see why this is called the elevator algorithm because it's just like an elevator just rotate this side on its side and imagine elevator going up and down. That's exactly what happens. It sort of stops at each floor, services, people and so on. The analogy of which floor is of course which cylinder you're on. And we deal with that by sorting the input requests. Now, one of the things that we might worry about this is that this has a tendency to favor tracks that are in the middle because we're kind of going out and coming back in and there's a lot more time kind of spent on the inside. And so there is something called circular scan which is nominally a little better where we always service going in one direction and we have a very quick spin back to another place and go out, okay? So that's the circular scan or C scan questions. Now you might imagine asking who does this? Well, clearly the operating system could, right? The operating system could take a look at all the requests that it's waiting on and it could do a reordering of them so as to do either the elevator or the faster the C scan algorithm. And thereby optimize head movement. So remember that this is only useful when we have a full queue. If we have an empty queue, it doesn't matter because we're not overloading the resource. So we wait, you know, when there's a queue then we reorder it based on C scan. Now the issue that's of interest which can anybody tell me about modern disks and possibly optimizing like this? Is this something that the operating system wants to do? What could be a downside of the operating system doing this? Can anybody think? I think people are thinking too hard. Very good. We have some interesting comments in the chat. So first of all, the operating system has to know the head location. So that's an issue, certainly. And we'll talk more about this moving forward but in modern disks the controller takes in a series of requests and does all of this reordering itself. So in many cases the modern operating system and device driver doesn't even know exactly where the disk head is or how the logical block IDs actually map to physical blocks. So that's one issue. The second issue is that the modern controllers actually take a bunch of requests in and do the elevator algorithm themselves. And so the operating system trying to do that and by the operating system, I mean the device driver as well. The issue with that trying to be computed on the host is that the disk itself is already doing a lot of that stuff because they're much more intelligent than you might think today. So while in the old days this kind of disk scheduling was definitely done by the operating system device driver combination. Today, some of it it's still done but it's a bit redundant with what the disk can do. Okay, so I wanna finish up. Actually I think we'll pick this up next time. So in conclusion, we talked about disk performance a lot last time and we've brought it back by talking about queuing time plus controller, plus seek, plus rotational, plus transfer time. We talked about rotational latency, right? So that's the on average half of a rotation. The transfer time has to do with the spec of the disk as to how fast it is pulling things off the disk. Technically it also depends on whether you're reading from the outer track or the inner track because the transfer times are faster in the outer tracks but usually we give you an average transfer time. This queuing time was something that we didn't talk about initially but the devices have a very complex interaction and performance characteristics. We talked about queue plus overhead plus transfer and the question of sort of an effective bandwidth which varies based on the devices. We talked about that last time. This queue is really an interesting thing, right? So the file system, which we haven't quite gotten to is really gonna need to optimize performance and reliability relative to a bunch of these different parameters. And the other thing that we talked a lot about today is the fact that bursts and high utilization in introduced queuing delays. And so finally this queuing latency for MM1 which is memoryless input memoryless output one queue or memoryless input general output one queue are very simplest to analyze. And basically you can say that the time in the queue is the time to service times this one half one plus C factor times row over one minus row. And that goes to infinity as utilization goes to 100%. Okay, next time we'll talk a lot more about file systems. We didn't get to them today, but we'll pick up with the fat file system and then we'll move on to which is in use today and then we'll move on to some real file systems that are more interesting than the fat file system next time as well. So I'm gonna bid adieu to everybody. Please vote tomorrow, very important. Try not to be stressed about it. I think it'll all work out well in the grand scheme. Alrighty, you have a good night.