 Hey everybody, welcome back to 162. We are gonna continue with our discussion of IO. And to that end, one of the things we were talking a lot about was this idea of how a CPU, which is of course running the operating systems and programs talks to a device. And we said, well, there are various buses in the system. And we talked particularly about the notion of a device controller. And the device controller is a piece of hardware that connects directly to the device and also interfaces with the various buses. These controllers can be on PCI buses, PCI express buses, they can be USB, et cetera. And notice that in addition to receiving commands over the bus, it also has the ability to basically talk to the CPU and give events through interrupts. And that's where the interrupt controller connection is. And so basically all of the communication with the device is pretty much between the CPU and the device controller. And we specifically, as we were getting toward the end of the lecture last time, we're talking about a couple of different ways the CPU can communicate with a device. And one was via special instructions. Those special instructions being things like NB and OutB or NW and OutW for byte or word. And those special port IO instructions go to a special port address space, which is different from the regular address space. And mostly with the Intel processors, this is a backward compatibility thing from the original IBM PCs. But what I show here as an example are some ports where port 20, 21, 22, 23 hex, those all represent registers inside the device controller that can control the device. The other thing we talked about was memory mapped IO in which certain addresses within the device are actually mapped into physical address space. And as a result, by doing reads and writes, the CPU is able to affect changes on the device. And this was an example that I gave here where maybe we're controlling a screen and this would be local physical addresses. And those physical addresses like OX8002, 0, 0, 0, 0, OX8001, 0, 0, 0, 0, 0, et cetera represent points at which if the processor were to write to those addresses, obviously they have to be mapped in a page table and then the writes can go through, will actually cause things to happen. And for instance, writing to display memory here might actually cause dots to appear on the screen. Or writing to the graphics descriptor queue might allow us to assemble various items in that descriptor queue that are effectively triangles for some interesting three-dimensional game or whatever. And then if we write to command or register, we can say, okay, render that in three dimensions or maybe we can read from OX0007F000 to get the status of the device, right? And because these are in the physical address space, we can protect this with address translation and under most circumstances, perhaps you only give the kernel access to these addresses, but you could potentially give it to a process whose job it was to control that device just by an address mapping, all right? And one of the things we did talk about a bit last time as I recall was the fact that with modern buses such as PCI and USB and so on, there's an automatic negotiation that happens for the actual absolute values of the addresses just to make sure that the physical addresses of the devices don't overlap. All right, are there any questions on that before I move forward? So we call that memory mapped IO. So transferring data to and from the controller, as I mentioned, there's a couple, we can either use ports or memory mapping, but there's another access that we can consider. One is programmed IO. And programmed IO is where every byte gets transferred by the processor. So the processor goes in a tight loop and it reads a byte and then reads the next one, reads the next one, reads the next one, reads them one byte at a time or one word at a time and stores it in memory. And as you can imagine, that's expensive because the processor or core is doing that. So the pros of this is it's very simple, very easy to program. And there are some low bandwidth devices that actually interact that way. And we showed you the speaker last time getting a programming the speaker to get some interesting tones out of it. But if you really wanna do a lot of transferring of data, the other option is something called direct memory access, which is a situation where you tell the controller, go ahead and transfer data to or from DRAM and tell me when you're done. And then the controller can go ahead and do all of those transfers on its own. So in particular, here's an example where the CPU is gonna try to do some IO from one of these disks. So the first step, the CPU is gonna talk to the device driver in the kernel and say, oh, transfer this for me into some buffer in memory. And what'll happen is the device driver will go ahead and program under some circumstances a DMA controller and that DMA controller will then reach out to the controllers for the disk and that disk will individually transfer bytes back to the controller, the DMA controller. And then the DMA controller will write these through to memory. And then when all is said and done, the DMA controller will finally cause an interrupt of some sort and that that's the point at which the CPU comes over. So you could kind of think of a DMA controller as a part of the system that basically acts like a CPU for the action of actually transferring data. Now these DMA controllers can either be a separate item on the bus that acts like a processor and does the transfer or it could actually be integrated and a lot of controllers have this these days where the DMA controller aspect is integrated. For instance, in PCI and things that are on the PCI bus are able or PCI express are actually able to go ahead and transfer over the bus directly to memory, okay? And one other thing I'll point out is of course if you're writing directly to memory it's quite possible that you're going to be writing memory that is cash in the CPU. And so that's an issue where we have to be very careful because we don't want the CPU's version of the cache to get outdated relative to memory and there are at least two options there. I didn't put these on the slide, I probably should have. One is where the device driver basically flushes the block entirely out of the cache before it does the DMA. The second is there is DMA hardware in a lot of devices that can simultaneously write to DRAM while invalidating the cache. And that's another way to make sure this stays coherent. Okay, questions. So direct memory access is an important way to get really high bandwidth communication between devices and memory. And it leaves the processor out of the picture, okay? So how do we find out that we're done or that for instance the device is needs some service of some sort? So examples where the operating system needs to know is when the device has completed a DMA operation or if we ask the disk to do a write we need to know when that's done. The other is when maybe there was an error encountered. Okay, and so the simple thing to do is for the device to actually generate an interrupt. And we talked a lot about interrupt controllers kind of in the first several lectures of the class. So mostly we were talking about timer interrupts but the device interrupt is similar. It goes through the interrupt controller and causes a dispatch to an interrupt handler that would handle that particular interrupt. And if it's a disk for instance maybe the disk generates an interrupt when it's done transferring. And at that point the operating system wakes up in an interrupt handler and perhaps it finds a process that's busy waiting for the interrupt to happen or for the device transfer to happen and it wakes that process up and puts it back on the ready queue. So the pros of interrupts are it can handle really unpredictable events really well because you don't have any overhead until it's time for the interrupt. So that's great. The downside is that an interrupt of course is a transfer into the kernel and you have to save, you change the stack you have to save a bunch of stuff on the stack there's a bunch of other overhead there. And so interrupts can be relatively high overhead and if you have something that's generating lots of interrupts on a regular basis that could be expensive and perhaps an interrupt isn't the right thing at that point. The alternative is something called polling and the idea behind polling is periodically the operating system just looks at a register in the device maybe by using IO instructions like we talked about or by reading from memory mapped IO from some register in the device controller. And periodically checks this and when there's a bit set in some register it knows that the transfer is done and it can continue. So the idea behind interrupts versus polling these are duels of each other they're different ways of getting information out of a device. You can imagine the downside of polling of course is that if the device isn't ready you've just wasted time looking at the device. And so the pros of this is it's really low overhead because you don't actually have to save and restore a bunch of registers you're just checking a register out on a device. The con is you can waste cycles if a device is infrequently ready. And so actual devices actually combine both polling and interrupts. A great example of this is a really high bandwidth network that let's say a 10 gigabit or a 40 gigabit or even a hundred gigabit per second networks these days. If you had an interrupt on every packet you'd be in trouble. However, as soon as an interrupt occurs you enter the network driver portion and what it does is it pulls all of the packets out including the one that originally called the interrupt but all the remaining ones as well so that's a form of polling and then it re-enables interrupts when it's done. So it takes the first interrupt it pulls to pull all the packets out and then it continues. And this is how you can basically allow something as high bandwidth as say a hundred gigabit per second network to actually not overload a processor. Okay, great. So now let's take a look a little bit more. We've seen this picture earlier in the term but if you look at the typical kernel like Linux or whatever Pintas you'll see that there's the system call interface which is the dividing line between user code above and the kernel below. So the kernel is all in blue and there's a bunch of different facilities inside the kernel. We talked a lot about process management and memory management in previous parts of the term and we're actually going to be talking further about file systems. That's our next topic starting next week and then we have other devices networking we talked a bit about but we'll also talk more about that in the coming weeks. There's a question here. Does the device or the OS decide whether to do polling or interrupts? It turns out that's a good question. It turns out that the device typically provides both as an option and whether you're polling or interrupts is really doing interrupts is really a question of whether interrupts are enabled or not. And so the operating system can make that decision. They can decide to always disable interrupts and only poll or leave the interrupts enabled until the first one occurs. And of course the first thing that happens on an interrupt is it disables everything and the kernel could choose to keep it disabled for a while while it's polling, et cetera. So that's purely the act of the OS to make a decision about whether to do interrupts or polling. So if you look at this picture that we've got here there's kind of the yes, you can selectively disable interrupts as well. That's a good question. If you take a look at the interrupt controller, there's typically a mask that lets you decide which interrupts are enabled and which are disabled. If you look at this figure, you see that the top half of this figure is got a standard interface, which is that open, close, read, write interface where sort of everything looks like a file in Linux. But then there's a bunch of interesting things below the covers and we need to talk more about that as we go on and how that allows us to basically get the standardized interface above, okay? So our next topic is gonna talk about some IO devices and specifically we're gonna talk about ones that can serve as storage devices, okay? Now, if you remember the idea behind a device driver which is gonna be something in the lower portion of the kernel here is that the device driver basically has that device specific code in the kernel that interacts directly with the device hardware through the device controller, which we've talked about now. And it supports that standard internal interface up to the higher levels of the kernel. And that's important because it makes the higher levels of the kernel much simpler. And if you remember, we had this discussion early in the term, the device driver typically divided into a top and a bottom half. And the top half is accessed in the call path from system calls down to making a decision about whether the device itself needs to be acted upon. And so the top half implements things like open, close, read, write, the iOctyl system call, something called strategy, which is a routine that starts communication with the device itself. And really what makes it into the top half is typically a process that's trying to do some sort of iO will work its way into the top half to do the iO and potentially things get to sleep there if the device has to be invoked. The bottom half runs as an interrupt routine and it gets its interrupts from the device and makes a decision what to do next. Okay, and I showed you this figure that should look familiar now. So above the system call interface, which is the user program portion, we might make a decision to do a request. How would we do that? We might do a read or a write system call. Okay, and that goes across the boundary and at that point we might say, well, can we already satisfy the request? So what might be a situation where we could already satisfy the request without ever talking to the device? Anybody have any ideas there? Okay, cache, good. So that it's a specific type of cache. It's caching the device contents, okay? And that's what's called the block cache. We haven't actually talked about that one yet, but we'll get to it. And so yes, now if you remember this interface for reads and writes is a byte oriented interface, right? So you can read five bytes from a file, but the blocks as we're gonna talk about in a moment underneath from the, say the disk are all 4K bytes at a time. And so we need a place to put the blocks that we've only partially read and that'll be the block cache. And so it could be that we can already handle stuff from the cache. Otherwise, we're gonna send the request to the device driver. And the device driver is going to figure out what needs to be invoked and it's potentially gonna put the process to sleep on a weight queue associated with say the disk. And then it's gonna invoke the scheduler to wake up something that's already on the ready queue. And of course at that point it will have something else running while we're doing the IO, okay? And so that's the top half of the device driver and it's gonna send commands and invoke the strategy routine to send stuff to the device hardware at which point the hardware is just gonna do its thing, okay? So the controller of a disk drive for instance will start the heads moving and at some point the operation will complete and it will then generate a completion interrupt. The bottom half will receive the interrupt. It'll figure out who needed that data and it will wake it up, transfer it into the user's buffers and we complete. And at that point, we've gone the full gamut from the original request to the response, okay? So hopefully that's familiar to anybody. Do we have any questions? Okay, and by the way, that decision between polling versus interrupts can happen partially in this top half of the device driver. So this top half of the device driver could decide to disable interrupts and start polling the device and asking it for data in which case we wouldn't go to the bottom half. We would be kind of working between the top half and the device itself. The other thing is when that, if the device is giving an unsolicited interrupt because say it's a network card and there's a network packet coming in, then we would come into the bottom half and at that point, there might be a decision made to start polling, okay? And if you notice in the network case, what's interesting there is you don't have a process that's requested anything, instead you have an unsolicited packet coming in. And so the bottom half of the network device has to do a demultiplexing where it figures out kind of which socket a packet is headed for, okay? So that's a topic for another lecture. So is the device driver part of the device or the operating system? So the device driver is definitely part of the operating system. Devices, however, have specific requirements and so a device driver comes with a device but it's unique based on the operating system. So the device driver for Windows is gonna look a little bit different than the one from Linux or Apple iOS. And mostly that's because of its interface with the upper levels of the kernel. Much of the lower level logic is gonna be the same, but it's definitely part of the operating system, okay? And the bottom half is not the same as the device controller, okay? So this is all software I'm showing you on this screen. So I know this is a software class, but in this instance, you need to keep track of the hardware itself is actually the device controller plus say the disk and the top and bottom half is actually the software in the operating system that interacts with the hardware, all right? Great. So the goals of the IO subsystem are to provide a uniform interfaces despite wide-ranging different devices. So as we already talked about the fact that we can F open slash dev slash something, you guys should all look at slash dev sometimes. That things that are in the dev sub-directory actually are devices and you can go ahead and do this for loop, reading something directly out of say the keyboard by going to the right slash dev file. And that interface would work the same if you were talking to a keyboard, if you were talking to a network or if you were talking to other things. And so that's the standardized interface that we're looking for, okay? And it's that device driver, the fact that the device driver provides standardized interfaces facing up really allows us to do that. And we're gonna try to get a flavor of what's involved in actually controlling devices as we go through, but we can only scratch the surface here. So first of all, there are several different types of devices and they're loosely divided up into three categories here. So the first category of block devices like disk drives, tape drives, DVDs, and these are devices which present blocks of data to the operating system, okay? And that's because the underlying device itself is block-based. So if you look inside a disk drive, what you'll see is a bunch of platters, we'll talk about that in a moment, and each platter has a set of sectors which are combined together into blocks and you can't read a byte off of the disk, you have to read a chunk off the disk, okay? And so that's a block device. And character devices on the other hand are fundamentally byte-oriented and so you can get a byte out of a keyboard or a mouse or serial ports, et cetera, some of the USB devices. And so the block devices, yes, you've got open read, write, seek, but when you're fundamentally pulling from the raw device interfaces, you're gonna get a whole block at a time. Character devices, there are things like get and put, which lets you get single characters, okay? Now, raw interfaces are not the ones we're used to. We're really used to, for instance, on the block devices, you're typically going through a file system. The file system goes the additional mile of making sure that even though the devices have blocks, that you could read three bytes from a file. And that's gonna be something above the block device interface. And in fact, if I go back here for a second, let's just do that to my little green figure. If you notice on this in file systems, we've got the block devices down here. The file system, which we're gonna talk about as one of our next topics, takes these blocks, which are scattered all over the disk potentially and reassembles them into what I'll call bags of bytes that you can then read and write, which is what we think of as files and what we think of as living in a namespace for files, okay? And so that's gonna be the file system. These other devices, which are fundamentally serial, those things have a pretty direct interface up because they're already byte oriented like the interface that is provided above the system call interface. All right, now, so the last type is the network device. Now you might think that networks ought to be either block or character devices, but it turns out they're treated as a separate type of device, mostly because of the way they work, okay? And the way they work is they have sockets, which receive things off of networks. And then those sockets, there's a, like we mentioned earlier, there is a unsolicited packets come in and get resorted into sockets and so on. And so those interfaces are a little different from both block and character devices. And so these network devices, like ethernet and wireless and Bluetooth and you name your favorite communication protocol, basically are considered network devices and they're pretty much interacted with as FIFOs or pipes or streams of a bytes, okay? Or if you think of them in terms of mailboxes or packets, those packets are not of fixed size. Whereas with the block devices, those packets are always, you know, say 4K or something like that, okay? All right, so how does the user deal with timing down from the kernel, excuse me, from above the system call interface? Well, up till now you've pretty much been dealing with the blocking interface, which means that if I go to do a read, what happens is the read system call waits until the data's back, okay? And basically this process is put to sleep until the data's ready. And in the case of a write, this doesn't happen as often, but if there's not enough buffer space or whatever, it'll put the process to sleep until it can officially do a write, okay? So that is what you're used to, the blocking interface. And that's what I also talked about when we just talked about that diagram with a device driver. There are two other options here, which are actually available often by calling I octals with the write parameters on a file you've already opened. So one is a non-blocking interface and that's the don't wait interface. And what happens there is if you do a read or write and you say, I would like five bytes, it will look and it'll immediately return regardless of how many bytes are available. And it potentially will give you back zero if there's nothing available or maybe if you asked for five, it might only give you three. So this interface is intended to be used in a polling fashion where what you're gonna do is you're gonna keep asking until you get what you want, but you don't wanna block. You wanna be doing something else and then you come back and ask again if you didn't get everything you want. So that's the don't wait interface, okay? And oftentimes you can turn a blocking interface into a non-blocking interface with the write I octal calls, okay? Finally, there's the asynchronous interface which is a little different than non-blocking. Asynchronous says, tell me later. And so what you do there is you give it a buffer and you say, I would like 10 bytes and it will return immediately regardless of whether the data is there but then later via something like a signal, it'll say, hey, your data's ready and at that point you can look in the buffer. So notice how the top two here are very similar to what you're used to, okay? The bottom is very different and that you've given it a buffer and then later you go back and look in the buffer, okay? So these three things are the interface from the user to the kernel, okay? The interface between the kernel and the device is what's handled in the device driver and that's very asynchronous because it's all event driven and the notion of blocking and non-blocking, putting things to sleep is really a notion of the process level above at the user level, right? Did that answer that question? So now let's talk about storage devices because there are a topic now and we're gonna move into file systems afterwards. There's at least two types of storage device that you're gonna run into on a daily basis, magnetic disks and flash memory. If we were 20 years ago, I might say tape, okay? I have a randomly scattered tape in there to see if anybody would notice but tapes are much less used than they used to be. But the notion of a magnetic disk is really storage that very rarely becomes corrupted. It's very large capacity. It provides block level random access and I'll tell you about a shingled magnetic recording in a moment that is a little different than that. The performance is very slow if you try to randomly access it, but it's still possible and it's much better performances for sequential accesses, okay? And SMRs have very good storage density, yes, indeed. So flash memory is slightly different, okay? So in flash memory, this is becoming increasingly high density. Excuse me, it's still about five times disk cost but they're converging. Block level random access is very fast. Good performance for reads, a little worse for writes than typical flash and it's got some weirdnesses that you probably haven't thought about in terms of how to overwrite blocks, okay? And the most important thing for me, I would say from flash memory standpoint is a wear problem. So if you write flash too often, you can actually wear it out, all right? And it'll stop losing bits, okay? So let's look at hard disk drives. So hard disk drive is kind of fun to open up. If you were to open one up, you'll make sure you copy all your data first because you will not only void your warranty but you will void your data. But if you look on the inside, there's a set of platters and a set of heads, okay? And I show a picture of a rewrite head over here on the far right and those heads are pretty sophisticated, okay? And they move as a whole in and out to reach different parts of the platter and they move together. So you'll have a head on each side of the platter and then a head on each side of every platter and then they move together to get into different tracks which I'll show you in a moment, okay? And what's kind of fun is the IBM personal computer way back when had about a 30 megabyte hard disk for 500 bucks will show you some modern equivalents like an 18 terabyte drive which has a much more data on it, right? I always like to show this because this is fun when I was first starting as a faculty member, these were new drives that had just come out and this is a form factor for flash for cameras, okay? It's the larger form factor than you'd get today but inside of this little chip is actually a single spinning platter with the double-sided heads and it actually you could plug this into a camera and the camera wouldn't know the difference between this and a regular flash drive and it's actually a disk drive and at the time you could get four gigabytes out of this and you won't get anything close to that out of flash. So this was a huge increase in density for that form factor, pretty cool. Now they stopped being made probably in 2004 or whatever cause they maybe six, it got to the point where flash was far more dense and so this kind of lost its market, okay? Now, so let's look a little bit more about disks, okay? So a series of platters, they're all in a spindle. Spindle rotates as a whole and it rotates at a constant speed except for starting and stopping and the reason for that is there's a lot of momentum, angular momentum in this and so it takes a lot of work to spin it up and spin it down and so you can't make it faster or slower while you're using it. You usually only spin it up and leave it because spinning up and down takes a lot of energy and then you have the heads and the heads are at a particular part which is called a track and so if you take a full ring which is what happens if you leave the head alone and you spin the disk that the whole thing is called a track, all right? And everything underneath is a cylinder. So all of the whole rings that are together that's called a cylinder, any individual surface has a track and then these little chunks called sectors are the minimum transfer piece for a disk and so these sectors up until fairly recently were almost all 512 bytes and the operating system would combine a bunch of them together into something we call a block which would be 4K. Today, a lot of the really high density disks now have a sector size that's closer to 4K. Okay, so disk tracks can be a micron wide which is close to the wavelength of light. The resolution of the human eye is 50 microns so you can't even see all the tracks here and so you can get 100K or more tracks on a typical disk which is pretty impressive. And typically the tracks are separated by unused guard regions that make sure that while you're writing one track you're not messing up the data on an adjacent track. So the track length, interestingly enough, varies across. Well, that's just because we're talking about a circle here. So on the outside, the size of a track is larger than on the inside. So hopefully that's not too surprising to anybody. What is surprising is the following. If we were to use time to define our sectors, so you basically you write for a little while and you write your 500 bytes for some amount of time and that's your sector, can anybody tell me about the difference in size of a sector between the inner tracks and the outer tracks? Yeah, the outer sectors would be larger and if I have 512 bytes on an inner sector and I look at the outer sector, are the bytes or bits, let's say on the outer sectors would they be as close together as they are on the inner sectors? Okay, the answer would be more space. And it's actually not gonna be more space between bits but rather the bits are gonna be longer. So that was the way the original disks work but that wastes a lot of density on the outside because what defines the amount of storage you can put on a disk is how densely can I put the bits together in this magnetic media and still get them back when I'm done because obviously we want to have our disks not be right only. That would be kind of unfortunate. And so using modern digital signal processing, what happens is we can actually on the outside we write the bits faster than on the inside to keep the density constant, okay? And so the density of bits per square inch is basically the same across the whole disk head or across the whole disk surface and to do that, we write faster and therefore there are actually more sectors on the outside than on the inside and the bit rate is higher on the outside than the inside. So if we were really interested in high performing, the highest performing disk drive aspect for a given disk drive, we could write on the outside tracks instead of the inside tracks, all right? Now, today the disks are so big you can put so much on a disk that the time it takes to pull all the data off the disk is so long that you can't justify backing data up that way. It just takes too long. And so a few years ago, companies like Google started doing the following. They would keep archival data on part of the disk and active data on a different part. And that was just so that they could back up the active data and they wouldn't even use the whole disk for active data, okay? And that's just because it takes so long to pull all the data off. They're so big. Now, an interesting variant, I will say, is the way I've been describing this is every track is separate. So it's a set of concentric rings, okay? And single magnetic recording is a little different. And what we do there is we actually write over, every track writes over half of the previous track, okay? And the reason to do this is, A, you get the tracks closer together. And now you might say, but wait a minute, now I'm intermingling the track N and track N plus one. And the reason this can work is basically because a really good DSP can figure this out, okay? And figure out what the bits are. However, the downside is whereas with this, I can rewrite individual sectors anything I want. I could rewrite this sector and then go over and rewrite that sector and rewrite a sector somewhere else and not have to disturb anything else on the disk. With SMR, I get a lot of density, but I have to rewrite whole regions because if I wanna change anything in say the top track, I have to write it and then I have to write the other tracks, okay? The larger rectangle at the bottom here is just, you're talking about on the conventional right at the top here, Nicholas. So this is showing you the difference between a regular system where our tracks are defined by these gray things here and whereas the shingled overwrites each other, okay? And the overlapping tracks are what we're talking about here. Are you talking about this very bottom one? Very left of the diagram. I'm not sure which one. So the larger right rectangle down at the bottom here is just showing you what's continuing. This is not saying that we don't overlap this one. At some point, we have groups of these shingled rights and there is a bottom one and then we put a bunch of space and so on because that defines sort of the maximum that we have to rewrite to write something in the middle, okay? Oh, this guy. This is showing you that when you write, you need to have a large rectangle because the writing head spans a larger amount of space. The read head can look at a very narrow space. So that's kind of showing you how much of the disk gets modified. If you look, the writer is this wide thing there, all right? Okay. The other thing I wanted to say that's pretty interesting here is these disks are all hermetically sealed, okay? Which means you can't open them up and part of the reason is that this is spinning very fast and these heads are actually flying on air just above the disk, okay? So they're actually floating a little bit above because of the speed of the disk is causing an effect that kind of like a Bernoulli effect almost, it lists the head off just enough so that it's very close so we can get very dense recording. Now today, the bits have gotten so dense and so the disks have to be so close to the heads and they need to spin them up so fast that they've started actually using helium instead of just regular air in there. So they pump it out and they put in helium and that's basically what's inside those disk drives now. So if you open it up, you're gonna completely break it. Okay, so if we look at a disk here now, we can define it by A, the cylinders. So that's all the tracks up on top of each other and remember the heads are moving as a group together. And then we can talk about the seek time, which is the time to move the head in to the right cylinder. And so suppose we wanted to get some sector on the top of the top platter, while we would top side of the top platter, what we would do is we'd first move the head in to that track, then the rotational latency would be, we would wait for the sector I want to rotate underneath the head and then last but not least, we would transfer the bits that are under the head and that would give us our data. Okay, and so if we wanted to sort of model the time here, what we would say is, well, look, we've got a queue, we've got the controller and we've got the disk. And so the time to get the request out would be the time that it sits in the queue and we'll, I don't know if we'll get entirely to queues today, I think we might, but the time it sits in the queue, the time it gets through the controller. Okay, that's queuing time between controller time and then on the disk itself, the time to seek, the time to rotate and the time to transfer, okay? And as you can imagine, the rotational length latency is going to be defined by the probability of where you are on the track when you get there. So if I were trying to model rotational latency in this equation, what would I do? I mean, how would I do that? Does anybody have any thoughts? Yeah, very good. We would start with taking the rotational time, which is defined by how fast that's spinning. So a typical time is like 7,200 RPM or 3,600 RPM. We use that and that would let us figure out how long it takes to go all the way around. And then on average, we'd say it takes half that time and that's the number we would plug in there to the rotation time, good. So here's some typical numbers, just you see. So space or density, so space might be 14 terabytes. Actually, I'll show you an 18 terabyte one in a moment that just came out literally this month. This old one from a couple of years ago had eight platters in a three and a half inch form factor, which is pretty crazy. The density, which is the number of bits in a square inch is more than one terabit per square inch, which is just nuts. And that's with helium filled discs and a vertical recording domains where the actual bits themselves kind of go into the platter rather than sideways. The average seek time is somewhere from about four to six milliseconds. So if you look here, that's how long it takes on average to move the head around to get it to where you want. The average rotational latency, so most desktop drives are in the 3,600 to 7,200 RPM. The faster you go, the more energy you use. And that's one of the reasons that helium is used because it provides less resistance and so you can go faster with less power. Server discs typically get up to 15,000 RPM. So you can imagine that the server discs are using a lot of energy but are faster, okay? And in the 3,600 is about 16 millisecond rotation time. Controller time depends on the controller hardware. The transfer time can be somewhere between 50 and 250 megabytes per second to transfer data off the disc, okay? And the transfer size at minimums a sector, which is 512 to one kilobytes, but usually the operating system pulls many together and so it will never transfer less than say four kilobytes in a row at a time, okay? All right, the diameters range from an inch to five and a quarter inches but really the three and a half and two and a half inch form factors are pretty common these days. Okay, and the cost used to drop by a factor two every one and a half years, it's slowing down a little bit, all right? Now, here's some performance. So let's, we have to ignore queuing time because that's gonna take a whole discussion and controller time is easy to imagine. But let's see if we can figure something out here. Suppose the average seek time is five milliseconds. If we have a 7,200 RPM disc, so the time to rotation is 60,000 milliseconds per minute, okay? Over 7,200 revolutions per minute gives us about eight milliseconds to go all the way around, okay? And notice how I've got my units set up. This is something you should remember from high school chemistry. So I have milliseconds per minute and revolutions per minute, the minutes are gonna cancel and I end up with milliseconds per revolution, all right? If a transfer rate is 50 megabytes and the block size is four kilobytes, then I can put all this together, then I can find out that it's about 0.082 milliseconds to get a sector out, okay? All right, now to read blocks from a random place on the disc, notice how this is gonna be seek time, rotational delay and transfer time. And if I put those all together, that seek time of five milliseconds is expensive and so we're gonna end up with about nine milliseconds and so notice the transfer time actually is hardly even in the picture here. The seek time and the rotational delay, which is half of eight milliseconds notice is the thing that's really costing us here. And if we randomly go on disc, we can get about 451 kilobytes per second out of that. On the other hand, if we read from a random place in the same cylinder, notice that we don't have to seek because we're in the same cylinder, we get the rotational delay four milliseconds, transfer time 0.08 milliseconds, now we're up to about a megabyte per second. Notice the difference, we almost doubled our bandwidth coming off the disc just by getting it from the same cylinder. So you can see it's extremely important to avoid seek time. And as I mentioned earlier, seek times can be up in eight millisecond range as well. Reading the next block on the same track, which is basically no seek time, no rotational delay, we can get that 50 megabytes per second back. So notice that this is gonna tell us something, if we build a file system out of discs, it's gonna be extremely important to do as much sequential reading as we possibly can. And then if we can't do that staying on the same track, and then if worse comes to worse, seeking. So we're gonna wanna build our file systems to really do a good job of keeping locality on the disc. Otherwise, our performance is gonna go way down. And when we start getting into file systems, you're gonna see why that's important, okay? Now, I just said that. So lots of intelligence in the controller. So sectors have all sorts of sophisticated error correction. So there's far more bits on the sector itself, including an error correction code than you actually are writing on the disc and they help to find the bits when they get errors in them. We can do something called sector sparing, which is take bad sectors and transparently use somewhere else on the disc without telling you, okay? We can do slip sparring, which is remapping a whole bunch of sectors to a completely different track if there's a problem. We can skew our tracks so that the sector number is offset from one track to another. All of this stuff is done by the controller. So although we're gonna talk about ways of building file systems to optimize for the physical location of the heads on the disc, there is a lot of intelligence already in a modern controller that's gonna be competing with you. And so that's something we're gonna talk about when we get to that point, okay? Now, hard drive prices over time have done really well up until about the 2012s or so and then it started flattening out a little bit. And part of this was that they were getting to be so large that there was a much smaller market for those really huge discs. Another problem that was really rearing its head about the early 2000s was that the bits were getting so close together that just the random energetics of heat would scramble your bits and you would lose them if you tried to make the bits any smaller. One of the things that made a really big, really big advance on that was vertically recording the domains that really helped a lot to make things more dense. Now, I wanted to show you a current hard disk drive if you wanted to know what the state of the art is. So the Seagate, for instance, an XOS X18, this is a, couldn't be a server drive but it's a three and a half inch platter and 18 terabytes, it's got nine platters and 18 heads. It's helium filled to reduce friction. It's got a four millisecond average seek time. The sector itself is four kilobytes. It's 7,200 RPMs. It's got very fast interfaces. So for instance, if you get the SAS interface, you can get dual 12 gigabit per second off of it. And you can sustain 270 megabytes per second coming off the disk, okay? So the other thing is there's actually DRAM cache on the disk itself to help make things faster, 256 megabytes. So in case you were under the impression somehow that a disk was just a simple thing with a bunch of platters and a head on it, in fact, it's much more than that. These controllers are extremely sophisticated. There are many miniature OSs in themselves and there's even caching on the controller. And notice that the price for this guy, I just looked it up on Amazon, 562 bucks. That's about $0.03 or 3 cents a gigabyte. If you look at the original IBM personal computer, it was a 30 megabyte hard disk. The seek time was 30 to 40 milliseconds. Notice that's a factor of 10 difference. You could get maybe 0.7 or one megabyte per second off of that, so compare that with 270. And then the price was 500, so it wasn't all that different, but because it was so small, we were telling about $17,000 per gigabyte. So that was a lot more cost per byte. So you guys have it easy these days. Now, let's talk about a different type of disk. So are there any other questions about spinning storage? This would be a good time to ask if there was something you were wondering about. What's the cache for? Well, the cache, among other things, helps make the access to the disk a lot faster. So remember when I said that if you were to randomly read, it's really slow. So what happens typically is these caches are actually used for what are called track buffers. And so when you go to do a read, it actually reads the whole track into the cache. And then when you go and read random parts off the track, you get much faster access. So this is different. Now the question is, is this the same as a hybrid disk? And the answer is no. A typical hybrid disk actually has flash memory on here as well. And the good thing about that flash memory is it means that writes are really fast and don't have to be committed to the spinning storage immediately. So you get much faster access out of it. All right, good. Now, solid state disks have been around forever. So in 1995, they started coming out as a way of replacing basically rotating media with non-volatile memory. Originally that was DRAM, okay? And it was DRAM with a battery. So if you look on a card like this, there was typically a battery back there that basically kept the DRAM's contents when nothing was on, okay? But around 2009, we started getting NAND flash memory which had a couple of levels to it and started making the flash dense enough to be interesting as a storage media in and of itself. And the idea behind flash in general is that trapped electrons distinguish between one and zero. And so when you program flash, you're actually trapping some electrons if you want a one or not trapping them if you want a zero. And that's how you distinguish, okay? And what that really tells you hopefully is that before you can write, you actually have to erase everything which is get rid of all the electrons and then you selectively write them. And we'll say more about that in a moment. The positive thing about this is there are no moving parts. So the failure modes are at least in theory a lot better than a system with motors that are running. It turned out originally the flash disks in say, I'm gonna say not originally, but let's say in the 2000, maybe 12 timeframe where people were really starting to put them on laptops and so on because there was such low power and they in theory were more reliable. It actually turned out that there were some companies that had some weird failure modes that would just all of a sudden take a, I'm gonna say a 100 gigabyte flash disk would suddenly look like it was only eight kilobytes and all your data had begun. That happened to me on one of my laptops where I was an early adopter on flash memory. Fortunately, the SSDs are much better now, okay? Rapid advances in capacity and cost ever since. The downside of SSDs, they're good on power. They're a little slower to write than read, but they also wear out. So the more you write them, the more you lose your data. And so that's a slight downside to SSDs. Now let me just show you a little bit about how this works. So typically you have a host, which is the CPU talking over a data bus like SATA. And you have some on the controller, you actually, you have in the controller, you have a buffer manager, which makes it look like a disk drive so that the host can ignore that it's something separate if it wants to. And then you have the flash memory controller also on that that controls all the flash. And what the flash memory controller does is it reads or writes four kilobyte pages, maybe say 25 microseconds or so. And what's interesting about that is that means that even though in principle, you got all the bits are stored individually, you still have four kilobyte pages that are coming off. So it looks a lot like a disk from that standpoint, except we don't ever have any seek or rotational latency because we're not moving ahead in and we're not having to wait for things to spin. Okay, so you can imagine that random access is much faster here in general. And our model for latency here is queuing time plus controller time plus transfer time. And this has the highest bandwidth regardless of whether you're sequential or random. So that actually has some impact on how you build a file system because you don't have to do the optimization for locality you did otherwise. And I'm gonna make sure to have a couple of slides that I'll put in when we talk about file systems about how this changes file systems because there are some new ones that are related to this, all right? Now, writing is a very complex operation, okay? Because in order to write, first of all, we have to have empty pages, okay? Because we can't write over something that's already been written because the only thing we can do is add electrons. So the right, the erasing process is high energy removal of the electrons and then you can add the extra ones to do the writes. And furthermore, you can only erase in big chunks, okay? So the big blocks that you erase might be for instance a 256 kilobyte block and then you can write in four kilobyte pages, okay? And so you can imagine that one tricky part about a file system for this is we need to make sure we have enough erased blocks that when we're ready to write some new blocks, we can find enough pages to deal with it. And then we have to make sure that when we're done with all of the pages in a block or we have to track enough to know when we're done, then we can go ahead and do the erasing so that they're ready for the next time we need them. So the free list management on the SSD can get tricky, okay? Because it's not just blocks, it's also, there's not just pages, it's also blocks, okay? The other thing is that the rule of thumb on Flash is that erasure is about 10 times the speed of writes and writes are about, excuse me, erasure is about 10 times as slow as writes and writes are about 10 times as slow as reads. So it's really slow to write, it's fast to read. And so this actually has that variation where writes are slower, erasers are a lot slower. And so you have to keep that in mind if you can to try to avoid writing until you really need to. The other thing is writes take power, okay? And so the more you write Flash, you're using a lot more energy than reading. Okay, so writes do not include erasure, no. All right, so you have to do erasure separately. Now, so the architecture, SSDs give you the same interface as hard disk drives to the operating system. So you're reading and writing chunks of four kilobytes. By the way, some of that erasure interface is hidden in the controller. And so the reading and writing to some extent, the OS can ignore this distinction. But if an OS really wants to do the right thing, it wants to know about this distinction, okay? But the part of the SSD controller helps you a little bit with this. So you can only overwrite data 256 kilobytes at a time. You can never overwrite a page that you've written before. It's gotta be erased first. So you might ask, well, why not just have 256K blocks and just erase everything at a time and then rewrite the whole block? And the answer is that erasure is very slow. And if you're not modifying bits, you absolutely do not wanna write them because you're gonna wear it out, okay? And so really this distinction between the size of the erasure and the size of the read and write is something that you wanna keep in mind when you're dealing with this. Now, there's a couple of things that SSDs provide for you. So one of the things is on the flash controller, there's actually a layer of indirection. It's kind of very analogous to what we just came through with virtual memory. So there's something like a page table that maps the operating system's view of block numbers to the underlying SSDs view of which flash blocks are being used, okay? And so that layer of indirection is there and helps hide the weirdness of the flash from the operating system, okay? The other thing is it gives you the ability to do copy on write under the covers. So really when you go to write a page, what happens is the OS, you actually end up writing a different page and then you remap it. And so that the old data is now basically garbage collected and the new data is mapped into the same block as before. And so this flash translation layer helps hide the underlying properties of the flash, all right? So the flash translation layer, I guess I already said this, no need to erase and rewrite the entire 256K block. There's a lot of that's handling this, okay? And yes, as said on the chat here, everything in CS is a layer of indirection. What do you do with the old versions of the pages? They get garbage collected in the background and old blocks that have no active pages in them get erased and put on free lists and so on, okay? Now I wanted to show you some quote unquote current SSDs. So here is the Seagate XOS SSD. This is from a couple of years ago, but they haven't actually updated this family right now. This is 15 terabytes. It also has the dual 12 gigabyte interface like that XOS drive I showed you earlier. Notice that the sequential reads and writes are up in the much faster, okay? Writes are fast because they're basically going to blocks that are already free. But notice this is like 860 megabytes per second as opposed to 270. So this is like a factor of three faster. And Amazon's price for this particular disk is 5495, which gives us about 0.36 gigabytes or dollars per gigabyte as opposed to 3 cents per gigabyte. Like we said earlier, so 36 versus three. This is my favorite hard to believe drive. So here is a disk drive and I say that in quotes that's the same form factor as all the other ones you're used to, but it's a hundred terabytes, okay? That's a hundred terabytes. And it can do 500 megabytes per second and it's about $40,000, which is about 4.4 gigabytes per second. So about, excuse me, 0.4 dollars per gigabyte or 40 cents per gigabyte, okay? And what's really interesting about this is despite the fact that these guys wear out if you write them too much, this company actually guarantees that you can have an unlimited number of writes to this drive for five years. Can anybody guess why, even though Flash wears out, that they could tell you you can have unlimited writes for five years? Why would they even give that as a warranty if Flash wears out? Yeah, so the problem here is to fill out, to fill up this drive is gonna take way too long to do. And so basically you could be writing at maximum speed for five years and you wouldn't overwrite things enough to wear them out, okay? And so they're comfortable saying you can write as all you want for five years and you'd be fine, all right? And notice part of that is the Flash translation layer. Every time you write the same block, and I say that in quotes, you're really writing different blocks. And so it's doing what's called wear leveling where it's making sure that as you overwrite things, it's making sure that every one of those pages on all of those 100 terabytes are all used equally well. And so if you were to try to write at your absolute maximum rate for five years, you'd never get anywhere close to wearing any of the bits out. And so they can actually make that guarantee. But anyway, that's my favorite ridiculously large drive. Okay, so let's see. So basically hard disk cost and SSD costs, hard disk SSDs have been basically going toward merging for a long time. They're pretty close these days here. I'm not gonna go through that any much more. I wanna tell you this, which is kind of fun. So if you're aware of the Kindle, so I'm sure all of you have seen them before, they're a really cool reading device. I love them myself. The thing that's cool about them versus pretty much any other LCD devices is that you can read them in full sunlight. And so if you're a fan of books, you get yourself a real Kindle. You can kick your feet up in the sun and just read. And there's an amusing calculation you might ask, which is suppose that I take an empty Kindle right after I bought it from Amazon and I fill it with books, is it heavier? Okay, so that seems like a ridiculous question, but let's answer that. And the answer is actually yes, but not much. Okay, and so let's go through this. So flash, as I mentioned, works by trapping electrons. So the erase state is actually lower energy than when you write a one on there where you put some electrons in there and trap them. So you got higher energy for one of the bits, okay? It doesn't really matter whether those are ones or zeros. And assuming, for instance, the original Kindles came out with four gigabytes of flash. If you imagine that a full Kindle, half of the bits are ones and half are zeros, then half of them are of high energy state. And you can compute for a typical flash transistor what the high energy state is. It's about 10 of the minus 15 joules. So a full Kindle is about one at a gram heavier than an empty one. And you can use actually E equals MC squared here with the energy to come up with how much weight it is. So it's actually heavier except that, of course, 10 to the minus 18 grams or an at a gram is unmeasurable because the best scales out there can't measure something finer than 10 to the minus nine grams. So the other thing is there's a whole bunch of other caveats. So you have to take the Kindle, set it to a constant temperature, fill it with books, cool it back to that temperature, recharge it, and then there'll be a 10 to the minus 18 gram. So this weight difference ends up being overwhelmed by battery discharge and all that sort of stuff. But it's amusing nonetheless. And my sources, by the way, are this guy, John Kubitawe. So there was a New York Times column in 2011, which was pretty funny. So the New York Times called me up and said, we have this question from somebody reading our column and they'd like to know if Kindles are heavier when you put books in. And so I wrote about why this was. All right, so you can, this is a great party thing, right? So one of the things I love to do in 162 is I like to help you all out with parties. Now, of course, unfortunately our parties are all virtual these days or they should be, but you can imagine that you're on your Zoom with the other 50 people in your party and all of the parties have too much milk, yes, that's true. And then you can say, did you realize that when you fill a Kindle with books, it's heavier? All right, and you'll be the most popular person at that party. Okay, so what about SSDs to summarize? So the pros versus hard disk drives, so they're low latency, high throughput. We can completely eliminate the seek and rotational delay. There's no moving parts, so they're much very lightweight. The power is low, they're silent. Turns out they're extremely shock insensitive, so you can drop things without jarring the bits. By the way, you can't quote me on dropping a laptop and being okay, I'm just talking about the SSD. You can read them at memory speeds essentially, although the writes are a little slower. The cons are that the storage is small relative to disks, but as you can see, SSDs, if you're willing to pay exorbitant amounts of money, you can get very big disks, okay? So in fact, that small storage thing isn't really true anymore. And the hybrid alternative that was asked about earlier is to combine small SSD with a large hard disk, okay? And really what that does is it gives you the ability to do really fast writes to the disk without having to seek and really fast reads it serves as a cache, okay? And so some of the other cons though is there's an asymmetric block write performance, so you have to read page, erase write page to really change any data on a disk or on a block. And the drive lifetime's a little bit limited, so you're limited to about 10,000 writes per page for modern NANDs. And so the average fail rate is about six years, life expectancy maybe nine to 11 years, but if you write a lot and you don't have an extremely huge drive like the one I showed you earlier, there really is a danger of losing some bits, okay? Things are changing pretty rapidly though. Now, one thing I did wanna show you is another option, which is kind of fun, which is nanotube memory. So this is something, so nanotubes unfortunately, perhaps my camera image is covering this up, but nanotubes are made out of carbon molecules and they're tubes of carbon, okay? And you can put a bunch of them in a pattern pattern and you can actually arrange so that they're either randomly together or they're attracted one way or another and so you can actually have two different resistances that you can detect and that gives you ones and zeros and there's a way to clear by erasing, which basically means put it back into one of the states. And the interesting thing about this is this doesn't wear out, okay? Because you're just moving the nanotubes around and so it doesn't wear out like flash, it's persistent, so you don't have to worry about losing the contents and it's as small as DRAM cells, okay? And so there's for instance, a company called Nantero, which has been very close and been working with DRAM manufacturers to produce these cells and this could potentially replace DRAM because it's as fast and dense as DRAM holds its contents and doesn't have a wear out problem. So that's a pretty exciting possibility to come up soon. I think this is gonna fundamentally change the way people think about memory once this becomes mass produced and they had already figured out how to pretty well produce these and they were working with several DRAM manufacturers a couple of years ago. So of course, who knows exactly what's happening because of the pandemic has sort of screwed everybody up but this will be fun. All right, so let's shift. Well, unless anybody had any questions on devices, I wanna shift gears to some performance to talk about that. Are there any other questions about devices? So this nanotube memory is actually three-dimensional patterning as well as possible, so this will be really dense, okay? So the difference between PCIe and SATA3 is those are two different buses. PCIe is used for, is a pretty common interface to plug cards and stuff in whereas SATA3 is something that was set up to specifically for disk drives and so they're for slightly different uses. DNA storage has been interesting for a long time but I haven't yet seen a good proposal for how to make it as dense as regular DRAM yet. But of course, we all know that DNA is very dense but that would be fun at some point. Do any of these use less heavy rare toxic metals? That's a really interesting question. I'm not sure the answer to that. The nice thing about Nanteros nanotubes is the biggest thing here is carbon which it'd be great to extract that from the atmosphere and use it but in terms of things like cobalt and some of these other things, unfortunately patterning of chips is not necessarily as environmentally friendly as one might like but I don't have any reason to suspect that this nanotube is worse than other ones and it might actually be better. So that's a good question though. So let's talk about performance for a moment. So when we're talking about these disks or we're talking about schedulers or whatever there's several things we might talk about and I thought I would just put these on the table for a moment. So for instance, latency, time to complete a task. It's often measured in units of time, seconds, milliseconds, microseconds, maybe hours, maybe years, right? Response time is kind of the time to initiate an operation and get the response back. So latency is time whereas response time often is around trip, right? It's from the time the quest went out to come back, okay? And sometimes the ability to issue the next response, the next request might depend on when you got the response because not all systems can handle pipelining of requests. A different thing is throughput. So throughput or bandwidth is typically the rate at which we can send tasks or bytes. Those are two possibilities into something. And it's often measured in units of things per unit time. So like operations per second or giga operations per second or bytes per second, megabytes per second. So often in networking, you might see megabytes per second, right? Megabits per second. And then another thing which ties into all of these as the startup or overhead, which is often the time to initiate an operation. Now overhead fits into latency, of course, but if you can pipeline and send several things at once, sometimes you can only pay the overhead at the first one and then the rest of them are run at full rate. Now most IO operations are roughly linear where if you have B bytes, the latency is the overhead plus B divided by transfer capacity. And so that overhead actually directly impacts your latency. And I'll show you that in a moment. When somebody talks about performance, the first question you ought to ask is what am I measuring and is it relative to something? So for instance, performance might be operation time. It might be rate. It might be any number of things. So you could talk about low latency is a high performing thing or you could talk about high throughput being a high performing thing, okay? Listen, you're talking about what this is, this is globs, I think that's just a typo. Sorry about that. So for instance, in a network, suppose we have a one gigabit per second link, everybody's got those. You probably got them on your laptops. The bandwidth might be 125 megabytes per second, right? So this is gigabits per second per link, megabytes per second, okay? That's just dividing one gigabit by eight, all right? Suppose the startup cost is a millisecond. We could take a look at a graph like this. So notice this is a double headed graph. It's got packet size on the bottom. It's got latency in blue on the left and bandwidth in red on the right. And if you notice the latency, because this is linear, the latency is really the startup cost plus the size of my packet B over the bandwidth. So here's my size of my packet. What I showed you there for latency is a nice linear graph. And notice that at the zero intercept, there's a minimum of a thousand microseconds or a millisecond because that's my overhead. And so if I were to look at the bandwidth of this, the effective bandwidth, yeah, this thing is a gigabit per second or 125 megabytes per second. But if I were to look at the effective bandwidth taking overhead into account, I get this red curve, all right? And I just take the packet size divided by the latency to send that packet. And that gives me effectively bytes or bits per second or whatever I'm measuring. And it has this shape to it, okay? And this shape starts out low, right? Because my bandwidth starts out at zero for small packets. And that's because the overhead's so high. Once I make the packet big enough, then my bandwidth starts getting higher. And in fact, at some point it levels out because no matter how big my packet is, I can't go faster than the raw 125 megabytes per second, okay? And so one place that can be interesting here is what's called the half power bandwidth, which is the point at which my effective bandwidth is equal to half of my total bandwidth, all right? And that's, for instance, here, if my packet is 125 kilobytes, then my effective bandwidth is at half of my full bandwidth. Okay? So just because you have a gigabyte, excuse me, gigabit per second link doesn't mean you get a gigabit per second. In fact, you often don't unless you have really big packets. What's also interesting here is if our startup cost is 10 milliseconds, notice how I had the overhead of one millisecond here. If I change it to something more like a disk, say 10 milliseconds, and I do the same computation, what you find here is that the half power point is not until 1.25 gigabytes in size. So I have to have really, really, really large packets before I come anywhere close to getting half of my native bandwidth. So that's a problem, okay? Oh yeah, sorry, this is 1.2 megabytes, my apologies. I added three extra zeros in my brain there. Okay, so overhead really matters and see this huge, zero packet size latency gets into play. And so when we wanna do a good job of optimizing things when we start building file systems and networks and stuff on top of devices, we're gonna have to be very sensitive to the overhead. So what determines the peak bandwidth for IO? So that was, for instance, that one gigabit per second. Well, it's the hardware. And so you can look at a bunch of buses. We've talked about things like the original PCI buses, was 133 megahertz at 64 bits per lane. Thunderbolt, which is a USB-C style connection, 40 gigabits per second. So the bus speeds have been continually getting bigger. The device transfer bandwidth is gonna give me my peak bandwidth off of a disk, okay? And so that has something to do with the rotational speed of the disk or the read rate of the NAND flash. That gives me my peak bandwidth, which is what I start with in a calculation like this. So my peak bandwidth is this one gigabit per second. And then the overhead takes over, okay? And so that peak bandwidth comes in many forms. And whatever the bottleneck is in the path is the thing that's gonna limit my peak bandwidth, okay? And we're gonna talk a lot more about this next time. So the overall performance for an IO path, which is where we're gonna wanna get might look like this. You have a user thread, they make system calls and their request gets queued and then eventually goes to the controller and the IO device. I already showed you this earlier when I was talking about the disk drives. The interesting thing that's the elephant in the room we haven't talked about is this queue. The mere existence of the queue with random inputs times causes this curve, okay? And so hopefully by the time we get through our discussion on queuing theory, you'll have a much better idea why this curve goes up as we get closer to 100%. So that 100% we're first gonna try to understand what 100% throughput means or utilization. And that's really finding our peak bandwidth that's possible to get through the device and getting as we get close to that in our requests, you'll find that it isn't that we linearly increase but instead we get this behavior where the curve actually climbs toward infinity if we're doing this in modeling as we get close to 100%. And we're hopefully gonna try to explain that but for the time being what's important is the fact that this curve is very nonlinear. It's not linear like I was implying with these previous slides. And so if it's nonlinear, you're gonna wanna be careful. You're never gonna wanna be operating over here because your latency is gonna be ridiculously high just to get a little bit more performance out of the system or a little bit more utilization. And so instead we're gonna want something more like a half PowerPoint or the point at which we stop kind of doing a linear gain with utilization and start getting into the rapid growth, okay? All right, and we're gonna explain that more. So just to start the discussion for next time sequential server performance is kind of what you think about when you say, well, it takes I have a request this blue one that takes L to complete and I have a series of them. And as long as the server, be it a disk or whatever can handle L of the, you know, can handle this at the rate it comes in, I'm good to go, okay? So a single sequential server that takes time L to do a task operates at a rate that's less than or equal to one over L on average in steady state. So notice that I'm getting maximum behavior out of this server because I'm putting these L items together and I'm squishing them together as tightly as possible. And so for instance, if it takes 10 milliseconds for me to process something, then the maximum rate I can get out of that server is gonna be one over L or about a hundred ops per second. If L is for instance, two years, it's possible I'll only get 0.5 operations per year, okay? And so this latency L to do an operation in the server is gonna be something we need to compute and that's possibly related to things like, you know, seek plus rotation plus transfer on a disk or transfer time off of flash and so on, okay? But as you can imagine, this was looking really nice and linear but that graph I showed you earlier wasn't nice and linear. Another version by the way of something simple here is a pipeline idea where you've got three operations, you've got to do three things, each of which takes a time L and I can do them in different stages. So I first do the blue, then the gray, then the green and I can pipeline those in the following way, okay, this probably rings a bell from 61C. But in that instance, depending on how many pipeline stages I've got or K pipeline stages, my effective rate is higher, okay? Because if L is 10 milliseconds but I can do four stages at a time, I get 400 ops per second rather than what I had is a hundred ops earlier. So we're gonna wanna start analyzing our systems as can we get any pipeline out of them as well, okay? And I think examples of pipelines are all over the place. So for instance, you know, here's a user process calls as a sys call, which cues in the file system which then goes into the upper device driver which cues there, which goes into the lower device driver and so on, or in a network we've got communication there's a whole bunch of cues throughout the network. So anything with cues is gonna start invoking cueing theory. So we're gonna have to analyze it there and you're gonna find out that unlike what I just showed you it's not linear, it's gonna have that unfortunate curve to it, right? And we're gonna hope to identify that as we get forward. And unfortunately real systems have these cues and have that nonlinear behavior. So it's not synchronous or deterministic like it was in 61C. All right, I'm gonna let you go but in conclusion, we talked about notification mechanisms today. We talked about interrupts and polling. Where polling is reporting the results by actually asking the status register what's going on and we talked about how we can combine interrupt and polling to maybe get lower overhead. We talked about device drivers which interface to the IO devices and give you a clean read, write, open interface to the operating system above and they manipulate devices through things like programmed IO that's where the processor reads each thing at a time or DMA and we talked about the three types of devices that device drivers have to deal with. We talked about block devices, character devices and network devices. We also talked about DMA to permit devices to directly access memory. So typically the device driver running in the operating system asks the device go ahead, please transfer this data to that part of memory and tell me when you're done, okay? And one of the things we didn't talk about today but you can imagine is, oh actually we did talk about it is while that transferring is going on it's possible that either the operating system had to have pre-invalidated the cache or the DMA has to invalidate the cache as it goes. We talked about disks and disk performance. We talked about queuing time plus controller time plus seek time plus rotational plus transfer time. We talked about rotational latency being a half of a rotation on average and the transfer time depends on the rotation speed, the bit storage density and as we talked about it it depends on whether you're reading from the outside track or the inner one. Devices have very complex interactions and performance characteristics. We've just started this discussion. So the queuing plus the overhead plus the transfer time and that's our latency, okay? And we talked about how overhead can make a huge difference and you need large block sizes to deal with that. And then we talked about how different devices like a hard disk versus an STD basically have different performance measurements, right? And systems as I've already alluded are basically gonna be designed to optimize performance and reliability. And that means we need to know something about the underlying devices. So even though we have these interfaces to shield us from knowledge, we need to know something more about the devices to really use them at their maximum performance, all right? And what we're gonna find out next time is that bursts and hydrolyzation introduce all sorts of queuing delays and that's gonna be the source of that growth without bound in our performance curve from earlier. All right, I think we're good to go for today. I'm gonna let you go and that's SSD, that's a typo, good catch. And so I'm gonna wish everybody good luck on tomorrow's exam. I'm sure you all do well and we'll see you on Monday.