 All right. Good morning, everybody. Everybody enjoy a slightly longer weekend than usual? Maybe you didn't enjoy a slightly longer weekend than usual. Maybe you spent it working on the assignments. So today, this is the last full week of class. We're going to spend today talking about how to make, today and part of Wednesday, I talked about how to make operating systems fast. And for those most of you that are going to go on and not write Linux device drivers or hack on BSD or work for Microsoft in the core OS division, this might be stuff that's a little bit more useful because these techniques and approaches to performance enhancement are things that you can apply to whatever code you write. And making code faster is always helpful. Nobody ever complained that their code was too fast. So we'll talk a little bit about how to measure performance. We'll talk a little bit about different approaches to improving performance, some of the pitfalls there, and a little bit about benchmarking. So how do we actually measure performance in the first place? As we're coming into the final stretch for assignment submission, so you guys have, what, three weeks? Two weeks? We're getting bad at counting. I think it's, somebody help me here. Three weeks from Saturday? Was that what it was? Next week? Maybe it's two weeks. Anyway, yeah, we're getting close. So this is, and I'm harping on this a little bit now because I feel like when the deadlines start to approach, this is a time where people do dumb things. So we've reminded you guys about cheating on the website. The other thing I want to remind people is that this is the collaboration policy that you guys have been agreeing to every time you submit an assignment. This is part of it. And part of it is you attesting that you and your partner are dividing the work in some sort of semi-even fashion. And every time you submit, again, this is what you're agreeing to. And the TAs have started to notify me about some groups that they feel like are particularly imbalanced. And that's not, again, this is what you guys are agreeing to every time you submit. Both parties. So if you guys are having some sort of issue with your partner, if you've been carrying your partner through multiple assignments, if you're working on an assignment too and they haven't written any code, it's probably time to come talk to us about that now. Because we have ways of noticing this. And if you're submitting things, you're agreeing to this. So when you submit things, you're attesting to the fact that you and your partner are both involved. So if that's not the case, then we need to know about it pretty soon. And if you tell us about it, that initiates a different set of procedures as opposed to if we find out about it later. So if you're working in a very dysfunctional partner group, please come talk to us. I think the people in this room are probably OK. But maybe talking to the internet, people who are in their pajamas at home. But again, you agree to these? And the TAs are sort of looking as they see people just spending an office hours and in other ways trying to determine are both people engaged? Are both people involved in these assignments? We don't want people carrying, one person carrying the entire group. It's not fair to your partner, really, who's trying to learn. It's not fair to the other students who are working hard and want to grade in the class that reflects their work. So if you have a partner problem, please email us. And we will work on it. So last Wednesday, we talked about operating system structure. And I actually didn't go through a little bit on interface design that was sort of tapped on to the end of that, but we can talk about it a little bit in a minute during the review. But it was kind of obvious stuff that you guys have heard before. Any questions about operating system structures? We talked about microkernels in particular and the different approach that microkernels bought to kernel organizations. I need questions on this before we do a little bit of a review. It was so long ago. So who remembers? I mean, some of the goals of when people decided, let's try this radical reorganization of operating system structure. What were some of the things that microkernels were trying to accomplish? What were some of the goals? What were some of the problems with more traditional monolithic kernel designs that microkernels were trying to address on it? Yeah, so you could potentially do that with the microkernel, right? I actually think back when microkernels were getting started, there was maybe less awareness of the problems that device drivers would cause in the future. But that's a good point. Yeah, Jared? So microkernels were modular? Yeah, so what's good about being modular in this particular case? OK, so yeah, that's actually a good point, right? So there was maybe some sense with microkernels that maybe at runtime you could bring in and out different pieces of code. And so you were getting some modularity just from the design, right? What other feature monolithic kernels, though, did we talk about that gives them some sort of degree of that same flexibility? What are most modern monolithic kernels support so that when you plug in a device to the system, they're not completely caught up, Sarah? Yeah, and not only can you recompile those binary modules, but what can you do at runtime, Robert? Yeah, you can kind of load them in these kernels have a loadable module sort of subsystems so that they can change their code at runtime, right? Like, this is something that really any application could do. There's no engineering reason, but it's particularly useful for kernels, right? Because if like, for example, like again, all these examples from Firefox, I'm sorry. In Firefox, downloads and update always has to restart, which really irritates me because I usually have 60 windows open on different desktops and it takes me half a day to sort of restore my setup. But it doesn't really need to do that, right? If your kernel did that frequently, it would probably get annoying, right? But kernels still kind of have to do that, right? I mean, how many people have had to reinstall their VMware tools on their virtual machine because Linux installed an update and then yet? So anyway, certain types of updates to kernels are difficult to apply, but at least at runtime, I can change the code, I can load parts of code that allow me to deal with a certain type of device, certain kind of file system, right? That gives me some flexibility. What else does that do as well to the size of the kernel image itself, Dan? Makes it smaller, right? So I don't have to cram everything. You know, it's not a one size fits all. I don't have to bring everything I need with me right away and load it all into memory at runtime, right? I can leave parts on disk and only grab them when I need them and then unload them later, right? And as Sarah pointed out, I can't update them as well. What were other goals of micro kernels? It's something about modularity. What else do I get from modularity? What was one of the problems with big monolithic kernel designs that we haven't touched on yet? Peng, why don't we want to help them? My different ones don't have all the code right. Okay, that's a good point. But what else is a consequence of the fact that all this code is running in this privileged address space? Sean? It's messy. It's messy, yeah, okay. So by forcing modularity on ourselves, we can potentially improve the design of our system, right? There's no more of these little hacks, right? Because I realized that my interface to the core map isn't quite right and so I build this weird side channel or some other piece of VM code is modifying data structures that it really shouldn't be able to modify directly but it can because everything's in the same shared address space, right? I mean, how many people remember, maybe this is when you guys were in diapers, but there used to be talk about splitting Microsoft up into two companies. Does anyone remember this? For antitrust reasons, because there was this concern, right, by, so let's say like you're, so the Windows API, which we haven't really talked about is big and it's gone through multiple revisions and it's big and crufty and there's all sorts of ways to do any one particular thing, right? So let's say like you work at, I don't know, Firefox, right? And you're trying to figure out, there are 13 different windowing commands that potentially do almost the same thing, right? From like three or four different versions of the Windows API and you're trying to figure out which one to use, right? Well, what do you do? What's your, what are your options, right? Well, what would you do? You're working in Mozilla, you're trying to figure out, oh man, you know, I wanna write a, you know, I wanna draw a pixel to the screen or whatever, I've got all these different options, like what's the only real avenue that's available to you if you're trying to figure out, like what is the right one, which is the one that's the best supported? Cal, what's that? You could use the latest one, okay, so that'd be a good guess, right? What else could I look at? What might there potentially be, hopefully, for a big, popular platform? What else could I look at online, Tim? I could look at the docs or something like that. Okay, so then those aren't terrible options, but what can I, let's say I work at Internet Explorer, right? Okay, what can I do now? I've got these 13 versions of this, I'm trying to figure out, okay, you know, which one are they gonna support in Windows 8? Right, what option do I have that the Mozilla guy doesn't have? Jen, yeah, I can pick up the phone or I can send an email to the guy who wrote those things and be like, hey, buddy, which one of those is fast, right? Or which one have you worked on in the last six months, right? And so that was actually part of the concern at the time, was that Windows application developers were getting sort of an undue amount of performance improvement, or they were essentially privy to the internals of the operating system that external developers weren't, right? And that was causing them to have advantages when they wrote this, right? But part of the reason that happens is when you have these big, messy interfaces, they get to the point where no one's really quite sure like what the right way to do something is and there's several different ways and which one is fast and which one has the side effect that I want or whatever, or they might call up and say, hey, it would be really helpful if you guys like in the next version had this called it did this, right? Like it would be really helpful to us if we just, you just added this function, right? And that maybe some of that stuff happened as well, right? But there's still one more goal here that we haven't got to that I want to get to. Jeremy, well, in a certain way it was more tied down, right? Because one of the ways we made micro kernels fast is by really, really heavily optimizing their code. But Sean, that's a, sorry, Sam, yeah. What does that buy us though? Yeah, but what else are, Alyssa, what else are application services not going to be able to do that code could do if it was running in the privileged kernel? Yeah, okay, so we're getting closer, right? So by taking my file system imitation and moving into some sort of application service, what can bugs in that no longer do? Effect it how? We're getting so close, we're almost, yeah. What is the thing I'm worried about the system doing? They can't crash, right? They can't crash the whole machine, right? So I've, you know, remember all that code inside the kernel runs at the highest, potentially the highest privilege level, right? So it has the ability to do all sorts of terrible things to the system. And the less code I can put in there, the more fault tolerant my system will be, right? And there was this idea, right, that if my file server running in user space crashed, I would just be able to restart it and the system would go on running without there being a problem, right? Without having to hiccup. I mean, yeah, you wouldn't have your file system for a few seconds, right? But you would just restart it and it would be okay, right? All right, no, I think we've got most of this. So we didn't talk about this too much, right? But, you know, just let me, you know, when you start thinking about an interface design, and this is something that you guys will be thinking about during assignment three, right? You know, good interfaces between components were part of the goal of the microkernel movement, right? And that's something that you can incorporate into a monolithic design without having, it have to be a microkernel, right? There's no need for these microkernel, for some of the microkernel parts, if I really have good interfaces, right? But what do good interfaces do, right? I mean, you guys, you know, the way that we teach you how to program here stresses interfaces quite heavily, right? But I think that maybe people forget sometimes about what they are and what they can do, right? So, you know, good interfaces, and again, when you guys are working on assignment three, you're gonna find that this assignment is really much more about interface design than it is about anything else, right? Because good interfaces, you know, allow your partner, right? So substitute program, partner for programmer here, they allow your partner to make correct assumptions about things that you've written, right? Like, if I make this call, this is what will happen, right? They also allow you to change how that interface is implemented, which is important, right? And finally, they also are usually written in a way that allow individual pieces of the system to be split off and tested by themselves, right? So if I write a good interface that I can test across that interface, right? And I can verify, for example, that, you know, my core map works, right? That my address translation, you know, component works, right? But these things work. And then once I start combining them together, then I have more, there's a higher likelihood that everything will work once I start to integrate things. All right, so any other questions about this stuff before we talk about structure? Okay, so why don't, I mean, why do we care? I mean, this is maybe an obvious question, right? But why do we care about performance? We've touched on this before, right? So when I talk about performance, what am I normally potentially trading off here? What might I have to give up to get performance, right? Or alternatively, what might I have to sacrifice performance to achieve? Yeah, AJ. Safety, right? Or broadly speaking, you know, you can think of correctness, right? You know, and maybe the less checking I do the faster my system gets, Jeremy. Yeah, so, you know, if, and that's something that, you know, we always, we always care about, and, you know, that having a system that's twice as slow and has, you know, 1% fewer bugs and it potentially isn't, you know, a hot selling system, right? And this seems to be true, right? I mean, maybe this is getting less true, I don't really know, but over the last 30, 40 years, I mean, you know, again, not to bash on, you know, to bash on the guys in Redmond, right? But like the most popular consumer operating system for years and years had basic safety problems that didn't prevent it from, you know, gaining world domination, right? I mean, Windows finally got around to providing, you know, and, you know, the UNIX people were always very contemptuous about this, right? Like, they were like, oh gosh, Windows 2000. Wow, congratulations, right? Like, you guys have finally implemented, you know, memory protection, right? Like, we had that in the 70s, right? So, but again, I mean, this operating system sold and a lot of people used it and it crashed and that frustrated people, but they combined it, right? So maybe that's the best example of the fact that people, you know, they like fast, they also like having all the apps work and things like this, right? But again, so the correctness problems with Windows, and many of them have been fixed, probably a huge number of them have been fixed over time, but they didn't prevent it from selling and sort of taking over the operating system market. And now this is, so this is going back to Dykstra, right? So we talked about Dykstra's operating system, which was one of the last attempts to sort of, until very recently, to make a provably correct operating system design, right? And this is pretty awesome, right? Like, you know, the paper that he wrote about this is a really fun paper to read, actually. And let's see here. So this is talking, again, this is in 1960, so we're talking about, you know, the only errors that showed up during testing were trivial coding errors, right? Occurring with the density of only one error per 500 instruction, each of them located with 10 minutes at the machine and each of them correspondingly easy to read, right? But this is 1965, right? And this is probably the last time that people tried to build a system where the logical soundness of it could be proved from day one, right? Ever since then, we started sacrificing the safety for performance on a number of levels. And then here's something interesting to think about, right? And this is something that OS vendors have worked on a lot, right? You know, would you care as much about the blue screen of death, right? If all you saw was that blue screen for a second and then the system rebooted instantaneously. I mean, you probably still care because you were in the middle of typing this really incredible, you know, new VM system that you were building for assignment three and suddenly the machine rebooted and you might have lost a little work but there it is, right? Like it comes right back up and one of the things that consumer operating system vendors have done is spent a lot of time trying to optimize boot, right? Because they realize that if when things go wrong and this doesn't crash, if you can get people back up and working again, then maybe they care a little bit less, right? Maybe Bella is a little less. All right, so performance is, operating system performance isn't so different, right? From improving performance of other parts of the system. How many people have feel like they have a strategy in place for improving the performance of something, right? If I gave you some code and I said, I'd like you to speed this code up, I'd like you to understand what's wrong with it, how many people feel like they could do that? Okay, well, this is good to talk about. It's not that hard, right? So we'll go over it, right? And you know, this will take us a few minutes, right? So the first thing you do is you have to measure, right, you need to measure the system so you know something about what you're trying to improve, right? Otherwise, if you made some changes, how do you know what they did, right? So the first thing is you have to take some measurements, right? Then you analyze the results and you try to use the results to determine what to do, right? So I take some measurements in my system and I figure out, okay, this is the slow part, right? And then I go in and I try to make the slow part faster, right? So that's step three, right? I improve the slow parts. That's not that hard, right? You know, step four is one of the more important ones. You know, you should start in the morning and then maybe move on to this by like mid-afternoon, right? So that's kind of like the right way to do this. And then you just repeat this, right? So you know, find what's slow, improve it and you know, have the celebratory beer and keep going, right? Okay, so on Friday, we'll talk about virtualization technology. Just kidding, right? This is a little bit harder than that, right? So, you know, everything's hard here, right? Like none of this is straightforward. I mean, this sounds easy, right? I mean, there's one easy part, right? Well, I don't know, it depends on how much you like beer and how difficult it is to determine which celebratory beer to drink and stuff for, right? So, you know, how, measure what, right? So under what conditions am I going to measure the system, right? What is the system going to be doing while I measure it, right? Is the system going to be doing something that provides a useful data point for me to start my performance analysis? We'll talk a little bit about that. So now I've got results, right? And hey, yeah, if you actually want meaningful results, you might like have to know something about statistics like a little bit, right? And actually this, you know, I've always been amused by how little statistical rigor there is in computer science. It's kind of like, oh, I computed a mean. All right, sweet. I get two celebratory beers now, you know? And then, you know, improving the slow parts. How do I do this, right? Like what techniques, you know, I broke the code, right? I didn't write it to be slow. Maybe that actually would be a kind of a good strategy for people in the future, just write slow code, right? And then when your boss asks you to speed up, it's like, no problem, right? Like I know exactly why that's slow, right? Like I'm sitting in a while, one loop player is spinning for a while. Like I could just take that out. The system got faster, you know? You get a raise, awesome. And then, yeah, so anyway, so, yeah, exactly. After I do the statistics, I'm usually ready for my celebratory beer. Yeah, so let's talk about some of the things that make this challenging, right? So the first thing is fairly basic, sure. Oh yeah, we'll come back to benchmarking and, yeah, yeah. Choosing a benchmark, right? And then someone yells at you for choosing a benchmark. Maybe it's not a good benchmark. Maybe it's a good benchmark, but it's not a good benchmark because it doesn't make your system look good, right? So define good, right? Might be a very exhaustive benchmark that just doesn't happen to work for your particular company. So at the most basic level, right? Doing something as simple, I'm trying to give you guys sort of appreciation for the fact that this is difficult, right? So measuring time, right? How do you measure time on a computer system? Shouldn't this be something that's easy, right? And especially time at sort of small granularities, right? Yeah, so this starts to become an incredibly hardware specific problem, right? And a lot of actually what we talked about when we talked about statistics collection starts to become very, very determined by hardware, right? So for example, if I'm trying to determine how long things take, if the thing that I'm measuring takes a very small amount of time, the hardware counters I have on my system might not even have sufficient resolution to measure, right? They might say, oh, it took zero, right? I took 10 runs, you know, and I was trying to figure out how much this particular bit of my algorithm, how much time it took, and the results I got were 0, 0, 0, 0, 0, 0, right? So that's not very helpful, right? And then, so what systems try to do is they try to build up these timers, right, based on the capabilities of the hardware, right? But at high levels, my timers are determined by sort of like the worst case capabilities of the hardware, the slowest clock that any device that the system is going to boot on has. As I start to try to take more advantage of the hardware, that's great, right? And this can potentially be very useful. What it means is that my benchmarking and some of my performance analysis starts to become very tied to a specific hardware platform, right? And that becomes important because the details of that hardware platform may affect my results in other ways, right? So you may have been asked to improve the performance of something in general. You may end up only improving its performance on one particular device, which is not necessarily a bad thing. But if your improvements on that device don't actually translate to other devices, or worse yet, make things slower on other devices, right? Then you haven't really solved a problem, right? So, you know, and then you get into all, man, it's just, this gets more and more terrible, right? So counter's roll over, right? So it's like, suddenly the counter was big one minute, now it's little, so you have to check for that. You know, these, I don't know, this doesn't sound very exciting, but it turns out to be more exciting, right? So, okay, so, but let's assume that we can sort of get away with this, right? This might be hard, right? But we have some sort of counter that allows us to measure what we want. What's one way, just let me ask people, I mean, what's one way to get around this problem, right? Let's say I have a very fast event and a fairly slow counter. So how can I, how can I use that slow counter to measure? This could be like a, you know, a first round interview question. It's Google or something, you know? I give you a slow counter and something that runs fairly quickly. How can I use that slow counter to make meaningful performance measurements of that fast thing? Yeah? Yeah, run in a loop, right? Run it 10,000 times, right? And take the time it takes over the entire, over the entire, now, of course, that can cause other problems, right? Which we'll talk about in a second, but in general, that can be a strategy for using slow counters to measure fast things, right? Just to measure large numbers of them, right? Repeat the same test over, right? All right, so, we'd like our measurements to be repeatable, right? They, in theory, they should be repeatable. Well, I don't know if they should be repeatable or not, right? But they're not repeatable, right? And there's all sorts of, I mean, you can imagine there's all sorts of variables that are changing, but what is one of these fundamental things that operating systems are trying to do as you run, right? Operate system is trying to do what? Use the past to predict the future, right? You're trying to measure the present. Every time you run a measurement, you're giving the operating system more data points that it's trying to use to make the next time you do that faster, right? So what's a fairly basic sort of example of this, right? You know, let's say I give you a benchmark to run and, you know, you boot up the system and you run it, and it takes 10 seconds to run, right? Then you run it again, it takes four seconds to run, right? You run it again, it takes four seconds, it takes four and a half, it takes four, and then you walk away from the system for a while, then you come back and you run it again and now it takes eight seconds to run, right? What's happening? Yeah, caching, yeah, okay, so caching where in this particular case? Could be a lot of different places, right? Could be pages, I mean, the operating system has a whole cache hierarchy that's trying to defeat your repeatable measurement, right? It could be cached like very close to the processor, it could be cached in memory, you know, it could be in the buffer cache somewhere, right? And there's a funny joke in the performance measurement community, particularly in operating systems, right? Where there's a bunch of papers where you take a bunch of measurements and you get really high variance, but the mean kind of makes your point, right? So you put the number in the paper and you explain away the variance due to what? Cache effects, right? Like, oh yeah, I don't know, cache effects, right? Yeah, that's, you know, this is like, this is the really dirty carpet, right? That computer scientists have been sweeping results they don't understand under for a long time, right? Cache effects, you know? Oh, they don't want cache, it's always in the way, you know? Frequently wrong, right? But still kind of a fun catch-all to explain away things you don't like. Yeah, Jeremy? Oh yeah, absolutely. So good benchmarks do a lot to work around caches, right? So when you run, you know, mature benchmark suites, they'll do things, for example, well, they'll try to determine at runtime what the size of various caches on your system are and then they'll try to defeat them in certain ways, right? So for example, you know, L1 caches have certain lines that they'll bring into memory. If I make sure that my memory access are spaced out appropriately, then I can essentially defeat that cache, right? So anyway, yeah, if you look at really sophisticated benchmarks, they do a lot to try to, to try to, you know, make sure either that, you know, in certain cases you wanna test warm cache behavior, that's fine, right? That's one type of behavior, but you don't wanna confuse the behavior of the system when the caches are all warm with the behavior of the system when the caches are cold, right? That's a problem, right? Because you could be like, hey, I changed this one line of code and suddenly the system is 50% faster. It's like, no, that's not what happened. Cache effects, right? Yeah, so just talked about this. And the other problem is, you know, we're talking about these incredibly complex systems, right? So they're almost never in the exactly same state that they were last time you measured. And like something is different, right? You know, the Wi-Fi connection has changed, there's more networking traffic. Now you have even more problems, right? Because, you know, at least in the past maybe we were thinking about benchmarking stuff that ran on a single machine. Now you've got all of these effects out in the network itself that are almost impossible to control for a potential, right? Like you ran the test at, you know, in the middle of the night when the network in your building was, you know, fairly quiet, or you ran it at 2 a.m. when, you know, Demetrios and his students are running some sort of like wireless experiment that's essentially basically crashing the entire wireless network in the building, right? It's like, I can't connect to anything, right? And so, yeah, I mean, there's, you know, doing this, doing repeatable experimentation is very, very hard, right? And then what's another problem with measurement, right? So I mean, I always say that we just talked about this, right, measurement affects the thing you're trying to measure, right? It can affect it because, again, the system is trying to adapt itself to the thing that you're doing, right? But what's another way that measurement would affect my system, especially when I'm doing performance measurement? What is measurement inherently going to introduce that I need to be aware of and compensate for? Aggrim, yeah, that's it. Okay, that's a form of this, yeah, sure. Let me pick out someone I haven't picked out today, Minesh. What's different about the system that I'm measuring versus the system that I care about? What does measurement cause? Yeah. Delays, faults, we're building up a broad category of, this causes overhead. It slows down, it inherently slows down the thing you're trying to measure, right? Depending on how you do the benchmark, it can slow it down a lot, right? Imagine, you know, so, and there's a couple, there could be a couple of consequences, right? So the first thing is that when you start to measure something, so you guys have probably had this happen to you already when you've been running your OS 161 curve, right? You've had some sort of race condition, right? And what have you tried to do to solve it, right? Because, you know, okay, it's assignment one and I don't really like GDP yet, so what do I do? How do I devote my system? Nothing, only. K printf, right? I stick in K printfs, right, in various places, try to figure out what's going on, like print the values of this variable, something weird is going on here, right? What does that K printf do? What does it cause the system to do repeatedly? It causes it in the block, it sits there waiting for the console device to print, and so you had this very tender, delicate race condition, right, that you were trying to isolate and the K printf made it go away, right? So maybe it was just leave those K printfs in there, you know, and they won't have the problem anymore. That's not really what happens, right? It just means that, you know, somehow you probably chased that race condition somewhere else where you're gonna find it later, but the point is that frequently when you start to, especially when you're talking about bugs, right, but this happens with performance as well. Instrumenting things in order to take measurements can affect the behavior of the system in ways that are detrimental to observing some problem, right? So this happens a lot on commercial systems, right? You have a production environment, somebody's running, you know, the latest version of your company's software, and they're having this problem, and then you, hey, hey, if you can convince them, right? So this goes down to here, right? If you can convince them to run your instrumented version that's 10% slower, they may not experience the problem ever again, right? Or it may just become so unlikely because of some, you know, change that your instrumentation code is making to the ordering of threads or the timing of certain things or whatever, right? So you can potentially destroy the problem you're trying to solve. And then you also need to separate the results, right? So the actual performance of the system from the noise that's produced by your measurements, right? So from any overhead that's caused by the measurement that you're doing, right? Because measurement code is running itself, right? Like it is going to run on the machine. It's going to consume cycles. It's going to cause context switches. It's going to cause swapping. It's going to need memory to sort state. It may access the disk periodically to write the results out. So again, I mean, it's essentially like you've taken your system and now you're running a competing process next to the one that you were trying to measure, right? And this is, yeah, so I mean, this gets worse, right? When you think about, you know, again, how central the operating system is to everything that's happening and also the fact that operating system can produce like an amazing amount of debugging help, right? Like, yeah, let's say, okay, you know, I've got some problem. I don't understand how my, you know, my page replacement algorithm seems to have a bug in it. And so I'm just going to dump every page that the system replaces as it runs, right? Like that system is not going to be usable anymore, right? And it's probably going to fill up your disk with debugging output within a few seconds, right? It's just like this stuff happens a lot, right? Real systems page quite a bit. And so if you're trying to debug certain things, it's possible that you won't even be able to see this stuff. All right, so, all right. So now maybe I've talked you out of this. So you know, you guys were going to say, okay, well, I've got this new idea. I'm going to implement some code in Linux, check it out. And you know, maybe I'll get a patch in a Linux. That'll be really cool. But now I've convinced you that you can't measure a real system, right? Like you're too scared to measure in a real system now. So what else can you do? What are other options here? I can't measure. You're still trying to, you still have a point to make. You still have a cool idea that you want to test out. What's, what are standard alternatives to benchmarking real systems? A couple of them. Dan. Yeah, there's, well, these are both a form of that, right? So again, I have this big complex, ugly, gross system, right? And instead of doing experiments on that, which is very, they're difficult to repeat. They require real hardware, right? What could I do instead? Yeah. I could build a simulator, right? Or I could build a model. So these are sort of two, two sort of classic approaches to trying to look at real system performance, right? So I can, you know, essentially build a model and models like when you know, you know, if you're trying to figure out whether you're dealing with a model or simulator, it's not that hard, right? But, you know, if you see equations, then you're probably dealing with a model, right? If you see, you know, an alternate piece of code that actually doesn't do anything useful, then, you know, that's a simulator, right? Usually simulators are their own application that encompasses the environment that you're trying to experiment with, right? And there have been, you know, if you go back, for example, people are familiar with RAID. RAID technology has some idea of what RAID is, right? So the original work on RAID used both of, you know, I definitely used models, right? So when people started proposing that you'd be able to combine multiple disks into one disk, right? They actually were able to sort of prove some of their points using analytical models, right? Like, here's how the system would work. And they had some, you know, equations in that paper that aren't too scary, right? But the idea is they, what they try to do is describe without building the real thing how the performance and the consistency of this device would work, right? And so that could be a very powerful, if you can prove things about your system analytically, then that can be quite, that can be quite powerful, right? Because that's something that, you know, then you just, then all you're worried about is trying to get the implementation right, right? Like, did I implement it correctly? And if you didn't implement it, like let's say you had a model that said I should definitely get a 25% speedup if I do this, right? And then you wouldn't implement it and you didn't get that 25% speedup. Well, you know, then either your implementation isn't very good, which is possible, right? Or there's some other effect that you haven't accounted for that's damaging your results, right? All right, so, you know, and again, the nice thing about models is you can potentially make these strong mathematical guarantees. The bad thing about models is usually in order to get to the point where you can actually say anything about a real system, you have to make all sorts of really, really bogus assumptions, right? And sometimes those assumptions, you know, those assumptions are usually unrealistic. Sometimes those assumptions end up causing your result to be wrong, right? Sometimes they don't. So as an example, there's been a lot of work looking at ad hoc networking, right? So how can I get a bunch of devices to communicate with each other? And there's been a fair amount of analytical work in this area, right? A lot of that analytical work makes what's called the disk communication model. It essentially says a node can communicate with nodes that are within some radius of it, right? And maybe with some probability, but let's just say it's, you know, if you're inside that radius, I can communicate with you and if you are outside of that radius, I cannot communicate with you, right? So the disk communication model is just totally bogus, right? The real world is way more complicated than that, right? There's all sorts of weirdness, of wireless signal propagation and stuff like that. The causes of that model are completely breakdown. Some of the results that people were able to show from that model are reasonable and some of those systems worked well. Some of them did, right? But that's an example of the kind of unrealistic assumption that you need to make frequently in order to prove things about something mathematical. The simulators, right? In the best case, you can build simulators where your, you know, essentially what simulators do is they trade off accuracy for speed. But if I'm good and if I design my simulator carefully, it does a good job of reflecting the behavior of the real system, it just runs way, way, way faster and it potentially runs repeatedly, right? So if I simulate, yeah. So for example, you could do this with one of your page replacement algorithms, right? You could write a little simulator of your BM system that, you know, dirty pages in different, with different patterns and then you could run your page replacement algorithm and see how well it worked, right? And you could use that to determine, you know, how, you know, what the overall page fault rate was that was achieved given different types of page replacement algorithm, right? And yeah, so in the worst case, if my simulator doesn't reflect reality or if there's problems with it, I have bugs in it that can potentially send me down all sorts of, all sorts of, you know, extremely bad paths. So Margot has a story about when she was a graduate student working on a simulator for a disk, for a disk system. I don't remember exactly what simulator this particular, what system the simulator was built for, but she realized she was looking at the code and she was getting some numbers that she didn't understand, right? And this is always a good, when you're using simulators, it's always a good hint, right? If you start to get, the first thing you do when you build a simulator is you run experiments that you think you should know what the outcome is and you hope that the simulator gives you results that are in line with your intuition. If it doesn't, then you need to do some debugging and she found some cases where it didn't so she started looking at the code and what she realized was in a certain case, the simulator was, I guess, simulating the buffer cache above the file system and then there was a certain case where the simulator would evict a dirty block from the buffer cache but not write it to disk. It just didn't simulate the write, right? So it would remove it and it would mark it as clean and somebody else could use it but it didn't simulate the overhead of actually doing the write to disk which is pretty substantial, right? So this was kind of a problem, right? This was a bug in the simulator, right? They didn't design it this way on purpose but the problem was that people had used this simulator to make points about file system design, right? And to present numbers that were now invalidated by this particular problem, right? So this is the danger when you start using simulators. If you don't get the simulator right then you get a bunch of bogus results and then a year or two later you end up having to raise your hand and be like, yeah, you know. I know that we said this one thing worked really well and it actually doesn't work as well as we thought. Maybe it doesn't work at all, right? So, all right. So now maybe I have some, you know, maybe between actually running experiments on real systems and using simulators and developing a model I have some foothold for getting started in terms of determining what to measure, right? But now we have to start thinking about what are metrics that I'm going to use to compare different parts of two systems, right? So for example, what would I use to compare two disk drives? Let's say I'm trying to do some disk drive benchmarking for like Tom's hardware or something, right? What would I care about? Read time, meaning what? Okay, so maybe the time to do a single read. What else, Jeremy? Yeah, so bandwidth potentially broken down into one and one, yeah. Okay, yeah, so I might break down my bandwidth in terms of large access and small accesses, right? But what in particular do I care about bandwidth? I care about the difference between one and one. Read and write, yeah. So, you know, read and write bandwidth would be something I could look at different, like Sherroth pointed out different disks might have different behavior between large writes and small writes and that might matter, right? Because if I'm looking for a disk that has really, really good write bandwidth and is particularly fast at doing small writes because I know that's my workload, then that's what I care about, right? So that's an example. What about schedule in algorithms? Remember, we talked about this a little bit when we talked about the Linux schedule, but what would I potentially, what metrics be that I would use to compare schedule in algorithms? Not good. How fair was that in, or? Yeah, but to find fairness, right? What do you mean by fair? Okay, yeah, so we need to come up with some metric of fairness, right? Jeremy. Yeah, so I need to, you know, when we talked about Khan-Kalivas and the rotating deadline staircase schedule or working in staircase deadline, something, we talked about an interactivity, but that's very difficult to quantify, right? So his, some of the work he did was actually quantifying, right? What's another potential metric I could use? Yeah. Throughput, right? Maybe, yeah, how much work is the system getting done over time? That might be an easier thing to measure, right? Okay, what about page replacement algorithm? Got two page replacement algorithms. What do I potentially care about? Bethany. The page size, well, okay, let's say I have a fixed page size, right? And what I'm looking at is, which page do I choose to evict when I need memory, agree? Yeah, so I could look at the number of times that when I need a page, it's in memory, right? It's in my cache, right? Amen. Right, so I care about how long these things take, right? This is true for my scheduling algorithms too, right? How fast does it run, right? If it takes a second to choose a page to move to disk, then I really don't care how good it is, right? Because it's just too slow, right? What's something else, Robert? Anybody, Jeremy? Yeah, what's the overall page fault rate that's generated by this particular algorithm on a heavily loaded system, right? How, how many, the whole point of, you know, to some degree a whole point of memory is to avoid going to disk. So how frequently do I have to go to disk when I use this algorithm? What about, what about file systems, right? Two different file system designs, right? I've got two mounted file systems, not what might I care about with if I'm trying to do performance measurement on file systems. What are some things that I would test in a file system benchmark? Yeah, so like path resolution, right? So how long does it take me to look up a path name, right, because that's a component of file system performance? What else? I don't know. Yeah, so I could look at read and write bandwidth, very similar to how I did with disks, right? What else might I care about here? Yeah, so some of this, you know, some of this works on file systems too, but what are file systems specific things? Yeah, yeah, so I might care how long does it take to recover from failures? Like if I can sort of force a failure onto the system and then remount the file system, how long does it take to clear out that failure? Yeah, so different types of allocation, right? So if you start to look at file system benchmarks, they'll do things like, you know, do a huge extend of a file, right? Like let's say I take a file that's a one megabyte and I extend it to one gigabyte, how long does that take, right? How long, and that comes down to, and then how fast does access to that file afterwards? So that comes down to things like how quickly can I allocate blocks, right? So, yeah. So I think this is a good stopping for today, right? You know, most of these, this is sort of this classic of different parts of the elephant, sort of a, sort of far side cartoon, I think. But most of these, part of this comes down to, you know, when we talked about all of these, what we talked about are things we could measure, right? What we'll talk about on Wednesday is what is the system actually doing during this measure, right, because that ends up mattering quite a bit, right? The, for example, the page fault rate that I achieve with the particular page replacement algorithm is not just a function of that algorithm, it's also a function of the workload that the system is actually experiencing, right? So on Wednesday we'll talk more about benchmarking and we'll finish our discussion of performance analysis.