 All right. Morning, everybody. Friday, Friday. Everybody ready for the weekend? Exciting weekend spent working on assignment three. All right, so my original plan was that I would finish up talking about performance today. And the next week we would talk about virtualization. But as I started writing these lectures about performance, I realized that there's actually a lot to talk about. So hopefully I'm not really boring people by going too slow. In terms of the material, which I think I described as a pantheon the other day, of fascinating and important information that you're learning in this class, I want to spend some time here because I think this is important stuff. I also think it's likely that you will be able to use this in other places. Yeah, it's a light crowd today. I feel like the room is more echoey. There's fewer warm bodies in here. All right, so today we're going to do lecture two on performance. And then on Monday, I'm actually going to do another lecture on performance where we're going to focus Monday on what do you actually do, like what are ways to improve the performance of your system. So on Wednesday, we talked a little bit about the challenges inherent in improving performance. Today, we're going to continue to talk about some of the pitfalls to performance improvement, some of the things that you might encounter and need to do along the way in order to sort of figure out where to spend your energy. I mean, you're a limited human being. You only have a certain number of hours from the day. And you want to work on the part of the problem that's going to produce the biggest improvement, the biggest bang for your buck. So we'll talk today about trying to figure out exactly what that is. Work on assignment three, not much else to talk about here. I'm going to send an email out today. I've been meeting with people in my office. And we've talked a little bit about the assignment. And I have some suggestions about ways to approach things in terms of what to implement first. And then we'll also give you some idea of how we're going to test your assignments, right? What programs you should be able to run where and kind of where are stopping points in the assignment that you can get to if you're in terms of finding good places where you might want to get your code to and then maybe tag it there so that just in case you don't get things completely working, you have something working to submit. So yeah, that'll be out today. All right, so any questions about Monday's material on, wait, today's Friday? Friday, right. Wednesday's material, or Monday's material, doesn't really matter, although I can't even remember what we talked about on Monday. Any questions about the stuff we talked about Wednesday? So Wednesday we started to talk about performance analysis, we talked about the difficulties inherent in measurement, and we started to get into a little bit of benchmarking and we're going to keep talking about benchmarking today. But any questions on the stuff we covered Wednesday? Wednesday, Wednesday. All right, so just a couple of questions. So who remembers? I mean, I put this up on the board and we thought this was easy. We could all just go home early at 9.30. What's difficult? Who can remind me about some of the challenges inherent in improving the performance of a real system or even just the performance of any piece of code? I'm gonna start over in the, let's see here. I don't think I knew guys enough. Anybody remember, anyone who was here Wednesday remember what was difficult about improving the performance of a real system? Anybody who's got the slides from last time up on their laptop? And you guess this from here, Alex? I mean, there actually were quite a few things, right? So there should be a lot to pick from here. You don't know. All right, anybody from this side of the room? Left side of the room. Represent nothing, you guys got nothing, right? Ooh, okay. All right, right side of the room. Anybody want to show these guys up? Yeah. Right, so experiments aren't necessarily repeatable, right? That can be frustrating. You run an experiment, you get a result, you start working on a piece of code and then you go back to run the experiment again and it sends you in a totally different direction, right? So the difficulty in experiment repeatability, okay. I'm gonna, now that the right side has contributed left side, I'm gonna give you guys another chance. Anybody over here wanna throw something into the hat? Something that's difficult about performance analysis and improvement. All right, I'm going back to the well, right side. I feel like this side of the room's a little bit more away today, okay? Anybody else, any other contributions? Timing, measuring time, right? We talked just something as simple as measuring the amount of time that has passed on a system, right, can be itself challenging. Now these guys are really, really giving it to you today. Yeah, Scott. Right, right, right. That's actually kind of what we're gonna talk about today. Finding ways to effect the results, right? Like finding out which pieces of code are actually contributing to the problem and then on Monday we'll talk about some, ah, right, right, yes. The systems measurement uncertainty principle, right? Measuring the system itself, it can create two problems, right? It can either make the system so slow that it's unusable or it just might destroy some sort of subtle effect that you're trying to understand, right? So yeah, okay, great. So yeah, so this was our basic blueprint of how to do things, but we talked about again some of the things that are so hard here, right? How do you measure the system? How do you even measure time, right? And then today we'll talk a little bit about what should the system be doing when you measure it, right? How do you develop good benchmarks that allow you, that, you know, a good benchmark means that if you improve the performance of the benchmark, you're doing something useful. A bad benchmark means that if you improve the performance of the benchmark, nobody will care because nobody will actually notice in real life, right? When they're real at using your system, all right? We'll talk just a little bit today. I mean, this is not a course on probability and statistics but I think that there's some general level of statistical hygiene that's important for computer programmers to have. How many people have taken a course on statistics? Is that required for the major here? To be a computer scientist and engineer at UB must one take a statistics class. Does anybody know? What's that? Oh, whoops, the wrong one. Statistics would have been more useful. Okay, no, no, probability is close, right? But statistics would be more helpful. At least I think so, right? So unfortunately computer systems people are kind of like mentoriously allergic to statistics and we'll talk a little bit today about some of the little minimal amount of statistical hygiene that might help you out when you're trying to understand and do things. And then yeah, so determining why I capitalized a sentence the way I did, but determining what to improve and then finding techniques for ways to improve it. So today we're gonna talk about the kind of what to improve, how do we choose the parts of the code that we're gonna spend our time on and then on Monday we'll go over some sort of classic techniques, different sort of tried and true methods of improving the performance of a piece of code, right? You sit down with a piece of code, it's not performing well, what do you do, right? So that's kind of like that's the last question you have to ask and in some ways one of the more fun ones, all right? Okay, so remember we talked a little bit about the fact that because we started and we got into this whole litany of complaints and problems and challenges about experimenting on real life systems, right? Either real life systems that are in active use, which is very difficult to do, right? Because people don't want you to even because you're gonna slow down the system or even real life systems that you're trying to run real benchmarks on, like a real computer, real piece of hardware. And so we came up with these two other approaches, right? One was modeling, the other was simulation. Who remembers some of the pros and cons of modeling versus simulation, right? Then, right, so simulations will produce repeatable results and that's good, right? That can be in contrast to real systems that are very, very difficult to produce repeatable results on, simply because they're almost never in the same state twice, right? With the simulator, again, you're scraping some complexity off the system in hopes that this will allow you to understand it better and make the necessary improvements. What about models? What's something good about a model? Well, okay, so strong, I would say strong mathematical conclusions, right? Like the conclusions that you get out of a model can be very strong and frequently are provable conclusions, right? They're hard conclusions. What's the problem with models? Unrealistic assumptions, right? Normally, to get the system into a state where we can model it and we can say things about it with mathematical certainty means that we have to make so many unrealistic assumptions that the system that we're modeling, the system that we're using these analytic techniques to try to understand, bears potentially little to no resemblance to the system that we're actually trying to improve in the first place, so that's kind of a challenge, right? What's one of the downfalls about simulations? The simulator, right? I've got to write a simulator, right? Now I've introduced this whole extra piece of code that has to be correct, right? It was hard enough before where I had code that had to be correct. The code that actually had to work that was running on my real system. Now I've essentially asked the person who's trying to improve the performance of something, hey, by the way, go out and write a bunch of other code that also has to be correct and then use that code to experiment with the code that you were trying to work on in the first place, right? So yeah, simulators can have bugs, right? So models, we can make strong mathematical guarantees after making a bunch of unrealistic assumptions, right? Simulations, in the best case, you know, the simulator is simpler and faster to work on than the real system and in the worst case, the simulator, some of the, I mean, there's two things that can go wrong with simulators. Well, there's many things that can go wrong with simulators, right? One is that the assumptions that we make or the complexity that we remove and design in the simulator renders the simulator results meaningless. The other problem is that the simulator itself can have problems, right? All right, any other questions on Wednesday's material before we go bravely onward? Questions about this sort of stuff. You guys feel ready for the first step and a half of performance analysis in Belgium? Maybe we should add a performance assignment as part of assignment three. You know, you guys have plenty of time. Let me just add a little extra. There seems to be no, yeah, okay. I mean, nobody seems opposed. All right, probably not. What's that? He's just in shock. You guys are like, he's like Stockholm syndrome at this point, you know? All right, so what, and okay, so this is a little bit of a review from last time too, right? We talked a little bit about, you know, what does it mean, right? So now we're getting to the point where we figured out how to measure, we've decided which approach we're gonna take modeling simulation or, you know, trying to work on a real system. Now the question is, what do we, like what code do we actually have running in the first place, right? What inputs do we use into the system? And, you know, and what metrics do we use as comparison metrics, right? So the first part is, you know, what do we use to run on the system? The second part is, what do we use to evaluate the outputs, right? And we talked about how difficult it can be to compare things, right? I mean, what does it mean to compare to disk drives, right? Or two different scheduling algorithms or I wish I could have eyes in the back of my head so I could remember what was on the slide without looking, two page replacement algorithms, right? Or two file systems. I mean, these are big complicated things that have many different performance outputs, right? And so finding a way to compare these is really difficult, right? And the other thing we're talking about today is frequently, you know, you have to be careful that in improving the performance of one part of the system, you don't make changes to other parts inadvertently that destroy those performance gains. We talked, I mean, you guys might remember this a little bit, right? So what system have we already discussed that potentially had this problem? A system that made a really dramatic improvement to a certain thing that the system did but potentially had this, like, all this dirt that was brushed under the rug in order to do this. Who remembers what we've already talked about that had this potential problem? What's that? Log-structured file systems, right? So remember when we talked about log-structured file systems it was like revelation, right? We could do all the rights of the disk in the same place. Woo-hoo! And then it was like, oh crap, we gotta clean the disk and it's, oh, okay. So again, I mean, sometimes you feel like you're working with the proverbial tablecloth that's too small, right? No matter how hard you yank it into one corner, it pops up somewhere else, right? You just need to be careful when you're improving the performance of your system that you're not doing things that are gonna jeopardize the performance of every other part of this, right? All right, so, and yeah, so I put this up just because, you know, I like to have a picture in every set of slides. All right, so let's talk about benchmarks, right? Usually we can break down benchmarks, right? So first of all, let's just define the term, right? What is a benchmark? What is a benchmark? Anybody? What's that? A standard? Okay, there is something standard about it. I will take the word standard from your answer and I will use it. You contribute it to the right answer. What else? Benchmarks are standard what? What's that? Okay, so it's something that we do use as a point of comparison, but why, right? So it's a standard point of comparison, right? If we do the results carefully, they are comparable and we would hope that they would be objective, right? But why use benchmarks, right? What's the alternative, right? So imagine I've got the file systems community, right? And without benchmarks, what do people end up doing? Yeah, they end up writing their own test. Or they might use some sort of application benchmark, right? But you can't compare things, right? Like, you know, you had some sort of performance problem that led you to implement a new schedule in algorithm and you ran it on your workload and it worked great. You know? And I don't know, Keith had a different performance problem. He wrote a completely different schedule in algorithm and it's so great on his workload, right? But I don't know, you know, I need to pick one of yours and I don't know which one is quote, unquote, better, right? So the ideal benchmark gives us a way to compare apples and apples, right? When you talk about file system benchmarks, the idea is benchmarks are supposed to represent some sort of degree of representative requirements, right? You know, what do most applications file system usage patterns look like, right? Because if I'm trying to evaluate file systems, I want to pick the file system that works best for a specific set of applications. Now, what's the problem with benchmarks for this very same reason? What's the challenge with developing a good benchmark? You need a benchmark for your benchmark, right? So different benchmarks might succeed or fail, but kind of like, what's the existential challenge of benchmarks, right? I mean, on some level, as benchmarks, you know, I would love a benchmark that allowed me to compare every different aspect of file system performance, right? Or let's put it this way, I would love a benchmark that reflected every possible application that could ever use the system, right? But what's the problem with that benchmark? It's almost impossible to cover everything one, right? What's true about applications and processes that use computer operating systems? They're different, right? They have different requirements. And so, you know, it's kind of like, you're like, okay, I had this great idea. I'm gonna come up with a benchmark that allows me to compare any two file systems, right? And then you start looking at different applications. You say, I'm gonna use that application, that application is this. And pretty soon you have something that's so general, that it's meaningless, right? It doesn't really reflect any individual application, right? But it reflects some sort of, like, it's like, well, you know, all the applications do reads, right? Or something, right? So there's some, you know, tension here in benchmarks between generality. Because generality allows benchmarks to be powerful, allows you to say, this, if I do well on this benchmark, it means that I'm gonna prove that lots of different applications performance, right? But as benchmarks get more general, they also stop being reflective of the applications that are contained in them, right? Because the database application is on some level very, very, very different than a gaming application, right? In terms of how it uses the file system. Trying to come up with a benchmark that, you know, improves the performance of both is very difficult, right? All right, but let's talk about these sort of different categories of benchmarks, right? So who knows what a, who knows or can guess what a, what is a micro benchmark? Ben? Right, so micro benchmarks try to isolate, you know, one aspect of system performance, right? Very, very, very specific, but you know, you're trying to look at one, one, one very, very tiny little thing, right? And usually, why are you usually using a micro benchmark? Why would you be trying to observe or isolate the performance of one specific part of the system? Checking for bottlenecks, but what is more likely, what is more likely to be the thing that you are doing at this time, that you are using this micro benchmark? You're working on that part of the system, right? So you wanna be able to isolate your changes from other noise that's affecting other parts of the system, right? Okay, excuse me. Similar vein, macro benchmarks, right? So what's a macro benchmark? Micro benchmark, isolate one specific part of the system, macro benchmark. It tries, yeah, I mean it tries to look at holistic system performance, right? Tries to see, you know, how does the whole system as an entire blob, you know, perform, right? Given some sort of general workflow, right? Okay, now, let's find some sort of uneasy middle ground here. Application benchmark, right? What is an application benchmark? Or an application, or a class of application that you talk about, you know, benchmarks that represent classes of applications. So macro benchmark is trying to like say, you know, oh, okay, over all different types of workloads, this is how the system would respond. Micro benchmark is trying to isolate one specific aspect of system performance. So an application benchmark is doing what? What's the most direct way to measure the improvement provided to an application by a change you're making to the system? What's the most obvious thing to do? Run the application, right? I'm trying to improve the performance of the file server, right? Or I'm sorry, I'm trying to improve the performance of a web server, right? This is my goal, I'm trying to make changes to the file system specifically to better support web servers, right? So run a benchmark or run a specific application that is that type of application, right? Run a web server on the system and measure its performance, right? So you have these application benchmarks that can represent specific classes of applications, right? And, you know, how well would a typical database workload, right? Database is a class of application, right? But you could also just run, you know, a standard version of a particular database with a given workload and look at how well it performs, right? So this is one sort of taxonomy of benchmarks. So let's talk about examples. Let's try to come up with a few examples of each one, right? And let's say, let's do this in the context of we're trying to improve the performance of our virtual memory system, right? So, you know, on Monday or Tuesday when you guys get done with assignment three, you realize your system's kind of slow and you decide that you wanna make some changes and improve it, right? So what are some micro benchmarks that we could develop to focus on specific aspects of the performance of our virtual memory system? Give me one example. What's one sort of atomic unit of operation or specific thing that the virtual memory system might do or a specific part of the virtual memory system we might wanna isolate and be able to? Okay, what about me? Paging is a little broad. So what about page? Page eviction. Page eviction, what, okay. See, I'm gonna, you know, what about page eviction, right? So, okay, so I could measure how long it takes to swap a page to disk, right? I would argue that's not really a VM benchmark though, right? That's more dependent on file, on disk performance, right? But let's think, I'm thinking about page eviction. What are parts of that that are specific to the virtual memory system? What does the virtual memory system have to do before it pages something to disk? It has to decide which one to evict. So what could I benchmark about that? How long it takes to decide, right? Okay, so my first one here that I actually put up was time to handle a single page fault, right? Now this is kind of funny because this, you might argue is not even really micro enough, right? Because this involves lots of other parts of this system potentially. And as we'll talk about a little bit, this, you know, what's the difficult thing about benchmarking sort of overall page fault handling performance, right? If I just ran the system and I measured how long it took to handle each page fault and I computed an average, right? What's the problem with that? Well, okay, we might defend on the performance of other things, but what's another problem? Right, there's so many different things that can happen here, right? There's a fast path where the page is in memory. There's a slightly slower path where I have to load a translation into the TLB. There's an even slower path where I have to evict something. You know, I might have to zero fill pages. So I've started to lump so many different cases together that this isn't really a micro benchmarking, right? So that's it. So time to look up page in the page table. Here's another example of a micro benchmark, right? I say to your address-based functions, you define this page, you know, what's the physical address for this page? How long does it take to figure that out, right? Depending on how you implement your page tables, this will vary, right? And then how long does it take to choose a page to evict, right? So whatever your page replacement algorithm is doing, you know, however smart it is and all these different AI techniques that I'm sure you guys are gonna apply to this problem, you know, how long does it take between the point when you figure out that you need to evict a page and you select a page for eviction, right? How long does this take? And of course, this is a function of how you do it, right? So here are some example micro benchmarks. What about macro benchmarks? What about macro benchmarks for your VM? Any guesses, any ideas? Yeah, John. You could do the conversation. Okay, yeah, so I would argue that might even be more of a micro benchmark, but you're right, you're starting to combine multiple things in there, right? So there's starting to be, yeah. So, you know, context switch overhead, right? So that's one, like, maybe, media benchmark. Yeah. How long does it take to start at the kernel? That's not a bad thing. That requires a lot of page faults. Okay, what about, can we recycle something from the list we already have up here? What about, you know, again, just aggregate time to handle page faults, right? So I take a system, I run some sort of, you know, heavy benchmark that pages heavily and then I just look at the average time that it takes or the page fault throughput, right? How many faults per second can I handle, right? And then, you know, it got page fault rate, right? So the page fault rate is an interesting indication of how effective the system is at handling page faults, right? The page fault rate can go up for a variety of reasons, right? One of the reasons might be that I'm doing a, you know, a better job of keeping pages in core. I'm doing a better job of selecting pages to move to swap. And so I'm handling more faults because the fault handling path is faster and the throughput of the system is improving, right? The page fault rate could also go up because I'm doing a bad job, right? And I'm evicting things in the TOB that should be evicted, right? So this is why macro benchmarks start to be complex because the fluctuation of them is not necessarily clear, right? All right, and then application level benchmarks. So, right, you guys have some of these, right? I can't claim that these are really like super realistic, useful applications, right? Like triple sort, right? I'm gonna sort three large arrays of numbers, right? I mean, I guess that's a benchmark, right? I don't know why you would do three all at once, but anyway. So we've given you actually some application level benchmarks and these can be used. And you can use these, right? If you guys want to experiment with your system, you can run these several times and you can look at the results, right? All right, so what about challenges using these various types of benchmarks, right? So what would be, and I think people have already kind of hitched it at some of these, what about the problems with micro benchmarks? What's one problem with a micro benchmark? Yeah, sorry, go ahead. Yeah, so there is some overlap between macro benchmarks and application benchmarks, right? The difference is that macro benchmarks are supposed to reflect some higher level component of system behavior that you can argue would be potentially important to all applications, right? Where application benchmarks are allowed to zero in on system performance, you know, components of system performance that are specifically important to a particular application, right? This is why you have these classes, you know, for example, database workloads, right? Because databases use operating systems in many ways that are kind of unusual. Actually, it's been kind of a running complaint by the database community for years and years that operating systems don't do a good job of supporting database workloads because databases do kind of weird things, right? Like databases will, you know, because of how databases store information, they really kind of almost want raw access to the disk, right, they don't want a file system in the way, right? So even just providing them a file abstraction, they start to get annoyed about, right? Because they're like, you know what? You know, I'm like, imagine if you have your own data structure that you're laying out in a file, right? Well, now that file is gonna be in different parts on disk and what you really wanted is just give me a partition, right? Just give me a chunk of the disk that I can lay out things myself. I know better than you do, right? So this has been a source of kind of research and continued conversation between the database and operating system communities, right? Yeah, there is overlap between those two classes of benchmarks, definitely true. All right, you guys have time to think. Problems with microbenchmarks. Kind of obvious problem with the microbenchmark, right? I've got a microbenchmarked, I've zeroed in on some very, very specific part of the system, I've doubled its performance, and what? Right, so as I think you guys basically said the same thing, which is that I don't know what I did to the rest of the system. Actually, there's two problems with this. We're gonna get to these later. The first is, I don't know what I did to the rest of the system. The second is, who cares? Maybe that part of the system doesn't even matter, right? So yeah, this is my way. You may not be studying the right thing and the thing that you're improving may actually have deleterious effects on other parts of the system. So you may be, your small performance improvement to something that doesn't matter may cause a significant performance degradation to something that does matter. So that's kind of the worst possible case, right? Macrobenchmarks, you're kind of talking about the opposite side of problems, right? So you're at such a high level that it's not clear exactly what fluctuations are caused by and like we talked about page fault rate, right? Where page fault rate can go up, the page fault throughput can go up and down for a variety of reasons. Just, right, what actually happened? Please put me on the spot here. So it's probably gonna be difficult to come up with a convincing example. But I'm trying to think about something in the context of VM, too, because that would be easiest, right? Well, okay, I mean, here's an example, right? It's possible that you, you know, it's possible that you're trying to improve the page eviction, like the speed at which you can find a page to evict, right? So you've identified locating a page to swap to disk as a bottleneck in your system and you're trying to improve that, right? And by doing so, you come up with that, you use a different algorithm that potentially evicts pages in a different order, right? And it turns out that the ordering that you choose is poor, right? And so what happens is you're moving the wrong pages to disk, right? And the overhead now of doing IO back and forth because you're evicting the wrong pages overwhelms the small improvement that you made, right? So there's a case where there's a balance between two components of the system, right? You know, the right thing to do, if I wanted to improve overall performance, might be to write a smarter page eviction algorithm that takes longer, right? So, you know, we talked about this a little bit, we talked about VM, it's possible that the extra, you know, thousand, 10,000 cycles that I spend on the page eviction path, if that prevents me from doing one additional IO, it's worth, right? So yeah, so that's a case. And yeah, that's a case where you would probably be conscious of that, because if you weren't an idiot, you might think, oh, okay, well, you know, I'm fundamentally altering an algorithm on the system that's going to affect other things, right? But if you get too myopic, right? If you get too focused on that one thing and you're not running, you know, higher level benchmarks, you might miss the fact that, oh yeah, by the way, you know, the system is significantly slower now because of the improvement that you made, right? So we'll talk about that in more in a sec, all right? So yeah, so macro benchmarks, there's so much going on that it can be difficult to actually come to real specific conclusions about the results. And so what a lot of analyses do is they start with micro benchmarks. They use micro benchmarks to draw conclusions about some of the changes they've made, and then they use macro benchmarks to kind of prove to you that they haven't, you know, totally destroyed the overall performance of the system as a result, right? And then hopefully they've made some improvements, right? So it's kind of the application, the macro benchmark is I've improved the system. The micro benchmark is to prove to you that the changes that I made to a specific component of the system are actually the source of the improvement and not some other thing, right? Or some other weird interaction between parts of this, right? And then application benchmarks always have this kind of a provinciality to them, right? I mean, that's just your application, right? And who cares, right? So, you know, if I'm, you know, trying to improve web server performance and I make all these changes to the file system and the web server performance goes up, you know, by a factor of two or three, that's great, right? But if, you know, 90, if most of the users that are logged into your system are trying to read email and the performance of the email clients just went down, then, you know, they may not care that the pages are loading somewhat faster from your web server. What they probably care about is that, you know, pine is really slow, right? So this is an important issue, right? All right, and I think, you know, it's worth pointing out a little bit of the psychology of how benchmarks work out in practice too, right? One of the biggest problems with benchmarks, I mean, or anticipate a problem with benchmark, right? You are, you're working on improving the performance of your system and, you know, you've figured out how to measure things, you've chosen a benchmark, you run some tests, the benchmark improves and, you know, you're ready to brag about your results, right? So what's the, you know, and I'm not trying to accuse you of anything, but what is it possible that has happened along this path? What do people do when they select benchmarks or what people, what might people do? Yeah, John. Right, now I mean, no one would ever do that, right? And they don't necessarily have to be like hacks, right? They might, like the problem is you get too wetted to your own benchmark. The first thing is you pick the benchmark, right? So that's kind of like, for example, you know, that would be like me saying, hey guys, how many people would like to have a final where you write the exam, right? And then actually maybe that's what we'll do. Maybe I'll have you guys submit exam questions and then I'll create an exam from the questions you submit it, right? That will give people an incentive to submit questions too because if you submit a question then you should probably know the answer. Yeah, that's just an idea. Anyway, but if I asked you to write, like I think we can do this in a crowdsourced way and actually have it be cool. If I actually asked you to write your own exam, right? And then have you come in and take it, that would probably introduce some sort of problem. I would hope you guys would do well. But don't write an exam that you can't get 100 on, please. But yeah, so I mean it's kind of like people are, and you know, I mean you have to, people who have good ethics and good scientific practice try to do the right thing, but there's actually, there's a lot of evidence and I wish I'd looked this up. From earlier scientific eras that this is always a problem, right? And it's not the function of just people who are trying to pass off bad results. It's just a function of people who get so embedded in a problem, right? And so determined to improve things that they start to see things in their data and they start to choose experimental methods that are really tailored to what they're doing in ways that render their conclusions very, very difficult to reproduce, yeah. Yeah, of course, right? I mean no, the answer's not hard, right? I mean running a bunch of benchmarks is usually a good idea, right? But again, I mean you'd be surprised at how many papers get published because running benchmarks takes a lot of time and it's also equally frustrating when you run another benchmark and the system doesn't improve. Like, oh crap. Maybe I won't use that benchmark in my paper. So, I mean I remember reading just recently actually that it's something like, I don't know, there's this shocking number of studies in psychology and medicine that produce really exciting results the first time they're performed. But every time they're reproduced the results diminish, right? And actually there's people who are studying this in the medical community because they don't know why this is. It's something very strange. But there's all these different claims that people have made that looked fantastic. The first time they were published and that is people have tried to reproduce them. They get harder and harder to reproduce, right? So again, it's not necessarily people with bad intentions. Sometimes it's just people who are, it's just some feature of how this happens, right? All right, so yeah. Again, people choose benchmarks that try to justify the changes they wanna make and also people choose benchmarks and then work on the system to the point where their work becomes coupled to that benchmark, right? Starts to lose a little bit of its more in reality, right? And again, I just wanna point out that this isn't just like human weakness, right? So what can we say? I mean, this is interesting because if we get to virtualization, which I hope we will, we'll talk a little bit about virtualization as a solution to this problem, right? But operating systems have this tough job, right? They're trying to support a lot of different applications, right? So what is, if you're an operating system designer, what's one way to make, and you've got a client, let's say you're, I don't know, you're Microsoft or something, and your big client is Oracle, right? And Oracle's databases, right? Like they are running a specific application on your system and that's what they care about, right? So what is one way to make them happy, right? What is one way to make your operating system better from the perspective of Oracle? To what? Right, so okay, so my maxim here is that the fastest system is a single purpose system, right? And to the degree that people continue to work on single applications, again, like the best way to, or one way to improve performance of the system is just to tailor it more and more and more to a specific application, right? Unfortunately, at least for a large class of what operating systems are still used for. For a lot of cases, this is okay now, right? Because a lot of times, operating systems are running environments where they only support one application, right? That's kind of an interesting modern twist on the general purpose operating system. But for the types of things you guys do, right? I mean, you probably, you might, you may or may not get really upset if the next Android phone that you got was incredibly tuned for Angry Birds, right? And that slowed down everything else on the system. Like mail was slow, the web was slow, Angry Birds was super fast, right? So, or you might get equally upset the web was really fast and Angry Birds was slow, right? Whatever, I don't, Angry Birds is kind of my catch-all thing to pound out. I don't really use Angry Birds or play it. All right, so, yeah, so we're going even slower than I want to, which is fine. So I think that, so let's talk a little bit, you know, we've spent a lot of time for the first two lectures on performance talking about what not to do, right? What not to do? You know, all these mistakes that you can make. So I want to sort of finish today by talking more about what to do, right? What are good techniques to follow, right? So the first thing that's kind of critical is having a goal, right? Having a specific goal, right? I mean, usually, that's the problem with performance. Even vaguely defined performance seems like something that's worth doing, right? I'm just gonna make my system faster, right? That actually seems like a worthwhile goal, right? You don't have to be that much more specific. If you actually want to make your system faster, however, it is really helpful to be more specific, right? So it's helpful to say, you know, I want to improve this particular piece of code. And I know we're coming to it very slowly, but we are gonna talk about how to choose what to improve, right? But the more specific you can make your goal, the more likely it is that you can actually improve things, right? And improve things for real, right? Not just happen to run the benchmark on a set of warm caches and have it look faster and then go home, right? And the reason is, you know, the more specific you can make your goal, the more likely it is that you can identify changes that you're making to the system, right? If it's just faster, right? And you make some changes to the code, then the change, you know, I mean, here's one way of thinking about it, right? Any, you know, any changes you make to your code base are likely to affect the performance of the system. Unfortunately, it's like flipping a coin, right? It might get faster or it might get slower, right? You know, absent of any sort of analytic techniques. So one way of improving performance without paying any attention to this is simply to make a change, run the system. If it's faster, commit that change, right? If it's not faster, revert that change and try making another change, right? And in theory, you could do this for long periods of time, right? I don't think that that would work very well over a long horizon, but if anyone wants to try that as a research project, I would love to have somebody experiment with that. It'd be kind of interesting. But if you don't want to do that sort of like, you know, a random sort of monkey approach driven, again, you could just imagine monkeys doing this, right? Make change to code, commit. If code works, evaluate performance. If code is faster, you know, continue. So if you don't want to do that, then being specific helps, right? So when you're using models and simulators, and you guys will probably use models and simulators, validating them before you start is really important, right? And especially with simulators, this can be really frustrating because when you write a simulator, you spend time writing the simulator, you know? Like, you wanted to improve the performance of the system and now you ended up writing code, extra code, that you didn't want to write in the first place, that's never going to ship with the product, that's not a part of the system that no one is ever going to know about. And then when you're done with that, rather than immediately getting down to cranking out results, someone, you know, some jerk like me is going to tell you, did you validate the simulator? And you'll be like, of course I did, I wrote it. Right, you know? Like, I understand the code, of course it works, right? So before you make changes, making sure that the simulated behavior matches the real behavior of the system is really critical. It's not something that people do very often, but it's really important, right? Because otherwise, you've created a completely different thing, right? And changes that you make that improve the performance of the simulator, what evidence do you have that they're going to improve the performance of the real system, right? If the simulated performance doesn't match up before you start making changes, then the simulator is really not that helpful, right? And you can do the same thing with models, right? And particularly, I mean, simple models, you know, should produce results that match your intuition, right? Okay, right? So, you know, another important thing to do is to pick the right techniques, right? And there's no real, I'm not sure there's a science to this, right? I mean, this is kind of like part of the art of performance improvement, you know? When do you use modeling? When do you come up with analytic techniques? When do you write a simulator? When do you try to do experiments on real code? When do you get a benchmark, right? And use that? When do you use the workload that the customer was running, right? I mean, these are the things that are, you know, over time, as systems engineers and as programmers, you guys will start to get a feeling for, right? But you should at least know that these techniques are out there and think about them when you're approaching a problem, right? Think about it. You know, should I model this technique, right? Is this system one that I can model, right? Maybe that's a powerful approach, if you can do it, right? Does a simulator already exist for this system? Can I use it to evaluate this technique? And, you know, I mean, and on some level, I'm not totally sure that I believe what I wrote in my own slides here, but, you know, but there's some, you know, there might be some hierarchy of these things, right? For example, if you can't convince yourself analytically that a new algorithm is gonna improve the performance of your system, then don't bother simulating it, right? And if it doesn't work in the simulator, don't implement it on the real system, right? I mean, I think sometimes people, including myself, get so wedded to an idea that they are determined to implement it on a real system and try it out because maybe it'll work, right? But, you know, what people have found out over time is that, you know, things that don't, things that don't work repeatedly usually don't work in your real system either, right? So it's just wasted effort. All right, so I think this is a good stopping point. Does anybody have any questions about this stuff? I'm going more slowly than I thought I would, but I'm going as fast as I want to, so. Are people bored by this? Anybody? It's okay, I'm not, and we'll take it personally. Am I going too slow? I feel like I'm going too slow. Anybody want to claim I'm going too slow? Other than me. Okay, what's that? You think I'm going a little slow? Okay, that's cool, that's cool, I'll speed up on Monday. All right, so Monday, Monday we will get through the remainder of this material and we will talk about a few hints of ways to make your system faster, all right? Good luck with assignment three and I will see you on Monday. Have a great weekend.