 Today, we are going to talk about performance and benchmarking. And we'll get through what we can today. This might spill over a little bit to Monday. But I've assigned papers for Monday and Wednesday. I think this is a pretty important topic and something that you guys don't necessarily get exposed to in other parts of the curriculum. It's related to operating systems, of course. But of all the things that we talk about this semester, it's probably one of the things that's more broadly applicable to other things that you guys are going to work on. Because in general, people don't necessarily get excited about hiring programmers or developers who write slow code. That's just not something that companies are usually looking for. When you look at their job descriptions, they usually don't say looking for a programmer who can write very slow code. That's not something that people want. OK, so sorry about my, I don't know why I can't count suddenly, but so you have three weeks from today to complete assignment three. Three weeks plus a few hours. So it's on a huge amount of time. People are working on it. Somebody is very close to maxing out assignment three. Maybe since I've been kind of hoping that someone's going to hold a patch that they know works to write before class and submit it and then just come to class and wave their arms, like no, it's 100 out of 100, it's me. Anyway, but this is a few hours old. So good to work. OK, so OS performance. Why does this matter? Why does OS performance matter? Why are we talking about performance in a class on operating systems? Yeah, yeah, yeah. So the operating system, remember, is involved in a lot of stuff. And Isaac has said it perfectly. A slow operating system makes a slow, doth a slow computer make. And people also prefer, I would argue, people prefer a fast and mostly correct operating system to a slow but broken operating system. Now the assignment three auto grader requires that your operating system both be reasonably not slow and correct. But most people would perform this. So one of the things I ask people to think about is if your computer, let's say that your computer rebooted instantly and not just rebooted, it somehow restarted and recovered all of the state that you, for the programs that you were using, would you care if something had gone wrong? You might not even notice. Imagine there was just a little flicker and then your screen popped up again with all the apps that you were just using. Who cares? Something went wrong. It doesn't really matter. So this isn't quite true because as I mentioned before, there's been some more recent work on provably correct micro kernels. But some of the earliest computer systems, particularly built by people like Dijkstra, who obviously had sort of a theoretical bent, had this as a goal. So he claimed in the THE operating system, which was designed back in the 50s or 60s, we have found it is possible to design a refined multi, I like that, a refined multi-programming system because I don't want an unrefined multi-program. In such a way that logical sound is can be proved a priori. And its implementation admits exhaustive testing. The only errors that showed up during testing were trivial coding errors, trivial. Occurring with the density of only one per 500 instructions, each of them located within 10 minutes. Classical inspection at the machine and each of them correspondingly easy to remedy. So and this operating system was quite slow because everything was slow back. So now I could hopefully convince you that we care about performance. But so OS performance isn't necessarily that different though than the performance of any other process, any other application. I mean you remember the operating system is just an application like any other application, it's a little bit special. So we can apply the general principles of performance analysis to this application, right? So what do we do? Okay, so we start out, we measure the system. Measure something about the system. We analyze our results at step two. Step three is we improve the slow parts. Then in step four, we drink our celebratory beer. And then if there's more work to do, we just return to the top. At some point I would suggest that you pause after step four and maybe go to bed or relax a little bit because the celebratory beer may reduce your ability to continue the rest of the process. If you return to the top a little bit, okay. Okay, so that's it. So on Wednesday we're gonna talk about the paper on... Actually, so today's Friday, right? So Monday we'll talk about the Butler-Lampson Hints paper and then on Wednesday we're going to talk about the kernel, right? Is that really it? Last year I got somebody to get up and start leaving. Scott Florentino started packing up this stuff. He's like, all right, 10 minute class, no problem. No, it's not quite that simple, right? You guys may wish, of course, it's not summery outside anyway, so who cares, right? If it was 70 degrees outside maybe I would stop there, but it's not, so let's just keep talking. Okay, so, but this is not as easy as I made it sound, right? What about the system I'm going to measure and how am I going to perform those measurements? The analysis is potentially gonna involve math. I don't like math. I became a computer scientist so I didn't have to do math. That was my plan. It works out very well, but particularly when you do computer systems. How am I going to improve the slow parts of my system? What am I supposed to do here? This part I'm pretty good at actually, I can offer some advice on these questions. And okay, so let's talk about what's hard about the process that I've described to performance improvement. So the first thing is just measuring stuff. So you would think that it would be easy, potentially to measure time. All of us have some conception of time and obviously time is something that's going to be intimately involved in the process of performance analysis. Unfortunately, hardware can frequently limit our ability to measure time, particularly on the time scales needed to do accurate performance benchmarking. So frequently you might find, for example, that when you try to measure some primitive in your operating system or in any program, using the hardware timers that are available to you, you discover that it takes zero time because it's so fast that it occurs within a single tick of this slow time. So that's a problem. And the problem with low level time, so frequently you can find some sort of clock on the machine that ticks quickly, but that may be very, very difficult to access or it may be very machine specific. So you remember one of the things that the exo kernels we're trying to do is actually export some of this information to applications. Applications frequently do not necessarily have access to really good timing information on real systems. And then the other problem you have with timers is you can do these messy problems with overflow where you have a counter, maybe it's a 32 bit counter and maybe it starts rolling every few minutes and that gets very up. So there are issues here. The other problem, and this I think is a much, much deeper issue with benchmarking and measuring operating systems is that it's very difficult to do repeatable measurements. So for example, and part of this is for this reason, right? So remember, you're trying to measure something that's happening right now, but the system is doing what? You're trying to measure the present, but the system is using the past to predict the future. So the system is adapting to your benchmark. And as a result, once you've run a particular piece of code the system is in a new state, you've changed things. What are some things that could change after you've run a benchmark for the first time? What are some things that might cause that benchmark to perform differently or that program to perform differently if you ran it again a minute later? Parts of the system are different. Changes have taken place, yeah, Scott. Yeah, so, and this is particularly true when I rerun things inside the same application. Remember, when I start running an application it faults a bunch of stuff initially, and if I keep running that same loop, now all those pages are in memory, they're in the TOB, and so some of the latency I might have experienced on the first trip through is gone. I also might have pushed stuff into higher level caches. So it might not just be in the TOB, it might now be in an L1 or L2 cache where the memory is really close to the processor and super fast. I also might have things that are being loaded into the buffer cache, the file that I'm using that starts out, maybe my benchmark starts up by reading some data from a file. The first time that goes to disk, the second time it's in the cache. So caches and cache behavior frequently are something that benchmarks think a lot about how to either detect or defeat. So there are benchmark suites that the first thing they do is they run some tests to try to figure out things like how large is the L1 cache size on this machine, because I wanna make sure that my objects don't fit into it. And this was one of these things that for years, during a certain decades of computer science research, you'll find when people couldn't understand their results, this was the boogeyman, right? This was the monster under the bed cache effects. And this was used to explain away all sorts of things that people should have thought more carefully about. Okay, so measurement tends to affect the thing that you're trying to measure. It's sort of the Heisenberg uncertainty principle applied to computer systems. The first thing is the measurement might destroy the problem that you are trying to measure. How does this happen? Does anyone experience this with their OS 161 system? Measurement or debugging may eliminate the problem, only temporarily of course. What's an example of this? Yeah, yeah, so you may have a race condition that you see periodically. And then what do you do? You think, I don't like GDB, so I put in a printf. That's a natural thing to do. Try to figure out what's going on. Voila, the printf solves the problem. If only you could leave the printf in there and have it printf on the client's machine all the time. But the reason is the printf has destroyed something about the inner thread timings that you needed to cause the problem to happen. And this can certainly happen with measurement as well. Measuring and recording things always affects the system. I'm printing things, I'm storing information, however I'm doing it, there's code being executed that was not being executed before. And so the machine is going to behave slightly differently. And for extremely subtle effects, this can essentially totally wipe out the thing that you're trying to measure. And so later you have to actually try to separate the results from your measurement from any overhead that the measurement process introduced. And the people who are having the problem that you're trying to study may not want you to measure their system. The vendor or the client may say, you know, I have this problem, can you help me solve it? And you may say, in order to help you, I need to install a bunch of extra logging on your system. And they may say, no way. My system is already slow enough, I'm already having these problems. I'm not adding extra overhead to the system just so you can try to find out what happened. So measurement can limit your access to production systems. So, and you can imagine for the OS this is even more problematic because of how central the operating system is to the performance of the machine. Trying to do OS level instrumentation can frequently affect everything. Well, I've since say frequently, will always affect everything that's running on the machine. It's difficult to find places to put the debugging hooks. You're talking about a very complicated code base. And operating systems can generate just gobs of debugging output. When I was at Microsoft years ago, we were doing some work on understanding page fault behavior in Windows and we did this thing where we tried to instrument windows to record this sort of information and you just end up with these massive log files that you have to go through because these things happen all the time. And of course it slows down the system quite a bit. Now you'll find, so it's kind of interesting, you'll find on real systems like Linux that logging and instrumentation is a very, very efficient operation. And the reason is that people want to use it to understand how the system works, but in order to use it, it has to be really fast. So they've done a lot of work into making these code paths very short and very efficient. Okay, so frequently the problems associated with measuring real systems cause people to give up. And instead, try to measure something different. And we have some other options here. So if I didn't want to measure a real system, what else could I do? What are my other options? You guys have been using one of these for most of the semester. Yeah. Are you using a simulator? So there's two options here. I can build a model or I can build a simulator. Now, modeling is something that's typically a lot more abstract. So a model is to some degree a mathematical description of how a particular part of the system is going to work. And that can actually allow me to prove things about how the system would operate under certain conditions. A simulator is different. A simulator tries to replay or sort of simulate. A simulator tries to simulate. That's a deep insight into the world of operating systems right here today. So a simulator might try to simulate how a real system is working. The Sys 161 tool that you are using is a simulator. It is running instructions from your kernel that are MIPS instructions, and it is simulating how they would execute on a real machine. That's what it's doing. Now, a simulator usually involves writing a bunch of new code to simulate the system that you're trying to understand. And if you're trying to figure out what's a model and what's a simulator, it's very easy. Simulators involve code, models involve equations. Both of these have their pluses and minuses. So with models, the nice thing about a model is I can make a strong argument. I might be able to actually prove things about the system that I'm trying to understand. The problem here is that in order to allow the system to be subject to modeling, I usually have to make a bunch of simplifications that are not realistic. So let me give you an example. In the world of networks, there's a lot of modeling done on how, for example, an ad hoc network might be able to route traffic. However, a lot of these models have one defining assumption, which is usually referred to as the unit disk model. They assume that every node in the network can communicate perfectly or with some probability with every node within a certain radius of it and not with any node outside that radius. The problem is that that's just not anywhere close to being true in practice. In practice, wireless communication is weird and strange and things bounce off walls and all sorts of weird things happen. And so the unit disk model isn't anywhere close to holding for actual real deployed systems. Now, making that assumption allows me to prove a lot of really cool things about the systems, these sort of ad hoc networks that I'm modeling. The problem is those things aren't actually true because real systems don't behave that way. But in order to make it tractable to say anything about the system at all using this model, I have to make that kind of assumption. So on the other hand, simulations, so you might think, why would I write a simulator to simulate a system I already know how it works? The goal with the simulation is to make the right trade-off between repeatability and accuracy. You don't want your simulator of Linux to become Linux. That's not the goal. The simulator is intended to remove properties of the system that are undesirable and allow you to understand how the system works. And for example, to make repeatable to allow repeatable experimentation. Unfortunately, you have to write a bunch of new code. So here's another opportunity to make a bunch of mistakes. So one of the stories that I didn't tell you from the debate over LFS was that at some point, all the names are omitted here to preserve, to make sure people don't come after me, at some point, somebody was working with a simulator of a disk, a simulator of a file system that they were using to run experiments on file system performance. And they had written some papers about results from the simulator, indicating certain things about how one file system or another file system had performed. Unfortunately, at some point, this particular student discovered that the simulator had a bug. So the bug was that when there was a dirty page in the buffer cache that was evicted, the amount of time that it took to write the page back to disk was zero. So the simulator was forgetting to account for the time that it would take to evict dirty pages from the buffer cache. Now you can imagine this would change your results quite a bit. So this particular student went to their advisor and they were like, by the way, we have this problem. And so you can imagine that that got sort of messy. Okay, so yeah, so if you have bugs in the simulator and your simulator may not end up being as simple as you want it to be, that's the other problem. Okay, so moving on to another challenge with performance and benchmarking, what are the metrics that I use to compare things? So how would I compare two disk drives? And I'm talking about performance here. I'm not talking about, I mean, I can measure things like capacity or whatever, but what are some ways that I might want to compare two disk drives? What are some things that I might want to measure about a disk drive that I would care about? Think about a spinning disk, yeah. Yeah, seek time. I might want to measure things like sustained throughput. What's the maximum throughput I can get from the disk? What's the throughput for reads or for writes? Is that different? What's the throughput for random IO to various parts of the disk, which starts to affect how it involves a couple of different things, right? So I'm also testing things like the scheduling algorithm that disk uses to figure out where to put the heads. Scheduling algorithms, page replacement algorithms. In most cases, there's a bunch of different metrics I can use here. So remember when we talked about scheduling algorithms, we talked about throughput, but we also talked about latency. The throughput is how fast the system can get a bunch of work done. Latency says something about how responsive the system is under load. Page replacement algorithms, how do I choose? I mean, maybe I have a metric, but maybe I don't, because maybe I also care about how fast the algorithm runs. And then when you start to talk about higher-level things, it becomes even more interesting, right? File systems. A lot of file systems make design choices that cause them to have different trade-offs than other file systems. And so this can end up like the canonical example of the bunch of the people touching the elephant. That's my favorite cartoonist, right? Depending on what part of the system I measure, I come to different conclusions. And I mentioned this before, but you guys remember some of you guys may have been growing up drooling on the old Mac ads, where they were always claiming that their processors were faster than Intel processors, always, every ad that they put out claimed that. And it was like some benchmark they chose that look at us and here's Intel down here, right? And that was true until it wasn't convenient that it was true anymore. Okay, so usually when we talk about, so benchmarks are the things that we're actually gonna use to test the system. And when we talk about benchmarks, we can divide benchmarks into three different categories. So microbenchmarks are designed to try to isolate some specific aspect of the performance of your system. Just one thing, a microbenchmark. Just one part of the system, I care about how that primitive works. Macrobenchmarks on the other hand, try to do end to end comparisons. Usually try to measure a big chunk of the system all at once and the performance of that chunk is dependent on a bunch of different things going right. And then, but macrobenchmarks, in contrast to application benchmarks, so an application benchmark you can really just think of as running some application on the system and measuring how fast it performs. The difference between application benchmarks and macrobenchmarks is that macrobenchmarks typically try to make generalizations about how a bunch of applications are going to run, whereas an application benchmark doesn't care. It says, I'm a database server and this is how fast I run on this particular machine. So, here's some examples. Let's say that we're interested in improving our virtual memory system. It's just a random example. You know, maybe you guys are, maybe the auto grader's cutting you off and you're thinking, how am I going to make this faster? So, give me some examples of some microbenchmarks that are applied to the virtual memory subsystem. What are some small little primitives that you might want to measure? Yeah, so swap out time, amount of time it takes to find a page in the core map. Maybe the amount of time it takes VM fault to run, but that's getting a little macro now. So, just these tiny little operations. How long it takes to map a virtual address to a physical address using your particular mapping data structure? These are examples of microbenchmarks. So, yeah, a single page fault might not be micro enough. Page lookup time. How long it takes my page replacement algorithm to select a page? Not which page it selects and how good that page is, just how long does it take to run? What about macrobenchmarks? What's some examples of larger things that I might want to measure? Macrobenchmark for the VM system. Yeah, I mean, yeah, maybe aggregate time to handle a page fault. So, if I look at all the page faults that occurred and graphed distribution, that might be helpful. The page fault rate that my system could sustain, which has to do with how fast I can move things back and forth to disk and things like that. And then the only application benchmarks you guys have really heard of are these sort of silly, stupid things that we gave you. But I don't know, unless you're really interested in repeatedly doing three sorts in parallel. Although, I have to say, one of the more famous benchmark contests, at least to me, that goes on a regular basis, is something called the Jim Gray sorting challenge. And if you want to enter, it happens every year. There are multiple categories. The goal is to design and deploy a sorting algorithm that can support billions and billions of records and sort them extremely quickly. And this has been going on for years. You can go find the results online, and it's a very cool little thing that people do every year. I think it started while Jim Gray was alive, but now sort of partly to remember Jim Gray, and also just to give people an excuse to continue to write faster and faster sorting algorithms. That's kind of cool. Okay, so problems with benchmarks. So microbenchmarks, if you deploy microbenchmarks, you have to be really sure that you are zooming in on the right part of the problem. If you deploy a microbenchmark and you're fixing the wrong part of the problem, your fixes are not going to be very effective. We'll come back to talking about this a little bit later. Macrobenchmarks have the other problem, which is that it can be very difficult to isolate or identify the part of the system that you need to change. If you use a macrobenchmark, and application benchmarks are always limited to some degree to the application that they're designed around. So they don't necessarily reflect the performance experienced by a mixture of applications. So unless you're trying to design your operating system to only support one class of application, you may need to run multiple of these to get a sense of the aggregate performance that a bunch of representative applications would experience. Okay, now I would probably argue that the biggest problem with benchmarking in practice though is just bias. So frequently people that are choosing benchmarks are using them to try to motivate some change about their system that they think is going to make it faster. So and frequently you'll find this when you start looking at computer system designs. I don't know who can, it's the sort of no free lunch principle. A lot of times the way to make one part of the system fast is to make something else slow. I'm gonna make this other thing slower, but when we talk about this design decision, I'm not gonna talk about the slow thing very much. I'm gonna focus on the fast thing. Check this out, I made this one thing really fast. Now if your application is highly dependent on the fast thing you may be really happy about that, but if your application is on the short end of the stick you may not be. So frequently benchmarks may hide the fact that there are performance trade-offs that are being made that the people that are using the benchmarks are not particularly honest about. The other thing that happens of course is that people choose a benchmark. They do a bunch of work to improve the system according to that benchmark and they may not be aware, even aware of the fact that the improvements that they made have caused other things to slow down. So this is the other one. Okay. And to some degree, even if you wanna sort of point out that people are a little bit silly about how they deploy this, there's really a fundamental tension here with how operating systems work, which is that the most useful system, I mean operating systems have always been designed, pre-exo kernel, really to try to provide an interface that's useful to a variety of different applications. However, frequently making the system faster involves tuning to the needs of one particular application. To some degree, the exo kernel system is on some level trying to address this, remember? This is part of their argument. We don't need to make these general purpose design decisions that try to work for every application. We can allow applications to achieve better performance by making better decisions on their own. But in general, the operating systems that you guys use today are extremely general purpose machines that try to provide good service to a large category of applications, but you can certainly make them faster by tailoring them and tuning them to the needs of one particular application. Okay, so instead of poorly using benchmarks, let's talk about what actually sort of the right some good things to do and some best practices here. So the first thing when you start trying to improve the performance of your system is to have a specific goal in mind. That's a little bit more general than I just want to make it faster. And this helps you start to design a workflow in terms of picking benchmarks and choosing parts of the system to improve. So you usually have to sort of pick a problem and we'll talk in a few slides about how to do this. If you start, if you decide to use a model or a simulator, it's very important to make sure that the model and simulator match up with the system that you're building. So going back to my example from before, if somebody had done some side-by-side comparisons of the simulator in an actual disk, they would have probably noticed some performance anomalies that would have caused them to realize that the simulator had problems. So if the performance of the simulator is dramatically different than the thing that it's trying to simulate, then you've got a problem. And sort of use these techniques as appropriate. And this comes with practice, but figuring out what is the right tool to attack this particular problem. But certainly do this with data. So now we've come to the part where we have to talk about data and math. And this is totally true. So I started out life as a physics major and I just realized I was too dumb to do physics, mainly because it involved a lot of math and so I became a computer scientist. And so a lot of computer scientists don't really like math very much. I mean, if you talk to like the theory people, then those are essentially math people just wearing different stripes or something. But the rest of us, the system builders and the networking people, not really, you know, we're a little wary about the hardcore math. And so on a good day, we'll compute an average. This feels like a victory, right? I'm gonna run my experiment a couple of times and okay, here's an average result, right? And if you really push us, I'll put error bars on my graph, right? That maybe, maybe, okay? That's asking a little bit much, I would prefer just to use the average. So one of the first things to do when you start running systems and using statistical methods is to make some predictions before you begin the process. And I always tell my students before you generate a graph, draw just a little sketch up on a whiteboard of what you think that graph's going to look like just based on your intuition. This is really helpful because if the graph comes out looking very different, then either you've learned something about your system, your intuition was wrong, or there's something wrong with the experiment that you're running that you need to correct, right? That's what I just said. And particularly if you can make predictions about simple cases, that's a good way to start developing your intuition. So it's not a bad idea to collect some performance results for things that you think you understand. So for example, if you were benchmarking your VM system, you might collect some information about how long it takes to find a page in your core map because you think that you can reason about that. You think you can reason about it, you think it should fall into a certain distribution and that's a good way of starting to develop intuition about how this works. So this is one of my favorite, I take a real common mistake people make when they start to try to use data to do performance analysis, which is they compute summary statistics really quickly. So they compute, they say, okay, I measured this thing, I computed the average in the median, I'm done. And the problem is, summary statistics like averages and medias, I mean averages are dangerous. I think that the average is a statistical thing that should just be banished from our society. I think average should be an outcast. There's really no point in talking about averages. People are like, oh, it's the average income in the United States, who cares? No one makes that much money. The mean is a much, much, much more useful measure. I just need to stand up in favor of medians. Sorry, median, yeah, the mean is the same thing as the average. Medians, please use the median. Median is so awesome. Median is friendly, the median sends a lot about your data, the average says very little. So for example, these two data sets right here have the same mean and median. But there's clearly something very different going on here. Between, if you measured for example, if this was your core map, the time it took to find a core map entry, you might be concerned about this. What is going on over there? And so looking at raw data, even if by the time you show it to your boss or somebody else, you're talking about summary statistics, look at the raw data first so that you have some idea of what's going on. Because if you say, if you sort of make claims about the data based on some sort of feeling that it looks like this, and it turns out that this is what's going on, you're gonna feel very dumb. This is frequently a sign that there's some sort of bug in your system or something very interesting going on in the behavior, particularly part of this. Okay. Outliers, so a lot of times people run into this problem particularly people who like to use means because there's this outlier in the data. I ran it a hundred times and 99 times let's find and then there was this one weird value that I can't explain. You have to think about those values. You can't just say, oh, I'm just gonna delete that row from the spreadsheet, don't need it anymore. Clearly something went wrong, I don't know what happened. I must have just typed in the wrong command or something. Try to understand, they can be just weird remnants, they can also be very important. They can have a lot to tell you about how your system works because if one out of a hundred times a particular operation takes an enormous amount of time, that is going to bother somebody, somebody live. People like things to be predictable. Okay, so now we'll come to the part where we actually talk about deciding what parts of the system I should actually improve. And you might think that this is pretty easy. Just improve the slowest part. And one of the things, so even if this was true, here's the problem that we have as programmers. So think about your VM implementation in whatever state it's in. There's probably some part of it that you feel is dodgy, suspect. There's like one piece of code where you just got sort of wedged into a weird place and you had to fight your way out of it with some sort of really nasty while loop that you're not sure if it terminates or whatever. I mean, all of us have like little bits of code like that in our system. And so if you ask developers, what's the performance problem with your code? That's the part that comes to mind. No, maybe the part that you wrote after Friday happy hour where you had a couple beers or the part that you wrote in the middle of the night like right before the deadline your boss have given you, whatever, we have this intuition about the parts of our system that are problematic. Unfortunately, the intuition isn't very good. In fact, it's terrible. And so if you ask people, oh, if you don't let people measure their system or you don't force people to measure their system and you just tell them, go improve the performance of the system, a lot of times they work on things that are irrelevant. There are things that make them feel happy inside because they got rid of that really ugly piece of code that they were embarrassed about, but they weren't actually the performance problem. But let's make this more concrete. So let's say your code has two functions. Foo takes five minutes to execute. Every time I run foo, five minutes go by. Bar takes five seconds to execute. So which function should I work on in order to improve the performance of this piece of code? Yeah, whichever one you use the most. Okay, do we have any other guesses? Yeah, which one's easier to optimize? Okay, so you guys are sort of on the right track here. I'm gonna go with the, but I mean, a lot of people, this is what we do. These are better answers, but a lot of us will say, oh no, foo, foo takes five minutes. I ran foo and I went, got coffee and came back and it was still running, right? So here's the problem, right? What sort of elements have we missed in our decision-making process? You guys just brought them up. First, significance. How much does foo matter? Now it's not quite as simple as which gets run more often because depending on the system, if I have small differences, it's not necessarily which one gets run more often. It's how much of a contribution that's making to the overall performance of the system. But the second thing that somebody pointed out, which is how hard is it going to be to change this, to improve it? Let's say that foo is calculating 60,000 digits of pi using the best available algorithm. You are gonna improve on that? I don't know. I'm skeptical. So, and this is very, but this is also very difficult to figure out until you get started. So, after running some experiments, the thing that we're likely to know something about is significant. So let's talk about that. However, I would point out difficulty matters here and this is something that's very difficult to figure out from running tests. You can run a bunch of tests that will say, improve this function. I'll make this system twice as fast. You go look at it and you think, and there's all these comments, improved performance by 4x, improved performance by 20% from the people, the programmers that were fired before you, you were hired to work on this, right? And you might be start to think, uh-oh, there is no performance left to squeeze out of this. So, but let's talk about significance. So this, and this comes down to something that I really wish, if there's anything that I wish that most people in the world knew about the world, other than not to use means and to use medians, this would be it. It would be Amdol's law. So colloquially, Amdol's law says, the impact of any effort to improve system performance is constrained by the performance of the parts of the system, not targeted by the improvement. How many people have heard some formulation of this before? Oh, goodie, okay, all the graduate students. So now, imagine when you have the choice between reducing the execution of foo from five minutes to one minute, or reducing the execution time of bar from five seconds to four seconds. Because remember, this is gonna matter. Now, note here that the improvement of foo is better. It's better, absolutely. I've improved it by four minutes. I've trimmed four minutes off it. I've cut it's execution time also by 80%. So clearly now, foo's the winner, right? Now, now that I, so now I know something that I may not even have known beforehand, and now I should be able to say foo, it's time to work on foo, that's what's gonna do the best. But it turns out, as someone pointed out, if the program spends 95% of its time executing bar, but only 1.1% running foo, then the speed up that I achieve from foo is actually smaller than the speed up I achieve from bar. Neither is super exciting, but one second, just that one second from bar, improve the overall performance of the system by more than 80%, reducing the run time of foo by 80%. Ondelsaw's extremely counterintuitive, and it's something that as developers, I think you guys should be thinking about and coming back to frequently. And this is why, so when I worked at Microsoft, the team that we interacted with a lot was the, I was on the desktop performance, in the desktop performance group, we interacted frequently with the server performance group, and the server performance guys, I mean, if they found an unnecessary instruction, and they were looking at assembly, now so if they found like some way that they could remove one assembly instruction from a critical path, you'd never see them again. They'd be gone for a month in Hawaii, because the system got so much faster it's one instruction, because that thing gets run all the time. And so those guys were really interesting. So another way of thinking about Ondelsaw, if you're not worried about the proportionality, is fix the thing that's killing your performance. Don't worry about the rest of the code, find the thing that's hurting you, and work on that. The other interesting thing though about Ondelsaw is the more you improve one part of a system, the less likely it is that that part of the system is actually still your bottom. So this is another interesting corollary to Ondelsaw, and it means that you have to make sure that as you work on performance of your system, you're frequently pulling yourself back and saying, what now? Because once you've done some work on a particular part of the system, it may not be the top order bottleneck anymore. So you need to stop, re-run some benchmarks, reduce some profiling, and find a new target. All right, so this time I actually am done a little bit early today. Do you guys have any questions? The last couple of lectures have been a lot of me talking. Yeah. It's usually lying on hardware or full software. Yeah, so available simulator. So it's too bad Guru is not here. My own student is Guru, who's the actual Linux hacker in our group. He works on a full system simulator. So it simulates an ARM instruction set. It is kind of like your Sys 161, but you can actually boot an entire kernel and whole system image on it. So we boot Android on this thing, and we use it to experiment with new hardware features. Now on one hand, it's super cool because we can do things to it that you can't do in hardware. So for example, I can come up with new memory architectures and I don't have to build them. I just have to write software for them. Now let me ask you guys a question. How fast do you think the system is? It's like a thousand times slower than an actual system. So when we have to run benchmarks, like for example, you might wanna run a benchmark that opens a webpage. Two days later, that thing's still going. So it is a very fun way to do experimentation because again, you have control over the entire machine all the way down to the metal. I can change anything I want. The price you pay is that it's dog slow, super slow. Like, you can boot Android on it, but you can't interact with it because it's so slow. You hit a character and then it, 10 minutes later, it redraws the screen. So it's just very, very slow. Any other questions about that? Yeah. Well, so the trick when you're building simulators is to make sure that your simulator has its own notions of things like time. So normally, like again, the simulator that we use to simulate Android systems, like it doesn't think that time's moving that slowly. It's just, its notion of time is very slow. So we can measure how long things would take on a real system inside the simulator without having to worry about how the performance is affected by the underlying system. So simulators normally have their own notions of time and no notions of performance. And so the goal is to get to the point where, for example, if I had a disk simulator that I was used to simulate my file system, if I make a change to the file system that improves performance in the simulator, that performance improvement will translate when I run it on a real system. Does that make sense? That's a great question. So to some degree too, I would point out that some of the need to build these types of tools has been replaced by virtualization. So when we talk about virtualization starting at the end of next week, which will really blow your mind, virtualization is very cool. But to some degree, virtualization is almost like simulation except for the fact that I try to make the underlying hardware do as much work as possible. So virtualization you can think of as simulating all the parts I need to make it safe while using the bare metal to run as much as possible. Now, for example, on the stuff we do with Android, we're simulating in a different instruction set. So there's no way to do that on top of x86, right? x86, if I set one of the ARM instructions out to the x86 processor, it would have no idea what to do, right? It's just totally different. But when you're trying to virtualize something, that means that the architectures have to be the same. Great questions. Any other questions about performance? So for Monday, please read the Butler-Lamson paper. It's a fun paper. It's got a lot of cool ideas for improving performance, a lot of great discussions of the trade-offs. And then for Wednesday, please look at the Linux scalability paper. That's also a really, really cool paper. And I think you guys will like how they did things. It's a great application of these techniques. Find a bottleneck, understand what's causing it, fix the underlying problem, and then watch your performance improve. So I'll see you guys on Monday. Have a great weekend.