 Did you forget to hit record? Yeah. Are we starting now? OK, that's good. Whatever. Who cares? The beginning part's all boring anyway. OK, so why do we care about OS performance? What does it matter? Yeah? I'm very impatient. You're very impatient? Actually, you're less impatient than you think you are. It turns. I mean, to some degree, most of the computers that you use today are predicated on the fact your reflexes are incredibly slow. So you may think you're impatient, but the fact that all of this slow, crappy, buggy software where it's like virtual machines running inside of other virtual machines and stuff like that, the fact that this all works at all is all predicated on the fact that computers have just gotten so much faster than you have. But when we think overall, so maybe I'm not giving you guys a good starting point for answering this question, why would the operating system performance matter? Why if you were going to pick some piece of a performance puzzle to work on, would the operating system be a potentially important thing to think about? Yeah? Yeah, so the operating system slows down everything else. The other thing that's interesting is that correctness is important. We will all acknowledge that. People don't like things to crash. On the other hand, let me propose the following thought experiment. I can guarantee I will promise you your system will never crash. The price of that promise is that it runs 10 times more slowly. How many people would take that bargain? I wouldn't. I'd just rather let it crash from time to time, right? Because whatever, it's not going to hurt that much. In the meantime, I'm not going to be tearing my hair out going crazy waiting for things to happen. And this is another interesting. So to some degree, the irritation caused by system crashes is kind of a function of the fact that reboots don't happen instantaneously. So what if reboots happen instantaneously? Like you would see like a lit light. It wouldn't be called the blue screen of death. It would be called the blue flicker of death. Because you would just see it for a second, and then the system would have rebooted. And maybe you lose some state in your applications, and you have to restart things or whatever. But clearly that would improve things. And operating system vendors like Microsoft have paid a lot of attention to this over the years and tried to optimize boot. Because they realize not only is boot something that people an inherently useless process, but when the operating system reboots, if I can reboot quickly, I've given the person less time to think about how irritated they are that the system rebooted it all. So this isn't, of course, an OS history, but some of the, this is a really fun sort of early paper about operating systems. This is the paper that introduced one of the synchronization primitives you guys use this semester. Does anyone want to guess which one that was? Semaphores, yeah, Dijkstra, PNV, those semi-made up Dutch words. So this was one of the first operating systems, at least one of the first ones that we have some evidence of as far as the research publication. This is like the favorite, one of everybody's favorite parts of it, where it's like, our operating system had no bugs. And the ones that it did have, we caught very quickly. And the idea part of, and this was about the last time, well, it's kind of interesting. So recently this goal has reappeared. But there was a long period of history spanning decades when no one cared about being able to prove an operating system was correct. So Dijkstra and these very early operating systems, this was their goals to sort of have a provably correct system. I could construct a mathematical proof that the system was right. And then we gave up on that idea for like 50 years because no one cared. And everyone just wanted things to go really fast. And now it turns out there's actually renewed interest in this problem. And there are some provably, quote unquote, correct microkernels floating around out there. And there's more activity in the research area and the research community about being able to do this again. But even those systems are far from a production system. So at some very, very early point in operating system evolution, this was like, oh man, 1950s maybe? We just gave up on the idea of correctness and said let's get speed right first because that's what people care about. So if we care about performance and other interesting thing to point out here is that OS performance is actually a little bit different. It's not so different though than other parts of the system. It's a little bit different. But we should be able to apply sort of a standard performance improvement methodology to the system. So here's what we're going to do. We're going to start off by measuring the system. So that involves measuring how long it takes for something to happen. So this is easy. Then what we're going to do is analyze the results. We'll take our measurements and we'll try to understand them. Number three is we'll improve the slow parts. Number four is incredibly important. This is sort of like the idle part of the loop where we have our salivatory beer or cocktail or wine or whatever you want. It doesn't matter. Or tea. And then if it's early enough in the day, you might want to do tea. And then you go back to the top. So this is pretty straightforward. I mean, how many people have done something like this before? OK, most of you guys. So this is good. OK, so on Wednesday, what we're going to talk about is the Butler-Lamson paper on improving system design. Any questions about today? No, it's a short class. All right, I'll see you guys on Wednesday. We're not done? Not done? Last year, I got somebody. Someone actually started to leave. I was going to walk out and see what happened. Yeah, so we're not quite done. OK, sorry. Yeah, people were like, darn it. Why did I ask? I should have just played along. All right, but what's hard about this process? The truth is pretty much everything is hard about this process. Nothing about this is straightforward. So we're going to measure the system. How are we going to measure our system? And what is the system going to be doing while we measure it? Both these things are pretty critical. We're going to analyze the results. How do we do this? This might involve math, numbers, statistics. That's the stuff that we're all computer scientists because we wanted to avoid. Computer science is full of failed mathematicians, physicists. I mean, I'm a failed physicist, so I know this. To prove the slow parts, again, how? If there's, I mean, I improve them when I wrote them. I wrote them in such a beautiful, elegant way. Possibly be improved, right? That seems like an oxymoron. This part's, well, anyway, this you can also spend some time thinking about. All right, so let's talk about this process here. The first part is how to, so this turns out to be actually really interesting. How do you measure time on a single computer? How do I do this? This should be easy, right? Yeah. OK. So now here comes a problem, right? So now the ways that I measure time, particularly tiny amounts of time. Large amounts of time are not so bad, right? But tiny amounts of time, and we're talking about when we're improving things, parts of the system, frequently the thing that we want to improve doesn't actually take that long. That doesn't mean that it can't have a huge impact. And this is something we'll come back to later today. During a brief period of my illustrious history, I worked at Microsoft in a group that was dedicated to improving Windows performance. I was part of the desktop portion of that group when we had all sorts of problems, but there was these server guys. And the server guys, I mean, if they could find like one instruction that was unnecessary, and so maybe they did do some assembly modifications, if they could find one unnecessary instruction on a certain part of the code, they would all get like a month off, right? They all go to HEDI to celebrate, and we'll talk about why later, right? So it doesn't have to necessarily be something that makes a big difference, just a small amount in the right place can be really significant. But when I'm measuring time, a lot of times, the way I do this ends up being very dependent on the capabilities of the hardware. So a lot of hardware devices don't have high enough clock resolution to measure extremely, extremely fast events. The fastest clock I have on the device might be in like hundreds and hundreds of cycles, right? And to some degree, I can use cycles to measure things, but cycles end up having their own complexity, particularly when I start talking about machines where the processor might be changing frequency on me, et cetera, right? So it's not quite as simple as I want it to be. These low level hardware interfaces also had these really nasty hardware dependent interfaces that vary a lot, a lot more than you want them to. And so there are these entire libraries that you end up having to use just to measure certain things in a reasonably cross platform way. And then there's this wonderful fact, which is one of these things that just, it feels like something that's so sad and ugly that you should never have to bother yourself with it, but it turns out, I suspect that many of you at some point in your life will have to deal with a rolling count, right? A good old Y2K-ish effect, right? You had something to hold a value in and then as all counters do, it looped over and suddenly you were at zero again. And that was an interesting place to be, give it your history and there's usually some weirdness that goes along with this, right? If you're not a dispute. And I also want to point out that the faster you measure something, the faster those counters roll. And you might say, well, you 64 bits or something like that, but a lot of times the hardware doesn't support it, just, you know, as James would say, how much time do you have? We can, you should talk to my students about this sort of stuff. It's really pretty terrible. Next thing, right? I want to be able to gather repeatable, meaningful results. So why would I want the results to be repeatable? What is scary about a situation where I've tried to record something about the system and let's say I take 10 measurements and they're all completely different? Why does that tend to be a problem? Or why is that frightening? Yeah. Okay, so that's part of it, right? But remember, what am I doing here? I'm trying to conduct an experiment. One of the things I do when I conduct an experiment is I try to control things. If the results are really different, what does that indicate? Yeah. That can be true too if I change. So certainly if I change something and if I have a huge variance and then I change something and then I still have a big variance, I'm in trouble, right? What else, Ron? Yeah, so remember, I mean, when I do a controlled experiment, I'm expecting the result to be controlled. If the result is this out of control untamed thing, then I start to wonder, am I actually controlling all the variables I should be controlling? I might be worried about some sort of beeping in the room. It's interesting. You know, I might be worried about there's some sort of background noise that's interfering with my measurements or there's some fundamental property of the system that I don't understand that's causing these measurements to vary. And this is fundamentally a very unsettling thing. So actually Scott and I have been working on benchmarking the assignment three solution set to try to work on so developing some performance targets for you guys to play with. And it's the same thing where you conduct one measurement 10 times and the variance is really high and we're sitting there scratching our heads trying to think what is causing this. Because clearly there's something about the system that we haven't quite understood. And here's the other problem with measuring operating systems is that whenever you're at any point when you're trying to measure something, the system is adapting to you, right? So the system is trying to use the past to predict the future. You're just trying to measure the present but you can imagine some of the things that we've talked about in terms of caches and other things. The system is trying to learn to some degree. It's trying to adapt to what you've done. So if I run an experiment 10 times, by the 10th time, all those pages might be in the buffer cache or somewhere else and the results are extremely different than when I start, right? And real systems, the state changes all the time. So we've been doing some benchmarking on Android, right? So on your Android devices, there's always these little things that are kind of starting up in the background. And so if I do one experiment at Timex and one experiment later, it's possible that one of those experiments collided with something that was also going on on the system that produced some noise that caused the numbers to be quite different, right? So this is very, very, it's very hard to control these real complex big systems. On the other hand, if you take away too much of the background noise, to some degree you're making your results very unrealistic because whatever you're trying to improve actually has to improve things on a real system, not just on this toy thing that you created in the lab. Yeah, so this, and this was like a classic excuse for years in the systems communities, blame a cache, right? You know, we use caches to solve all our problems, including unrepeatable results, right? So if you, you can still find this phrase, like you should just go to Google Scholar and Google cache effects. There's probably thousands of papers, right? Yeah, you know, the variance in that table is pretty high, but we think it's due to cache effects, right? We have no idea what we're talking about. The cache effects are a convenient boogeyman, right? What does that mean? It means, oh, the cache must have made one of them faster, one of them slower, but no one ever knows. It's just, you know, how I point fingers. So, other problem here, right? Measurements have overhead. Measurements affect things. Measurements are not a, usually in a real system, measurements involve running some piece of software that runs on the system that's being measured and competes with other parts of the system for system resources. So, depending on what you're doing, how many of you guys have had this experience, not necessarily doing benchmarking, but doing debugging during this class? So, you put a printf on something and suddenly the problem that you were trying to file solve goes away, right? Has that happened to anyone? Yeah, of course, because once you start printing to the console, like, you're generating a bunch of interrupts and, you know, whatever race condition that you were investigating is now gone. So that's super frustrating. Now, I have another source of noise, which is the measurement harness itself. And in the real world, there's a practical limitation which is that someone, so imagine you're working at a company and someone comes to you and they say, you know, your widget is running really slow on our servers and we don't understand why. And so, your response is to say, hey, I'm gonna come and make it run even slower so that we can understand the problem better. I mean, there may be a little bit of resistance to this, right? Given that they already have some sort of performance problem, that's what drove them to contact you in the first place, all right? And, you know, with operating systems, this is even more problematic because the operating system is such a low-level piece of software that, you know, and it affects everything. So, I might be trying to debug a particularly slow application on a particular operating system. Am I gonna install a totally new OS on that entire machine that affects every other application just to solve this one problem? That's hard to do, right? Yeah, so here's other problems, right? Operating systems do a lot of stuff. So, imagine if you're trying to trace something like page replacement behavior. I mean, even, so even on the Android devices that we've been using, if you try to monitor every, like, you know, this seemed like a very obvious thing for us to want to understand, we just wanna see every context switch. Why is that such a problem? It turns out that generates like a lot of output. And so, trying to run that on a device that someone's actually using generates so much output that it's actually not, not acceptable in most cases in the types of misrepresentation we do. So, we do all these hacks to work around this, right? Because there's just a lot of context switches. So, if you start writing them all down, just that process creates so much overhead that nobody can use the phone or no one can use the server anymore. All right. So, what do we do, you know, so there's some alternatives here. When you start talking about the benchmarking space. One is, I can build a model. So, what is a model of the system? A model is to some degree an analytical description that allows me to reason about the system mathematically. So, that usually, particularly for operating systems, involves abstracting away a lot of the low level details of the system. You know, I might model the disk as having a certain path to take between two different points, but there's a lot, you know, developing a model that's useful for actually calculating things about a real system usually involves throwing away so much information that it's not clear whether the resulting model is actually useful. A more common approach when you talk about systems is building something called a simulator. So, you guys have been using a simulator all semester, right? That is sys161, that is a MIPS simulator. The nice thing about a simulator is that you can potentially get extremely predictable results. So, for example, when we run certain tests on sys161, if you pin down some of the randomness that's present present in the simulator, you should be able to get completely repeatable results. The system works the same way every single time. What's the problem with simulators, typically? First of all, let me ask a question going back to our discussion of virtualization. What is the difference between a simulator and a virtual machine? This could be a good exam question. Why can't we give you a MIPS R3000 virtual machine to run your kernel in rather than a simulator? Yeah, want to take a stab at it? What does that mean? Yeah, you're on the right track, yeah. Yeah, you're getting colder, right? What's the, what was the trick to getting good performance out of a VM? Yeah, yeah, but what does that mean? That's totally true, and so, in order to be able to run on bare metal, what has to be true, yeah? Well, the VM, neither, the VM doesn't either. Both of them create a very self-contained environment, but there's this critical difference, right? Why can't we give you a MIPS R3000 virtual machine, yeah? Well, not just the GuestOS, Guest Applications as well, right? So to run a VM, you need binary compatibility between the Guest Operating System and Guest Applications, everything that runs inside the simulator and the underlying architecture. So let me ask this question one more time. Why don't we give you a MIPS R3000 virtual machine? Yeah, Rob, I hope not. That's a crufty old thing, right? Yeah, and you guys don't have a MIPS R3000 a computer to run it on. If you did, you'd be living in the 1980s, right? That would be an interesting machine, but, and that would be awesome that we had vented virtualization at that time, right? That'd be kind of interesting, like we've been back in a time machine or something. But anyway, yeah, so what happens with the simulator is the simulator sees this MIPS R3000 instruction, and what does it have to do essentially? The underlying architecture is x86 probably, for most of you guys. What does the simulator have to do? It sees some instruction in MIPS R3000, and kind of fundamentally what does it need to do? Yeah, it doesn't quite do that, but it has to have, it has to execute the series of x86 instructions that would, that are required to, you know, have the same effect within the simulator. What effect does this have on the simulator versus a virtual machine? Way slower, right? You guys may have seen this on your computers. I mean, you guys can run, you know, things, and if you can run things, yeah, that's great. You can run your system's CIS 161 simulator, and if you run it with something that produces a lot of CPU cycles, the simulator will run slower than wall clock time. Yeah, right. You can, I mean, I can, no, and the reason, yeah, that's a good point. I don't wanna give you the illusion that it's looking up a translation in a table, right? It's software, so what it needs to do is it needs to do the necessary things inside the simulator to make it appear as if that instruction had taken place, right? So there is some series of x86 instructions that gets executed, but it's not being translated, right? Because as you point out, there is probably not a translation, right? So it's programmatic. It's not sort of done in a binary translation way. Does that make sense? Yeah, it requires this extra piece of actual software, right? Yeah. No, no, no, there's no CPU. There's a simulated CPU. So that means two things. First of all, that means that because you're implementing the machine in software, you get to choose the level of detail you want. So for example, you know, you guys can see this in your own simulator. Your system one simulator blends feet. Your system one simulator does not actually simulate any real CPU that I think ever existed because it's blending features from the original MIPS R3000 architecture with later features that were required for multi-core processing, right? The R3000 was done at a day where there was no multi-core, no multi-processor architectures. And so when that was added to the simulator, David took some instructions from later versions of MIPS and sort of blended them in. So there is no actual piece of physical hardware that you could run this system on. Does that make sense? This is a simulated CPU that is a hybrid of other things. When you run more complicated architectures, the simulator can actually get incredibly slow. So you can run, this is kind of interesting, you can actually run entire full Android simulations on X86 using something called, the name's gonna escape me. There's an ARM simulator out there that I'll probably think of in a few minutes. It sometimes runs like a thousand times slower than the machine inside. So you boot it up and it's called gem five. You boot it up, you go home for the night or you start it booting, I should say. You push power, you go home for the night, you sleep and then you come in the next day in 10 minutes and gone by inside the simulator, right? Because there's no hardware. There's just all these software things and the expansion required to actually simulate the processor is quite good. All right, but again, the nice thing about simulators is because you build them and control them, you can make them do certain things that real hardware would never do. You can make them completely predictable. You can eliminate lots of sources of randomness and uncertainty. That might be present in real systems, right? And this could be useful, right? Now the problem, of course, is, let me just finish talking about simulators. What happens if you reduce, so the goal of producing a simulator is usually to make things simpler, make things more repeatable, be able to isolate the effect on the system that you're trying to capture. What's the danger? So when you build a simulator, you typically rip out some of the reality that's present in a real system. What's the problem with doing that? Yeah, right. It's, well, there's two problems. One, it's possible that you end up removing something that creates a performance effect that you actually want to see. The other problem is you can make mistakes. Your simulator can be wrong. So there are famous stories of people actually publishing whole papers about the performance of a particular system and then somebody trying to reproduce those results and realizing, hmm, there's something wrong with the simulator. So someone once told me a story about a paper that she was working on with their advisor and they were trying to reproduce these results and everything was looking good. And then they realized that the, remember we talked about buffer caches and one of the things you have to do, what do you have to do when you evict a block from the buffer cache? This is a good review. The buffer cache or using memory to cache the disk if you remove something from the buffer cache, you better make sure you do what before you reuse that memory. Zach? Oh, well remember the buffer cache isn't actually using virtual memory. The buffer cache is caching the file, file system blocks. So if I have a block in the buffer cache and I'm gonna remove it, I need to make sure that I do what? Yeah. Write it to disk. Write it to disk. Yeah, exactly. I need to make sure if I have a dirty block in the buffer cache, I write it to disk. Now it turns out this simulator did not do that. Somebody forgot to put in the timing required to actually write the block to disk. So it turned out that removing blocks from the buffer cache was really fast because the contents were being destroyed and nobody realized it. So this simulator was producing totally bogus results which made somebody sad when they discovered it because the particular point that they tried to make was not true. All right. So anyway, this is not interesting. Okay, I'm gonna just skip through this. You guys can look at this online. All right, so this gives you a sense of the tools that we have at our disposal that are not real systems. Simulators potentially building models to try to explain persistent performance. Now we get to the part where I actually have to measure something. So coming up with good metrics for persistent performance is pretty difficult. I was just laughing with somebody after class last week. Did anyone ever have one of the old old Macs that had alpha processors in them? One of the, oh really? Nice. So for years, do you guys remember this? Are you guys old enough to remember that? No, yes you are, give me a break. At some point Mac did not use Intel processors in all of their products, all their laptops. They had these, what's that called, Alpha? PowerPC, that's right, sorry, PowerPC, yeah. And it was so funny because for years and years and years like you pick up these PC magazines which of course I've read all the time and there'd be these big ads where Apple was touting the fact that their PowerPC based architecture was faster than Intel at something, something, something. And then it was like overnight they were like, ah, that wasn't true, right? We're just gonna use Intel processors just like everybody else. So they just, so, and this was a case where they're measuring something and they're trying to produce a result but trying to claim that something was faster, right? At the time where he had these two competing architectures. This is starting to happen again between ARM and Intel, it'd be interesting to see where it goes. But you can think about if I'm trying to compare the performance of two different things, picking metrics is quite difficult. How many, do you guys look up performance results? How many people have ever been on like Tom's hardware or something like that? Okay, so you guys are a little bit familiar with this and what you probably notice is that there are a lot of different benchmarks and micro benchmarks that are used by these sites, right? So if I'm trying to create, like, they wanna compare two disk drives, what are some of the things that I might, might be interested in comparing about two disk drives? What are some of the metrics I might care about? Yeah? Okay, I can measure the RPM, right? I probably don't really care directly about RPM, right? Because who cares unless it makes noise or something. What do I really care about as a user? Yeah. Okay, read and write speed, can anyone be more specific? Yeah. Well, usually there's a couple of different types of read and write speed that are measured by these tests. So what are the extremes? Yeah, Matt. Yeah, so there's kind of the highest throughput you can get from the disk. So imagine I'm reading one contiguous track and then I just move, you know, it's essentially I'm doing the least number of seeks possible and reading the most amount of data. So that's usually quoted as one number. And then there's the worst case performance where I'm essentially seeking from one place to another all over the disk. So imagine that I pick random blocks all over the disk and just send them down as fast as possible and see what I get. Clearly, neither one of those benchmarks is a perfect fit for an actual real workload, but they're interesting data points to compare, right? So the seek times factor into that second thing, right? So I might have a disk that has like a great data path so it can actually read a lot of data really quickly once it gets where it's going, but has terrible seek performance. And I might have another disk that makes different trades. So even here, things get complicated. Scheduling algorithms, same thing. Am I interested in interactivity? Am I interested in throughput? Am I interested in how busy every part of the system is? We could be here all day talking about different ways to compare the same thing, right? Two-page replacement algorithms for which workloads? Workloads where there's a lot of reuse, workloads where I touch a lot of pages just one time. That can make a big difference. Now, you start talking about whole subsystems, this gets even more interesting. So file systems have a gazillion things you might care about. How long does it take to create a file? How long does it take to move things? How long does it take to do name lookups? What about creating big files? What about extending a file and then truncating it again? You can think of lots and lots of different operations that might be affected by the file systems underlying data structures that I want to measure. And so this is sort of the classic, everybody's touching a different part of the elephant. And to some degree, when you start measuring systems, it's even more interesting because some people only care about one part of the elephant. Some people, I don't really care about anything other than the trunk. So I don't, you know, you might be touching the foot. I just don't care what you have to say about the foot because I don't care. Just, I just want the trunk. All right, so just when we start talking about, so benchmarks are the thing, are a way that we usually use to refer to pieces of software that we use to test system performance. These can be broken up in a couple of broad categories. So microbenchmarks are designed to target or isolate one individual component of the performance of a system. So give me an example for a file system. What's a file system microbenchmark? I just threw out a couple of examples a minute ago. Like what would be one microbenchmark for a file system? Think about file system operations that I might care about. Okay, so I could do reads and writes. Yeah, what else? What's that? To some degree, reading and writes sort of end up having the disk get involved in a way that I'm not completely necessarily happy about. What are more file system operations that I might want to care about? Yeah, rename. Yeah, do like 10,000 renames. Rename is kind of a complicated operation. Some day when you guys do assignment four, you'll realize that. But yeah, so rename turns out to be kind of hard. How quickly can I open a file? Opening is pretty much entirely a file system operation. These are sort of microbenchmarks. Macrobenchmarks, here I might do something like, for example, run a benchmark that reflects the file usage patterns of a real application. So opening a file, reading it all into memory, making a few changes to it and then closing it again. That might be sort of a common operation. In certain cases, the performance of the file system and creating temporary files is important. So how quickly can I create a new file, fill it with some contents, do some processing on it and then destroy it. So in here, what I'm starting to do with a macrobenchmark is blend together some of the microbenchmarks in a way that's supposed to be representative of something that someone, a real application might actually do. I can also run, I mean applications themselves can also be considered to be benchmarks. And in some case, that's the best way to benchmark a system. I don't care about the individual performance of reads and writes and things like that. All I care about is how fast does the web server run? And so the best way to do that is to run the web server. I have to give it some sort of load, right? And you'd be able to measure something about the application that I care about. But in certain cases, this is the best way to evaluate performance. All right. Any questions at this point? Did you guys look bored? Sorry, this is a boring lecture. Wednesday will be more interesting. You guys understand this already? Okay, that's good. All right, I'm just gonna blow through this so we can talk more about, this is sort of what we've already talked about together, right? So when we think about benchmarks and how they're used, I mean the problems here are potentially kind of obvious. Now microbenchmarks, you may end up improving one tiny little part of the system but not really contributing much to the overall system performance. Macrobenchmarks can be very, very hard to understand because they are the result of a lot of different variables. So with the macrobenchmark, you can get into these cases where you can run the same thing a bunch of times. The results are quite different and I am completely confused about why, why that's the case. Application benchmarks, you run into this problem where at least in the old days where I actually wanted to run multiple applications on one system at one time or maybe on the new days when you talk about interactive systems, maximizing system performance for one application usually comes at the price of hurting performance for other applications or at least not optimizing performance for other applications. So if I asked you for example on your smartphone what application should I tune the entire performance of the system for, that might be a hard question for you to answer. Maybe not, maybe so. Oops, sorry. So the other problem is people's choices of benchmarks are somewhat problematic. So benchmark is an interesting area where you see a lot of efforts by various standards bodies. So somebody comes up with a benchmark suite and they start to use this benchmark suite to evaluate a particular type of system like a database system. And in certain cases, those benchmark suites and those organizations are the result of collaboration between a bunch of different database companies to say, hey, it would be useful if we had something to measure the performance of our system, let's create this organization that does this and will all participate. In other cases, it's one company that's marketing a particular product that says, hey, there's this awesome benchmarking suite out there that you guys should all use to measure your databases and oh, by the way, ours turns out to be the best. Let me measure it with that. There's no coincidence, clearly ours, it just is the best. But I'm always reminded of all those stupid car awards you see on TV, does JD Power give an award to every car every year? It always seems that way. It's like, I'm just, so many people get those awards that they end up being sort of meetings. So this is an area where you have to be a little bit careful. On other cases, people will find a benchmark and do a lot of work, they sort of chase that one particular benchmark in a way that may not be appropriate, right? It may not be the best for all the applications on the system, and may be done in just a way to just try to improve on that one particular thing without really worrying about a broader performance. And the problem here, and this is something that this sort of tension, I think, is super interesting and something that you guys will watch play out over your lives as computer scientists and as well you watch technology evolve. General purpose computing systems are useful. That's sort of what we've been building for the past 40 or 50 years, and that's what most of you guys have access to. It's a general purpose system that is capable of doing many, many things. However, the fastest system is a system that's built for one specific purpose. You imagine you take the whole hardware stack and everything about the system from the processor instruction set and everything else, and if I just have one application in mind, I can build something that will just scream so fast because every part of it has been explicitly selected to do this one thing. It's interesting whether or not that's an appropriate choice. We have, there's been a couple of times recently over the past few years where we have people coming in to give a talk in the department and they're like, yeah, we have this one algorithm that we are having a hard time getting into to run fast. And so what we did is we went out and we got $10 million of taxpayer money and we spent five years and we built this special computer that runs this algorithm super fast. And you're like, yeah, so what? And then you usually ask them, well, what was the problem with the performance on the general purpose system? They're like, we're not sure. Like, we didn't spend a lot of time studying that. We just ran straight away to build a brand new computer that solves this one problem. Not clear whether that's the right approach. And on Friday, we'll read a great paper that sort of pushes back against that general line of attack, which is, oh, these general purpose systems don't work very well. We need these brand new solutions. Friday's paper is a great counter example to that. So I hope you guys will read it and come to class. All right, so when you start, and this is a sort of good advice for when you're trying to address any sort of problem with the system, certainly a performance problem as well. The first thing to do is have a goal in mind. Have a specific, this is sort of part of being scientific. I know that computer science is one of those fields that has to have science in the name to sort of reassure ourselves, right? Someone famous said that all the sciences that have science aren't real sciences, right? My wife is a political scientist and she would agree with that statement. So, you know, but this is a case where you can actually make yourself feel like a real scientist. Like put on your little lab coat and, you know, have a hypothesis. This is like, this is the thing I'm curious about. You know, for example, Scott and I were talking earlier, you know, does the randomness and TOB replacement in the solution set have an effect on performance that's causing the performance to vary when we run it with different seats? I don't know. Let's find out, all right? So that's our hypothesis, is that might be causing the high variance that we've observed. That like validate things. This is particularly important when you, if you start writing simulators, you know, a simulator is only as, any results that you get from a simulator is only as good as the simulator itself. Simulator is wrong, you're in trouble. So, you know, writing one of these tools from scratch is a lot of work. Even using somebody else's tool that's pretty mature, you need to be careful with them. Just make sure that it's measuring the right things. And, you know, oops, I should have increased the font size. So, and then, you know, use this toolkit as needed. There are times when, you know, we didn't talk about modeling today, but there are times when coming up with an analytical model of the system can be really useful and help you think about how the system works and observe some things about it. There are other times when it's not that helpful. And you have, but you know, you have a lot of tools at your disposal here. There's also profiling tools and other things that are built into a lot of these standard, standard toolkits. So that can be something that can help, right? Let me see where we are on time. Okay, I think I'm close enough to be done. You guys have any questions? I think I'll pick up here tomorrow. So tomorrow we'll talk about statistics. We'll talk about Amdahl's Law, which is maybe one of the two or three things that I hope you walk away from this class knowing something about. And then we'll go over Butler-Lampson's famous and fairly whimsical paper on hints for computer system design. So I will see you guys on Wednesday.