 All righty, why don't we get started? Again, I'm Carl, and I'm going to be covering, I guess, it looks like, hi, nice to meet you guys. Tonight's, or tonight's, or this morning's, or whatever it is, what we're going to be talking about is essentially measurement statistics on this. If you had 341 with Chris Chandler, a lot of these themes are going to be somewhat familiar. It's just that it's going to be kind of at a different level, because you're now talking about a lot more systems in play, and we're going to be focusing more on software rather than on hardware on this. So I can also speak from very firsthand experience about a lot of the stuff we're going to be talking about today. I have had my woeful problems playing around with measurements in the Linux kernel, and it's a great idea in theory, but in practice, making it actually work can sometimes be troublesome. But again, that's what computer science is all about. So let's actually talk about performance and benchmarking. Again, they're certainly related. They're not obviously exactly the same. Performance we're essentially talking about is, does your system respond to you? Do you like it? Or do you kind of sit there and twiddle your thumbs while the system is saying buffering, or is it just frozen, or what have you? As opposed to benchmarking, what we're talking about is how can we maybe compare one system to another on this? But in terms of that here, what do we care about performance on this? In other words, Jeff kind of makes the point here that it's not even so much whether or not your computer crashes. Remember, well, maybe not. But those of us from the Cretaceous era, remember Windows 95 and the fact that the thing blue screened every second minute. Well, we don't want that to happen. But what we're saying is sometimes if we had a choice between essentially a computer that is mostly correct, it's going to crash once in a while. But when it's not crashing, it's really blazing fast, and we enjoy it. As opposed to a computer system that is kind of the exact opposite. In other words, it's a computer system that never crashes, but it's really, really slow on this, which would you rather prefer? Because, let's face it, if the entire computer rebooted almost instantaneously, would you really care? Obviously, this is assuming that you're not losing any data on this. But in other words, we want to kind of keep in mind here that it's not so much that logical correctness is the be-all and the end-all, but we also really do care about performance. And that's kind of one of the things you're dealing with if you're on the assignment 3.3 with the benchmarks and the timeouts on this. Yes, you need logical correctness, but you also need, if you will, a certain amount of performance that the user can appreciate. And this thing here is actually a quote. I believe it's from Dijkstra on this. But essentially, if you're reading through here is this person is talking about designing a refined, multi-process system, yada yada. In other words, we're essentially talking about a kernel or an operating system that it's, for the most part, we've figured out that it does not have bugs. Or if there are bugs, you can kind of see, essentially they're small enough, we can kind of hand wave at them or what have you. And this is from a published research paper. And it is from 1968. Provable kernels, or that is an operating system that you know to be correct, that you've kind of gone down all the logical paths and you can say, I know for sure that XYZ will not happen or that these will always be results are essentially exceedingly rare and exceedingly expensive in practice. Just Google it. If you're talking about like a provably correct kernel or let's say fault-free on this, there's a very few subset of manufacturers that will sell you for a huge sum of money, kind of mostly correct on this. But for the most part, getting a kernel, for that matter, any program that you know for sure is 100% correct, is kind of an exercise and futility on this. You're chasing a wild goose on this. So let's talk about trying to get it as close as we can and as performant as we can in practical uses here. So what we're talking about here is performance. Well, you know the drill on this. What we're gonna do is measure our systems. Now one thing when we're talking about measure, this is kind of a vague word here and we'll talk about that in just a few minutes. What do we mean, what are we trying to measure on this? Because when we're talking about performance, we're talking about performance in terms of what? Well, it could be responsiveness that it's quick. It could also be performance in terms of correctness, but maybe there's some other factors in there too. And maybe you wanna kind of maybe throw an economic facet in here. It's performance in the sense that we're also hitting maybe the financial goals of the CFO that told us that I need you to design a server cluster to do XYZ and it's gonna be kind of a trade-off involving let's say money with correctness and performance. So these are a little bit vague here and we're gonna have to dive into those and try to refine them in a little bit here. Okay, so we measure our systems and we get some sort of statistics back. The classic one again is how fast does it run? So we're gonna get some wall clock figure back from that and then we need to analyze the results. In other words, well, in general, shorter is better but watch it again. Think back to 341 days on here. Just because one particular program seems to run better does that mean is it a better design program? Is it better hardware or vice versa on that? Just because one computer seems to be better on this well what type of a testing suite are you running on it? Maybe a different testing suite will give you a completely different result on this. So again, something to think about. We need to measure again vague term and then we need to analyze it. Well, which is better or which is worth again a little bit vague but we have to deal with this. Then improve the slow parts here. And again, another little bit of a vague term there. What do we mean by the slow parts here? Because remember a computer has many, many parts to it. We could talk about the disk drive. We could talk about the memory. We could talk about the CPU on this. We could talk about other areas of the IO on this. And what part or parts do we want to improve? Same with an operating system on this. Are we talking about the memory manager? Are we talking about the file abstraction? Maybe there is a problem with our file abstraction in general or maybe it's a problem with just one underlying file system and the file abstraction overall is actually quite good. I don't know. That's something that we need to determine here. And what do we mean by improve that? Well, this also gets a little bit into the AMDAL's law sort of thing. In other words, when we're talking about improving the slow parts, well, what do we mean by slow? Let's say that we were able to, let's say speed up one particular facet of our system 10 fold but that part of the system only accounts for maybe 0.01% of the total user experience. Well, was that really a big win on that? Research papers are famous for that or infamous for that. Hey, you know what? We dug into Android and we found out this particular area that is really inefficient and we came up with a better way of kind of a better mouse trap. Well, and this particular subsystem now is improved X-fold here. What's the overall effect on the user experience? And then it kind of well, it kind of we measured time and it's about 2% faster. Well, that actually may be a significant speed up in terms of a running actual system that's out there in commercial user land. But on the other hand, 2% is a lot different from let's say the five times speed up of the subsystem. Everyone with me on this? So in other words, when we're talking about improving what's worth improving and what's kind of the overall bottom line on this. And then, okay, in terms of celebratory beer on this, probably knowing Jeff, he would probably recommend one of these craft beers from one of those micro breweries on this. But, and then, if you will, go back up and we're in the infinite wild loop. So that's kind of the nutshell version of operating system performance on this. Measure, analyze, improve, and then go back and forth. That's a lot of software development on this. Or should I say software maintenance on this? Now, this last part here is to say that essentially, remember what is operating systems? Remember, way back from the first week of class, an operating system is a software program. Yeah, software program. And it's nothing more and nothing less. It's simply a question of degree rather than, if you will, principle on this. So the problems that we're gonna be dealing with are going to be ones of largely degree. And that's what's going to be going on in the next few minutes. So, how can we improve our operating system performance? And in theory, this was intended to be kind of the penultimate class because then we're talking about, you know what, let's just skip ahead and not even deal with this problem. But let's actually kind of deal with this really. So what's so difficult about this? We already talked a little bit about measuring our system. Well, doing what? How do we actually measure it? Especially when we're talking about an operating system on this because in user land, we can take advantage of some needle kernel tools. Well, if we're talking about improving an operating system, well, we're gonna have a reflexive problem. How do we improve that and still rely on the stability of a kernel when we're changing the kernel on this? I know when I go ahead and I instrument Linux, if I get a bug in there, rather than having a user process shut down the entire stupid phone reboots on this, I know I pushed out some changes that I had been testing for literally months on my own test bed device and it worked great. And as soon as I pushed it out to the alpha test level of the phone lab users, I have Scott looked over at me maybe about less than an hour later and said, hey, Carl, my phone just rebooted. And you know what it was? It was a no pointer right in the middle of fork that I had failed to handle. So this sort of thing is it's kernel problems, as you know, cause a lot more problems and it's more of a pain on this. So same thing true with measurement. Okay, so we've got measurement, we've got analyze, but again, yeah, statistics. And okay, improve the slow parts. Again, what types of parts are we talking about? Are we talking about the operating system? Are we talking about the hardware? Because again, the operating system ties in very closely with the hardware on this. And again, our celebratory craft beer, please. Okay, well yeah, I would not know much about that, but apparently there's a bunch that are out there. So okay, how can we actually measure time past on a single computer here? Well, a couple ways of doing this here. Number one is, okay, when I'm talking about time past, we're talking about obviously measurement in terms of the metric of time. As we already talked about earlier, there are other metrics that we need to take into play. Possibly the economic one, certainly the correctness one here. But in terms of, let's just for now focus on making things faster or more responsive in terms of wall clock time to the user. So let's see if we can't measure our guinea pig system on here. Well, a couple ways we can do this is since we are talking about the operating system, the operating system needs to be really, really responsive because remember the point of running windows on your machine is not to run windows, it's so that you can run whatever first person shooter you've got on there. So the operating system is the service to that. So we don't want the system itself to take much time here. So how are we actually going to measure the performance that the kernel actually has in place here? Well, one way we can do it is with the software here. And the problem with that is it may not have fine enough resolution. Example, I am currently banging my head against an oak tree trying to measure file statistics on Linux on Android. In other words, how often files get opened, read, closed, all that fun stuff. Essentially, literally every time on an Android phone, any file gets opened, closed, read, or written, or other stuff. As you can imagine, that happens a lot in any given phone here. Now, one of the metrics that I'm interested in is, let's say, that a user program calls sysread on this. How long does it take? So I need to, within my tracing code, call a clock before and after each syscall on this. Well, you know what? Given we're talking about nanosecond precision at this time, because I'm inside the kernel on this, the granularity of these software counters actually becomes very significant on this. So it's kind of like the Milliken oil drop experiment going back to chemistry on this. Quanta really begin to show up on this. So in terms of chunks that I'm dealing with, I'm seeing like 400, 800 multiples of fairly big numbers on this, because we're talking about, again, software counters. In this case, I'm talking about the Linux clocks simply have fairly large granularities on this. Now, maybe I could ask for a much finer grain granularity, but a couple problems with that. Number one is I need to make sure that it's accurate. Number two, that I'm not janking on resources that is used by other threads in the kernel. I could actually also take a look at this problem here. Hardware counters have extremely device-specific interfaces on this. So in other words, if I want to use the hardware interface, number one, I'm going to have to figure out how to do that rather than using the drivers that are already in there. But the other thing, too, beyond that is it's not necessarily going to be portable. In my case, I'm only worrying about one specific type of hardware on this. But if I were to, let's say, do a cross-platform analysis of, let's say, file access patterns on different Android devices, I'm probably going to be stuck with using some sort of a software interface, because the hardware stuff is going to be unique to each device. And what are we talking about operating systems? This is, for the most part, a software class. You need to be aware and cognizant of hardware. But my point being is that one of the main reasons we have an operating system is to try to abstract away the specifics of hardware. Think about that. You guys as application writers, you don't want to have to fiddle around with what's the specifics of this clock on this phone or that clock on that phone. You have to go through, learn to vice drivers each time, and manufacturers are very often not all that forthcoming about details. That's why we have operating systems. OK, another thing, too. Oh, boy, look at that. Fun. Counters rule eventually on this. Example, you already have dealt with this. Remember back to assignment two and PIDs. And the whole problem is, especially if you were using an array for your process structure on this. Well, you don't want to make that array too big because you're going to run out of memory, as you shouldn't. But what's the problem with a smallish array on this? You're going to start reusing PIDs fairly often. Well, there's ways around that. You can use your incremental counter or what have you. I think probably a good half of the people in this class have had that conversation at some point maybe about six weeks or so ago. Well, this happens with other stuff, too, on this. One of the things I'm trying to keep track of is essentially how many files get opened on a phone. How many, not just individual files, but I'm talking about file sessions on this. In other words, each time open gets called. Now, the thing about that is that these file descriptors can roll on this. And not just that, the process IDs can also roll on this. You know, the 32,768 process limit for 32-bit systems, that's more than enough for OS 161. It certainly ain't enough for a real system like Linux with the Android operating system on this. I had to deal with this. So one of the things that I'm trying to measure is how many times, if you will, let's say a file gets opened or whatnot, I had to insert some additional code using 64-bit numbers to make things unique on this. So in other words, these sorts of things you really do need to deal with here. In other words, if we're trying to measure something, my point is numbers get really, really big, really fast on this, and I need to track a whole bunch of unique occurrences on this. I can't worry whether or not file 20,000 at 8 a.m. is the same as file 20,000 at 4 p.m. or whether or not the counter ruled in the meantime. But again, nutshell version is numbers get big here, and if we're using a software measuring thingamabob, that is probably going to have a fairly coarse degree of granularity. Think about the time stamp that you'll get back from the kernel on this. As opposed to if we're dealing with, let's say, directly with the hardware, then we're going to be limited to the specifics of that hardware and dealing with interface problems. Questions, comments, complaints on this so far? All righty. Now, another thing, too. This is computer science. And what's one of the things about computer science that you always hear about? You want measurements, experiments that are repeatable on this. You write up a research paper. At the very least, if you're ever challenged on that research, you want to be able to say, yeah, this is how I conducted my experiment and have the challenger at least be able to kind of put together some similar batch of experiments and get more or less the same results. Well, here's to hoping on this. Part of the problem is, remember this old chestnut? We're trying to measure the present, but the rest of the system is trying to use the past to predict the future on this. In other words, we essentially have kind of a feedback loop that really guarantees lack of predictability and lack of, well, repeatability on this. And part of this problem, too, is when we're injecting the measurement code itself adds additional complexity and additional, well, uncertainty into this whole thing here. A real system, they're almost never in the exact same state that they were in the last time on this due to a whole bunch of different factors. You've already dealt with this a lot, especially in assignment three. Now, you're only dealing with OS 161 and an emulator, but look how much, if you will, variability comes into play because of things like what's the initial random seed, how process it, how many times you've already dealt with this problem. Most of you is like, I've got a weird bug that shows up every 10 times I run it. Or this works one out of three times, et cetera, on this, too, because, again, we're talking about predicting something that may or may not come to pass on this, lots of race conditions on this. Now, try to do that on a real system on this in terms of where you've got, let's say, variabilities such as when a user types something. We're waiting on disk IO on this. We've got literally several thousand threads that are actually up in space at the time on this. Trying to get any sort of consistency out of that, it's going to be kind of difficult on this. So take a look at this here. Classic research back out or wimp out. Well, I ran my experiment, and I was expecting a pretty graph that kind of goes in a Gaussian distribution. But instead, I've got this thing with, I don't know, with just kind of a whole bunch of random noise. I don't know if you ever look at there's a classic research paper, it's by Kovar Hall. Look it up sometime when you get it done here. But it's, was it? It's like electron band gap in Germanium, I believe it is. It's about a real world undergraduate science experiment. And essentially, you have a livid undergraduate who is saying that what I was told to get as a physics experiment, or maybe it was a chemistry experiment, is absolutely nothing to do with the results that I actually got. Well, this happens all the time. And you know what? If the timing results that I didn't get back on this. So in other words, I'm trying to measure, let's say, the time that it takes to read a file. And I'm expecting that most of the time it takes x milliseconds. But you know what? I'm getting x milliseconds sometimes, and y milliseconds other times. I have no idea why. I'm going to blame cash. That's what's going on. And you know what? That could be the reason. I'm just saying is that it's also a little bit of a cop out on this. And I know that's something that we often get questioned about on this. Jeff, other researchers, like, wait a minute. You know what caused this particular, if you will, let's say, batch of data in terms of, let's say, how much time something took? Well, we really do need to know, because we're going to get asked by this, by, if you are the reviewers of the paper, on this. All right, so going on here. Measuring real systems here. Yeah, look at this here. This is the old, more feedback here. The measuring itself tends to affect the thing that you are trying to measure. Well, yeah. It's kind of like, what is it? Again, is it Heisenberg? I forget the person. That if you try to measure something, it essentially messes up badly whatever the thing it is that you're trying to measure on this. So yeah, measurement may itself destroy the problem that we're trying to measure. So I mean, one thing about that, I am trying to measure files on this. And how do I measure or monitor or log files? Well, I do that by writing it out to a log. Now, what is that log? It's a file. So what happens when I try to measure files? I write to a file, which causes a file access, which goes back to my log. And that was one of the first problems I had to deal with on this. Whoopsie, it's a great case of recursion on this. You can't log logging. Well, the problem is, though, I want to log stuff, but I can't log the logging. But I still need to know how much effect logging has on the system. How do I know that when I can't log the logging on this? So there's going to be a certain amount of guesswork that you can never get away with on this. So other things, too, is we need to separate the noise produced by the measurement itself. In other words, if there is some sort of a delay here, well, what's it caused by? I know one of the other researchers in Jeff's lab was struggling with this issue. And there were some delays that we couldn't account for. We couldn't account for. We couldn't account for. Well, it turned out the delays were actually introduced by, guess what, the logging and measuring system itself. So that itself is we were getting results, or he was getting results, but he had failed to say, you know what, this particular, the measurement itself has an effect that we have to kind of pull out and separate from that. So another thing to keep in mind here is, look at this here. The measurement overhead may limit your access to real systems on this. And another researcher, I don't know if any of you have met Jinghao. He's a really super bright guy in phone lab. And he is right now researching some very very network essentially trying to prove, how can you verify a particular, let's say, network algorithm on this? One of the problems he's dealing with is the network hardware vendors don't like to be forthcoming about their code. In other words, you've given me the algorithm. Well, here's your chip. Trust me, I've implemented your algorithm on this. Oh no, I'm not going to let you see the code that implemented it. That's my trade secret. So he's actually having to deal with this. How can he kind of essentially verify that this black box does what the manufacturers claim that they're doing? Make sense on this? So in other words, these are all real world examples here that in terms of the measurement itself, it's going to very likely destroy it sometimes. But at the very least, it's going to very often muck it up or introduce additional, well, just jankiness into the system. And lastly, again, sometimes you're just going to have problems with manufacturers themselves not being forthcoming on this. Trust in the comments so far. Moving right along here. Now, let's take a look at the problem with the OS itself here. We need to find out, where are we actually going to insert our debugging hooks on this? So we need to understand the kernel itself. And yeah, a lot of output on this. This is another thing, too. Yours truly has actually several times now melted Jeff's phone lab test bed, which is not a pleasant experience. I can assure you on that. So in other words, when you are logging every single Harshtunken file access in the kernel, it generates a lot of output. And sorry about that. And what happens with that? Well, I want to collect it. Well, it's being collected on users' phones here. And essentially, this whole process itself over whelms the collection and aggregation system itself. So that's actually something that right now, yours truly is trying to use a mallet to shoehorn a lot of data into a small hole on this. And I'm going to have to move in the direction of doing more summaries on this. Sometimes I'd like to get the more detailed data. But my point being is sometimes we have to live with less data because we're just simply not able to collect all the data that we would like. And even when we did collect it, the next step would be processing all that data, which is another issue. So benchmarking can seem hard. Well, I'm not saying it's conceptually hard, but these are more practical problems than anything else. But they're real practical problems. So is there any way we can sidestep this? Ideally, we want to measure things in the real world because you can say, this was a real world experiment using real users, real computers on this. But sometimes you can't do that. Another classic example of that, if you are in nuclear physics and you want to build, let's say, a better hydrogen bomb, you're not going to be able to recreate experiments every second week. You're going to have a problem with, where are you going to get your test bed on that? There is going to be some, if you will, island in the Pacific Ocean that is going to be very unhappy. So we have to go instead of doing a real world thing, we're going to build a model. That could be one thing. In other words, what we are going to do is instead of coding something up and running it, we are going to try to put something into mathematical terms. What would we do? I want to create a really big bang on this. Well, these are the physics equations about, let's say, how tritium and deuterium interact under these conditions, yada, yada, yada on this. So I don't have to turn some poor island in the Pacific into an inferno. I can say, here it is, here is, the mathematical equations instead. Another thing we could do is I could build a simulator on this, very close. But instead of simply saying, here are the mathematical equations, what I'm going to actually do is, you've probably heard about this a lot, like weather modeling on this. Well, I've got the equations, but I want to take that a step farther. And I want to actually put data into it and, well, just let things go and run. Well, let me take a step back here. I lied to you a little bit. When I'm talking about simulator is I'm actually using software to do what? Simulate, also known as emulate, some other parts that I'm having problems with. Or maybe that I just simply don't want to deal with. Example on this, right now a lot of you are dealing with assignment 3.3 and eviction. And what's one of the problems with it? It takes a long time to run those tests, because what is your code doing? It's sleeping most of the time. Well, what happens just for the sake of my own ease of development? Rather than the code constantly having to wait on real IO, I write some sort of a simulator for, let's say, disk access on this that says that, OK, I'm going to simply magically up the wall clock time by whatever it is. But I can kind of, in effect, pretend that that read or write operation already took place. What that's allowing me to do is run my experiment and get my results back a lot quicker on this. So I'm simulating a piece of hardware on this. So again, nutshell version, we've got models, which are, roughly speaking, mathematical tools. If you see an equation, it's a model. Again, think the equations to who knows the hydrogen atom, or in the, let's say, some sort of an equation for, let's say, the probability distribution of, let's say, where the disk arm is. As opposed to code, that's a simulation. What I'm doing is, let's say, that I am trying to simulate a particular piece of hardware or what have you. Questions on the difference between, if we can't do things in the real world, or we don't want to for practical reasons, we can either build models or we can do simulation on this. So where can we go from here? Virtual reality here. What we're talking about here is, what are the pros and cons on this? Mathematical guarantees, if we do have assumption, because again, let's say, so you think about like Newton's laws of motion or whatever they are, they are essentially true. But in terms of modeling computer systems, what's our problem? We're talking about unrealistic assumptions on this. So I'm going to, let's say, assume that, who knows, this particular part of a computer system has a Gaussian normal distribution. Let's say I make the assumption that where the disk arm is follows a Gaussian normal distribution on this. Well, I'll be able to model it very easily. There's statistical equations out there, statistical packages. And I can get lots of nice results here. But the issue here is, again, I'm making that assumption. How do I know that it indeed does follow that type of a distribution on this? So again, that's something that we have to be careful about. Simulations, again, this can certainly speed up and make our data collection a lot easier or quicker on this. But again, we're also having issues here that sometimes our emulator itself or our simulator is itself implemented correctly on this. So one thing that I know that you guys are familiar with, and that is how about the emulator that you're using for OS 161, namely CIS 161. And one of the issues, this was an issue a while back on this, that there's actually, I don't know if you've ever looked at it, the hooks for the emulator CIS 161 allows you to collect statistics from the emulator itself. And when you do that and you hook into the emulator, it actually was causing some anomalies and went upstairs to Harvard on this. I don't know where things actually arose with this. But my point is that the collection of statistics itself was triggering some problems on this. So in other words, we had an emulator, or you have an emulator, but we're assuming that that simulator, that emulator, is correct. Most of the time it is. How many times he said, the emulator is broken? No, the emulator is not broken. It's your code. Well, but sometimes it's not so much broken as well. I guess, yeah, it is broken. It's a bug, not a feature. Well, the point is that there could be something that is wrong. And it turned out that there was something a little bit wonky with this particular case on this. So it all boils down to, we're making some assumptions that our shortcuts, either with mathematical models or with our simulators, are actually correct. OK, what do I do? OK, now this is where 341 comes into play for those of you who have had 341. By the way, graduate students, 341 is kind of a class about computer organization. It's kind of a quasi-hardware course on this. So how do I compare, let's say, two disk drives? Or remember, I know Chris Schindler talks a lot about comparing, let's say, two CPUs in the class. It's kind of the same idea. In terms of operating systems, we're comparing two scheduling algorithms, replacement algorithms, file systems, what have you. So point being is, how do we actually compare these things? And we're going to get into a lot of the same issues that was talked about in 341 on this. And what we're saying is, how we measure it, what metric? There could be several metrics on this. Remember, we talked earlier, I hinted at about this at the beginning of the class, that, let's say, if we're talking about a computer overall, well, what's good? We could talk about, like, responsive to the user on this. But what do we mean by responsive to the user? We could also mean, is it cheaper? Is it, let's say, does it crash? And even with it responsive to the user, do we mean that some occasional real long waits are OK as most of the time it's really, really quick? Or is it better to have, let's say, a computer that kind of waits a slow but not too long period between each mouse click or button click on this? So it really depends by what is our metric, because what test program, in terms of assigning noogie points to this particular artifact as opposed to that one, we're going to get two completely different results. And again, the old story about the blind, in this case, I guess, is the blind scientists and the tree. What are we measuring? We're going to get slightly different results on this. So couple types of benchmarks that you do need to be aware of, microbenchmark. In other words, one aspect of system performance, like, oh, wait a minute, next slice on this. Macrobenchmark, that's the entire system. And then application benchmark, that is, well, a program. Think you could even say, let's say, Microsoft Word or Firefox or what have you. And, well, which one actually works better on this? NLX actually kind of take a look at this. Virtual memory system, you're dealing with this right now. Microbenchmarks, and take a look at this here. Time to handle a single page fault or time to look up a page table or what have you. Those are very easily quantifiable. They're very clear on this. But what's the problem with these microbenchmarks? They don't tell us a lot of information. Macrobenchmarks, aggregate time to handle page faults on heavily loaded system. Because let's face it here, if we're talking about, let's say, time to choose a page to evict, yeah, that we want our victimization algorithm to run pretty quickly on this. But why do we need a victimization algorithm in the first place? Because what? We're out of memory and we want to, in some way, service the user's request on this. The user is not going to give two franks whether or not the page eviction algorithm took a microsecond or two microseconds on this. The user is probably going to be much more interested in what the result of this page eviction algorithm is, namely whether or not a good page was picked. So once again, you can have a great, or if you will, a very quickly running page eviction algorithm. But if it makes a lousy choice on the page to pick, then in essence, it's not really a good idea overall. And that's why we need to look at the macro benchmarks on here. Stuff like page fault rate here and then application benchmarks. We've looked at these already. How about that? Triple sort, parallel VM. These are tests. And these are good and bad. Think back to, again, 341. And you've probably heard a lot of talk about, well, sometimes manufacturers do stuff like they essentially, it's almost verging on dishonesty. They'll code up tests that emphasize the strengths of their hardware and de-emphasize the weaknesses of it or vice versa on this. Or let's take a page, if you'll excuse the pun, out of everyone's playbooks right now. What's one of the mantras that you've probably been hearing here? Pass the tests. Well, I don't know why. All I know is that this really cludgy hack works. I passed the test on this. What you're really doing here is you've got a benchmark that you were given here and you are coding to that test on this, which for the purposes of trying to get past this course, it's human nature on this. But is it a really good idea of designing code in general? No. Because if you were, let's say, trying to design a memory manager for a real system, the real system, or the real user, is not going to be running parallel VM. The real user is going to be running who knows. Again, Duke Nukem to date myself completely on this. So point being is that you need to keep in mind what's going on. All righty, microbenchmarks. What's the problem here? Tree, meat, forest. We were already talking about this here in terms of the fact that we're measuring, let's say, one particular thing. And that is that, let's say, we want to measure page fault time here. But you know what? That by itself is not really going to help. What we want is what is the performance of something that's going to be probably more noticeable to the end user, either an interactive user, or let's say, if it's a scientific user waiting for a batch of results. So again, focusing in on something really, really micro does not tell the whole story. Macrobenchmarks, well, it's the exact opposite. It might be what we're looking to measure, but how on earth are we actually going to define that? What do we mean by more responsive to the user? Application benchmarks. Well, again, who cares about your stupid application? Improvements for it may harm others. Again, I'm sure you've already seen this. You change around one piece of code so that you're now passing parallel VM and big fork goes out the window with the bathwater. So that's one of the reasons why Jeff gives you a whole body of tests. Try to, if you will, cover a lot to bases. But even then, it's not really perfect on this. Questions so far? We've talked a little bit about, if you will, the types of benchmarks, some of the problems with measurements on this. And by the way, these are good exam questions. If you looked at previous years and the general rule like what is the problem with microbenchmarking? Or give an example of, let's say, how you might be able to evade one particular type of benchmarking. Or give an example of the problems that you might find in terms of data collection. Or let's say, manufacturer X comes up with widget Y that improves data collection. How will this improve or change the whole measurement process on this? All righty. Choosing and running the benchmark, again, what we're trying to do is trying to make their system look faster on this. Again, essentially, that's what you're doing right now with assignment 3.3. Frankly put, you're trying deliberately to optimize your system to get through some tests on this. But we don't want this to go on in the real world. And then another thing, too, is what we could do is you know what? I changed something as a manufacturer, and I want to kind of sell it. So I've got a particular benchmark that that's going to make my particular system configuration look better, even though this particular, let's say, computer with this particular software setup, it might be great for running Photoshop, but just absolutely horrible for crunching Excel spreadsheets, just due to the way things are set up and configured on this. So this is the problem with a lot of these other types of benchmarks. So fundamental tension here. Yep. Useful system is a general purpose system. A system that will run anything, cost nothing, be essentially infinitely fast on this. Fastest system, in practice, though, is a single purpose system. In other words, if you want a system only to process signals, well, whistle up your electrical engineer of choice and have said electrical engineer design one of these ASICs, a specific integrated chip that does nothing but X on this. And actually, if you go back, look at the history of computers. Remember, going back to, let's say, game consoles and, let's say, the 1970s, you had specific dedicated hardware, a machine that would only play palm or a machine that would only play breakout on this. Because at the time, the fastest system that was affordable and cost less than, let's say, a house had to be dedicated in hardware on this. And obviously, with time, there's been a merging between the two on this. Questions, comments, so far? What to do next? So again, this is kind of almost more 341 here. But in other words, don't simply try to say what's going on with improving a particular metric here. But I want to do something. And let's say I want to improve a specific area, come up with. It's kind of like almost test driven software here. I want to validate my model and my simulator before I start changing things here and then actually check. In terms of my measurements here, do they actually work? If they don't work, I probably have done something wrong. Even before I get to that, does my, let's say, my model for collecting data make any sense in the first place. Maybe it's just way too specific on this. But try to think about these things before I actually get into the testing phase on this. I'm as bad as anyone else. Well, I will try to kind of change my models to meet whatever it is that I'm trying to test on this. Again, human nature. Other things to here? As appropriate here, if I can, what can I improve? Sometimes, frankly, people, it's not even worth it here. My simulator does not show improvement. Don't bother implementing here. If I can't change something and get results out of it, maybe I need to go in a completely different direction. Example on this. Do you remember back some weeks ago with the friendly, was it the BFS scheduler on this? One of the points that Jeff was trying to get across is scheduling is really difficult. How can we measure schedulers? Very difficult on this. And we've had some really smart people work on this. And sometimes bottom line is that despite a lot of effort here, it may be something, let's concentrate our efforts elsewhere on this. The CPU scheduler, maybe it's time to focus our efforts on, who knows, maybe the virtual memory system, but some other aspect of system on this. So sometimes the results that you get from measurement is that this is not a very good area for improvement on this. And this could be either in research or in a practical area. Let's say you're working for a company on this. And your boss says, we need to come out with widget 2.0 so that we can keep selling software updates to our customers on this. Well, what are we actually going to improve? Classic area of that. And this is getting a little bit off. But something like Microsoft Word and Word Processing, that is a mature product on this. What really needs to be changed in terms of improving Microsoft Word? Besides, OK, you can make a snarky remark about Microsoft. But my point is that it's a mature product, and there's very little to do. There's probably other areas where we as a company could invest dollars to try to get results back on this. Questions, comments? Oh, how to make things fast. I had to Google this fellow here. But essentially, yes, this fellow is emphasizing, if you'll, system capabilities not necessarily performance on this. What do we want our systems to really do? It's not always going to be about getting one particular metric on this. Questions? OK, well, thank you for coming, people. It's the last time I'll see you. It's been a pleasure having you, so sniff, sniff. I should say this. Jeff has requested that for this week, in lieu of recitations, we're going to be scheduling some more office hours for you people in light of it kind of winding down to 3.3. And then probably next week, if we could maybe get together some sort of like a review session, I know Jeff is probably going to be doing some review in class. I don't know if the TAs are going to, we hope to all offer you something else on this too. But get her done with assignment 3.3. And then don't forget this thing called the final. All righty? Thanks, people. Good luck on the last assignment and the test.