 All right. Good morning. Someone more dispersed group today. OK, so today and Friday, we are going to talk about performance and benchmarking. And I think, I mean, in the pantheon of incredibly useful information that I have been providing to you all semester, I think that it's possible that this is actually up there kind of close to the top, all right? Partly because this is possibly something that when you guys go off and write software, after you take this class, will actually matter to you. You actually might have somebody come in and say, this piece of code that you wrote is really helpful. It's really useful. It's great. It also is too slow, right? And that's, you know, that's kind of a happy problem to have, and it indicates that people are using your code, but at the same time, somebody may ask you, and here's the problem with somebody asking you to improve the performance of your code. It's actually something that they can measure, right? So they can come into you and they can say, hey, you know, like I ran these benchmarks and it takes this long and I'd like you to reduce that by half, right? And unlike most other software projects that you can work on for long periods of time with, you know, some sort of poorly defined output, right? This is the type of thing where somebody can actually measure it and say, well, you only reduced it by 48%, so you're not going to get your raise this year, right? So anyway, so we're talking about some general sort of approaches to performance and benchmarking. Operating systems, again, are an interesting case study because the performance of operating systems really matters and over the years, the community has invested a lot of time and energy to talking about benchmarks, thinking about performance, coming up with methodologies for addressing the problem, developing a lot of folk wisdom within the community that's used to help people figure out what to work on, right? So that's what we're going to talk about today. And hopefully this thing is going to continue to work because if it doesn't, I'll be a sad person. Ah, all right. So Simon 3 design documents are due. The Simon 2 solution was released yesterday. People got a copy of it and we're working out great. So any questions about course logistics? Simon 3 is due two weeks from today at midnight. So it'll be a fun two weeks for you guys. You know, look, I mean, the course staff, my suggestion is that people, you know, come start coming by office hours. I had office hours yesterday from 11 to 1 and Calvin was there along with Ethan and that was it, right? So actually, I think Isaac stopped by at some point. But, you know, like, please come and get help from the course staff. Come get help from me. Come get help from other people. I mean, that's one of the things that's going to help you with the assignments is if you come talk to us and get some sense of how to do these things. I mean, I've done these assignments before, so I have a little bit of understanding about what are possible pitfalls, right? But we look forward to seeing your design documents tonight. Any questions about course logistics sort of things? All right. So last Wednesday and Monday we spent talking about operating system structure, right? So last Wednesday we talked about model of the kernels. On Monday we finished up talking about some other kernel designs, including micro kernels, exo kernels, and multi kernels, and hybrid kernels, actually. So any questions about this stuff? Just a brief review. I think this is kind of important stuff. Unfortunately, we're into a portion of the class that's a little more grab-bag-ish. So, you know, we're moving a little bit more rapidly from topic to topic. And hopefully that doesn't, like, really scramble up people's brains. Or if it does, hopefully it does that in some sort of good way. So, but any questions about our last little mini-unit on operating system structure? All right, so, you know, the type of question that you might see on an exam, what, you know, when people started to build micro kernels, what were their goals? What were their, what were they responding to in model of the kernel design? What were they trying to fix? What were they trying to improve? And what was the overall goal of the effort? Can anyone give me one, one piece of the overall goal of the micro kernel movement, if we want to call it that? Oh, one at a time. Ah, Malik. Okay, so, so model, so model of the kernels. Okay, so I'm going to get you to refine that answer a little bit. Model of the kernels already enforce a user space and kernel space separation. Oh, oh, wait. You were, you were headed off in a good direction and then you, like, you hit reverse. Right, so try flipping that statement around. Right, trying to get stuff out of the kernel. Right, like, get stuff that is in, you know, the privileged kernel code base and get it out of there. Right, so we want to minimize what goes in the operating system kernel. Right, I mean, what do you think kernel? Right, what do you think? Like, think about a kernel of popcorn. It's this little thing. Right, and, you know, the micro kernel guys were kind of like, well, model of the kernels, I mean, it's more like a nut or something. Right, like it's this big or like a ball, a bolus of junk. Right, so let's try to like compact this down and really get to the point where we have a nice little compact kernel of things that have to be privileged. And then we'll toss everything out and force it to run the user space. Right, what's, what's a, what's a sort of a, what's the consequence of this? Hopefully, a hopeful consequence. Why do this? I mean, this sounds like a big pain in the butt. You might have a more stable kernel, right? I have a smaller kernel. It's easier to, to, to check for errors. To try to, to verify the correctness of it, either by human effort or in some other ways. So, so that's one thing. So, so one ends up, the second question is kind of, what ends up in this kernel? What are, what are some of the things that you really just think you can't get out of the kernel? Right, I might want to, but I'm really not going to be able to. So, so, so, right, IPC, right? So, so on some level, if the kernel doesn't do IPC, there's really no, not going to be a way for processes to communicate, right? So the kernel has to provide some way for process to communicate. Why does, why does this become even more important on microkernel systems? What's that? Somebody set it up here. There's like a collective answer here that came in three parts. But does somebody want to state the whole thing? Right? Right, because what I'm doing is I'm forcing stuff that was in the kernel out into user space. And so by doing that, I'm forcing it to use IPC mechanisms to communicate. Before it could just make direct function calls inside the privileged address space of the kernel. That was great. Now I'm forcing it to, to communicate across this, using this primitive that the kernel is going to help make safe. But at the same time, those, those primitives could potentially be expensive, right? So, I need some sort of really fast IPC. What else goes in the microkernel? What else can, can we just kind of not get rid of? What's that? Yeah, I mean really, again, this is kind of the exo-kernel thing, right? Any sort of resource multiplexing, right? This is one of those things that we just kind of have to let the kernel do, right? It's like the rest of the processes got together on the system and they said, to avoid letting this become kind of a Lord of the Flies situation, we've decided that we're going to elect the kernel to be in charge, right? And the kernel is going to divide up resources and then we won't have to be hunting each other all over the island or whatever happened in that book. They'll write a long time. All right, so, so yeah, so typically in the kernel we've got like really, really low level VM routines. We've got, you know, IPC, you know, protection, right, protection and multiplexing and these sorts of things are what ended up in the microkernel, right? And what we've done is, as people pointed out, we forced everything else out into user space. So we've forced other things to be implemented into user level processes and to communicate across this well-defined interface. And we talked about some of the things that you can actually implement out in user space, like file systems, right? And actually file systems are kind of an interesting, I meant to bring this up when we talked about file systems. So, you know, again, the microkernel, you know, some of the microkernel design principles lived on, but really the idea of forcing these things into user services running outside the kernel didn't really come to fruition. But with file systems, what's an interesting piece of evidence that this is at least possible? How does it happen on a microkernel? So who knows about a specific feature of Linux that gives you some idea that this is actually possible? Right, so Linux actually now supports user level file systems, right? So you can write a file system in Linux and you can run it entirely in user mode, right? And people have used this to support, there's like an EC2, sorry, an S3 file system that's written entirely as a user level file system. So clearly that's possible, right? All right, what about good interfaces? Give me some characteristics of, you know, this is, you know, unfortunately, something that we rushed through at the end, but I think one of the more important takeaways of last class, which is what are some good characteristics of interfaces? How do you know when you've written a good interface? Let's start over here with these guys, Alex. What's that? When it could communicate easily. What is that? Okay, I mean, I think that's a good point, right? So the interface technically doesn't do any communication, but what is easy when you write a good interface? Interaction between components that are on both sides of the interface, right? So if you write an interface in a natural way, you'll find that using it feels right, right? Using it feels natural, using it doesn't feel strange, right? And a lot of good interface design is kind of thinking about how does this component interact with other components and what are the right set of calls to make, right? What about you guys? Contribution. Good interface design. How do you know you've designed a good interface? Yeah, yeah. So one way you know that you've written a good interface is when someone else can use it, right? When somebody else can use it to write their code. Someone else can take your piece of code and they can use it and they think, oh, okay, that's kind of cool. Less... Okay, so maybe if I... Well, I don't know, that's an interesting question. So if I reduce the connection between modules... Yeah, that starts to get a little bit... I'm not so sure about that, right? That's maybe about more of a language-level battle, right? I mean, my... Just a brief aside, my joke about Java is that it always seems like you need to create at least eight objects in order to do anything useful, right? And part of the reason for that is because they have really strict interface definitions and they really... They've abstracted things very, very in a very detailed way, but at the same time, it makes it impossible to do anything useful, right? On the other hand, languages allow you to do things useful in a single line can end up being really disgusting and ugly, right? So there's some sort of tension. All right, another contribution from this side of the room. Good interface design. How do you know that you've written a good interface, John? Yep. Yeah, so good interfaces, as Butler-Lamson put it in a really famous paper about hints for computer system design, good interfaces give you a place to stand, right? It means that when you want to change the implementation, when you need to rewrite the code, when you're, for example, maybe trying to improve the performance of the code, the interface definition itself can be stable, right? And that isolates your changes from the rest of humanity, right? So, you know, again, we basically talked about all these. So the other thing that didn't come up, right? Thing that didn't come up, and I think it's important to emphasize, is that when you write a good interface, good interfaces are... Remember, when you guys write code modules, people aren't going to read your code. What are they going to read? They're going to read the documentation and at some level what the documentation is, is the interface specification. So, if your interface says something, people might actually believe that your code does that thing, right? And make assumptions about it doing that thing. And so, whatever you put in the interface are things that you have to support. If it's not in the interface, then that means that people aren't allowed to make assumptions about it, right? So the example that came to me when I was thinking about this is the stability, right? So the sorting algorithms have this property that they can be stable or unstable, right? A stable sorting element preserves... The stable sorting algorithm preserves the order of the original elements in the cases where multiple elements have the same sorting key and unstable sorts don't do that. The stability of a sorting algorithm is an important property of it that clients rely on, right? So if you say, I have a stable sorting algorithm, here's the function to call, and you pass me whatever sort of data structure and, in fact, be stable, and a better say stable even if you're implemented, okay? So this is what John pointed out. I allow myself to improve my implementation without requiring clients make changes because I can keep the interface the same, right? And again, as long as I am implementing the assumptions in the interface, I can do this. And finally, one thing that didn't come up is that it allows me to break things into useful chunks in order to test and verify, right? And this is something that, you know, again, I mean, I don't think anyone's going to take my advice, but if you wanted to for Assignment 3, you could break down Assignment 3 into chunks that you could test yourself. You could write your core map, and then you could write some kernel-level tests to test the core map, bang on it a bit, and see if it works, and you could run those, and you could see if they actually function, right? It always seems like too much work to write tests, right? It always seems that way until you find yourself debugging for days and days and days, right? And then you've forgotten the fact that you decided not to write the tests in the interest of time, right? So, yeah, so that's usually a losing argument. So again, this is critical for Assignment 3. So one of the things we really want you guys to write out and think about is you're doing the design document is how things are going to interface with the Java, right? What's the interface with the core map, you know, as an example? All right, any other questions about this stuff before we talk about performance? All right, so why do we care about performance? Maybe it's an easy question. Why do we care about performance? What's that? No, I like that answer a lot. Because user time is important, right? Yeah, and on the other hand, the other thing that's interesting here, and this came up a little bit before when we talked about sort of like the trade-offs here between correctness and speed. But the thing that I think is interesting, that people who think a lot about correctness hate this. But people would rather have a system that is mostly correct and really fast than a system that is really slow and provably correct, right? And, you know, this is kind of a joke. So this is a quote by a famous, actually a really famous computer scientist who also did some systems design. He dabbled in a number of different things. He probably heard it from something else. So he was talking about this great computer system that they built where the logical soundness can be proved a priori. So this system was structured in such an elegant way and so carefully that you were able to prove its correctness, right? You were able to prove that it worked and let's see here. Look, they were able to uncover these errors, trivial coding errors, right? With the density of only one error per 500 instructions and each of them was located within 10 minutes, right? So who thinks this person is talking about windows? Who thinks this person is talking about Linux? Who thinks this person, well, I mean, why not? Who thinks this person is talking about macOS? Hands should go up. Come on, there should be some fanboys here. MacOS, probably correct. All right, does anyone know who this quote is by? So Edgar Dykstra in this quote is from 1965, right? So this is a quote about the, which was the initials of the university he was working at. The multi-programming system. This is a very, very old operating system. It had a layered architecture. We talked a little bit about the operating system. Does anyone remember what classic contribution the operating system made to your life and to OS 161? Semaphors, right? Semaphors were an idea that were introduced by Dykstra as a way of implementing this provable system correctness, okay? So on some level, I mean, this is really the last time anybody claimed that this was even possible, right? Until very, very recently, we talked a little bit about the provably correct microkernel stuff, but that's for a very, very small system, right? So in general, for a long time, people have put up with lack of correctness, right? And part of the reason they've done that is in favor of speed. I remember, I'm just gonna say this anecdotally, because I couldn't find any actual proof for this, but I read somewhere that they spent all this time implementing the operating system, and the hardware was evolving really rapidly. You can remember, this is 1965, right? And I read somewhere that when it finally booted, it took like several days to boot, right? And I mean, even in 1965, that was a long time in computer terms, right? So all of this provable correctness, at least in my convenient anecdotal world, had a significant performance cost, right? And this is one way of thinking about it, right? So I remember people at Microsoft talking about this. If your system just rebooted immediately, right? If that blue screen of death just came up as like a flicker and then it was gone and your system was back alive, would you care that it crashed? I don't know, probably not now, right? Everything you're working on is in the cloud, you know, et cetera, et cetera. So, you know, if your system could come online immediately, then maybe software faults wouldn't even bother you at all, and maybe we would just forget all about correctness and we would just fail right over. So performance really matters, right? And one, I think, again, evidence of the fact that performance matters is that we've been favoring it over correctness for a long time, right? Okay, so we care about performance. This is pretty easy, right? So how many people have ever done any performance analysis and improvement? Ben, I'm figuring because you came from industry where they care about that, Malik, maybe a little bit. Anybody in a class here at UB? Oh, really, what class? Yeah, yeah, yeah. So, but most people know, right? But, you know, again, I mean, this is probably like, it's probably not too much to this, right? So here's what we're going to do, right? Simple, it's like a little four-step recipe, right? First thing we do is measure the system, right? So we'll run some experiments, get some data, no problem. They're going to analyze the results, you know? So this is like, you show up at 9 a.m., measure your system, that probably takes, you know, maybe you check email a little bit, so that takes maybe until 11. Then you have some results. You spend a couple of hours analyzing those and it's time for lunch. You come back from lunch. You spend a few hours hacking and improving the slow parts of the system. And then step four, you go out at 3 p.m. for your celebratory beer, right? Like, this is pretty easy. And then tomorrow morning you get up and you go back to one. So this is not hard, right? This seems like it's pretty trivial stuff, right? All right, so on Friday we're going to talk about virtualization technologies. You know, does anyone think this is easy? Okay, good. So let's talk about it. That would be kind of fun. What time is it? Maybe the earliest I'd ever finished class. It's still 9.25, right? I can't teach an under-25-minute class. All right, so that was my attempt at ending early enough. But I think I took so long to get to that slide that people actually maybe took it more seriously than I meant them to. But anyway, so okay, so what are we really going to do here, right? How are we going to get stuck and tripped up by all this stuff, right? So measurement, right? Measuring your system. This should be the easiest part. This is our starting point, right? Like, what do you mean by that, right? How am I going to measure this system? What are the tools I'm going to use? What is the system going to be doing while I measure it, right? How do I make sure that activity is meaningful? How do I do it without interfering with some poor staff who might be trying to use the computer, like me, right? The analysis, right? So now, oh man, I mean, I don't know about you, but I got into computer science because I didn't like math, right? And now you're asking me to like, you know, compute an average or something. I mean, now I think I can do on a good day, right? But, you know, it's possible that an average might actually not be that useful. Statistics, right? Now I've got a problem. How do I improve the slow parts of the system, right? I mean, I thought my system was fantastic. It was great. Like, I wrote it and I was so happy with it and I was, you know, just reveling in how beautiful and perfect it was. And now you're telling me I've got to go and mar its perfection with this gooey performance improving stuff, right? So how do I do that? And then, of course, like, this is a tough, this is a tough decision as well, especially after you've spent all day doing these other things, you end up in a very sort of over-analytic mode and end up overthinking this. And then, you know, after you've had the beer, it can be difficult to figure out how to get back to step one. All right, so we're going to break this down. We're going to talk about basically part one and two, actually really just part one today. We don't really need to talk about four. So, and then we'll talk about two and three on Friday, right? So part one is kind of the measurement part and this is actually something that turns out to be what made you guys would expect it to be really, really hard, right? Hard for a variety of reasons, right? So let's start out by just talking about how we perform measurements on real systems, right? So how hard, you know, and a simple thing that we might want to measure, right, that we might care about when it comes to performance is the time, right, the time that's spent, right? Like, you pointed out, time is passing. Your life is draining away and your computer is doing something, right? So time ends up being a pretty fundamental metric and you can imagine that it's easy to measure time on real systems, right? Is this easy? Who thinks it's easy? I should have asked the question where the answer was that it was actually easy, because... So when you start talking about time, you end up, you know, down in the guts of the computer, unfortunately, dealing with the fact that there are these fundamental limitations to what computers can do and the type of access you have to computers, right? So, you know, when you're measuring time, you're essentially stuck between these two places, right? You can get these high-level system counters, right? That, you know, it's like, oh, you know, your program took two seconds to run, right? And that might be great if you're debugging some really, really slow piece of Haskell or something, right? Because that might give you a starting point. But if you're talking about, you know, events that are going at a nanosecond scale, right? Or a microsecond scale, right? Remember, talking about machines that are executing potentially billions of instructions per second, right? So a computer second is a long time in the lifetime of the computer, right? Maybe a short time in your lifetime, but it's a long time in the computer time. It's going to be difficult to measure this, right? Your high-level counters may not have enough resolution to measure the type of thing that you want to do. And the lower-level counters end up being really difficult to get your hands on, right? They can be messy to interface to. And, of course, you know, the lower-level counters start to roll over, right? Which makes this even more fun, right? Like, I mean, maybe I care about this more than you do, and that's just because I spent an unfortunate part of my graduate career having to actually think about counter-rollover, right? It's just so annoying, right? Just add four more bits to the counter, right? And then it'll only roll over once every four years instead of once every four weeks, right? So, okay, so now, but let's say that, you know, we did some work in the morning. Now it's 10, 10.30 a.m. We've been digging through the documentation. We figured out which counters we want to use, how to set them up, how to measure things. We've written our code. We've rolled things into loops so that we can measure multiple iterations, you know, now, but, you know, we're at least, well, maybe we're not, right? So, we run the experiment, and we get a result. And, you know, we start doing the analysis and we pinpoint some points in our code, and then at some point, you know, we decide, well, maybe, you know, I want to, you know, I change the way that I do the measurement a little bit and then I run it again, and I get a completely different result. So, what types of things on computers are going to make it difficult to get repeatable measurements of real systems? Right? What's the real... So this is a place where you're essentially coming up against a fundamental principle of computer system design, right? So, you're trying to measure something, right? You're trying to measure a particular discrete event. What is the rest of the system trying to do? What's that? Optimization. And how do we say that in this class? What's that? It's trying to use the path to predict the future. And every time you do something on the computer, you add something to the past, right? You give the computer more information about what you're doing. And so, it's likely that the next time you do that same thing, the computer will be more prepared. If it wasn't, it would kind of be a terrible system, right? I mean, if it wasn't at all that it was kind of like, ah, forget the past, man, the past is the past, then it would have a difficult time using the past to try to predict the future. So, what are some ways, right? We've talked about this in multiple places in the class. What are some things that might happen the first, second, third time you run a benchmark or you run a set of tests that would make the future results different? Then, right, remember, this system is a series of caches, right? And as you run things, stuff is filtered. Essentially, one of the things that the operating system is doing to try to make the future better is to try to take things that were used in the past and promote them up into faster and faster cache, right? So, this informal sort of idea of cache effects, right? I mean, it's funny, cache effects, I don't know what the equivalent is, it's like the boogeyman, right? Like, cache effects are kind of like chaos theory in computer systems research, right? It's like, well, I didn't understand my results, cache effects, right? You read a lot of papers that essentially take interesting or potentially uninteresting noise in their data and essentially attribute it to cache effects, right? Like, yeah, we're not sure why they got results, cache effects, right? And of course, what's the problem with cache effects is there's a convenient way to eliminate results that you don't like because people are always using it to eliminate the results they don't understand, right? You never take the results you understand and say that they're due to cache effects. No, no, no, it's the ones that don't fit into your model and you're like, oh, okay, that must have been... That was way too fast, cache effects, right? So anyway, so this is a problem, right? And this sort of thinking about performance is something that sort of makes it difficult to reason about performance and of course the general problem here is that when you're talking about real systems they're almost never in the same state twice, right? I mean, if you think about the state in some sort of like very abstract, you know, Aristotelian sort of way, right? Capturing the same state of the machine is almost possible, right? We're talking about, you know, exactly how fast the disk was spinning, exactly where the heads were, right? Which direction the heads on the disk were going? You know, where every byte of data was on the bus lines in between the processor and the cache, right? I mean, it's a total mess, right? So whenever you're taking results like this you have to accept the fact that the machine is unpredictable. Now at some level, what we do is we use those statistical techniques that we'll talk about a little bit, just a little bit. I'm not gonna bore you with too much statistics on Friday, but we do, we want to use some degree of statistical rigor to try to understand the distribution of our results, right? We're not expecting to get the same number every time, right? What we're trying to do is to understand the distribution of numbers we get and the reasons why a distribution looks the way it does, right? But in general, this sort of property makes it almost impossible to produce the same result on a real system, right? We'll talk about some other ways of getting repeatable results. So, okay. Last thing about real systems, right? Any physicists in the audience, anybody who's taking a physics course has any interest in physics, right? Right. So measurement, and this is especially true for operating systems, right? Because operating systems are very low level and doing a lot of things and trying to do them very rapidly. Measurement tends to affect the thing you're trying to measure, right? So inserting, you know, debugging hooks, right? Tracing, trying to, you know, let's say, I mean, you guys can try this, right? When you get your assignment three working, put in some printfs in your VM fault handling code, right? And see if you can wait long enough for your system to get to the kernel name, right? I mean, when you have really, really hot code paths that are being executed millions of times a second, that's not a place to call printf, right? That's not a place to like... I don't know, maybe that's obvious. It's funny, it's not a place to like, hey, you know, I'm gonna make a library call. I'm gonna make a billion instructions to execute, right? No, no, no, that's not gonna happen. And there's actually three outputs of this, right? One is that it's possible that as soon as you poke the system, you know, with your measurement probe, it just stops doing whatever it was doing before, right? Whatever it's doing, whatever subtle performance problem, I mean, sometimes you have these performance problems that are caused by really subtle race conditions, right? So, you know, 99.9% of the time page fault handling happens within, I don't know, 30 cycles or something. But that 0.1% of the time, something happens and it can get stalled for, you know, a quarter of a second. So, putting a bunch of tracing and debugging operations onto these code paths, it's possible that you're just gonna eliminate this entire, like, whatever really subtle little problem interplay between two threads or two parts of the system that is causing the thing that you're trying to measure and eliminate is now gone, right? So that's probably the most existential problem with measurement, right? It's possible that, again, as I pointed out, I mean, measurement tends to affect the thing that's measured. That's always true, right? But if it affects the thing so severely that you lose whatever it is that you were trying to fix in the first place, then you're in trouble, right? Then you need to try something else. In general, all the time, whenever we do measurement, we have to separate the results, right? We have to figure out what's the overhead of the measurement that we're doing and how do we get that out of the results so we can understand our system better, right? The final problem on real systems is that measurement overhead can mean that people who are encountering the problems that you're trying to solve are not willing to help you solve them, right? So, you know, you've got a client that comes in with a really, really heavily loaded web server and says, you know, this performance degradation from the last version of the code that you sent me, I'd like you to fix this. And you say, hey, can you run this tracing version of our web server that runs 10% slower? I mean, they're going to be like, no. I don't want to do that, right? So, yeah. So the measurement overhead can mean that these people are less willing to run it, and it might mean that it makes it difficult for you to run these systems, right? And that's the flow that the measurements might take in an order and in amount of time, right? So for operating systems, that tends to be even worse, right? Mainly because, you know, a lot of tracing and debugging that you can do on real systems happens at places where programs interface to the operating system, or it involves making modifications to the operating system to collect information about the programs that are running, right? When you're trying to do that in the operating system itself, you've kind of bounced up to some sort of really metal level, right? So, like, where is the right place to put these debugging hooks in? You know, they can generate gobs of output. I mean, when I was working at Microsoft, we were collecting some statistics on the memory management algorithms. You can imagine that, like, if you try to trace page fault handling or outputs of data, you can generate, you know, terabytes of data within seconds, right, in terms of just, like, the faults are happening all the time, right? You know, mind you, that's also really slow down the system, period, but you also have this problem where, you know, I've got a gigabyte of log data I've generated in a quarter second that's sitting in memory, right? So now I have to stop the system while I flush that on the disk and I can restart, right? So this becomes kind of an issue. All right. So let's give up on real systems, right? I mean, I think we've convinced ourselves that measuring real systems is just too tough, it's too hard. It's kind of a mess. So what are some alternatives here? What do people do when they've decided that, you know, for whatever reason it's too hard to get repeatable results, it's too hard to figure out how to measure things, it's too invasive, I can't get real workloads. What do we do? What's that? Simulate? I can simulate. What else can I do? Different word. Somewhat similar. So I can build a model. All right. And modeling, I actually think it's interesting. I mean, I have some degree of respect for both modeling and simulating. I've done more simulating than modeling. I think mainly because modeling always seems like a much more abstract and mathematical pursuit, right? And I tend to avoid mathematical pursuits. But modeling can be very, very powerful, right? So modeling is this idea coming up with an analytic description of your system. Models don't run, right? You don't implement a model, right? You develop a model and then you use them all to prove things analytically about the system, right? The other option is to build a simulator, right? And simulators involve writing some additional code, right? So a simulator is a trade-off. It means you've got to write more code than you already have, so it introduces this burden on you as a performance, you know, optimizer. But the idea is that you're trying to scrape complexity off this system so that you can understand it better, right? So normally, you know, when you write simulators you take some part... What ends up happening, people don't always think about it this way, but is they're taking some part of the system that's complex? Like, let's say, you know, you have a radio on this, like a Wi-Fi radio or something, and those things are really complicated, you know? Sometimes you send two packets back to the same receiver, one gets dropped, the other gets through, right? Why does this happen? Nobody knows, right? The vagaries of the atmosphere. So what you're trying to do is you're trying to build a simulator that eliminates some of that stuff, right, where you can say, okay, I want 90% of packets to get through, right? Is that realistic? No, not really, right? Because packets don't get dropped statistically, they could drop for a variety of reasons, they're just really hard to understand. But normally, when you build a simulator, you're trying to remove some complexity from the system in hopes that you'll be able to understand the code that you wrote better, right? And again, the usual way that you distinguish models and simulators is if it has code, it's a simulator, if it's math, it's a mop, right? So if you see equations, you're dealing with a mop, right? If you see code, more code, then you're dealing with the simulator. And this can be a difficult choice to make, right? It's not necessarily a choice you have to make, the best choice sometimes is everything, right? What can you imagine or some pros of, well, I just put one up on the board, so I'll talk about this one. So models tend to have this nice feature, which is you can make models will generate really strong mathematical conclusions, right? You can use a model to prove something about the system, right? A bound, right? You can prove that the performance will always be as good as this, right? Or whatever. What's the problem with models? What's that? It's harder. So what do I usually do in the process of creating a model? It's almost impossible to create a model that actually reflects reality, so what do I do along the way? I make all these really unrealistic assumptions, right? So usually, in order to constrain the problem enough so that I can actually prove something about it rigorously, I end up having to make so many assumptions that by the time you show the results to the guy who's actually working with the real assistant, he kind of looks at it and says, I don't really believe that, you know? I used to work in wireless sensor networks, and you had all these papers that would start off by saying, you know, we assume that nodes can communicate on this fixed unit disk model, right? Where if you're within n feet of the node, it can communicate with you, and if you're outside of it, it can. And then we were just like, well, if you're starting from that point, then I'm going to have a difficult time really believing anything else in the paper, right? And it's possible that there's still a lot of good work, but those sort of assumptions turn off people who understand more of the complexity that underlies these systems, right? So that's a problem, okay? What about simulations? What's the pro of a simulator? Or a pro? Why would you write a similar? What's that? Right, so one is, I mean, let me just ask this question. You guys are working with this simulator this semester, right? So this 161 is a MIPS simulator. What is one attractive feature that you guys have hopefully discovered of this simulator? No hardware required, but what else can it do? What's that? You can debug through it, but particularly for experimentation. We saw it on real systems, it can be impossible to get the same result twice. What about a simulator? What's that? So I can set up the simulator to, yeah, so that's a great point. I can set up the simulator to simulate certain situations. One pro that I didn't put up on this slide is that I can get repeatable results, right? So if you set up your system 161 simulator and you give it a random seed, it'll just run the same set of, if you don't change your code, it will do the same. It will die the same way every time, right? Until you fix that problem and then it will die some other way. So the point is that simulators have that nice feature where they can produce repeatable results. In the best case, what you do is you remove hardware complexity that's not really contributing to your intuition about the problem and that allows you to produce a result that makes more sense, right? And the other thing that can happen with the simulator is you can actually speed up things quite a bit because you don't have to use real hard, right? It's faster to just apply some sort of radio model to a packet transmission than to actually send a packet over a radio to another node and record whether or not it showed up, right? The problem is that you have to write more code or a problem, there's a lot of problems with simulator results, right? One of the problems is that sometimes that complexity you've scraped off the system is actually the complexity that was producing features in the performance data that you were having a hard time grappling, right? But in the worst, I guess the worst possible thing that can happen is that you actually have problems with the simulator, right? So there's a famous story that I think, I wish I remembered all of the details better but it came into play when we started to talk about LFS, right? The Log Structure File System. So I think there was a simulator that was used to support, let me tell the story in as vague a way as possible. There was a simulator at some point, somewhere that was developed to support assertions that were made about a certain file system. That simulator had a problem. The problem was that the buffer cache in the simulator did not actually write dirty blocks to disk when they were evicted. So you can imagine that this produced a certain degree of performance improvement, right? If when you evicted blocks from your buffer cache, you didn't have to write them back to disk, that would be a faster system, right? It also just wouldn't persist data, right? Which is kind of what the file system was for, right? So here's how the story goes. This simulator was used to publish results about, that supported a certain set of design decisions about how to design files, right? At some point, someone became aware of the fact that the simulator had this problem, right? And that person decided to fix the simulator and rerun the experiments. So you can imagine that the results that were produced the second time were different, right? It's possible that they were still favorable to the system that the simulator was designed to support, but probably less so, right? Given that it actually did things like write stuff back to disk. And, you know, this person who discovered this had since moved on and was at a different institution and, you know, ended up in kind of an interesting situation vis-à-vis the person that they had done the original research with who was themselves in an interesting position because the paper that they had used the simulator to support was fairly popular and well-cited, etc. So you can get into trouble with simulators, right? And this is just one very vague story about how. All right. So, I have a few minutes left and I'll just talk about benchmarking before we're done. Right? Now that we've done the how part, right? Introduce you to some idea of the how section of performance, right? How do I even measure things like what system do I measure them on real systems models? Simulators, right? The almost more philosophical question is what's the right thing to even start talking about when we start to compare two disk drives, right? I give you two disk drives and I say, which one is better? You know, or I give you two scheduling algorithms, right? Same question. Which one is better? Page replacement algorithm? Simon 3? Which one is better? To file? Oh, file system. This gets really nasty, right? Which one is better? If I just asked you that, right? I just said, Isaac, here, two disks. Tell me which one is better. They're both $100. Tell me which one I should buy. I mean, at the bottom of this, I'm asking you a question that has a yes or no answer, right? And why is that so hard to produce given these types of systems? What's that? So I need a benchmark, but is it possible that I'm going to have a benchmark? The reason why this gets hard is the performance of these types of things are really multi-variant, right? There's, you know, disks have a read speed, they have a write speed, you know, there's all these different factors that influence how disks perform. And so, really, asking you which one is better puts you in the position of having to weight those different factors against each other. Sometimes, unfortunately, you know, some clueless upper management type is going to come to your desk and they're going to say, you know, Calvin, you know, we're making an important decision here. I want you to tell me which is ZFS better than EXT4? And you're going to be like, and he's like, I don't understand. Just tell me which one of the ones we should spend a million dollars on. Now, in that situation, what is he asking you to do? He's actually asking you to figure out how to weight the performance. So that's a different question, right? But the fact that there are performance weights and there are multiple variables that have to be combined to produce a score for these is pretty clear, right? So these things are difficult. I think there's some similarity here between a philosopher and elephant sort of problem, right? Like, you know, really depending on what part of the system you're measuring and how you're measuring it, you can come to all these different conclusions about it, right? And the problem is, the thing that makes it worse from the perspective of the elephant, right, is that these people are trying to figure out what to do to the elephant, right? In the philosophers thing, they're just trying to figure out what it is, right? But in the performance thing, we tie the trunk in a bow or something. So this poor animal is going to be struggling here as people try to do a variety of different things that aren't really addressing some sort of holistic issue. All right, so... All right, so I'm out of time. Friday we will finish with benchmarks and we will talk about analysis briefly. And in particular, we will talk about finding the parts of your system to improve, all right? I look forward to seeing you guys next.