 Q all right It's nice outside all of a sudden Thanks for being inside. It's a nice day All right, so today what we're going to talk about is We'll finish talking about the performance and benchmarking Material that we started on Monday Talk a little bit about how to use statistics and we'll talk about on del's law Which is pretty critical to understand really on del's law is one of those things that maybe comes out of computer Systems, but you can apply in Almost any to almost any situation so it's it's quite widely useful And then we'll start talking about Butler-Lampson sort of famous paper on hints for computer system design And this is another thing that I would really encourage you to read I mean the setup and the way that the paper is structured is a little bit corny But there are some really really good very deep lessons in this paper about how to build Software not just computer systems, but a variety of different types of things that you guys will probably build in the future All right, so I checked this morning. The course valuation percentage is currently at 43 So, you know if you haven't done the evaluation, please do it the sooner you guys hit the targets the quicker I release questions from the exam So, you know and this is very helpful feedback But yeah, please keep going if you haven't done it do it doesn't take that long. I don't think So and and also just just make just to make sure you understand I don't see these results till after grades are released So there's no way for me to punish you either collectively or individually for anything you might say on this So please be honest, you know and please do the evaluation. Okay We're in the process of sort of finalizing the assignment three the third checkpoint targets So this is the end the end of the road One significant difference with assignment the third part of assignment three that you guys should be aware of is that Because we started to swap because we started to involve the disc The targets run a lot more slowly than they have in the past now Some of you guys are probably going to get frustrated by this because you've been in an iteration loop that involves Running test 161 running the target seeing what failed and then going back and try to fix things and that's not a problem But in this case you will actually have to figure out how to use some of the test 161 features Like the ability to run single tests because if you run the whole test We didn't make tape ten minutes to complete and that may be longer than you want to wait Same thing when you guys start submitting things to the back end I don't see any reason why there would be big differences in in performance on your machine and on the back end Once you submit something for grading on the back end you cannot resubmit until it's done So if you want to wait for 15 minutes to find out something that you could have found out locally probably a little bit more quickly By all means go ahead But I would encourage you to look at the test 161 features and figure out how to use them Like actually run and I just suggest something for future reference before you ask a question in a forum like run test 161 help you know like that will actually answer a lot of the questions that we've seen in the forum about the tool Right there is a help page. It is designed to be helpful You know we're thinking about in the lab putting together like a the equivalent of the can I Google that for you? Like can I run the command with dash help for you and put it down live somewhere? So maybe we'll just there'll be a fun little weekend project But yeah, please explore the capabilities of the tool There's a lot of features you guys might not be using use them to shorten your iteration cycle for the swapping part So you don't get frustrated All right tomorrow. There is a distinguished Lecture in Clemens 120 at 330. I would really encourage you guys to go to this You may not have gone to some of the other ones you may have but this particular speaker So we're also going to end office hours half an hour early tomorrow at 330 to accommodate this so the TAs can go Hurry about a Christian has done all sorts of really important work and computer systems and networking and What he's going to talk about tomorrow is something that's quite accessible. So a couple years ago. He took a break from MIT He's now running a company or participating in the running the company called Cambridge telematics and That company right now is actually releases an app that you can download and use and Tomorrow you may in addition to finding out about what they're doing There's some very interesting sort of psychological and sociological messages that are part of this project But you also will find out what a fantastic driver I and then you will want me to take you places because I'm such a safe and good driver Which is true actually you may not get there quickly, but you will get there alive So that's kind of the bonus and and yeah So this is a pretty based on sort of what I've seen from this before this is a very accessible talk So please come right. I think you guys will will appreciate it. It's pretty interesting stuff Okay, final announcement of shameless plug. I'm looking for UTAs for the course. I'm teaching next year. It's the first year's new first-year seminar. It's a class on the internet It's going to be awesome. We're going to do cool stuff We're going to teach people really interesting things It's stuff that you guys probably wish that you knew wish that we had taught you earlier now We're going to teach you right freshman fall So if you want to participate, please sign up look at that. There's even a QR code So we know we're in the future now So if you if you want to like load this up on your website and scan it with your phone or something like that Does anyone use QR codes raise your hand if you scan a QR code in the past week All right two people Yeah, I don't know. I don't I don't get it But anyway, there's one just to show you that I'm capable of creating one. It may or may not work so So anyway, so please sign up like you know If you're around next year, I think if you've taken this class with me and you did okay You're a reasonable fit for this. We're looking for people that are excited about the material You don't have to know the material because unfortunately this class is new and so we haven't taught you this before None of the TAs will take in the class So please sign up and let me know email me if you have any questions, but All right any questions about the performance and benchmarking stuff We've talked about so far so we talked a little bit about types of benchmarks We talked about different approaches to measuring something that is not a real system We talked about some of the challenges of measuring time any questions about this before we go on Okay So now let's talk about statistics, right? So how many of you feel like you have an expert graph of grasp of statistics? How many of you how many of you have taken a course on statistics you already took it you took a class, right? Did they teach you something in that course? Okay, that's interesting Okay, how many people can explain the difference between a mean and a median? How many people understand why the difference is important? Yeah, I think that I think there's probably more damage It's been done to society by the mean than anything else out there, right? I mean the mean is a terrible terrible number to use don't use means They're usually wrong use a median and it will make a lot more sense, right? For example, the average income in the United States is a lot less descriptive of what this country is like than the median income For example, right? Anyway, so if you're not feeling like you're an expert on this stuff It might be good to review it because to some degree and I think this is increasingly important You know your lives are going to be defined by how well you can collect Interpret and respond to data So if you don't feel like you can do this figure it out do whatever you have to do go to course Sarah or something like that And and learn this stuff because it's it's important but a lot of you guys including me maybe became computer scientists because we didn't want to do math And so statistics always feels like math and so turns sometimes turns people off, right? But it but it's critical, right? So, you know on a good day We you might be able to and I have all these struggles with my students and other people I work with I have these struggles myself, you know on a good day Maybe I can convince you to actually run an experiment a few times and Compute a summary statistic over the results like that's that's what computer scientists frequently feel like is a level work as far as statistics, right? I'll run the experiment twice Okay And maybe if I really wanted some extra credit, I'll put airbars on the graph, right? Now the airbars may be a little weird because I only ran the experiment twice But whatever this is kind of how we feel like this is good, right? We're doing good here This is like a plus extra credit sort of statistical work, right? Unfortunately, that's not really sufficient. So let me offer a different approach To to doing these sorts of things whenever you run an experiment It's very useful and I tell my students, you know draw Before you produce a graph draw a picture of what you expect the graph to look like What is this? What is this experiment? What are the results going to yield? It's a great way of testing your intuition and Making sure that once you see the result You understand how it compares with what you were expecting a lot of times We have a tendency to look at data and think oh, okay Well, that makes sense But it only makes sense because I'm seeing it if I actually had to predict what it looked like What I would have discovered is my prediction looks quite different than what I found, right? And that could be a problem and You know again, this gives you a comparison point after you actually gather some real data and predictions are also a great way to validate those Non-real system tools that we were talking about before so if your simulator if you run some simple experiments on a simulator And the results don't match up with your intuition at all then it's possible You know one of two things is wrong you and your intuition about how the system works Which is interesting to debug or the simulator itself Which is even more interesting because that error is going to ruin all of the results that you use that simulator to collect, okay? So beware the premature use of summary statistics. So what do I mean by this a lot of us want to You know, it's a lot easier once I've done 10 experiments to just compute an average and move on Producing things that look like distributions or histograms frequently Require a little bit more work on a plotting level and also potentially more data But it's really really deceptive to Converge too quickly to any of these so-called summary statistics that try to summarize an entire data set particularly before you know What the data set looks like, right? So for example these two data sets if you collected them if this was the underlying Reality of the thing you were trying to measure these two can produce the same mean and median and in fact I can construct two different distributions one that's bimodal One that's multimodal and one that's unimodal that have all of the same summary statistics from standard deviations to everything else And so there's really no way to figure this out. Why is this difference important? Like let's say you were measuring some part of the system performance. Why are these two distributions? Why is the difference between them in terms of modality so critical? Why would you know, why would this if you were able to plot this data? Why would this plot lead you in a very different direction than this plot? What is the plot on the right telling you? Yeah, George What's that? Okay, so they're different right and and the most striking difference here is supposed to be in the modality Does that make sense this this distribution is all clustered around one point this distribution contains two different It's essentially the overlay of two different distributions one of them is clustered around this point the others cluster around that point Why is this interesting from a system perspective? Let's say this was like the low time of a particular web page on your server What is this what is this graph over here with the bimodal graph? What is that telling you? What's that? Okay, I'll start with that answer and try to so I have more than one bottleneck I have more than one thing that's going on. There are two different contributors to the page load time So something different is happening. Let's say this is the same web page or it's the same code bath It's the same benchmark. I'm just running it over and over and over again There are two different things that are happening within the system I don't know what they are, but those two different things are potentially producing these two different distributions Who knows what it is? That's up to you to discover But this data is really interesting because it tells you that there's there again. There's something we're going on here, right? There's some point at which one Benchmark, you know when I run the same benchmark twice One of them goes down one path and the other goes down some other path and those two paths are potentially what is producing this distribution So that's extremely interesting to know if all you do is collect this data And you never plot it and look at it and compute these summary statistics You will never learn this and you'll be missing out on something. That's really important about the underlying system So Examining raw data and looking at these types of distributions is really critical Don't don't just compute a mean. It's too lazy or median even Look at the data run enough experiments that you can you can look at data like this and this will give you a sense of what's going on This is actually happened to real people So, you know Margot who did some of the work on the course has stories about Students coming to her with data and they you know, they would they would say okay I hear I did a plot and they would have computed one number for a data set and she would say you know The variance on that looks kind of big right can I see the underlying data? And it turns out it looks like this and there are actually two different things going on. They had no idea What's happening right so this is a real thing Okay Outlier so what is an outlier when I collect data? Define an outlier colloquial definition. Yeah Yeah, it's very different than the rest of your data So let's say I run a benchmark the first time around and I get 10 the second time around I get 9 the third time I'm running at 11 the fourth time around it. I get a hundred Then they keep going and all the rest of my results are sort of clustered around 10 So that fourth result is an outlier. It's way over there So if I plotted the distribution, you know, I'd have a bunch of data points like if I did this right I'd have a bunch of data points that are clustered here or maybe clustered like this And then way off here in the edge of the graph somewhere over there is like one data point So what happened? So what is the tempting thing to do when you come across a data point like that? What's the easiest thing to do? Yeah Get rid of it. Just ignore it right like ah something weird happened You know must be some strange cosmic ray that came down and bounced around the room just right and hit the You know hit the CPU cache just right in the right place and like somehow invalidated of an entry, you know Or oh, what I must have you know, there's all sorts of ways to explain this away I must have run the experiment wrong. Maybe I gave it the wrong parameters, whatever And that's very very tempting. What what sort of I mean? This is related to another problem that you guys have faced in This class. What does this sort of feel like? What's another time you guys have probably had to Work on something in this class where something really unusual happened just one time right and it was like oh I hope I was ready for that. What is this similar to? Suspect some of you guys have debugged some of these problems sort of like what? You ran you were like I am so happy this test works and I ran it again. I ran again and then at some point what happens? Bails right some sort of weird race so it turns out actually you guys I'm sure will be very interested to know this Scott actually found some bugs in the solution set for assignment three We found them because we used that's what's actually one a run a he was actually gathering some statistics about performance under various parameters I won't bore you with the details so he ran this overnight He ran like a hundred times and we actually got it to crash a few times. So that's interesting David always knew there were bugs with the assignment three solutions that but it's not clear He knew where they were now now we do right so this is going to like raise conditions and in a similar way the approaches is is Also pretty similar right when I have a race. I really want to know what happened Because there's something wrong with an outlier It's possible that you made one of those mistakes that you're trying to claim that you did And it's possible there are cases where you can remove outliers and the data set safely But there are also cases where those outliers are really full of information There's something very very bad that we're not because if you're Amazon.com and you're running your home page And it turns out that the average load time of your home page is you know a hundred milliseconds But the worst case load time is one minute You probably want to know what happened in that one minute case because that customer is not coming back Unless you're Amazon.com. There's nowhere else to buy anything anyway. So you have to go back But they will be mad. They will be sad that your web page took so long to load All right, so you really need to understand outliers when you're working with them Okay Any questions about the statistics part? This was like the fourth part of our of our very easy progression of how to approach these sorts of performance and benchmarking problems All right, so now let's say that we've gathered the results. We've analyzed them We have some sense of what's going on with the system Now the next thing to do is clearly just improve the slowest part of the system, right? It's an obvious next step that cannot possibly be wrong Right. Okay. I'm just gonna go on Yeah, so so and even if so even if this were true, this is very hard to do because If I so how many people think they know What is the bottleneck or what's the slowest part of their assignment 3 implementation? All right, how many people if I asked you How do you improve performance on assignment 3 you'd be like there's this thing I need to fix right how many people know that for sure Okay, most of you guys are wrong probably And and you're wrong for like for not bad reasons It might be I need to fix the part of the code that I wrote at 1 in the morning or 4 in the morning I need to pick fix the part of the code that's really dodgy and weird I don't really understand that's okay like cleaning up those parts of the code isn't necessarily a bad thing But most of your intuition about performance is wrong And that's why it's actually really important to go through this whole process including running benchmarks gathering data and being objective About the parts that need to be fixed Because again if you if you ask most developers go often improve the performance of your code and you don't Force them to do this or you don't do it yourself. They've got plenty of things to work on right? They've got a task list that's a hundred things long of things They want to fix and improve none of which are guaranteed to have any impact on performance and some of them actually might might cause it to decrease Okay, so let's go back to the first assumption right so I'm going to improve the slowest part, okay? So let's say your code has two functions foo and bar foo takes five minutes to execute bar executes in five seconds so Which function should you work on optimizing and why? Yeah Okay, so that's a good point. Let's say these are independent. There's no there's no call chain dependencies here. Yeah Yeah, so that's part that's part of the answer right The and there there are two aspects to this right so the so the one that we just brought out is Significance right which is how much do these functions matter if that one, you know if foo is Let's say foo is some sort of cleanup function in your or some sort of recovery function That's used by your file system after a crash It may never run or it may run so infrequently in in cases where performance doesn't matter that you just don't care Whereas bar may be executed like every five seconds So it's possible that about a hundred percent of the time that your code spends is in is in bar and a tiny tiny Percentage is in so that's part of it The other thing that's also very difficult to gauge and this is something where you have to use your intuition as a software developer is How how much performance is there to be gained by improving these functions? So if the reason why foo is so slow is because it's actually really done like it's using some sort of linear search Or something stupid it may be that there's actually a pretty big win there that you can get easily And it may be that actually bar has been around for ten years and every Intern that this company has hired has tried to improve bar So there may just not be a lot of performance to be gained. So these are things to think about So and but the thing that you can measure here is the significance This is something that you can measure using standard testing suites, you know code coverage tools and stuff like that The difficulty part, you know, there's one of the few places in this process where you actually get to apply your intuition and a little bit of Knowledge about the code as a software developer So here is you know this and if like I said before there are Two or three things I want you to take away from this class This is one of them because you're gonna you're gonna apply Omdol's law to every part of your life So here is Omdol's how many people heard Omdol's all before? Okay, awesome. So you're gonna hear it again, and maybe by the fourth or fifth time you've heard it It'll actually sink in Okay, so this is the most formal presentation of Omdol's law and it says the Impact of any effort to improve system performance is constrained by the parts of the system not targeted by the improvement What does that mean? Someone like try to translate that into a little bit more manageable terms Yeah Right, so the point is that if there's if I have a Piece of code that takes a minute to execute and I work on a part of it that only takes a second to execute The overall improvement can only be one second That's the maximum and it's unlikely that I'm gonna get that one second unless the thing is doing something totally useless And I can eliminate it completely so What Omdol's law says is if I was thinking about this so let's say that I was able to Let's say that I was able to conjecture that I could achieve the following performance improvement So now I'm raising the stakes a little bit here because not only did foo take five minutes to run But foo is also it's also possible to reduce the runtime of foo by four minutes So I can reduce the runtime of foo by 80 percent in contrast Maybe I spend a lot of time on bar and I can only bring it down one second So the the improvement of foo is better Both proportionally and absolutely I'm getting more time back Four minutes a lot more time like several multiples and I'm doing a better job Proportionally if I just look at the overall one time. So in every possible case this looks like a winner and Yeah, that you know, that's what I would think I know where these things come from sometimes. It's nice that they're there It sort of jolts me out of my room okay, so Here but I can get but in any one of these cases I think this is really important to understand It does not matter what those statistics before were I can always construct a case where you're doing the wrong thing If you don't understand the contribution of foo and bar to the runtime of the system in this case if the program spends 95 percent of its time running bar and only 0.1 percent of its time running foo and look, okay, maybe that seems like a contrived example, but it's not really there are certainly parts of the code that Very very rarely get touched. How many people have used the code coverage tool before when they're doing testing? Please next year more hands up Yeah, so code coverage tools the idea is I run a bunch of tests and a code coverage tool Tell me what percentage of the code was actually touched by the tests I ran Zack how hard it is it to get those numbers up to a hundred percent It's very very hard. So it turns out that even if you write a bunch of tests There are all these weird code paths that your code never goes down and guess what is down those code paths Bugs Performance problems like if the code isn't hit by the test suite. It is not tested So if your test suite only achieves 50 percent coverage There is a whole bunch of your code that you have no idea whether it works or not like zero So that's a problem So if you know assuming so again There is code that is infrequently executed even by really really aggressive testing suites that are trying to test all sorts of things that could happen Right, so maybe some of those code branches are impossible They may be just due to conditions that will never happen in the real world Although sometimes the way adversaries attack code is by creating those impossible conditions getting code down past It's not supposed to be right. That's another way to attack software find parts of the software that don't get used Find bugs in them that nobody caught because the test suite didn't hit them and Create the conditions necessary to lure the code down that code path at which point I can exploit some sort of buffer overflow or something like that All right So in this case that I just talked about The speed up that I get from bar is actually less than the speed up that I get from food Simply because food is just not executed very often and so even that Again, keep in mind for it was four minutes 240 seconds versus one second so two orders of magnitude More time was saved by my improvement to food and I improved food by 80% But regardless of what those numbers look like I can always create this case. It's also but there's also a really fun sort of Of what's the right word for it? There's a fun pathological example here, right? Because it's possible that food is never run at all In that case no matter how much work you do on it the performance of the system will never change So that's another interesting thing to think about and like I said before I mean this is why when you Do work on certain performance problems if you can remove one instruction and a lot of times this is done Everybody who's writing hand coded assembly for courses in our department and if you're wondering why am I doing this? I'm also wondering that But I will point out the fact that it is possible sometimes to use your assembly skills to actually improve performance One cycle at a time, but if you remove one like what's a code path where I would be very happy to find a cycle Give me an example from operating systems of a code path where I would I would again I would take that month for a vacation to Tahiti if I could find one cycle on what code bet Say VM fault with the fault path into the VM code context switches Yeah, those are all good answers that code gets hammered especially the VM fault stuff I mean all the time boom boom boom. I'm hitting that all the time So that stuff has probably been poured over and looked at over and over and over by the Linux people windows and all those groups Right, just looking for is there any way just reorder a few instructions and get one Trim one instruction from this path All right Here's another way of thinking about on dolls law like fix the thing that's hurting you Other parts of the code may be ugly other parts of the code May be messy other parts of the code may use crappy algorithms linear searches Whatever they may do things in a dumb way You may know about those problems and those problems may keep you awake at night because you're worried about the fact that Really didn't do a very good job of implementing that particular part of this interface, but who cares if it's not used it doesn't matter So find the part that's contributing to the problem and fix it and don't worry about the rest of the stuff Here's another Unfortunate corollary to this to on dolls law, which is pretty interesting The longer you work on a particular part of the system The less likely it is that you're still working on the right problem Even if you found the right problem to begin with So I found part of the system that was really hurting performance I improved it the problem is is I improved the contribution of that particular code path to the overall system performance now all these other parts of the system are rising up as the new problem to solve so You have to make sure that this is an iterative process Finding a problem using benchmarking Analyzing statistics fixing it and starting over quickly So you don't get too locked in to just sort of polishing this one particular part of your of your program Because like I said the longer you work on that part the less likely that is that you're doing the right thing Okay Any questions about this at this point makes sense overall? I mean you guys will have to do this I think at some point in the future You know even if you you know even given the fact that computers are really fast if you build something That's really successful if you'd go to companies that that build these big systems They very quickly get to the point where it is really important for them to squeeze every Outs out of those machines that they run Because it's expensive to run thousands of machines and data centers you got a cool and they got to keep them online So at some point if you can take if you could take Facebook's workload and reduce it by like 10 percent You could tell them you can run the same you could support the same quality of service on 10 percent fewer servers I think that they would find that interesting And you might find it interesting All right, okay, so let's talk a little bit. We'll probably continue this on Friday about this very sort of famous paper by Butler-Lampson, so the the structure of this paper is sort of interesting, right? But first I want to make a bold claim And and I think Butler-Lampson would agree with me, which is that since computer systems are actually way more complicated than algorithms You can want I mean one way to think about it. Maybe it's more fair to the algorithm people is computer systems are essentially a Complex interaction between hundreds and hundreds and hundreds of different algorithms that do different things Different functions little pieces of the function that are running a various various algorithms in some degree every function is an algorithm But because of how computer systems are written and used, you know, we're not doing big o-analysis here We're not going to prove anything about the performance of a system That's very very difficult to do the best thing we can hope to do is measure things and improve the slope arts, okay? So Butler-Lampson sort of leads off with this he says designing a computer system is actually harder than designing algorithms Why what are the things that make designing computer systems more difficult? What are some of them? Yeah Yeah, there's there's more internal interfaces. There's a lot more complexity with building real systems act Right I have some I actually have to think about how real computers solve problems rather than just like this abstract algorithm that I can now prove Things about yeah What Butler-Lampson says is the external interface is Last price lead a little less precisely defined more complex and more subject to change What is this like what is the external interface can be an example of an external interface to a system? Yeah, what's that? You know, we're talking about a computer system here. That's computer hardware So computer hardware has interfaces as well and you're right to point that out, but but give me an example I mean, where do you find a lot of interesting computer interfaces system interfaces now? I mean an example of a system interface I mean you guys have been and implemented parts of one this semester, right? Which one is that? The system call interface that's the interface between applications and the operating system now that's kind of an interesting case right because To some degree do you do you feel like the operating system interface at this point falls? Into this category is it less is it not precisely defined? Is it more subject to change and more complex? I mean complex maybe Even you know the Unix interface which is has tried to remain pretty thin has gotten big There's lots and lots and lots of different system calls that you have to implement in order to implement the POSIX API That's just hard if you wanted to build a new operating system today, and you wanted to be POSIX compliant It takes a lot of work. There's just a lot of effort that goes into doing everything that the POSIX requires Okay, what about subject to change? OS interface changes a lot changes infrequently What do you guys think? Pretty infrequently right so okay. We don't get the infrequent. We got one of the three less precisely defined again Very imprecisely defined What do you guys think? OS interface not really okay, so the OS interface has one out of three here You know it is quite precisely defined. I mean again POSIX has this definition That definition does not change very often the the the rate of change of the OS API is important to applications because if a change Is frequently I have to keep rewriting my application. That's that's that's annoying complex sure So why I mean is this is this idea relevant anymore? So okay, the OS interfaces is why first of all why is the OS interface not fall into more of these categories? Compared with other interfaces that we're going to talk about in a minute the OS interfaces a lot what? Yeah Yeah, okay, that's fair right but but at this point the OS interface may not change as much just why Yeah, okay, so that's true But I think if there was a lot of value to change in the OS API we'd still do it But why like why is the OS API so stable? Compared with like the API that might be You know provided by like a new company that provides some sort of web service online in comparison that the OS API is what? Yeah, that's true, too, but It's old Right at some point once you've worked on an interface for a while it gets old and once it gets old You hopefully it gets stable, you know it gets stable. It gets well defined So over time we've built up lots and lots of documentation about these interfaces because they've been around for a while And they have stopped changing so often if you went back in time to the 50s and 60s I think you could make this argument about the OS interface, but that's not true anymore So what interfaces does this apply to? Give me some examples of modern interfaces because again if this doesn't apply to anything that why are we talking about this paper? Right who cares, you know, there aren't any interfaces that fall into all three of these categories anymore What what modern interfaces do still fall into these categories? Yeah Yeah, like every Silicon Valley company that has a rest API like they are they are addressing these problems So, you know the authentication service that we use this semester to give you guys access to the course tools that has a back-end API I had to write tools against that API. I am happy that that API did not change during the semester because that would have been a pain But who knows I mean the company is a couple years old. They're probably free to change their APIs But it's another case where you know the how exactly you these are new interfaces, right? The interface to an authentication provider. What is that? What is the right interface? How do I design it? I may because it's not clear exactly what it is and we don't have 50 years of experience with it It may change. It's certainly complex. There's a lots of different features and functionality. I might want to provide And so I would argue that this still applies to a lot of systems In fact most of the systems that we build whether it's an internal API that's used by the tools within a company Or whether it's an external API that's presented to the world This is still true in a lot of cases, right? So this paper is still pretty relevant Whenever you guys write a piece of software, excuse me guys Can we? Thanks Whenever you guys write a piece of software that software will have an API and you will and so you will solve this problem At some point if someone else starts to use it You'll have to worry about it in a different way because then again Then you care about someone like me who spent you know a day writing a tool against your API once and really doesn't want it to Start to fail. Maybe it is failing. I don't know the scripts. I wrote haven't died. So that's Okay, so the other reasons is more complicated as someone point out before is they have a lot more internal structure So building up some of these you know again think about these big web services that have API's There's lots of internal API's that you don't see that are used by one part of the system to communicate with another And then that's something else that that is hard to get right And then yeah here right so the measure of success is a lot less clear when I'm designing an interface You know again, there's no performance notation for interfaces Performance has to do with how they're used how intuitive they are to people do they accomplish the things that people want them to accomplish? So that's that's a lot trickier. I can't prove that one Interface to a video hosting Service is better than another it depends on how they're adopted and what people do with them Okay So Butler Lampson at this point in time I think Butler Lampson could rewrite this paper and it would probably be even more awesome but you know he had participated in a bunch of different software projects and He this paper is not a you know a research paper and that it presents novel results He makes no claim to do that. He says these are to some degree sort of folk wisdom that's emerged During the time that he's worked on these projects and all the lessons of these in this paper are drawn from experience Unfortunately, you know if you read this paper and I would encourage you to you will find that some of the things that he brings up are a little I mean they're a little dated I mean the examples are from you know a couple decades ago And so it may be a little bit hard to transport yourself in time to the point where people would care about some of the Software systems that people are building back then But the lessons themselves are quite good All right, so there are three goals that Butler Lampson focuses on when you're designing systems like this. Does anyone remember what they are? Does anyone know? It's a big fancy table in here. It sort of organizes everything. All right, so functionality. What does functionality mean? for a system Okay Yeah, like does the system can the system accomplish the goals that you set out to accomplish, right? That's reasonable Speed performance how quickly does it accomplish those goals or how many? Resources are required to accomplish those goals because today sometimes the latter is more important, you know I may be willing to sacrifice speed which no one cares about who's using my API for Just not having to need 10x servers, which are going to cost me a lot of money Okay, so speed performance overhead, you know, this is how many resources does it require? Fault tolerance What does this mean? What does it mean for a system to be fault-tolerant? I mean, what does it mean for your OS 161 kernel would be fault-tolerant? There's a three-word answer Okay. Yeah, more or less. I mean it The easiest way to describe a system that's fault-tolerant is it keeps working, you know over time it has a history of continuing to work That can be accomplished in a variety of ways that can be accomplished through handling lots of different inputs that be accomplished through Making sure you control the inputs that the system receives that can be accomplished by failing and restarting really quickly Whatever but again the property of a fault-tolerant system is it keeps running it keeps working It keeps providing the functionality that you want So those are my goals now there are three now he breaks the design task into three parts when you're designing a system The first is ensuring that the system is complete So this is very related to functionality. This means is the system I'm about to build is it actually going to Accomplish everything that it set out to accomplish And if you don't do this very carefully, I mean how many people have ever maybe in this class How many people never had the experience of writing a piece of software and realizing like maybe halfway through that there was some important Is some important goal that the software was just not going to be able to meet based on how you've designed Right like this has definitely happened to me. It happened to me when I took this class I was like, oh, I want to do copy on right. Whoops And you guys might think about like given your VM data structures Could you actually do copy on right the answer for some of you is no right? You'd have to go back and start over and redesign your data structures And so sometimes this happens so if copy and write was one of the things you wanted to accomplish And you design the system in a certain way you can go to a point where it is just impossible and the system is no longer complete Choosing interfaces So again, this is something that we know we don't talk about enough as software developers I think but the interface that you're going to provide to someone who's going to use your system. What is the API going to be? Because there are certain cases where if you don't provide There's certain there's two things that can go wrong here either the interface Isn't complete and then it doesn't allow me to accomplish everything I want So even if the system accomplishes that the interface doesn't get in the access to the functionality Or the interface is really bad the interface makes it hard to do things that should be easy You want interfaces that make it easy to do things that should be easy and Possible to do things that someone would expect to be hard. That's the goal And then finally, you know considering different implementations. So this is Related to both speed and fall tolerance. How am I actually going to build the system? What are the implementation decisions? I'm going to make along the way That are going to ensure that the system is performant and the system keeps working so this is his you know catch all table that Unifies all of the different pieces of advice that are in this paper, you know as a software developer Take this table you can print it out. However you want you could print out this slide and cut it out and I don't know like Paste it to your laptop right? I mean forget like a stupid Apple sticker or something like that or like a The Android robot or whatever that's not going to help you looking at that right this Looking at it will actually help you because all of the slogans here are really good advice and if you understand what they mean You will be able to write very very good software. How many people I mean how many people have heard one of these sort of slogans before Yeah, Rob, which one? Separate the normal and worst case. Yeah. Yeah anyone else So actually I'll just a lot. I just want to pause because because I have a great example for separate the worst case And maybe that's where we'll stop today. You guys can start packing up if you want to so they remember and remember healthcare gov Remember that crappy website. Yeah Yeah, awesome example of how not to design a website so a friend of mine was working for one of the government sort of technology agencies at that point and So it turned out that the way that the person who came in the group that came in that fixed Healthcare.gov the way they fixed it was to separate normal and worst case No joke. So what they realized is there was this one really really terrible piece of software in the middle of the whole system That was causing the whole thing to just suck. It was bad It went down all the time you make an API call and you never get a response But what they realized is a lot of the people that came to the site Didn't actually need that complicated piece of logic. They could be satisfied in a much simpler way So what it started to do is if you you know would ask you a few questions I don't know just I've never used the website So I don't exactly understand how it works But it would ask you a few questions if you were the normal case It would bypass that entirely and it would it was able to offer you some of the plans that you were eligible for If you were the worst case what it would do is it would tell you to come back later It would send you an email and then it would just bang that API over and over again with your request until it got a response So the normal case is fast. Those people just go straight through the worst case I make a synchronous so I don't make you sit there waiting while the page spins I say by the way, I'll send you an email tomorrow when your results are ready. Go have a nice night All right, so these things actually work in the real world We'll talk more about them on Friday and we'll talk about scaling Linux to many cores See you then