 All right, let's get started. OK, so welcome to the first lecture of Computer Science 162, Operating Systems. I want to start first with an introduction. We have two instructors for this class. So I'm Professor Anthony Joseph. I'm over in the Rad Lab on the fourth floor of Soda Hall. And starting on the 16, I will have office hours on Mondays and Tuesdays from 10 to 11 AM. Little background about me. My current research areas are cloud computing. I'm part of the MESOS project, where we're trying to build a data center operating system. So the idea is in this class, running on 50,000 machines. I'm also part of the secure machine learning project. So in that project, we look at how do you design machine learning systems to be secure against adversaries? So a very simple example is Google's landing page quality. They have to constantly deal with the fact that search engine optimizers are trying to game the machine learning algorithms that they use to figure out how to rank the quality of pages. And then the third project that I'm part of is the Detour Cybersecurity Testbed. It's the world's largest public cybersecurity testbed. Sits out, actually, behind Soda Hall next to the volleyball court. Other projects I've worked on are peer-to-peer networking, a project called Tapestry. And I also like everything mobile and wireless. There are lots of wireless gadgets, mobile computing, wireless networking, and cellular telephony. So also, co-teaching this class is my colleague, John Kenny. Hi. So I am a researcher and also doing a lot of work in big data. I am in 637 Soda. And I'll be having office hours from this week, three to four Thursdays and Tuesdays. So we have a library for large-scale data analytics, which is not necessarily cloud-based, Bidmark. I also work on the design early aspects of data. How do people that do things with data approach the analytics tools? How do we link them with human-computer interaction and so on? I also do some work with human learning analysis of MOOC data. And I'm teaching a grad course on that also this semester. Also, I've done a lot of work on health technology and tech for developing regions. And OS has underlined all of these projects. And you'll see some of those relationships through the course. Yeah, I forgot to mention, at the same time I'm also teaching the graduate version of this class, 262A. OK, so we also have four TAs. Matthew Fong. And the TAs have their office hours posted. Their office hours will start next week, since we don't have section until next week. Second TA is Kevin Clues. Third TA is Alan Zhao. Oh, he's not here. Oh, yep, he is in the back. And George Yu. So those are our four TAs. So what are we going to talk about today? So first, we're going to talk about what we're going to learn in this class and why it's important, so lots of motivating examples of why operating systems and systems is something that you ought to study. We're going to talk about what is an operating system. This is rather contentious. It's been decided in part through court decisions and other political processes. We're going to talk about some of the different ways of defining an operating system. Then we'll talk about some of the logistics of this class. It's a large project-oriented class, and so there's a lot of moving parts that we'll tell you all about. And the most important, I think, aspect of this class is interactivity. So it is a large class, but it's great when we get lots of questions. I can guarantee you, if you have a question, there's probably at least 25%, 30% of the students in the class who have the exact same question, but they're too timid to ask that question. So please ask questions. OK, so what's the goal of this course? The goal of this course is to learn how systems work. And we're also going to talk about some of the primary challenges that we face in building systems. And as a result, we're also going to talk about some of the principles of system design. This is how we are going to address the challenges that we run into. Now, we'll also go through some examples of where people were building very large systems. They didn't follow the principles, and they didn't end up with a functional system. Spent a lot of taxpayer money, usually. And we're going to learn how to apply these principles to building systems. So the end goal of this class is that you come away understanding how large systems work and how actually to go out and build those large systems. So let me give you an example. So we can ask what happens when I do a search query on my iPhone? So I search Bing or I search Google. So there's lots of moving parts here, and there's a complex interaction that occurs between parts that are owned and operated by different administrative domains. So first thing that happens is I make a domain name service request when I type in, search, where's CS162 today? And I want to know using Google, so I have to find www.google.com. So I make a DNS request from my phone to a local Berkeley server right here on campus. If I'm over in Sota Hall, it's actually down in the basement of Sota Hall. And then that DNS server is going to contact lots of other DNS servers to get that answer. Once it has that answer, it's going to give me back a response. It's going to give me the internet protocol address of www.google.com. So now I can use that to route across the internet through many different machines, many different administrative domains, until I reach one of Google's data centers. Let's say down in Sunnyvale. And my request gets routed to their front-end server, a load balancer that's going to spread the load across the tens of thousands of machines that they have in that data center. So that machine is then going to contact one of their search index machines, which is now going to contact hundreds of other search index machines. They're all going to search their local in-memory and on-disk copies of Google's index and generate a result. Now, because Google is a for-profit company, it's also going to contact their ad server and generate a relevant ad and create the search page, a result page rather, which is then going to be returned back and displayed on my phone. Very complicated process. Lots of different protocols involve DNS, IP, TCP, HTTP, perhaps if I'm using encryption, HTTPS, SSL, TLS. At the end of this course, you're going to understand all those acronyms and how to build an application like this. That's a big system. So some of the motivation for studying computers and systems is because we have computing devices everywhere. We have cell phones. This is the Mars Pathfinder, I think, a Moat, PCs, cars, data centers, supercomputers. Anybody have an idea of what's the single largest use of microprocessors? Embedded systems, 10 billion microprocessors. Now, what's an embedded system? Well, there's a microprocessor in my remote. There's a microprocessor in the projector. There's a microprocessor that controls the heating, ventilation, and cooling in this room. I don't think it's working very well. Same with the lighting controls, same with the AV controls, and so on. So they're all around us. Many of them are networked. So that projector is probably on the network. By the way, second largest use of microprocessors? Automobiles. Automobiles have, on the order of 40 to 50 processors in them. OK, this is another interesting graph that I like to put up that shows how computing has really changed over time. So Professor Caney, he works on user interfaces. And one of the things that has really enabled user interfaces is this trend that we have of fewer people per computer. When we started out, we had very, very expensive mainframes. We had thousands of people using a single computer. People were inexpensive relative to the cost of the mainframe. So people sat and waited, submit your job, and maybe a day later you get back a result. But over time, we see the deployment of mini computers. So now we have a couple dozen people per computer. PCs, now it's a one-to-one relationship. And we can start to actually provide an interactive, really nice experience for users. Graphical user interfaces up here. And now we're in a domain where we have many computers per person. So each one of you probably has a half a dozen microprocessors. If you have a car, multiply that out. If you have a set of devices, it becomes even larger. OK, now what has enabled all of this is an observation that Gordon Moore made back in 1965. So in 1965, people asked him, what's the density of transistors and processors going to look like? And plotting just a few points on a log linear scale, he projected out that the density would double every 18 months. Now, it's pretty amazing, because I think there's like five data points or four data points that he used here to plot his curve. But what's amazing is we have very closely followed that curve in terms of the actual number of transistors and this doubling every 18 months. Now, a lot of people confuse the doubling of transistors and Moore's law with a doubling in performance. Those are actually two completely separate concepts. So here's a graph that shows performance. Performance, this has primarily been a function of frequency, being able to continually increase the frequency at which we clock microprocessors. There was also some architectural changes. So it was 25% per year from 78 to 86. This was during the era of the VAX. And then when we saw the introduction of risk and x86 micro architecture, we see this 52% growth and performance every year. So if you are a software developer, this 1986 to 2002 was the golden era, because all you had to do was sit back. And every year, your program got 50% faster. Didn't have to do anything. Just look at it, 50% faster, each calendar year. That was awesome. But something happened here in 2002. Does anybody know? Somebody who took 152 maybe? Yes. What's that? The heat wall. Yes. We had a heat wall, a capacitance wall, a power wall. The amount of power that we could dissipate in a microprocessor air-wise was around 150, 160 watts. When we hit that wall, we couldn't keep increasing the frequency. So after that, growth has really tailed off instead. So this really led to a sea change in the way we think about designing processors. Because now, we can't just sit back as programmers and have things go faster every year. But we still have Moore's law going, at least for the next decade or so. And so that means we're getting more and more transistors we can put on the chip. And instead of trying to make them run faster, we're adding more functionality. So we add more cores. So this is really the rise of the many core architectures. So back in 2007, Intel did a test chip. This wasn't something they were intending to sell or produce in volume. It's called Polaris. And they were trying to see how many cores they could fit on a standard process. And they were able to fit 80 very simple cores that were basically two floating point engines and a simple ALU core, interconnected with a mesh network, 100 million transistors. In 2010, they released to academia this single chip cloud computer. So this consists of 24 tiles. Each tile contains two cores and some caches and a router. So again, we have this mesh network that links these cores together. And there's hardware support for message passing. This is a revolutionary way of doing processors. Up until now, we've always thought of processors as having some fraction of the cache hierarchy be shared between the cores. So you communicate through memory. Now, you have to communicate through message passing. The same approach we typically use to communicate from one computer to another is now what we're seeing on a single die. So this is the future. It's a much, much harder model to program. This is what we're going to have to deal with. Now, what is many core? It's hard to say. Is there an exact boundary? Is it 16? Is it 32, 64, 128? It's a lot of cores. And one of the biggest challenges I think we're going to face is in the past, we could just sit back and our programs got faster and faster. Now, we have to figure out how to divide our programs up to make them run in parallel instead. And as we're going to see in the next few weeks when we work on synchronization and when you work on the project on synchronization, that's really hard. Now, how do we program all these cores in parallel? Well, maybe we'll use a couple of them for doing video and audio so we can listen to some MP3s or podcasts while we're working. Of course, we're working away, so we have our word processor, another for our browser. And then, I don't know, if it's Windows, we can use the rest for virus checking. Okay, so this is really what we're going to have to deal with and we're going to have to exploit this parallelism at all levels. And we're going to see how we overlap computation in IO. We'll see how we try and subdivide a program and the challenges we'll face with dividing up a program. So this was sort of all hypothetical. You couldn't go out and buy a single chip computer, but you can certainly go out and buy a modern Intel processor or AMD processor and these are already many core. So this has eight hardware threads running on it. If you buy this in a dual socket configuration, you've got 16 threads that you have to figure out what you're going to do with them. In the supercomputer world, or the cloud computing world, very easy to keep all of these threads busy. In the desktop world, it's a lot harder. So that's the challenge. Okay, some other trends to look at. So this is retail hard drive capacity in gigabytes as a function of year on the x-axis and capacity in a logarithmic scale. And again, you can see we have exponential growth in capacity. So storage is growing. Storage is also getting much cheaper. One of the lectures we're gonna have where we talk about storage, we're also gonna see a really fundamental trend that's happening right now and shifting away from rotating storage, hard drives, to solid state, flash-based storage. And probably, if I teach this class in five or six years, probably won't even mention hard drives or I'll mention it in passing and I'll only mention SSDs. Another thing is a tremendous exponential growth in bandwidth. So the IO that we, so this is years again on the x-axis, this is rates on the y-axis on a log scale and we can see server IO doubles every 24 months. So in data centers, people are already looking at deploying 40 gigabit and 100 gigabit and we just put in a proposal to the National Science Foundation for the campus to have 100 gigabit connectivity to the rest of the world because we're saturating our 10 gigabit links already. So you always want your core network to grow faster than your edge, your servers. Any questions? So at the same time that we see all these technological trends, more people are going online. So the Internet Systems Consortium, twice a year, every six months, does a ping of all of the hosts that they can, all the IP addresses and sees how many hosts respond back. So they did this in January of this year and found 963 million responses. Now this is a floor on the total number of internet accessible hosts because if you have a router running NAT, it's only gonna respond back with one ping, even though you might have a dozen devices or more behind that router. There are whole ISPs that are behind NAT like for smartphones and so those aren't counted. They did the same survey just a couple of months ago in June and found 996 million. So growth of over 30 million in just six months. Now backing all of this up with these devices is people. So over 2.4 billion internet users. But that's not the number I find amazing. The number that I find amazing in this chart is if you look at the growth in the last decade for Africa, for the Middle East, for Asia and for Latin America. And I bet if you were to actually look at the last, say five years, it's even greater growth rates. A lot of this is being driven by smartphones. It's a very inexpensive way to get internet access in the developing world. So the exciting thing about this is these are, when you're building applications, Facebook right now is thinking about how do they get the next billion users? Well, the next billion users are coming one way or another. The question is can you capture them with your application? Can you scale your application to support another billion users? Okay, so the important thing, it's not just PCs that are connecting to the internet. I tried to find newer numbers because the numbers have become even more lopsided than they were in 2011. It's not just the case that smartphone shipments have exceeded PC, they've eclipsed them. So far exceeding PC sales. But in 2011, it was 487 million smartphones versus 414 million PC clients. Now in this survey, they actually lumped in tablets with that. I really would lump in tablets with smartphones since they often share the same operating system, either Android or iOS. So that number is already, you could argue, eclipsed. Another I think kind of interesting number back in 2011, 25 million smart TVs. So again, thinking from a system standpoint, there's lots of security implications and lots of maintenance implications when you actually have an operating system running in a television set. Overall, four billion phones in the world, more than half of those are now smartphones. And the conversion from feature phones to smartphones is increasing rapidly. So pulling all of this together, when we think about building systems, we need to think societal scale. So all the way from MEMS, this is a little MEMS accelerometer developed over in Cori, all the way up to billions of devices connected together. And think about building applications that span across that. There's a tremendous power that you can get when you can take a device that has a limited amount of computational capacity and leverage off of all of this infrastructure. So for example, I run an application on my phone that records what the accelerometer is doing. So you get this 3D accelerometer, a very noisy signal, and then it takes all that data, uploads it to the cloud, and then figures out A, where I am, and what I'm doing. Was I walking? Was I running? Was I cycling? Was I driving? All from the accelerometer data. Now if I tried to do that processing on the phone, my battery life would be about half an hour. But instead I can go all day because that work gets pushed off to the cloud. And this is really why this class has evolved become much more of a systems class, is because we really think about these kinds of problems, not just in terms of an isolated device, but in terms of a collection of systems that all have to operate together. Okay, any questions? So let me talk about some logistics. So as you all know, class is now from 530 to 7 and 145 to an L. It's very important that you attend class because the interaction, that's a big part of this class. You can certainly, the class is slide-casted, you can listen to us talk and watch the slides, but you can't ask questions when you're sitting in front of your computer. When you're here in class, you can ask questions, you can also hear other people asking questions. We'll repeat the questions, but there's nothing like being here for the real experience. Now 5% of your grade is dependent on participation in class, that's attending class and asking questions, attending section and participating in section, and also participating in the news group on Piazza. 5% might not seem like a lot, but at the end of the semester, it's a difference in a grade. So please keep that in mind. Waitlist, so we had a bit of a rocky start with our class. We're at 180 students. The department has set a target for the class of 184. They're going to continue processing the waitlist today and then because of a holiday, they won't be processing it on Thursday and Friday. They'll process it for the last time on Monday. So if you're still on the waitlist, there is still a possibility you'll get into the class because people are still trying to figure out their schedules. But beyond Monday, the waitlist will be dropped. That said, we were able to accommodate a large fraction of the students that were on the waitlist question. Ah, good question. So the question is, are the quizzes during lecture or in discussion? They may be either. Typically what we'll do is have three or four quizzes would drop one. So that way, if you missed the quiz because you had an interview or an athletic event or something like that, you won't be penalized. But beyond that, you would get a zero for that quiz. Any other questions? Okay, so sections. Very important to attend the sections because lots of information, especially about the projects, is going to be presented in the sections. Lecture, we're going to be at the 100,000 foot level. Sections, you get to go down to ground level, ask lots of detailed questions, work through some of the problems related to what we've talked about in class. And it's where you're going to hear a lot about the projects. It's very important that you attend each section. You've been assigned a section automatically by telebears, ignore that. We're going to have you form project groups of four to five people, and everyone in a project group has to be able to attend the same section. It's not good enough to have the same TA, you have to actually be in the same section. Attend your preferred section next week and then look for the sign up link on the website and we'll get you scheduled into sections for starting on the 17th and 18th, yes. They need to be able, so the question is, does everybody need to attend the same section on telebears or attend the same section? Ignore what telebears says. We're not going to use telebears at all. We'll schedule you into a section based upon what sections you're able to make for your group. I'll talk more about that in a couple slides, but we'll do our own scheduling of people to sections and load balancing across the TAs. We have a website, very important to go check it out. All the lecture notes are on the website so you can see them, you can read ahead. I also have done them for, they're in PowerPoint format, also in PDF four up. A lot of people like to print them out and annotate them in class while they're listening, so you can do that. We also have a news group, we're using Piazza. There's great clients for Android and iOS, so please register, sign up, ask questions there. That's the main way you should ask questions. If you email us with questions, chances are, 10 other people are emailing us with the exact same question. So please don't be surprised if we email you back to say, please ask this on Piazza so we can answer it once. And we're going to try and be very diligent about answering your questions as quickly as possible. Finally, there is a webcast, it's an audio podcast with slides. So if you go to webcast.berkeley.edu, you can find the class. It usually takes them a day or two to put the lectures online. Okay, I'll say it again. Our lecture goal is lots of interactivity, want questions. I like it when you ask questions. I'm more than happy to drop some of the material or finish it next week if we get lots of questions and end up running long. All right, so what are we going to cover in this class? So as I've tried to convey with the motivation, it's important to look beyond an individual node's operating system. And so we're actually going to look across systems and look at end-to-end and system design. So that means there's going to be some networking in this class. So if you haven't taken E1-22 yet, we're going to cover the basics of how networking works. We're going to have some database materials. You can understand how systems use databases for reliable structured storage. We're also going to talk about security concepts. So again, this is part of the end-to-end understanding how something like HTTPS actually works. And the projects will reflect this emphasis. So we have a long-term goal within the department to make CS162 really a mezzanine course between the lower division and the upper division systems and networking classes. So ultimately, this is going to be the gateway class to the database class, the security class, software engineering, networking. And we now have a new advanced operating system class that Professor Kuba-Tawett's taught the first time this spring. And we'll be offering again next spring. So if you really enjoy this class and you want to learn even more and hack a real operating system, you can do that in the spring. So to reflect this change, we've changed the balance of material. So we now have 14 lectures on core OS topics. We have three lectures on networking topics, two on databases, two on security, and one lecture on software engineering. And then finally, we have a capstone lecture that pulls everything together and goes through that example that I talked about at the very beginning and goes through every little detail of it and you should understand exactly how every little detail works. Questions? Okay, so we have a textbook for this class. The textbook is the Sobershots Operating Systems concept book. The ninth edition is the latest version, but we also allow you to use the eighth edition and the website actually lists the readings by both the ninth and the eighth edition. So you can pick whichever version is most cost effective for you and purchase that version. There's lots of online information that we also have for this class. If you go to the information link, you can see other books that you might be interested in buying, I think there's a C-book, there's a Unix, a BSD book, a Java book that's actually free, don't have to buy it, and other books. Also, for networking, database, and software engineering topics, that material's primarily gonna be limited to the lecture notes. So grading, everybody always wants to know about this. So the rough grade breakdown and this may be subject to change is that we will have two midterms, there's no final exam, so you don't have to worry about December 20th. 45% of your grade will be from the two midterms, that's about, that should be 22.5% each. Four projects, 50% of your grade. So half your grade comes from the projects, little less than half comes from the midterm, and participation and quizzes make up the remaining 5% of your grade. Four project phases in this course. First project is Threads project using nachos. Second project is a multi-programming project using nachos. And the third and fourth project are key value stores. First we start with a single node key value store, and then we make it distributed, and look at how to do reliable updates. We treat you as adults in this class. So we give you four slip days to use for your code deadlines. A slip day is yours. You don't have to ask us if you want to use it. You don't have to justify its use. You can use it for whatever reason you want. You have four of them. There are four projects in this class. I recommend you use your slip days wisely. That means don't use all four on the first project. If you do, and you do need additional slip days, it costs you 10% of your project. Slip days can only be used for code. They cannot be used for your design documents. Computing facilities, at the end of lecture, the TAs are gonna, oh, I'm sorry, question? Yes, so the question is, if you use a slip day for your project, do all of your partners have to? Also, this is shared fate. So the answer is yes. Choose your project partners wisely, very wisely. Your grade is a shared grade. That said, after each project, there's a group evaluation. And you have to say who was responsible for what parts of the project. We treat it as a zero-sum game. So if someone's not contributing to a project, their grade will go down, and people who contributed more to the project at the end of the semester, their grade will go up. If we see a trend, it's not on an instantaneous basis, but across the projects, if we see a trend that someone's not contributing, their grade will suffer. Question? Yes, question is, are the projects related? So the first two projects, absolutely, they build on each other, on nachos. The second two projects, yes, they build, you'll first implement a single node key value store, and then you'll extend that to have a distributed key value store. So the question is, if we build a really awful single node key value store, will we be able to get a really awesome working distributed key value store? Unfortunately, probably no. The same is true with the threads. If you don't get your threads working, multi-programming is not going to work well. That said, in the more than decade that I've been teaching this class, everybody gets it working. The TAs are really good that we have, and they'll help you get it working. But it starts with a good design, and that's why in this class, before you write any code, you're gonna start by doing a design. And it's during the design process that hopefully we'll catch a lot of the algorithmic bugs that make life miserable if you implement it, and then it doesn't work, and you have to go back and figure out why it didn't work. So keep that in mind. When you're racing to write your code, start with your design first, yes. Question is, will you be getting feedback on your grades for the projects? Yes. So with your design review, your TA will give you feedback on the quality of your design and any problems in your design. And then also with your code, you'll get feedback about your code. We use auto-graders, so you'll get some automated feedback both during the development process, and then also after you submit your code, we'll run it against additional tests and give you additional feedback. Other questions? Really good questions. Okay, so computing facilities. So we use instructional machines for submitting, and so everybody needs, we also use it for our grade books, you can use G Lookup. So everybody needs to get an account form, and the TAs will hand those out at the end of lecture. They have a list of everybody who's in the class. So if you were just added to the class from the wait list today, they won't have you on the list because Bear Fax is 24 hours behind. So you'll have to get it tomorrow instead. You can do all of your work on your own machines. You can set up your own nice little Eclipse environment and be perfectly happy on your desktop or laptop. You just need to actually do the submission on one of the instructional machines. Make sure you log into your account this week so we can create the grade book. You'll also need to log into your account before we can assign you to a group. So as I said, we work in group projects in this class. It's to simulate the rest of your life in 14 short weeks. This is really how it's gonna be in the real world. It's very unlikely you're gonna go off in the real world and work by yourself. You're gonna work in a team, probably not of four or five people, but of four or five hundred people if you're at a large company. And so we wanna give you some of the experiences and the skills that you need to operate efficiently in a large project group working on a large software project. Communication is critical. We'll have groups that are people who are best friends since elementary school and by the end of the semester, they're worst enemies because they don't talk and they don't tell people what they've done until 11.50 p.m. on the night that something is due. And that tends to make people really upset. Let's just say if you do that. So communication is really critical. Communicating with your project partners about what you've done, what you're expecting from them. People are not mind readers. So if you get behind on your part of the project, very important to tell everybody else, I'm behind and could use some help because somebody else may have already finished and be waiting for your part. Better to let them know early than 11.50. We're gonna make you document your work because that's very important as part of the communication process. Then we're gonna make you communicate with a boss. In this case, your boss is your TA. You're gonna have to explain your decisions, your design, why you did something a particular way. And you'll get feedback, very critical feedback about that so you have to tell them what the plan is. What each team member is responsible for. The TAs are gonna be looking to see, do we have group dynamic problems? Like four of the people in this five-person group are doing all the work and not giving any of the work to the fifth person. And so you're gonna have to provide them with progress updates so they can keep track of what's going on. Okay, so signing up for projects. Look for the project signup link. That'll appear shortly. Now that we actually have all of our sections scheduled on telebears from the central campus. Because I said four to five people per group. Everybody has to attend the same section. Ignore telebears. We're gonna do our own assignment process. Now, we're giving you plenty of time to form your groups. You have until Thursday of next week to find project partners. Everyone has to log in to their account before they can submit the project form. And it's ideal if you have at least three potential sections. Minimum two sections. We're trying to load balance 184 students across all of our different sections. We don't wanna have 20 students in one section and 80 in another. So please provide as many potential sections as you can. The fewer you provide, the less likely we're gonna be able to schedule everybody into their top choice. Okay, we'll try and post your new section assignments next Saturday. It may take us a day or two to do some hand tuning. We use this little greedy algorithm that runs to try and give everybody their first choice. And then we have to go back and do some hand optimization of that. You'll start attending new assigned sections as of the 17th and the 18th. So, oh, this should have actually been filled in. I believe it's Tuesday, one to two and two to three are our additional sections that have been allocated. So the 107 and 108 sections are Tuesdays, one to two and two to 3 p.m. It's updated on the website. And I'll update the lecture notes after lecture. Okay, were that any questions? Okay, so the format of this class is we'll go for about 45 minutes and especially in a room nice and warm like this, I find attention spans tend to wane. And so we'll take a five minute break. Just for your information, I will unfortunately be away next week. So John will be leading the lectures on his own next week. And also I'm heading to the airport right now, so he's going to finish the rest of this lecture. All right, everybody. All right, let's get started. All right, so we're going to resume in a minute. Right before we do that, we have one announcement from the Blueprint group. Hi guys, I'm Brian. I'm from Blueprint as well as Michelle and Stephanie who are in this class as well. We are a student group organization on campus and we very closely follow the 169 class. We provide technology for non-profit organizations. We work with mobile and web technologies and you'll work with a single client throughout the semester on some kind of web or mobile project. That's a really fast overview. For any more information, you can come to one of our two info sessions tomorrow at 7 p.m. in HP Auditorium or if you can't make that on Monday in 220 Wheeler. There's flyers in the back already and if you want any more, I have extras. Thank you. All right, let's continue. All right, so Anthony gave you a high level of some of the issues with operating systems design and some of the differences in scale. We'll sort of look at those briefly again right now. And we'll take some of these challenges and sort of talk at high level how we try to address them. So we have challenges of scale, of size of machine, number of machines and heterogeneity, meaning differences in type of architecture. These days we have devices connected to the internet that all go all the way from very tiny medical devices, cell phones of course, appliances all the way through to extremely powerful cloud servers. So they're running different operating systems working at very different rates and so on. We have to find ways for all of those things to cooperate effectively and be robust. So the CPUs in very small devices are typically single CPUs. Modern graphics processors are in the hundreds, in fact they're in the thousands of cores these days about 3,000 for the current Nvidia Titan. The clusters, machines and clusters, the practical limit for clusters right now is about 10,000 in a single cluster. People are trying to push that a lot further and it requires some fairly substantial redesign of the way the services and naming services are built. Let's see the network itself has a wide range of speeds and latencies. So latencies, the delay takes packets to propagate and that goes from nanoseconds for ethernet, moving at the speed of light along a wire and up to satellites or planetary explorers which are in the seconds or minutes actually. Earth orbiting satellites generally less than that but communication in space even more. So anyway, so you get the idea of the range of things. Storage is one of those elements of computing that's still growing exponentially. Discs are in the many terabytes per disk now. Smaller flash memory devices only megabits so you've got six orders of magnitude there and finally access times are in the six order of magnitude range. All right, if you compare that with the diversity in automobiles there is a big range in power between consumer vehicles and high performance vehicles like a Bugatti Vrayron or something similar to that. High performance cars are in the 500 to 1,000 horsepower range. That's still only a 20 X difference. The speed is about a four fold difference from 100 kilometers per hour to 400. The top speed of most vehicles in that 400 range is about 200 miles an hour, two and a half. You get the idea of large range of weights as well. Still all of these numbers are much smaller than the diversity you see in computing systems. All right, there's a challenge of complexity because user applications are trying to do a lot of different things. They're dealing with user interface, they're dealing with some data sources. They're often dealing more directly these days with devices like sensors, novel kinds of input devices, the touchscreen for sensing, 3D sensing like the Kinect and so on. So we need ways of taking those very complicated peripheral devices and providing an interface to the programmer that's fairly simple and manageable. Along with the complexity comes vulnerabilities, additional vulnerabilities of security and failure. Security meaning somebody else getting into your system and doing something bad, like stealing your web camera as recently happened, stealing a family's web camera and scaring the infant. And failure which can cause a variety of disasters. Luckily, we've actually been pretty lucky in information technology. There have been some serious failures but they haven't cost that many lives. Anyway, the rate of threats, if you look at graphs of the rate of threats to computer systems it's growing very fast and so we have to be ever more vigilant about defending against those. So systems get more challenging, it's harder to anticipate all of the things that can go wrong. And so it becomes extremely important to use a design philosophy that's in a certain way conservative and is likely to lead to a safe system. Okay, all right. And you can see graphically here the growth in the complexity of core operating systems. And along with that growth has the rate of compromises and fixes and patches to operating systems has generally grown as well. It doesn't quite track this graph. It did for a while and it's luckily tapered off after an extremely high number of compromises about two years ago at peak. Things are getting a bit more stable now. Part of that is that the evolution of windows has slowed down a little bit and also people are getting a better handle on some of the highly vulnerable sub-components like Flash and Java. But you can see things are enormously complicated, 50 million lines of source code for an operating system. Far more than most people need but it's there, it's built in to provide flexibility, extensibility, often a lot of kind of glittery services, multimedia services that make the product look nice and make an impressive demo in the store that where the PC is being sold, unfortunately there's a cost in complexity and maintainability. Okay. So these days most computers communicate within the computer through main memory. Most devices are memory mapped. They have a certain amount of the memory address spaces devoted to the device and that simplifies the communication. The CPU normally has access to everything. Particular devices have access to a subset but it allows the CPU in a fairly simple way to communicate with the device saying, okay, put some data in this place, come back to me, maybe hit an interrupt when you're done. When that happens then the CPU can drop down and read the data under its own sort of control. So it's a simple model. It's used in virtually all modern operating systems. All right, so let's look at an interesting example of these challenges made very concrete which is the planetary Mars Explorer named Pathfinder. So that project was launched back in 96. It was mostly successful but it had a fairly famous failure along the way. All right, so first of all, some of you may not remember because you're about five but back then this kind of process was actually pretty mediocre for the time. There, Pentium's were already around and processes were actually about an order of magnitude faster than this. This is actually a characteristic trade of both NASA, the aviation industry and also the military which is the computing infrastructure is often several generations behind. Does anyone know why that is? Typically? Yeah, it's very stable. They have an enormously complicated system that can't fail and they co-design the computing infrastructure with all of the mechanical pieces, put it through all kinds of tests. You have to keep the validated system the same or all the other parts would have to change. It's extremely difficult to sort of upgrade in the middle of the project and the project itself often runs for many years. There's also some advantages often in terms of resistance to radiation from the older sort of larger cell-sized processes. But it was a real computer anyway. It ran a real-time operating system called VxWorks. So real-time, we'll talk a lot more about that, but real-time just means that the operating system typically is mostly interrupt-driven. It has various timer interrupts as well and it's designed in a way to be able to respond very quickly to most sensor events, most device events. And it also has typically some fail-safe features. Explicitly, this one, if one of the critical threads was blocked for too long, it would reset the whole process and restart the, reboot the software and it did when it failed. Luckily though, because of the design, it was able to recover from that failure. Oh, by the way, so when you have generations old hardware, this was a big problem for the space shuttles because they had generations old hardware when they launched the first ones and then they ran for about two decades. So believe it or not, it's true, they actually did it. Where would you buy hardware that's like 30 years old? What's that? Probably they did that. Any other ideas? In a hurry. eBay, all right. So NASA did a lot of shopping apparently for Challenger, computer infrastructure on eBay. All right, so part of the function of the real-time OS is because you can't hit the remote button easily. It's about a between a four minute and 20 minute trip for the signal to go to. So if you're about to hit the rock, it doesn't help. The system has to be smart enough to recognize the threats. It also has to be able to reboot if necessary. I mean, one of the things that might get frozen is the network stack, so it can't receive messages anyway. But it was programmed to be check all of the processes and if one of the critical ones died, it would restart. Yeah, and in particular, you want to restart and place the communication process dies. There were many instruments, many different processes, navigation processes, telemetry and so on, communication. So you want those things to keep running. And if there's a bug in one, hopefully not, but they always still had to protect the different processes from each other. And antenna positioning obviously, if it didn't have the antenna pointed the right way, it wouldn't be able to communicate either. So that's an important one too. So the software is probably gonna crash. And fortunately, it did recover and it did restart and it was doing fine the next day. But it does raise a challenge, which is often a challenge with especially real-time embedded systems, which is how do you diagnose a fault when it happens? Obviously not with GDB, with a 20-minute round trip. That's worse than the instructional machines during a project deadline, right? So no, you need something that was, first of all, that will work a bit better remotely as well. What they actually did with this is that they had a full working mock-up of all the computing infrastructure and the telemetry on Earth, which was able to dump the data, the complete data, because they couldn't dump it directly from Mars. And so they actually were able to reproduce the fault and discover what happened. In this case, it was a priority inversion problem and it's a rather complex interaction between three processes, which froze a high-priority process. It was trying to get a resource that some low-priority process had and wouldn't give up. So luckily though, the system was designed to tolerate that and it successfully recovered. So that's sort of in a nutshell, a whole bunch of the issues that we care about in systems design. All right, let's see. Yes, and time critical is in other aspects, somewhat specific to this real-time system, but also recurs in other systems. So what are some techniques for dealing with this kind of complexity robustly? So we already mentioned the diversity in the CPU memory devices and so on. Let's see, so what are the pros and cons of writing a single program to perform all of these different activities? So I'm just rephrasing this question slightly because this is asking, do you really need to write this? You've got to write something. So what are the trade-offs between writing one program versus many different programs? So imagine you wrote the Pathfinder Control Software as one program, which they sometimes did for other space vehicles. So what are some disadvantages doing that? So yeah, it's monolithic, so yeah, there may, yes, exactly, there may be interactions between the different pieces, even if it's got some modular character, it's hard to control unnecessary interactions. Any others? Any advantages to doing it this way? Yeah, some things get easier. Communication is often easier because you can control the order in which things happen. Most of the time, things are faster as well. If you have extremely tight loops, a third thing is that you can often anticipate a little bit better what kinds of state the system will get in because they'll be explicit in some kind of flowchart or state machine that you've built. So there are some advantages of this. The offsetting advantages though are typically the complexity explodes so much that it's intractable to really get this to work. So the vast majority of real-time systems do use an operating system that basically allows you to separate the software that's running the telemetry from doing the navigation, from managing the batteries and solar panels and so on. All of those things are separate modules. They all interact in real-time with the hardware, but the OS sits in the middle and tries to negotiate between the components so that everybody gets what they need without too much delay and without causing a problem. Yeah. All right. So the second question is what about if the software is hardware aware? What are the pros and cons of having hardware aware software? Yes. Yeah. No, it's almost perfect answer, so. I should just say perfect answer. No, I mean, the two main advantages are that you can fully exploit the hardware, communicate as efficiently as fast as possible, and you can use the full functionality of that specific device. The disadvantage is if that device gets changed, then you lose that flexibility, and it's much more difficult to take that system and evolve it as there are upgrades to the hardware. Okay, last question. All right, there's a faulty program crash everything. So that's the last criticism of this single program approach, which is if you have a single program and it goes awry, goes wrong, there's no simple way to recover for the operating system or anything else to restart your program somewhere else. At least with multiple different application programs, the OS can say, well, this has gone wrong, but I'll keep running my other processes and try to recover, yeah. Wow. I think that's a mark. All right, that's a complex, almost a marketing question in the sense that I think there are, the only reason I can think of is that there are advantages to hardware and software vendors to forcing people to upgrade the OS and making the experience progressively slower. I mean, there are suspicions that that's happened with between Intel and Microsoft. I think that's only suspicion though. It's more realistic that software inevitably becomes more complex and slow. In terms of hardware forcing people to upgrade, I don't know, I can't honestly think of something like that. I think Apple at various times has tried to upgrade hardware in a way that would produce a better experience. They've pushed Firewire for a long time. They kind of pushed it in a way that's away from what the PC industry can do to give them a competitive advantage, but it's not really to force something to change in the OS necessarily. Okay. So a pretty central concept in modern operating system design is virtual machine and they come in two main flavors, which I'll get to in a sec, sorry. So the virtual machine is an emulation of an abstract machine. So it's one machine putting on the skin of another and appearing to be something else. It also gives the user the experience that they own the whole machine. So in other words, they have a machine to themselves, even if they're sharing the machine physically. If it's a cloud, if it's a machine in the cloud, they have the experience that they're sitting on one physical machine even though it may be only a third or a quarter or some other weird fraction for Amazon of the real machine. Okay, so there are two types of virtual machine. The system virtual machine is a full emulation of an operating system and machine such as Unix or Windows or Mac OS that is typically running on a physical machine. So the main vendors are VMware, there's also Parallels desktop and there's an open source project, fairly large open source project called Zen. So you install one of these packages on your machine to be able to copy a virtual machine image from somewhere and all of a sudden, you feel like you're running that machine. You may be running a Mac emulation on a PC or vice versa and with some of these systems like Zen, you can actually flip between several different machine images quite fast and have the experience of sort of shuttling between physical machines. So the system VM is supported also, it's been a key part of the Unix system for a long time, the ability to basically create this emulation. A process VM is a more specialized virtual machine. Canonically, the Java virtual machine is the example most people talk about. So it implements an abstract virtual machine, this sort of Java byte code interpreter and provides a higher level abstraction of the hardware even than the operating system. So Java tries to give you the same sort of virtual machine experience across Windows, Mac and PC and so on, Windows, Mac and Linux. So the implementation is a lot simpler because the process virtual machine is just an application program running on the native operating system. Okay, so what the process VM does is it allows you to simplify the program, each process thinks it has all of the memory in CPU time. Normally all of the devices are available to the user as well. You can simplify the interface to the different devices, people don't need to know what display driver they're running. And the device interfaces for things like mouse, pointers, are much more abstract normally than a physical mice or pointer device. In terms of networking, you can also provide higher level primitives like in Java you have serialization, the ability to stream objects very easily across the network. A critical element of the virtual machine framework is to isolate problems. So it's a basic thing is that you isolate these different processes, allow failure, provide a lot of freedom to each process to access virtual devices but provide the protection that the other instances of the virtual machine will keep running. And critically you don't want any of those processes to be able to crash the physical machine underneath. And these systems do a pretty good job of that, typically not perfectly but about 90%. Okay. All right, so here's an example of the system virtual machine layout for several virtual machines running on top of Linux. So up to this line here, it looks like a normal operating system. There's Linux which is a kernel plus various system processes running and then there's the physical hardware sitting below it. The sitting above can be emulations of Windows XP, NT, a Unix emulation. From the user's perspective, they see what appears to be a standard desktop in one of those languages. All right, so that's the system virtual machine. Just take a minute, are any questions about that concept? So those are the basic virtual machines, they're a key element of cloud computing, probably just curious, how many people have run a virtual machine? All right, good. I mean in a sense you should answer yes if you've actually run a Java process, so. All right, so the system that we'll be using in this class is called nachos. Not another completely heuristic operating system. So it's a self-contained operating system specifically designed and engineered for teaching operating system concepts. This particular flavor of nachos is Java based, somewhat paradoxically. So you will actually write some low level assembly or low level code which you can pile into MIPS assembler which will call up into the operating system. It's almost an upside down version of an OS. But having this actual system kernel and code in Java makes it a lot easier to understand how the system works and it allows you to focus on what it's doing rather than how it does it. It's a pretty nice system, pretty easy to get your head around. Much easier than thousands of lines of C code in a typical kernel. All right, so just to sort of capture these ideas around operating systems, what they really do, what they're for. One metaphor that the textbook uses is the operating system is the government. And in their frame of mind, they feel that governments don't really do anything but they just control users. But what are some good things that governments do that all of you people I hope can think of? Well, all right, what are good things? What do people do? Okay, what do governments do with the taxes? Yeah, okay, good. Bridges. Well, wait, slow down. Why is that a good thing? You're even more. Well anyway, so what about public education? All right. They're from Ivy League schools so they forgot that there's public education. Your parents mostly rescued for you from that experience. Anyway, so I actually like this metaphor because there are some significant things that governments do for you that you couldn't do yourself. One of them is public education. One of them is infrastructure, like bridges, like roads and so on. It's impossible for most of us to get 100 machines to run a big calculation on yet with virtualization. Cloud operating systems allow us for a modest fee to start a bunch of virtual machines up, even a fairly large cluster, run them for a while and give them back. Okay, you know, it's kind of like you can use bridges in freeways, you don't need to use them all the time, you share them with people, but anyway, it's more than just collecting taxes and telling you what to do. They kind of spoil it with two hour speeches, but still, those key infrastructure projects are sort of a good metaphor for what OSes do. There are other metaphors like regulating the behavior of different processes, settling conflicts over resources. I think those are quite useful as well and preventing errors. And in fact, some operating systems are, what's the word, I guess, stateful and they actually penalize and reward processes for errant behavior, so they do have a sort of a memory as far as the behavior of your process goes. The facilitator is another metaphor which is just essentially helping people get their job done, providing libraries that are convenient. Those millions of lines of code are potentially useful libraries, most of which you won't use, but every once in a while, they can make your life much, much easier. All right, okay. And so, you know, some elements of the operating system fulfill both roles. The file systems needed by everybody, you need facilitation to access it because it's quite complex. You also need the traffic cop function to make sure that your accesses to the same file are exclusive and correctly ordered. Okay, so finally, in practical terms, here are some of the almost universal elements, sorry. Most of these are universal elements of an operating system. Up to say email. Most of these would be considered a natural part of the operating system. Okay, what about the file system? I think most people would say yes these days. Once upon a time, the file system and the network were considered separate and in fact, operating systems would sometimes, you know, NTFS, operating systems would sometimes support several different ones. UNIX does that to some extent, but now they're typically more heavily integrated. Multimedia's an interesting boundary case because it's so challenging in terms of real-time performance. It typically does require some real-time hooks to be built into the OS fairly deeply. So, it's hard to disaggregate that one. User interfaces, any thoughts? What's that? Yeah, I mean, there's different, clearly different philosophies. In UNIX, it's almost completely separate. You can run UNIX without a user interface at all. In Windows, it's extremely tightly coupled, so there's really different versions of this. All right, is this only interesting to academics? Who else? Who are that gets very excited about this sort of thing? Yes, well, specifically, who at Microsoft? The lawyers, right? Because the lawyers and marketing people, because they would like to incorporate essential elements, especially those that are related to monetizing, like the browser, maybe like multimedia players, so that they can better monetize, so. But anyway, normally the top part is considered the boundary of the OS. All right. So, remember, we've got to pick up some account forms, so just wait one minute, we're gonna hand them out, literally in one minute. So, you can consider the operating system to be everything that ships in the OS package or the packaged machine. Operating systems, though, almost universally have a single process that's always running, called the kernel, okay? So, if there is a heart of the operating system, that's what it is. So, in Windows, you'll typically see, and in Linux, you'll see hundreds of processes running at it that form the operating system in the large, but there's almost always a single one that's running that's the core of the OS. Okay, so here's a summary of what we talked about today. Just wanna make sure we have enough time to hand the account forms, because we're right on at seven. So, what I'd like you to do, so it will go a little bit faster if we go roughly alphabetically, so the GSIs are gonna stand at the front of the room with the account forms, and if you could come up in alphabetical order, roughly A to E, F to M, M to P, and then Q to Z over there. That'll help us. Yeah, sorry, questions, sorry. Last names, yes, last names, it's by last name.