 Hi, so it's hump day, I guess, Wednesday. Just want to point something out, if you guys haven't noticed, which is that it's really nice outside. And one of your enemies this semester, when you're working on the assignments, you can imagine that the world is against you. Your kernel will crash. Clearly, it's our fault. Or something will go wrong, and the one time we rerun the test on your kernel, it won't work or whatever. But you have a new enemy this semester, which is the weather. So the weather is normally kind of an enemy, but it normally doesn't really start to kick into April, where it gets really nice outside. And you think, I've been inside all winter, and I don't want to write kernel code indoors anymore. I want to go outside and play. But the problem is it's 60 degrees now. So I would suggest you guys finish all of the assignments by next week, because then you can enjoy the one. OK, anyway, just want to point that out. Yeah, all the assignments, including all the ones we haven't released yet. We started talking about fork last time. Today, we're going to finish talking about fork. And then what I've done is, rather than kind of slog through the process-related system called interface, what we're going to do is stop and talk about synchronization. And one of the reasons to do this is because Assignment 1, which we'll release later this week, does a lot with synchronization. So it's helpful to talk about it now and do some examples in class so that when you get to Assignment 1, you have some idea what's going on. So this is one change we've made to try to sort of synchronize the course content a little bit better with the assignments. So we'll talk about fork. And to some degree, this is not a bad place to stop and talk about synchronization, because fork is one of the things that creates synchronization problems on computer systems. It's not the only thing, uh-oh. It's always one thing you forget. But it's one of the things. Are you going to come back up? No. One more try. It's so weird. Like it wants to, but it's not quite there. Oh, there it is. OK. Thanks, Chrome. I don't know what's to be over there. OK, done. Makes me want to listen to that song again, but we won't. OK, so just to announce, OK. I lost what I was, my train of thought, clearly, but we're going to introduce you to synchronization today. So just quick announcements, please. If you're working on Assignment 0, it's clear that people are, which is awesome. How many, well, anyway, I'll ask these questions in the forum later. Get a sense of kind of where people are and whether people are stuck with some particular thing. People have been in office hours, which has been great. OK, so I have a couple questions about the setup this year. Is discourse working out for people? Are people happy with that? Seem to work. Please, like, change your profile with a picture and your actual name. So I don't have to, like, look at your UBIT name and try to guess what your name might be, because that's kind of a fun thing to do, and I have better things to do. So like, set up your real name or something you want to be known as on the forums, I guess. OK, what about the screencasts? Does anyone watch those? I've got some people have seen them. Yeah, are they useful? You might want to turn them on in, like, HD, because otherwise it's kind of blurry. I've never done this before. Next time I'm going to make the fonts a lot bigger, but right now the type's a little small. And the last thing, I put something up on discourse. So it occurred to me that Stack Overflow would be, like, a great tool to provide for you guys to use for the class. Does anyone think that would be nice, like a Stack Overflow type forum? I think it's, like, so discourse is kind of nice, but it's like these conversations or whatever. And Stack Overflow has this benefit where the good answers sort of flow to the top, right? It's not piazza where there's like one answer and you guys have to sort of collaborate to create it, which is ridiculous. But yeah, so I think it would work. The problem, anyway, we'll talk about this more on the forum. The problem is Stack, I was thinking, oh, this is great. I'll just install the Stack Overflow code. It won't be a problem. It's not that simple, right? There's some, like, vetting process that we have to go through. But I think if we cooperate, we can probably get through it. Yeah. Screencasts are on the assignment. They are on the assignment. So if you reload the assignment, they're just embedded into it. Yeah, so you get to hear my surprisingly whiny voice. I didn't realize how whiny my voice was until I started listening to it. I was like, oh, terrible. I don't know what to do about that. Probably take some lessons or something. I can start talking in an artificially deep voice. OK, so any questions about the material we covered on Monday? So Monday we got through talking about fork. What fork does? Kind of an important moment in the history, in sort of the history of your process, the moment it's created from something else. Any questions about fork? Just a few slides of review. So remember, this is sort of the updated process model that we got to where we introduced a level of indirection so that we could share some things between the parent and child after fork, right? And we had these three levels of indirection from file descriptor to handle to object, right? Again, this is something you guys will know a lot better after assignment 2. And that allowed us to share a couple pieces of state separately. So the file table, which contains the file descriptors, or index is into the file table, is private to the process. The file handles are private to the process, but they're shared. They can be shared after fork between the parent and child. They don't have to be. And the child doesn't have to use the file handles that are shared with its parent. If the child process wants to disown its parent, doesn't want to communicate with it, doesn't care about it, it can just close everything in its file table and move on with its life, right? There's no thing that requires it to share file handles with its parent, but it can. And then the low-level file objects save state that's used by the kernel. That's for the file system in specific state. And that stuff is shared sort of transparently between multiple processes, and that's managed by the underlying file system layer. Processes don't even have to worry about this. OK. So remember that fork copies the thread of the calling state. Fork copies the calling thread by default. The address space, the process file table. And so after fork, my child has whatever thread called fork, an identical address space to the parent, and an identical file table. So that means that that file table points at the same file handles that the parent had opened when it called fork. OK. So fork returns twice, remember? It turns zero once. What's the other thing it returns? The process ID of the process that was just created. So that's what's returned to the parent. And this is how you distinguish between whether you're the parent or the child when you use the fork call. OK. So any questions about fork before we plunge on? Yeah. So with fork, you said that it returns the return zero if it's the parent or the process ID of the parent. Can you explain again what fork returns? Ah, OK. Good question. So fork returns two values. If fork returns zero, I am what? Child. If fork returns, OK. So I just want to point out one thing. It's good to be accurate about this. This is on success. If fork fails, then fork will return some negative value. I think it's negative 1, and it sets air no, and that's done by the C library for you. So this is on success. On success, fork returns zero to the child, and to the parent, it returns what? The child's process ID. And this becomes important because there is a system call that the parent can use where it has to identify its child by the child's process ID. So it's specifically weight. So if I want to wait for my child to exit, I need to pass the child's process ID to the weight system call, so that's how the parent finds out of it. Does that make sense? Cool. All right, any other questions about fork? Fork returns twice. I think that's so cool. It's just weird to think about. All right, so now let's go back and look at one cool way that we can use fork. And this is a very common use of fork to create these process pipelines that we talked about last week. So a canonical way of programming on Linux and Unix type systems is to create pipelines of processes where the standard output from one process is passed into the next process of standard input, and that's done without the processes really having to do anything. So that's a really nice, again, some of you guys are so used to because you've been doing it, but it's a really nice model. It's just a very nice way of being able to build up more complicated programs from fairly simple problems. OK, so another system call that a piece of functionality that the kernel provides to processes is something called pipe. So the pipe system call creates an anonymous pipe object. Now there are ways to create what are called named pipes. Has anyone ever used a named pipe before? That's when you know something is really bad, right? Like when you have to start using named pipes, like there's just something about the world that's not right. So a named pipe is like a file, but it's actually a pipe. So there's no contents of the file. It just allows you to communicate between two processes in a pipe-like way. But by calling pipe what I get is an anonymous pipe. So it acts like a file, what pipe returns is a file descriptor essentially that can be passed to read and write in other things. And actually it returns two file descriptors because a pipe has two ends. So on one end, I write, and on the other end, I read. And I think people have corrected me and I haven't corrected the slides. I think Unix pipes are now bi-directional. So technically, there are two ends, but there's no read-end or write-end. If you read to one end, you will get the contents that are written to the other end and vice versa. That make sense? So you put things in one direction and they come out the other direction. So ignore the write-only end. I think you can set up a one-way pipe, but I think they're also bi-directional pipes. So anything written to the right-only pen is immediately available at the read-only end of the pipe. And to allow processes to use this effectively, the kernel will buffer a certain amount of information in memory. So when I write to a pipe, that write can complete before the next process, nor the next process calls read. So this is interesting. This is not necessarily something that's always true of this type of communication primitive. So for example, Go provides channels that don't have this feature by default. So if I write to a channel, the other side has to read it before the write will complete. The buffering allows the two processes to sort of act more independently. OK, so why is this useful? Well, so let's look at how we can use fork and pipe to set up a simple pipeline between two processes. And this is something that's essentially what's done by the shell every time you run a shell pipeline. So here are the steps. First, I call fork. But before I call fork, I create a pipe object. So now both the parent and the child have this pipe object. Remember, the pipe creates two file descriptors. So those file descriptors are in my file table. And so they're shared after I call fork. So the parent creates the pipe. It's got two entries in its file table now. It forks. The child has those same two entries, and they point to the same pipe. And now I have to do this little sort of delicate dance here where the parent has to close one end of the pipe and the child has to close the other. And what will be left over is a one sort of a pipe that leads from the parent to the child. So it allows the parent to write data that the child can then read. And the nice thing is these look exactly like files. So if I want to take input from a file, I can do that easily. If I want to take input from this pipe as part of a shell pipeline, I can also do that very easily. Happily, I have a diagram here because that would be hard to understand otherwise. So here's my pipe. And let's just imagine I have a read end and a write end. The parent has opened this, so the parent's file table is updated with references to this pipe. Now the child, now I fork. So this is what the child has. The child has these same two file descriptors that point to the same file handles that identify each end of the pipe. And now I just need to make sure that the parent and child close the right ends. And now here's what's left over. So I have this nice anonymous buffer that the kernel is maintaining the memory that leads data from the parent to the child. Are there any questions about this? Yeah. So I don't have to, right? If I left it like this, this would actually allow both the parent and child to communicate data to each other, right? So the parent could send data to the child, the child could send data back to the parent. If you think about how most UNIX, my example here is sort of based on how UNIX command pipelines are structured, they're one directional, right? So I start with one command, I pipe its output into the next command, which pipes its output into the next command, right? The stuff doesn't flow upstream. But no, as far as I know, there's no reason why you couldn't leave both ends open and allow both the parent and child to write and read data from each other and just use this as a form of IPC. Yeah, Ron. Yeah, I don't think so. I think that semantics of a Piper that when I write to end A, it emerges on NB, right? And if I write to NB, it emerges on NA, right? So there's no loop back here. I'm confused. Yep, which end? And then you've designated to write it. So that's the point, right? I mean, the data will flow, right? So if I write to this end, it will be readable on this end. If I write to this end, it'll be readable on this end. If I write to this end and try to read from the same end of the pipe, I'm not gonna get the data that I just wrote. I don't close those up. Is that true? Ah, that's right, okay. Now, you're totally right. So this is going back to your point. You guys have successfully flummoxed me. So that's completely true. So the pipe, so what the two questioners have been able to point out is that if I don't close those ends properly, if the parent read wrote here and read there, it would get the data it just read, right? That's how the pipe works. So in order to allow this to work, I do need to make sure that each process closes one end of the pipe. Now coming back to your question about can I use this to exchange data both ways, yes, right? Because remember, there doesn't have to be a write-only end and a read-only end. I can read or write. This is a file descriptor, right? So it's like any other file. I can read and write from it. So for example, if the parent writes to this end, let's say that this is what I have right now. If the parent writes to this end, the child will read it here. If the child writes to its end, the parent will be able to read it there, right? So this allows me to exchange it. Yeah, sorry, that's... Which end is for the read end? So again, this is a legacy of my slides. There's no read end and write end, right? The only thing that's critical is that the parent and child close the different ends of the pipe, right? So for example, the parent has to set things up so when the child runs, it knows which end to close, right? Because for example, if the child and parent close the same, like let's say that the child closes the same end as the parent, then the pipe essentially is not attached anymore, right? It's a good question, yeah. What sort of information would be something to talk about? So it could be anything, right? I mean, this is just an unstructured stream of bytes, right? So typically you guys are probably used to programs that send text through pipes like this, right? Standard Unix commands at output text. It doesn't have to be though, right? I mean, as long as the parent and child agree on the format of the data, it's just like a file, right? It's a stream of bytes. How you interpret those bytes, totally up to you, right? Yeah, good question, though. Any other questions about this? Okay, so here's an example, right? So, and this, I think this could almost work. I run, ooh, sorry, thanks. So I run fork right here, and now I've got two processes, right? So who's executing here? The child, because my return code is zero. So I close one, I close the one end of the pipe, right? Pipe essentially returns two file descriptors into this array, and now I can read some data, and the parent is gonna close the other end and write some data, okay? So this, and this is unstructured data, right? Unstructured character data that's being exchanged between the two. Okay. All right, so before we talk about synchronization, I just wanna point out a couple of problems that you guys might have sort of observed with fork. Or maybe you guys can help me observe some of these problems. So fork ends up having to copy a lot of state, right? Particularly, what is probably the most expensive thing that fork has to copy when it forks and creates a new copy of the parent? What do you think is probably the most, creates the most overhead? Yeah. Yeah, the address space, right? Remember, when the child starts executing, its memory content should be identical to the parent, but is it shared with the parent? Better not be, right? Yeah, so when the child begins to execute, its address space is supposed to be identical, but it's supposed to have a completely different set of, the contents are distinct. So if the parent and child, if the parent forks and the parent and child both write to the same location and memory, those two writes should end up in different locations in physical memory. So we'll come back to this when we talk about virtual memory, but yeah. So the address spaces are quite expensive. Why is this particularly problematic? So remember, I've got this command called fork, and the only parameters I've given you so far are fork, which creates a new process, but you might assume that, how interesting would your machine be if all you could do was create more copies of the first program that ran, right? You just have lots. It would be like a more broken version of those old Apple computers, right? Like you can create more copies of the same game that you started the computer with, but you can't create anything else, right? So clearly there's a way for processes to change. When does that frequently happen? Yeah, well before, yeah, I mean it's in between fork and exit, right? So if you think about what your shell is doing, your shell is repeatedly calling fork, and then trying to execute a command that represents the thing that you typed. That's what it does over and over and over again. It calls fork and then it tries to execute that command. So here's what fork does. I copy all sorts of state, and then I've copied all this memory, and then the first thing that the child does is it says I don't want that memory anymore. Because what happens when you, one of the things that happens when you run exec is that exec completely destroys the address based on loads of new program. So for example, when I forked the shell, all that shell code that gets copied is completely replaced by whatever the shell is trying to execute. So this ends up being pretty wasteful. Does this make sense to people this problem? So fork spends all this time copying all the contents of all the code for bash, right? That was one of the things that bash had in this address space when we looked at it using PMAP. And then it's trying to run bin true. So then bin true says, oh by the way, I don't want any of that code anymore. I'm bin true, I want my own code, and I have to wipe all that stuff out and start over. So there's a couple of different ways that systems have worked around this problem. So this is very clever memory management trick that you can play, and we'll come back to this when we talk about VM, because it's like a really neat way to sort of convince yourself that you understand how virtual memory works. So it turns out that I can actually set up the parent and child so that they do share memory after fork, but it's safe because if they try to modify that memory, I make a copy. So this is something that's called copy on write. Now the common case here is the shell forks, and the first thing it does is run exec, and so it doesn't actually modify any of those shared pages and therefore there's no overhead to do it, right? The other thing I can do is there, so there's this new system call that at some point that was introduced which is called v fork, right? And v fork is the variant of fork that you use when the first thing you're going to do is call exec. And so what v fork will do is unless you immediately call exec, it'll fail, right? It'll just, because, and you can kind of imagine what v fork didn't do, right? v fork is like fork's lazy cousin, right? So what did v fork not do? Didn't copy the address space, right? Or didn't copy very much of it, right? So the point is if I tried to start doing anything that would require that address space, I was like, no, no, no, no, no, you can't do that. I didn't set you up with an address space that allows you to do it. Okay, yeah, and then, so as I pointed out, over the years as the system called API on Linux and Unix systems has evolved, you've seen a lot more fork-like variants. So clone is something that you can use now to create threads within your own address space as well as new processes, and clone gives you a lot more control, right? It allows you to determine what parts of the parent and child are shared. It allows you to, so for example, if I want to specify an entry point for the new process or the new thread that I'm creating, I can do that using clone so they don't have to start executing in the same place. So it's a lot more flexible. This is something that came along a bit later. Yeah, and if you look up the man page for clone or any of these system calls actually, that's a good way to find out more about them, more than you would ever want to know, right? Okay, so one last little game with the standard Unix utility. So fork establishes this relationship between a parent and a child process, and this starts right when I boot up. And there's no real way for the child, and that the child can choose to ignore its parent, right? As many of you have potentially chosen to do, and you can choose how much you want to pay attention to your parent, how many things you want to share with your parent, but that relationship doesn't go away. Your parent's your parent. And so if you run this utility on a system, maybe inside your vagrant virtual box, this is kind of a neat thing that will show you the tree that's essentially been generated by all the forks that have taken place and the result in execs that happen later, right? So this should allow you, by looking at the output you should be able to tell, what is the parent or grandparent or great-great-grandparent of every process on the system? In it, so, and you may wonder, where does in it come from? I mean, it's like a chicken in the egg, right? How do I get a process before I have a process? So in it is set up, sort of, especially by the kernel, right? Once your kernel finishes booting, it sets up, and this is not true of your OS 161 kernel, but this is true of real kernels. It sets up the first process by hand, and then that process is responsible for creating everything else, right? What are most, so in this case, what are most of the things that in it created? If you guys are sort of familiar with Unix Linux like servers, yeah. Yeah, so these are like daemons, right? So I mean, you guys may have experience with Etsy in it, right? That's where all your in it scripts are located. That's what in it does, right? I mean, in it is responsible for starting things like the SSH program that allows you to log into the machine, web servers. I've got, I had Git daemon running on this, which provides public access to Git repositories. This is a mail server, all sorts of stuff that's running on the machine, right? Once these services get started, then potentially you have the ability to log in. Again, I've got an SSH, and you can see the PS tree, of course, like PS, PS tree has to report itself. So it's in this tree, its parent is bash, and its parent is SSHD, and I don't know why SSHD, maybe SSHD forked off a couple of copies of itself to accept new connections, right? All right, any questions about fork? Yeah. Oh, good question, okay, so what are some of the things that you think could cause fork to fail? What's an obvious one? Yeah, if I'm out of memory, I don't have enough memory on the system to actually perform the fork, sure. What else? Yeah, so hopefully if you ran a fork bomb like, experiment in your virtual, I don't think this will happen in your virtual machine, but hopefully if you ran it on a well-maintained system, what will happen is that your user cannot create an unlimited number of processes, and that's for obvious reasons. So what would happen is eventually that fork bomb would start to fail because there's a limit that's been applied to the number of processes, and you start to hit that. Now the problem with fork bomb is it will still generate so much system call traffic that it can still be a pain. It's still chewing up a lot of cycles and creating a lot of overhead because it's kind of like a denial of service attack. I mean, even an unsuccessful connection to a website consumes resources, right? But yeah, so fork might fail because that reason, you might have given it bad arguments, right? I mean, there's all sorts of reasons that fork can fail. So if you want to find out all the different reasons that fork could fail, what would be a good way to do that? What's that? Looking at the source code, okay, so that would be an awesome way to do it if you're in an operating systems class and or have a lot of free time, right? So I would certainly encourage you to do that. By all means, go read the source code for Linux fork. I mean, it's out there, it's readable, maybe. What's potentially a little bit more of an expedited way of doing it, yeah. What's that? Oh, wow, that's worse actually. So he's gonna hook up a debugger to Linux, which is also an awesome thing to do and something that is totally possible. And fun, right? But I will see you in a couple of days, yeah. What's that? Look up the error codes. Look up the error codes, and where would I do that? Yeah. Man fork, right? Exactly, so, and actually this would be very interesting. So if you run, if you look at man fork, it'll show you all the return codes, right? And it'll say here are some of the, and it may not show all of them, but it'll probably show some of the common errors and what causes them. Great question. Any other questions before we go on? Okay. All right, so now we're gonna, like I said, we're sort of jumping ahead a little bit. We're getting out of the flow of talking about system calls, but it's a good point at this, it's a good moment to talk about a problem that we've created with this whole fork idea, which is we now have multiple things that are running on the system at the same time. And this is normally a really good thing. And what I wanted to sort of give you a primer. So we're gonna come back and talk about threads in a week, maybe a little bit more. But what you need to know to start thinking about synchronization, and particularly to start working on assignment one, is the idea that the operating system, so up until very recently, concurrency was always an illusion, right? Why is that? Yeah. Yeah, so until like pretty recently, your computer, well, I guess that's not gonna be true forever, sadly. So anyway, until relatively recently in the history of computing, machines had one core, and that core did all the processing on that machine. And then I think it was about the time that my, maybe like early 2000, like right around 2000, I think you started to see the first multiprocessor machines, right? Now that's actually an important distinction. So today's processors have multiple processing cores on the same die. The early sort of multiprocessing machines had two processors in them. So you've got like one processor that did one thing, and then the way that you got multiple things to run at the same time is you just bought more of those, right? So if any of you guys had put together a machine, imagine that you'd buy like four core i7s or something like that, right? Which you would never do because A, I don't think you'd get a motherboard for that and B, you would be out like several thousand dollars, which is probably more than you wanted to spend. So yeah, so those machines had actually two processors, but at that point concurrency went from being an illusion to being real at least to some degree, right? Despite the fact that your computer, normally unless you have a gazillion cores, probably still provides you with the illusion that there's a lot more going on than it can actually do at the same time, right? At this point, you know, here we are in 2016 and your smartphone now has four cores. So concurrency now on modern systems is very real, but the illusion of concurrency is this idea that the system can be in the process of doing a lot more things than it can actually be doing it once, even if the number of things that can actually be doing it once is greater than one, right? So threads are the abstraction that I used to multiplex the CPU and I think you guys are sort of familiar with the idea that there can be multiple threads that are running either within the same process or across multiple processes on the system. Once I have two processes, I have at least two threads, right? There's no real, it's kind of an interesting idea. What would a process be like without a thread? Just a blob of memory, I guess, right? Kind of boring. Okay, so now when you guys, how many people have done a lot of sort of multi-threaded programming in the past who didn't feel really comfortable with some of these concepts? Okay, not enough of you guys. So concurrency is, let me just rant for a minute here, concurrency is super, super important for you guys to understand and it's not just because of multi-core. It's also because concurrency allows you to make even single core machines go way faster because your computer has slow parts and really fast parts. So give me an example of a slow part, the disk. Disk is super slow. Give me an example of a really fast part. The processor, right? I mean, that thing is like zipping along and the disk, okay, so disks have gotten better but still we're talking like Earth to Pluto, right? In terms of the distance between how fast your processor is and how fast the disk is. And so if you can't write concurrent software every time you have to do something with the disk your process is going to sit there for like decades, ages like the universe will warm and cool and warm off again before you get done, all right? So being able to write code that is concurrent which also is not quite the same as parallelism we'll get to that in a minute, is super important today. So if I was you and I were living in 2016 I would say I need to learn how to write concurrent software right now, okay? And this class is going to help, all right? And so that's the biggest thing today, right? Is that despite, and you also have multiple cores out there. So you're telling me, hey, I wanna work as a software developer for you. You have a four core smartphone, I'm sorry I can only run software that runs on one of those cores. What am I gonna do with the other three, right? I don't know what I do, what the phones do with the other three to be honest with you and I study this stuff, so. But so what's hard about writing concurrent software at least on some level is that once you start, so a lot of times until you've written software in this style, until you've written things like this you're used to a model of programming which is kind of step by step by step. So here's how my program runs, it starts in main and then it runs the next line and the next line and the next line and maybe there's an if statement so maybe it goes here or there and maybe I call a function so it goes there and comes back, but you've gotten used to thinking about your program as doing one thing at a time. Once you unloose, sort of unleash the hounds of concurrency, all sorts of things start to happen at once and it requires a lot more coordination in order to make sure that things happen correctly. And so the challenges when writing concurrent software boil down to a couple of things, right? One is coordination, which is essentially how do I ensure that I split up what I'm trying to do in a way so that I can do it efficiently and make use of all the system resources, okay? And this is an interesting problem and it's something we will talk about. This is something that operating systems have a little bit to do with in terms of how they schedule resources. The second thing is correctness and this is what usually people struggle with more when they start to write concurrent software. So correctness problems manifest themselves in a bunch of different ways. One is the system crashes. That's actually pretty good because that's easy to detect. The other problem can be stumbling doesn't happen properly or everything stops happening, right? And your software just sort of sits there. So concurrency creates a lot of issues and what we're gonna talk about for the next couple of lectures are primitives to enable efficient concurrent code, all right? And we're gonna focus mostly on correctness because that's the most frustrating thing to get right but we will come back to coordination, all right? So the operating system is, you might be thinking, well, why are we talking about this in this class? Well, there's two reasons. One is it's almost too late, right? And you guys should really learn this stuff before you graduate. The second reason is the operating system is kind of important when it comes to concurrency. So why? Yeah, right, exactly. So remember, the operating system is a program. That program provides this interface that's used by processes. How many processes are there on the system? Like more than one, right? Like a lot, hopefully on a good system. So the operating system is inherently concurrent. The operating system has to be able to handle a bunch of different requests from a bunch of different processes or the threads that are inside those processes at the same time. So you can look in certain cases and say, well, the program doesn't really need to be concurrent. It's just performance optimization for the OS. It is inherently concurrent because a non-concurrent operating system can only support one process, right? And if you look at the development of Linux or other things, one of the things that the people who develop those systems spend a huge amount of time worrying about is this idea that they might have to stop everything to do something. So in Linux for years, I think there was this idea there's something called the big kernel lock, right? The big kernel lock stopped everything. And so the part of the goal was reduce use of that. Because if I have to stop everything in order to handle one type of request, I've got all sorts of processes that are trying to do things that get slowed down, okay? All right. And a lot of times, even internally, the operating system has a lot of threads. So there are certain threads that are created in the operating system to respond to requests by user programs, but the operating system itself, in order to sort of do things in the background, frequently spawns off a bunch of threads. And so even the OS, when it's not handling requests from processes, is multi-file, okay? And then as Ron pointed out, lots of shared state, right? You know, all the shared state that records stuff about the file system, all the resources that I'm multiplexing, whether it's the CPU cores, whether it's memory, all of that has to be tracked in the operating system. And negotiating access to that between multiple processes requires a lot of shares, okay? And then the final reason that this is super important is because if I get it wrong, it's bad, right? If I get it wrong, it's blue screen of death, its system gets really slow and unresponsive, whatever. You know, if a user program messes up currency, that's okay, we'll just crash, we'll just restart it, right? No problem. If the OS crashes, it's bad, right? A lot of things get affected. Okay. So probably the most coherent, you know, I'm not sure I even really understood this distinction, but there's a, how many people have heard of Go? Oh, that's awesome. Where did you guys hear about Go? Google, yeah, okay. So Go is Google's new program. How many people have programmed in Go? Oh, okay. So Go is awesome, you guys should learn it. And Go is a language that's really designed to allow programmers to unlock concurrency. And they have this great description, and they have, there's a whole talk by this guy named Rob Pike about concurrency and parallelism in Go. And I think it's kind of an important distinction. So just let me sort of point this out. So, you know, what he's trying to point out is the fact that parallelism means that multiple things are actually happening at one time. So without, for example, multiple cores, I can't really have parallelism. It's not a physical reality. Whereas concurrency is about, is they put it dealing with multiple things at one time. So for example, you could have an operating system that was concurrently handling a bunch of requests from a bunch of different processes despite the fact that it was only running on one core. And so what that means is that I'm switching back and forth between things and I'm trying to make sure everything makes forward progress, right? I would encourage you to, the lecture slides will be up soon and they will have a link to this video. And I would encourage you to watch it, it's very cool. And this language project is extremely cool, right? It is, if you guys like C code, and of course you will love C code by the time you're done with this class, and you want like a modern version of C that isn't terrible, Go is it, right? It fixes all sorts of things about C, it's just really nice. Okay, so those are things that are good about concurrency, that's sort of why we have to do it. Concurrency really makes, it's an incredibly powerful tool for making things perform better and unlocking all these system resources that you have, right? But concurrency creates these problems and this goes back to your mental model about how different, how your program executes, okay? So when you start having multiple threads, or in the case of the operating system supporting multiple processes, what you need to think about is the fact that those threads in general can be run in any order, can be stopped and restarted in an arbitrary moment of time and once they're stopped, can remain stopped for arbitrarily long periods of time, okay? And again, the normal reason this happens is because the operating system or the thread schedule for your application is trying to make good use of resources by switching back and forth between things and allowing a lot of things to happen at once, right? But these realities create challenges that make it important to get things right, okay? So let me see if I have time to get through this example before we're done. Oh yes, okay, perfect. All right, so this is probably the canonical example of the synchronization and it's designed to make you care about it because it involves money. Maybe I should have put more dollar signs or more zeros on the end of the dollar amounts or something like that, but this is me, okay? Like I care about a thousand dollars, right? Okay, so then here is the code that you wrote or that I wrote, you can blame me if you want, to update the balance on this particular piece of banking software. So what does this do? This calls this function that retrieves the balance in my account, it updates that function, it updates the amount by adding the amount that you're depositing and then it writes that value back and then there's this little thing down here that tells me, maybe sends me a little text message letting me know that you've deposited money, right? So let's imagine that this is what's happening, right? So one of you guys is giving me a thousand dollars, right? This is like B-level material, right? I said I had an unlimited number of A's, so if I have an unlimited number of A's, then in theory I can charge whatever I want for them, right? Or actually maybe in theory that means that the price for them is driven to zero, right? I don't know, anyway. And one of you guys is depositing $2,000, okay? So that's better, not perfect, you know? So what can happen here, right? So now remember, these two, so again, I mean this code looks fine, right? And until you have multiple threads, this code is not a problem, right? The problem is when you start having multiple threads, this code can go wrong in some unfortunate ways, all right? So how many, just to sort of jump to the punchline here, how many different values can my balance have if after this code runs? Two, someone wants to say two, do I hear three? All right, do I hear four? Okay, let's see. All right, so here's the best case scenario, right? And this is pretty much what would happen if you didn't have multiple threads. So let's say this is not a multi-threaded application and these two calls to this function proceed synchronously. So they minus student deposits the $2,000, now I've got 3,000. The B student deposits $1,000, now I have 4,000, okay? So I'm good, this is what we want to happen. This is the version in which there's no money that is inexplicably destroyed, right? Like an invariant here should be that there's the same amount of money before as after, it just moves around, right? Okay, so a different option. What happens if the A student gets to this point in the code? And remember, one of our assumptions was that your code can be stopped at any point in time. And it turns out that this is the point in time that the OS chooses to stop your program and run something else. And the thing it's running is this deposit. So now what's going to happen? So now the B student runs, and this is the point of the code they're at, and now let's say I go back to the $2,000 deposit. Or actually, no, let's say we do the B student first, right? So now I've got $2,000. What's my final balance gonna be here in this example? $3,000. So you guys deposited $3,000 into an account that had $1,000 in it, and the result was $3,000. So $1,000 is gone missing. Okay, now what else can happen? So what's the worst case scenario? I get caught for selling grades, I guess. So let's say that that doesn't happen. This is just slightly worse, right? So the A student gets started, the B student gets started. Now what happens? A tries to put the balance, okay? But then B puts the balance, oops, right? So this is funnier, I guess, because the balance went down, right? But it's equally wrong, right? Now there's $2,000 that have gone missing, right? And so look, like I said, this piece of code works fine in a single threaded environment. If this piece of code is not executed in parallel, there's nothing wrong with it. But once you start executing in parallel, and once you start to have to worry about all the different ways that two threads can interleave with each other, this code has a problem, right? So just my last statement for today, so this is something that you guys may have heard of, this is called a race condition. When the output of a program depends on some detail of how the threads are executed, in terms of what order they're executed in, that's something that we refer to as a race. This is not a feature, this is a bug, okay? And on Friday we will talk about how to fix it. I'll see you guys then.