 All right everybody, welcome back to the third lecture of the virtual CS162. Tonight we're going to dive right into some material and try to give you a programmer's viewpoint of this. So the first several lectures we're going to basically talk about what you as a user level programmer might see from the operating system before we get really in depth in how the operating system gives that view. So that's basically our goals for today. We're going to talk about threads, what they are and what they aren't and why they're useful. So we'll give you some examples and we'll also talk about how to write programs with threads and some alternatives to using threads. Okay, so if you remember from last lecture we talked about four fundamental concepts where the focus was really on virtualizing the CPU and one of them was the thread which is essentially an execution context, fully describes a execution state with program counter registers, execution flag, stack, etc. We talked about an address space which is the set of memory addresses that are visible to a thread or a program. We'll talk more about that. We also then talked about a process. I did see some questions on Piazza about is something a thread or a process? That's the wrong question. So as I responded a process is basically a protected address space with one or more threads in it. So typically you talk about a process with threads and the question of is it just a thread or is a process involved usually involves protection. And then the final thing that we finished up with was this essential element of modern operating systems which is hardware that can do dual mode operation and protection which is really boils down to there being two states, a kernel state and user state where only certain operations are available in kernel state and as a result the kernel is able to provide a protected environment. Okay, so just again recalling from last time we talked about how to take a single processor or single core which is going to be pretty much where we talk about for the next several lectures and give the illusion of multiple virtual cores. So what you see here is the programmer's viewpoint is going to be that there's a set of CPUs that are all talking through a shared memory even though there's only one actual CPU. Okay, and we're going to think of threads as a virtual core and multiple threads are achieved essentially by multiplexing the hardware in time. So we talked briefly about this idea of these three virtual CPUs executing on the single CPU by loading and storing registers in memory as we go. So we sort of load magenta's registers run for a while then the cyan ones and the yellow ones and et cetera and the thread is executing on a processor when it's actually resident in the processors registers and it's idle or asleep when it's not and we'll talk a lot more about the states of a thread once we get into the internals of the operating system but each virtual core or thread has a program counter or stack pointer, some registers both integer and floating point in most cases and the question of where it is, well it's on the real physical core or it's saved in memory chunk of memory we call the thread control block and the difference between these two things is whether it's actually running right now or in sort of a suspended state. The other thing we talked about was this idea of an address space which is the set of all addresses that are available to a given processor or thread at any given time and we talked about how 32-bit addresses give you 4 billion bytes, operations and 64 give you 18 quintillion addresses of 10 to the 18th. Some of you may know 10 to the 18th as an exabyte but that's the general idea of an address space. What's more interesting for our purposes here is going to be the virtual address space which is the processor's view of memory where the address space that the processor sees is independent of the actual physical space and in most cases that's involved some explicit translation so the processor brings out virtual addresses they go through some translation to get to physical addresses and for the purposes of this particular lecture here's a thought that you can put in kind of use throughout the lecture which is this idea of translation through a page table and again we said 61c did talk about that we'll talk about it in a lot more depth but here we have two virtual programs or threads or whatever you want to talk about them and they're operating in their address spaces with code and data and heap and stack and all of those things and what happens is the addresses come out of the processor and they go through a translation map and again how are that works we don't really care for the moment and after they're translated they get translated into physical addresses so the code of this blue thread or process basically gets translated into this particular chunk of physical memory whereas the green code gets translated into this particular chunk okay now the question that is in the chat here is our virtual address is handled by the OS or by the CPU hardware and the answer is yes so in reality these little translation maps are usually part of something called a memory management unit in hardware but the operating system is responsible for configuring these things by setting up something called a page table and so we will talk a lot about that so don't worry if you don't remember the details and in fact 61c hardly talked about that but for now imagine that there's actually a translation map that basically takes these addresses from processor one turns them into the blue ones and from processor two or pro virtual processor two turns them into the green ones and notice that if we translate it right then blue can't even touch anything that green is addressing because there's no way for blues address space to get transformed into something that addresses the green physical memory okay and notice also I have the operating system down here which is completely separate as well so this simple idea of translation gives us quite a bit of protection okay all right now are there any questions on that are we good and let's try to keep the chat for actual questions here so that we can keep questions from people so let's keep this mental image here of page translation and how it protects green from blue and both and the operating system from both of them and let's move on so if you remember we talked then about processes the question here about where we store the page table is that's actually going to be stored in the operating system itself in a way that is not addressable to the two I to the green or blue okay and the the reason that parts of user space can't address all of physical memory is exactly what you see here not every address you take all the addresses that are possible and they just don't translate to things that are green or white from the blue side and that basically prevents the processor the blue one from actually even addressing green or white okay and we'll talk more about this as we go okay now so if you remember when we talked about processes again a process is basically a protected environment with one or more threads here's one with a single thread your your Pintos projects that you're going to be dealing with essentially have a single thread per process but real operating systems or I will say more sophisticated ones can have multiple threads okay and so a process is really this execution environment with restricted rights one or more threads executing in a protected address space okay which owns a chunk of memory some file descriptors and some network connections okay so a process is an instance of a running program if you have the same program running twice they'll have it'll be running in two different processes and why do we have processes as an idea okay is that means that those two processes are protected from one another and the OS is protected from them so this idea of processes is really one in which it's the essential protection idea that we're going to be talking about in the early part of this class okay modern OS is pretty much anything that runs outside of the kernel runs in a process for now okay so the last thing we talked about as I mentioned was dual mode operation and here processes execute in user mode and kernel executes in kernel mode but as folks on the on the chat have really talked about a couple times here as if you think about this translation for a moment if blue is able to alter its own page tables then all of bets are off for protection right if blue can alter translation so that some virtual address which was previously not valid can somehow map to green or white then you know all the protection is broken there so really we need some way to make sure that the user code can't alter those page tables among other things and so that's where the dual mode operation comes into play okay so processes running in user mode are running with a bit set in the process says user mode and in kernel mode that's when the bit says kernel mode and it's only if you happen to be in kernel mode that you can modify things like page tables okay and we'll get into much more detail in that as we go on so this is still not quite enough okay you have to make sure that user mode apps can't just randomly go into kernel mode and execute anything they want because what's the point right and so we talk briefly about very carefully controlled transitions between user mode and kernel mode and that careful control transitions basically allow us to make sure that the only way to go from user mode to kernel mode is when doing things that the writer of the kernel supports okay and putting things in kernel mode is typically done only with extreme care because things that are running in kernel mode have control over all of the hardware and so typically only the operating system developer puts things in kernel mode all right we'll talk about some slight versions of that that are a little different as we get further in the course but for now pretty much only the developer the operating system puts things in kernel mode okay and here's an example of something we'll talk about today for transitioning from here's the user process running in user mode excuse me with the mode bit one making what's we're going to talk about is a system call which goes into the the kernel executes some special function and then returns to user mode and that system call is very restricted so yes it turns on the mode bit to one meaning that we're well or a zero in this case saying that we're in kernel mode but it only allows you to do that if the code you're calling is one of a very small number of entry points and so an example of this might be open a file or we're going to talk today about start a new thread or start a new process so this could be a fork system call and so that dual mode operation involves this extreme restriction okay so what are threads okay so a thread is a single unique execution context talked about that it provides an abstraction which might be a single execution sequence that represents a separate separately schedulable task and that's also a valid definition threads are a mechanism for concurrency so we're going to talk a lot about that understanding that because of threads you can have multiple simultaneous things that overlap each other and that can be very helpful and protection is completely orthogonal okay so again that question of is this a thread or a process is the wrong question the process is the protected environment the threads run inside of it and that process would include for instance an address space plus a translation map through a page table okay all right by the way the mode bit there's a question about the mode bit here let's just say that a mode bid equal to one is user mode and zero is kernel mode but this is completely dependent upon the dependent upon the particular piece of hardware and in fact in x86 there's even more than just two options here so for now there's user mode kernel mode okay that's what we need to remember now what are threads okay so protection is this orthogonal concept but let's dive into the motivation for why we even bother with threads okay so and and yes I will say one other thing since this topic is coming up in the chat so one way in which things get added to the kernel is device drivers and we mentioned last time that those are weak points in reliability typically and those device drivers are things that get added only if you're a supervisor okay and and you've made a decision that you're you're willing to add this to the to the kernel and risk that device driver so what I mean by protection being orthogonal now again is that the protection is the environment the thread is the execution context okay so those are different things so for process has one or more threads in it okay now for now processes contain their own threads and don't access other people's threads except through communication mechanisms okay so what's our motivation for threads so operating systems as you can imagine need to handle multiple things at once okay you know processes interrupts background system maintenance all of those things keystrokes I'm moving the mouse around I'm drawing things on the screen so there's many things at once or multiple things at once MTAO by the way I made that up but we're going to use it for the lecture so operating systems need to handle MTAO okay and how do we do that we do it with threads so examples our network servers have to handle multiple things at once because there's many threads there are many excuse me network connections that come in at once parallel programs well by definition if you have a bunch of CPUs and you want to run something in parallel you need to do multiple things at once and if Fred some Fred's could be a way to do that and when you talked about parallelism in some of the 61's one of the ways to do that is with threads okay now programs with user interfaces invariably need MTAO so that would be again like I said mouse movement keyboard if you have a voice interface so the microphone here is something things get drawn on the screen these are all different independent things that can happen and so having threads available to allow them to happen in parallel is important okay and they're gonna make it really easy to program network and disk bound programs have to handle MTAO because you have to hide network and disk latency so the question on the chat as I mentioned multiple things at once is a term I just made up it's up top here multiple things at once okay so you need to be able to if you're waiting for something to come off the disk or from the network you want to have a thread that's just sitting there waiting but not blocking everybody else up so you have another thread doing something else okay so now let's keep the let's keep the chat down to just things that we're actively talking about well the concept of how processes communicate with each other is a much more interesting extended one so don't worry we will get to that okay not today but we'll get to it so threads basically our unit of concurrency provided by the operating system and each thread can represent one thing or one task or one idea one chunk of code okay and so that's that's gonna be our model in this particular lecture so let's talk about some terms that you've heard thrown around as you've come up you know learning about computers so some definitions so multi-processing is sometimes used when there's multiple CPUs or cores acting together on the same task a multi-programming is something similar which is multiple jobs or processes not necessarily running simultaneously so the idea of processing versus programming sometimes gets at parallelism versus concurrency multi-threading is just multiple threads in a process okay and so what does it mean to run two threads concurrently now I know then the 61 they try to get this idea of concurrency versus parallelism but let's take another stab at it what it things what it means for things to run concurrently is the scheduler is basically free to run the threads in any order and any interleaving and the thread may run to completion or time slice in big chunks or small chunks or whatever and so concurrently means overlapping with no control over how that overlapping goes so here's some examples here's multi-processing where we have a B and C or threads and because let's say there are three cores in this system all three of them are actually running at the same time so not only are a B and C concurrent but they're also parallel okay here's a different view where we have the same three threads but we don't have more than one core or processor we only have one processor and in this instance we can't actually have things running simultaneously so one thing that could happen is a could run a while and then B and then C okay where now we're actually running a to the end and B to the end and C to the end or we could interleave them a runs a while B runs a while C runs a while then A then B then C then B etc all right and notice that these two options here could happen interchangeably on the same system depending what the scheduler does or whether you have multiple things running multiple processes could use up cores so that maybe if you have enough things running you get this interleaving or if you only have one thing running you get multi-processing okay and so the very important thing to note here is the moment we move into this idea of concurrency we have to design for correctness we can no longer just throw up our hands and write a bunch of code and hope it works because any code we write has to work regardless of what the scheduler decides to do for this interleaving let me say that again the moment we start with having more than one thread and a concurrent system we now have to start thinking about correctness and you could think about correctness and just write a bunch of stuff and keep changing it until it sort of looks like it works and I guarantee that is a bad idea because it will stop working at three in the morning or you can design for correctness with the proper locking schemes or parallelism constructs or whatever and we'll talk a lot about that as we go and then you can be sure that no matter what the scheduler throws at you this will do the right thing okay questions so we're gonna we're gonna try to teach you how to design for correctness that's gonna be our goal okay and again the difference between multi-threading and multi-programming is perhaps somewhat historical but multi-programming came up in the days of the original Unix systems where there was only one thread per process so process had a single concurrency and add their space associated with it multi-threading kind of comes up in the in the era where you can have more than one thread per process so it's really kind of multi-programming might be one thread per process multi-threading might be more now we're gonna talk about advantages in a bit okay so just hold on to that question so concurrency is not parallelism so look here this is parallelism A, B, and C running together at the same time this is not parallelism all of these are concurrency they're the possibility for overlap okay so concurrency is about MTAO multiple things at once parallelism is about doing multiple things simultaneously okay where simultaneously again if I were to take a slice across here and look at a given cycle on that multi-core processor for instance I would see there is an instruction from A and instruction from B and an instruction from C all running at the same time whereas if I have only one core I see that there's really only green pink or blue okay so example two threads on a single core system are executing concurrently but not in parallel okay each thread handles or manages a separate thing or task okay and but those tasks are not necessarily executing simultaneously okay now I'm not actually talking about Amdahl's law which got brought up in the chat because Amdahl's law is about the ability when you have parallelism to actually get use it successfully so if you notice here green pink blue might not you know the green might run a little bit and then you have to wait for pink and blue to finish before you can do anything this might by Amdahl's law be very poor because the serial section is large okay so we're going to be talking about we'll talk about parallelism a bit more as we go on okay now here's a silly example for threads okay remember my favorite number Pi okay and so here's a thread where we say main and we compute Pi to last digit and then we print the class list okay so what's the behavior here anybody are we so first of all this is going to run forever until we unplug it or hit control C or something what about the class list yeah class list will never get executed so this particular instance is an example where running the first one to completion and the second and then the second one means the second one never runs and okay and furthermore if you think about this we have not told the system that it can't interleave these because we haven't introduced any threads so this is a process with one thread and all it can do is first run compute Pi and then run print class list so threads using threads correctly starts with giving the system notification of what can actually run concurrently and then the scheduler can start doing different things for you okay so for instance here we could add some threads now create thread here is just a is just a general abstraction for however you create threads in your system but if this somehow creates a thread which is computing Pi on argument Pi dot text and this is somehow creating a thread that's printing the class list on class list dot text what we've started out here is we've actually introduced concurrency to the system in a way that allows it to now start scheduling things in an interesting way all right create thread here is some abstraction of spawning a new thread I'll actually give you p threads later in this in this lecture as one instance but this should now start behaving as if there's two CPUs in the system virtual CPUs and as a result we will see digits of Pi perhaps showing up in Pi dot text interleaved with the class list getting printed okay and so why is that well because we've created two threads and now the scheduler can interleave them and go forward now notice that this previous version even if you had a multi-core with a hundred cores on it is still going to behave the same way because we haven't told the scheduler that there are multiple threads that can run okay we've only there's only one thread that's in this code okay now let's talk some administrative here so as you know homework zeros do tomorrow and you're really going to get going on it what homework zeros particularly important because it gets you set up with all of the infrastructure for CS 162 gets you set up with your github account and so on okay it gets you set up with your virtual machine gets you familiar with CS 162 tools and it reminds you a bit of programming in C which also I'm hoping that most of you went to the C review session yesterday I think that there were some videos that came out of that so you should be able to look at them but remember homework zeros do Thursday tomorrow okay project zero was released yesterday and you should be working on it okay it's due next Wednesday and project zero is like a homework should be done on your own okay and by the way I'm very happy to hear that that the review session went well that was our intention you know C is is a language that you probably don't have enough familiarity with yet you will have plenty by the end of the term and it's good that's good to get moving on it okay so the other thing of course we mentioned is slip days you have because of the complexity of being virtual you have four slip days for homework and four slip days for project that's a little more than we normally give but I would say bank those don't spend them right away okay and because basically you know I'd save them for more the end of the term because when you run out of slip days and you turn things in late you don't get any credit so okay and I don't have a direct estimate on project versus homework but teach the project zero is like a homework so get moving on it the other thing which I hope everybody realizes is that Friday that's two days from now is drop day so you need to make a decision this is an early drop class you need to make a decision about whether you're gonna keep the class or not okay and you know it's very hard to drop afterwards I don't know we had a student a few years ago who will remain nameless who didn't realize they were still in the class they had kind of stopped paying attention and about halfway through the term they realized they were still in the class and they went to drop it found out that it was an early drop class and they were they were petitioning their department they weren't in the ECS to allow them to drop it and last I'd heard that didn't go so well so that was because I think they're one late drop that you get they'd already used up and so they basically were stuck so don't be stuck this is actually bad for you this is an awesome class I like to think this is you know the most awesome class but perhaps I'm overstating it but if you don't want to be in the class drop it please and let other people in so I don't want to overstate that anymore but all right and by the way as of tonight we're probably gonna let the rest of the folks on the on the wait list into the class as well as concurrent enrollment so I think we are now everybody's in okay unless you don't want to be in which case you better drop okay all right any questions on administration yeah I just wanted to make sure I told that story about early drop okay DSP related policy you can talk to me individually and about it okay now I have everybody's letters and so on so okay as far as collaboration policy I've said this before but I just want to state it again I'll stop saying this every lecture but be careful about your collaboration okay watch it carefully so explaining a concept to somebody in another group is fine discussing algorithms or testing strategies is fine discussing debugging approaches is fine searching online for generic algorithms like hash tables or whatever that's also fine notice that these are not details about projects or homework these are higher level ideas or concepts okay that that's fine what isn't fine are things like sharing code or test cases with another group or individual including homeworks so I know there was a proposal on piazza to have homework study groups or whatever but in CS 162 the homeworks are actually graded and they are part of our checking policy to make sure that nobody's sharing code so make sure to do your homeworks on your own and by the way the home doing your homeworks on the own on your own we've chosen the homeworks carefully to help you with the projects so that's another reason why it's very important to make sure that you do the homeworks because they will help you along with ideas in the projects okay you can discuss cons high level concepts but no details okay nothing like well I would do this or I'd have a variable that did that you can't do any of that idea okay copying or reading another group's code or test cases not okay copying or reading online code or test cases from prior years not okay helping somebody in another group to debug their code not okay right yeah now you know we compare projects and homework submissions against prior year submissions and online solutions and take actions if we see significant overlap and don't ask your friends don't put them in a bad position by asking them to give you an answer to a homework that's happened and it got caught and it's bad for both parties so all right nothing that so let's go back to the topics all right if you a negative number on the wait list I guess that's I have no idea what it means I think it means that there are that we are basically allowing everybody in now at this point but I think we may or may not let new people in so so let's go back to threads which is our big topic and we'll get to processes as well but back to Jeff Dean's numbers everybody should know I brought this slide up the first day just to show you the huge range of numbers okay you know and everything from half a nanosecond to or hundreds of femtoseconds up into you know seconds okay and really these up here in the seconds or in the in the millisecond range here can be problems okay because disk seeks you know tens of milliseconds etc you can't wait all of that time tens of milliseconds before you do something else and so you want ways of overlapping IO and compute and so these this number said here tells you right off the bat a very good motivation for threads okay which is handling IO in a separate thread to avoid blocking other progress now threads masking IO latency so disk does disk seek also include SSD we'll talk a lot about disks in SSD a little later in the term as you may be well aware so SSD typically doesn't have a seek time like a disk does okay because SSDs are solid state that's what SS stands for but there is an access time to the disk so even that access time is time that you could be off doing something else okay so it's not going to be as big as 10 milliseconds but it'll be microseconds that you might want to do something else in so threads are in typically at least three states and when we get into schedulers and in the internal of the operating system you'll see more about these but roughly speaking a thread could be running which means it's actually got a processor or core and it's getting CPU cycles out of that hardware or it could be ready which is means it's eligible to run but not currently running or blocked and if you remember that picture I showed you earlier let me pop it back up here just because it's an easy way to say this dum-dum-dum maybe it's not that easy in this instance here if we can run a b and c what that means is that while a is running b and c are ready but not running okay they're on the ready list you'll see more about this as we go and as soon as a is the scheduler decides that a is done in this instance then it picks b off the ready queue or in this instance where we're alternating between a b and c we have a will be running will be in c are ready and etc okay and we're going to show you a lot more about how that actually works but for now what's useful is really this idea of running ready or blocked which is a new one which is that that thread went off to do an operation and it made a system called of the kernel to say read from disk or from a network and it's actually not on the ready queue and it's ineligible to run okay and this is where the true power this is where the true power of the threads come into play because if we have two threads then one of them can be blocked and off of the ready queue will the other one's running now a question about can only one core can a core run only one thread at a time yes by definition a core has a hardware thread it's running and that thread you know gets pulled off the ready queue now I will I will talk soon about simultaneous multi-threading where perhaps this gets a little fuzzy okay but for now the other question of how do you get from blocked to ready is basically the operating system notices that a thread is blocked on IO the IO comes in and then at that point it puts the thread on the ready queue and takes it out of block okay because it's ready to run because the thing it was waiting for is ready okay we'll get to that in much more detail not today so one of the once the IO finishes the OS marks it is ready okay and and so then you know as a result we're going to have multiple virtual CPUs going through where any given core has one thing that's actually running and then the schedulers got the rest of the things ready so here's an example where if no threads perform IO then essentially they're always on the ready queue or running and so here we have two threads while magenta is running sians on the ready queue and while sians running magenta is on the ready queue and while magenta is running are you guys at the point if we put IO in here we get something more interesting right so here's an instance where the magenta runs and at some point it does an IO operation and that's going to take it completely onto the ready queue and put it back on the on a weight queue okay so off the run it's not going to be running it's not going to be on the ready queue it's going to be on a weight queue associated with that IO and now this the blue item which we're assuming is just computing pi or something gets to just keep running and there's no reason to switch because there are no other thread in this instance that can run on the ready queue okay and then eventually when the IO compute completes the magenta is put back on the on the ready queue and at that point it's it's now available to be run now a question here about can it go directly from block to running so that doesn't happen that way just because the scheduler gets involved it again we'll talk more about that and it needs a chance to run its policy to the side just because something's on the ready queue it may or may not be the next thing to run and so we typically go from blocked to to the ready queue not running immediately okay so perhaps a better example for threads than computing pi although given that pi is a cool number I couldn't imagine a better example might be the following where we create a thread to read a really large file maybe pi and we create a thread to render the user interface and what's the behavior here is that we still respond to user input even while we're doing something large okay so this first thing maybe runs part of a windowing server or what have you or it's running the event loop for the for the windowing server and this other thing is doing something that might be either IO or computationally intensive this is a great use for threads okay threads for i for user interface and threads for compute is a common common pattern okay everybody with me on that now um you know hopefully having done homework zero you know how to compile a C program and run the executable so this typically creates a process that's executing the program and initially this new process has only one thread in its own address space with code globals etc um question we might make is how do we make this a multi-threaded process well i've kind of shown you this in pseudo code but once the process starts it can issue system calls to create new threads and these new threads become part of the process and they share its address space and once you have threads in the same address space now they can read and write each other's data uh and therefore they can they can collaborate with each other okay whereas if you have threads in separate processes they can't talk to each other easily and again i know that question came up in the in the chat a couple of times but the whole point of processes is to make it more difficult for them to share information that's the protection component and the only way you share in that instance is by explicitly deciding to communicate okay so let's talk about system calls this is part we mentioned this earlier with our dual mode discussion so typically if you look at an operating system we've got this narrow waste idea or the the hourglass kind of design where the the difference between user and system is at the system call interface okay so things running above here typically run in user mode and that's going to be all the stuff that you first start writing when you're in in user mode and things in the kernel tend to run in the system mode or kernel mode okay so user mode above kernel mode below and then of course we have hardware here and what's at the the interface here is system calls and so the way to get from user mode into the kernel through this interface is by making a system call okay now many of you are probably saying but i've never seen a system call well you know the operating system libraries issue system calls language runtimes issue system calls and so in many cases the system calls are actually hidden below your programming interface okay now a very interesting question that's in the chat here is well are system calls standardized across all sorts of operating systems and the answer is no in general windows and unix and you know os2 and ios and all these different operating systems have different system call interfaces but there are at least one set of attempts to standardize there's the so-called POSIX interface and the POSIX system call interface is shared at least partially across a bunch of different operating systems okay another question that's in the chat here is if you're an administrator are you running in user mode yes most of the time however you're allowed to do things that take you into the kernel where you might not otherwise if you're an administrator and we're going to tell you how that works but you'll have to hold off on that for a moment so for now let's assume that user mode is what programs are running kernel mode is the operating system code and the way you get across this is an extremely carefully controlled transition okay now here's another way to look at this here we have a bunch of processes running on an os you know they you know an application or a login window or a window manager and all of them typically have an os library that's been linked into them and those applications are used the operating system by making library calls and if you haven't gotten very familiar with this already you will soon which is libc that's the standard library for c programmers typically has a bunch of system calls in them that have been wrapped in a way and i'll show you what i mean in a moment about that that make it possible to essentially make a system to make a procedure call that then makes a system call into the operating systems and when you do that the system call is the thing that makes the transition to kernel mode but the function makes it easy to use and this is why many of you who haven't taken an operating system class have never actually looked at system calls directly okay so um libc is is question in the chat is is libc standardized mostly okay mostly uh if you were to look at distinctions between linux and uh Berkeley standard distribution unix and other versions ios the libcs aren't always the same exactly they mostly have all the same things in them but their arguments might be a little different so on all right um but pretty much libc all has the same almost always the same things in it all right okay let's think about similarity rather than difference for the rest of the lecture here because it'll be very easy to get lost in slight distinctions now the um the library i want to talk about for threads is called p-threads and the p stands for posix okay so posix is that attempt to um standardize a set of system calls across a bunch of operating systems and there is a semi-standard threading interface and you can look it up that is called p-threads and perhaps the most interesting thing here to start with is p-thread create which is a function call in C that you can make that's going to create a thread for you okay and typically you have uh several arguments which are pointers to structures and i'll show you an example how this is used in a moment that uh for instance come back with a thread handle that you can control that thread by stopping it and starting it and so on some attributes of the thread which we will use much here and um and also a function to call and some arguments and so really what does p-thread create do ignore all the the noise in the argument list here it starts a thread running on a procedure of your choice and that procedure by the way i thought i would talk you through this because everybody ought to hear it once what the heck does void star parentheses star start with routine parentheses void star parentheses mean the way you understand things like this is you go from inside to out so what it says is start routine is a go to left pointer that's the star to a function that has as an argument a void star item which returns a void star so it's a start routine is a pointer to a function that takes a void star and returns a void star okay isn't that fun now there's also p-thread exit which is something the thread calls to exit if it wishes although if the thread routine just ends then the the thread is done and then p-thread join is something that says given a thread uh handle wait until that thread is done and then go forward and so join is a way to allow say a parent thread that has created a bunch of threads to now wait for all the threads to complete before it goes forward okay now uh what you should do is you should try p stands for posix all right what you should do is try when you're running in a unix style container including the ones that you've set up try man p-thread so man is the man okay this is the manual the manual command and you say man p-thread whatever and it'll tell you about p-threads or man ls or man whatever this is the unix way to access manual pages what's fun about this or whatever depends on your notion of fun is you can actually go to a google search and say man p-thread and it'll work or there are also lots of websites out there that you can look at to see information about p-threads but let's use this to get us uh some ideas about system calls and even an example of using p-threads since we're trying to talk about what does a user see so what happens when p-thread create is called so what i see here is i see a routine p-thread create that i could call in my c-code from main or something like that so what happens well remember that we're calling system calls and we're hiding it in many cases from users since we don't want regular users to have to worry about system calls and so really p-thread create is a function that if you were to look inside of the library you've linked it with what you'd see is that um it's really a special type of function not written entirely in c that does some work like a normal function and then has some special assembly in it that sets up the registers in a way the kernel is going to recognize and then it executes a special trap instruction which is really a way of jumping into the kernel think of it almost as an error and then the kernel says oh it's not really an error it's a system call and by jumping into the kernel this way what we've done is we've transitioned out of user mode into kernel mode because it's an exception and then that place we jump to very carefully figures out what system call you want okay and so what happens is we jump into the kernel and the kernel knows that this is the create system call for a thread and it gets the arguments it does the creation of the thread and then it returns and that return in that point there's special place to store the return value you're going to all become familiar with this and then it returns which takes us back to user mode and the bottom of this function which grabs the return values and then returns like a normal function so this function isn't a normal function this is a wrapper around a system call but as far as the user is concerned it looks like a function and you've just linked it okay and a system call can take a thousand cycles okay it's not it depends a lot on what it's doing it also you have to save and restore a bunch of registers when you go into the kernel and come out again and we'll talk more about the cost of that okay so doing system calls is not cheap this transition from user mode to kernel mode is more than just setting that bit there's a whole bunch of stuff around it and we'll talk about stuff in another lecture okay now okay and when you create threads what you're doing is you're basically creating at least initially hear a schedulable entity and in that instance multiple things can be running okay and whether we transition to a new thread on during creation is a different story which we'll get into when we get to actual scheduling but another idea that i'm just going to introduce for this lecture briefly is this idea of fork join pattern which is a parent thread creates a bunch of other threads that run for a while these little squiggly things are threads and then they all exit but what i want to do is i want to wait until they're all done with their job so maybe they're running in parallel etc and then eventually what happens is uh we join namely we wait for every one of them to end and then the single parent thread continues after all of these are done now there is a good question here which i want to address briefly is once we enter this assembly code are we context switching no no c code when it uh compiles compiles into assembly it's just that we're doing some special assembly that's a little bit out of the scope of what a c compiler usually produces and that's why it's typically specified as assembly language okay the other thing is again don't get too worried about multicore because what we're talking about works perfectly well if there's only one core in the system okay keep that in mind all right it will all run okay so now that we've got fork join parallelism let's tie everything together so here's some code i bet you guys thought you were going to get out of this lecture without some complicated code what we got here is we got a main um function call okay and in this main function call or which is the start of the program we have some uh malik statements we have some thread creates we have some joins okay and we could ask ourselves how many threads are there in this program we could ask does the main thread join with the threads in the same order that they were created we could ask do the threads exit in the same order they were created uh and if we run the program again when the result change so let's look here for a moment what we see here is we start by the way this main program has been set up to take an argument and if there's an argument then we use it for the number of threads otherwise we use two so assuming there's an argument of some sort we malik data that is big enough to hold the handles for a bunch of threads so these are p thread t items and um then we uh print some information like where the stack is okay and uh some other information like where is this common um item okay and then we go through a loop and we create a bunch of threads we create n of them and for each thread uh we keep track of its handle in uh a thread structure okay so now we've gone through let's say there are four threads we've gone through we create all four threads and we store handles to them and the reason we do that is so that we can join at the end but let's take a look at this p thread create what you see here is the thread function which is surprisingly as i mentioned before thread function is a function um that takes a void star and returns a void star and by putting the thread function here we've implicitly said put a pointer to that function there so this creates a thread each time each loop it creates a thread that calls thread function and then finally we um are going to go through the thread joined to finish and if we were to run this with an argument of four what's going to happen is the first thing is it's going to tell us uh where the stack is so main stack and notice that what i did was this t function this t variable that's in the local variable of main i say take its address and cache and basically turn it into a long and print it and so here's an address seven ff e e two c six b six b eight is an address that represents the stack for main okay and what's interesting is we do that uh for each of the thread functions when they run where we have this tid and we print out the the storage location for this local variable tid and notice how they're all a little different so each thread has its own stack okay and notice also that they run in different orders and that's because we create a bunch of them and then they get interleaved okay all right and so the question is sort of how many do we create well it depends on uh the argument do they join in the same order they were created well yes because we um we go through join and we do a join on the threads in order zero one two three and therefore the main thread waits for thread zero to finish thread one thread two thread three and if a thread uh exits early then when we go to join it just finishes really quickly okay and then if we run the program again with the result change yes the scheduling is going to be different so with the threads may not wake up in the same order okay so there are five threads here total yes the four that we created uh with p threads in the original main one okay so there's always a thread created when you create a program okay now uh if you notice um now of course p thread exit uh basically when a thread exits it allows the join to move forward now um this join is not with null this is we're joining with this thread and this is an argument that we're just not using on on p thread join okay and there's four created by the for loop because in this instance the argument was four and we took that argument to decide how many to create so n thread equals two is only used if we don't have an argument all right so what about thread state so the state's shared by all threads in the process address space okay if you don't call p thread exit which uh we could easily forget then what happens is um it's the uh the thread function exiting calls p thread exit uh basically um implicitly without you having to do it all right so the state is shared by all threads in the processor address space so the content of memory uh is shared i o states shared uh state that's private to each thread in some sense is there's a thread control block in the kernel that's why i have it read and then there's cpu registers that are um either in the processor or in the thread control block depending on whether it's running or not and a stack okay and what is the stack well the stack has parameters temporary variables return pcs etc so um one view of what we just did there was there's a bunch of shared state for the the threads which is a heap global variables in code okay and then the per thread state is there's a thread control block um and a stack and save registers for each one of the threads now just to uh quickly be on the same page with 61c material if you remember what stacks are good for they hold temporary results and they permit recursive execution so if you notice here i have some uh pseudo code for c and notice these labels over here represent the memory um that this if statements add or the memory that this b is at okay so if the if statement's at a then b might be at a plus one uh these this is just a loose idea here so don't get too hung up on this okay but if we call a of one what's going to happen is a is going to come in and uh we're going to create a stack frame okay for procedure a to get called temp is one okay because that's a local argument and the return is going to take us to exit why is that well when we return from this version of a the next thing is exit and we're done okay and so those are all on the stack and now we sort of say well is temp less than two well yes it is because it's one in that case we're going to run b and what does b do well b creates a stack frame for itself but if you notice here um there aren't any local variables so the only thing we have is the fact that when b returns we're going to go back to a plus two why is that well b calls we call b here and then when we return we return to here okay so that return variable is actually put on the uh on the stack okay and now when c runs it creates a stack frame and eventually um we call a of two and notice that now we've got um we're calling a again recursively so a the first version of a is here on the stack but by the time we go to the second version we're down here and is temp uh less than two no so at that point we're going to output uh we're going to print temp which is two and then we're going to return and what do we return well we return to c plus one which is down here and c plus one is going to return c okay and then eventually we get back to a plus two we're going to print our one we're going to return and we're going to be done okay so there you go that's a stack now the question of can is it possible for one thread stack to crash into another absolutely okay and if you look you could say well what's the layout with two threads well we have different stacks in the same address space and if this stack grows too far it's going to mess up uh the blue stack okay um so you know we start having to ask some interesting questions how do we position stacks relative to one another how big are they and so on uh one of the things we'll be able to talk about uh in a in a few lectures is we can put what are called guard pages such that if this pink guy runs too too long and it goes into this empty space it'll actually cause a trap into the kernel which can then make a decision about whether to allocate more memory or to kill off the thread okay and the reason there are no protections in place is because multiple threads running in a process the process is the protection so this is good and bad right it's a liability if you run uh infinite Fibonacci style things that run into each other because we all know everybody wants to do that all the time as you learned in sixty one eight or it's a benefit because now yes the stacks are in the same address space but these two threads can easily share data okay all right and let me uh I'll get to the sharing in just a second here okay and how to allocate more memory uh oftentimes with a thread if you really are running out of space you may need to um there there's an argument you can use to say I need north stack space okay but um this is uh this becomes an interesting question of debugging we'll save that for another lecture but what I do want to say here is the programmer's abstraction is uh one of lots of threads all running kind of at the same time right an infinite number of processors whereas the reality is some of them run and some of them don't okay and and it alternates and that's that idea that we have to we have to create our uh code so that runs correctly despite uh the schedulers interleaving in fact I like to think of the scheduler almost as it's um it's a Murphy's law scheduler is the way to think it's going to do the the interleaving that screws up your code the most and so you need to design for all interleavings which really means you have to do the correct thing with respect to lots okay and so the programmer's view here might be that we have x equal x plus one y equals y plus x etc but in reality one execution could be well they do run one after another and another could be well x equal x plus one runs but then we go off and we run a different one for a while and then we continue or we run the first two guys go off for a while and continue okay so this reordering uh let's not worry about reordering so much as interleaving okay now um so there are many possible executions okay and I think I've hammered that point home already but you need to keep that in mind and before you give up and think this is impossible in fact proper locking discipline will take care of you here and and uh make sure that you run correctly under under all interleavings okay and that's um our job over uh the next you know a couple of weeks is to give you an idea how you might possibly design things so that they work under a variety of interleavings okay so correctness with concurrent threads has this non-determinism component where each time you run there's a different interleaving okay so the schedule it can run the threads in any order it can switch threads at any time and it makes testing difficult in fact it makes testing uh of all possible interleavings not in principle even possible now there are folks in the department who know how to test up to a certain depth of interleaving and there's some pretty elegant results in that mode but um there's one instance where things can be done and that's when the threads are independent and they don't share any state and they're um say in separate processes then it really doesn't matter what order they run because uh you'll always get the same answer and that's a deterministic result cooperating threads which are running in the same process suddenly we've got this non-determinism and we have to worry about it so if you could somehow make everything always independent then you've got deterministic behavior and you're in good shape of course even when you think things are independent they're all running on top of the same operating system and we all know that an operating system crash or bug can screw up pretty much anything but let's not worry about that for now so the goal is correct by design so just to point this out we have some race conditions so what if initially x is zero and y is zero and we have two threads one of which sets x equal to one and the other sets y equal to two what are the possible values of x when we're done well that's not even very interesting right it must be one because b doesn't interfere okay more interesting of course is this one where maybe thread a does x equal y plus one and then thread b says y equals two or y equals y times two what are the possible outputs there well it could be one three or five non-deterministically okay and so more interesting okay now that's because we're essentially racing a against b and this is bad code okay yes this has non-deterministic answers but you wrote code that should never have been written this way okay and we're going to try to avoid race conditions now let me show you a good reason for sharing there were some questions earlier so threads can't share stacks and the reason for that fundamentally is that the stack represents the current state of an execution and if you had two threads on the same stack they just screw each other up and you'd lose you'd lose that go back through my thread or my stack example and think through that for a moment so threads all have to each thread has to have its own stack now here we have an instance of for instance a red black tree which you probably ran into in 61 b maybe thread a doesn't insert and thread b doesn't insert and then again if you just wrote code like this that tree would get screwed up okay so and yes every thread has its own stack in in the process okay so this particular instance of thread a and thread b is absolutely not going to work you're guaranteed to get a wrong result so some quick definitions which we are again going to go through in much more detail in subsequent lectures are the following so synchronization is coordinating among threads regarding some shared data in a way to try to prevent race conditions and prevent you from getting the wrong answer so some some ideas mutual exclusion basically ensures that only one thread does a particular thing at a particular time so one thread excludes the others from a chunk of code it's a type of synchronization a critical section for this for this lecture is code that exactly one thread can execute at a time okay it's a result of mutual exclusion and a lock is an object that only one thread can hold at a time and it's used to provide mutual exclusion now these things we're going to talk in much more detail and we're actually going to tell you how to build locks that's going to be an interesting discussion in a couple of lectures but for now a lock is going to be a way to give us mutual exclusion and locks have a very simple interface they you can acquire the lock and you can release the lock and when a thread acquires the lock or tries to acquire the lock what happens is if some other thread currently has the lock other threads that are trying to acquire it are put to sleep and when that thread that has the lock finally releases it then one and only one of those threads is allowed to acquire it so this mutual exclusion given by locks okay namely only one thread can acquire at a time is going to allow us to start building correct code even with a lot of parallelism and concurrency in there okay and don't worry about how to implement this we will talk about that in great detail later but how would we use that in this example well the two threads would acquire a lock on the whole data structure or on the root of it okay insert three and then release it or maybe thread b acquires the lock inserts for and releases it there's a an elegance to how to distribute your locks that you're going to get to start thinking about like you could have a single lock at the root and if you grab a lock then you know that if a grabs the lock then it knows that thread b can't be anywhere in this data structure so it can just do its own thing and insert and then when it releases then b can know that a is not in the data structure and so on or you can start distributing locks throughout and you can do a more sophisticated thing where you grab a lock and then you grab another lock and so on okay but for this purpose of this lecture think of grabbing a single lock at the root that's going to clean things up for us okay all right now there's an interesting question here about single instruction operations on various shared variables and those are special types of hardware interlocks we're going to talk about where you don't actually need a lock okay and yes there's plenty of different types of lock although we'll also talk about that as we go forward now p threads again p for posix has a locking infrastructure that thing we just talked about is called a mutex okay and you can initialize a brand new mutex and then the different threads in the system can use lock and unlock and it'll work like I just said okay so you'll you'll have a single thread that'll come back okay that's that mutex structure and then you'll use that mutex in different threads and as long as they all use the same mutex then they will all have that locking behavior I just said and p thread lock will grab the lock and unlock will release the lock okay and a mutex is just another name for lock in this instance okay so you'll get a chance to use these in homework one so here's an example of our thread function for our multiple threads so mutex is a type of lock yes and here our critical section could be where we have this common integer that's a global variable but we have a bunch of threads that are on it if you try to increment a global variable the simple version of increment here is going to get all screwed up if you have multiple threads on it by grabbing the lock incrementing and releasing the lock then you can make sure that that shared variable uh does not get screwed up okay all right now are there any questions on that before I um I want to say a little bit about processes now before we are out of time so what it means when a thread holds a lock is that the thread has executed the lock acquire operation whatever that is here it's p thread mutex lock and it succeeded then the the thread that succeeded and was allowed to continue has the lock okay so in this instance because this is now a critical section oh there's only one thread that's ever allowed to get past the lock at a time and so only one thread can be in this critical section at a time and we say that that thread has the lock okay and if a thread tries to acquire the lock and the lock is already acquired what happens is it's put to sleep until it's released and then it allowed is allowed out so only one threads allowed in this critical section at a time and and keep in mind this thread function is run by many threads simultaneously so we're talking about a scenario where many threads are running at the same time okay so let's talk about processes briefly before we run out of time here so how do we manage process state so we've been talking about for instance multi-threaded multi-threaded processes where each of the threads has a stack and some register storage and then of course there's sort of global code data and files okay and just to let me just say this again answering the question what constitutes a critical section is the piece of code that's being protect protected by the lock okay that's the critical section it's the piece of code where only one threads allowed to execute that little piece of code at a time okay and it could be many it could be many instructions it could be many things in there okay now okay so now what we're going to i'm going to move on to processes so if you remember the life of a process is the kernel execs the process we kind of talked about this last time and then when it's done it exits and we go forward so rather than threads we're actually talking here about creating a brand new address space and moving into user mode okay and once we are in user mode then there's a lot of ways that we get into the kernel like we talked about system calls interrupts are another thing that we will talk about where an interrupt might involve say accessing some hardware here and then eventually we return from interrupt or an exception like a divide by zero or a page fault other things might bring us into the kernel etc okay but that's still where this lecture is about user mode so what how do we create new processes okay so processes are always created by other processes okay so how does the first process start this is like asking about the big bang right well the first process is started by the kernel it's often configured as an argument to the kernel before the kernel boots and it's often called the init process and then that init process creates all the other ones in a tree okay and all processes in the system are created by other processes at that point now we're only going to have time for a couple of these process management APIs here but the first one here that's easy is exit so here we have main okay the process got created we execute exit it ends the process okay so this is not particularly maybe interesting to you except for the fact that every process has an exit code which can then be grabbed by its parent where the parent is going to be the process that created it okay and by the way this is completely different from the dot init segment in the elf library so notice that this initial process the init process is actually a process okay that's running in the system and you can find it typically if you know where to look okay because it's typically if it exits then the then the system crashes and goes away so exits not maybe that interesting except that it has an argument and zero means successful exit whereas anything else is non-zero says unsuccessful and the parent process can find that okay so what if we let main return without ever calling exit well in that instance you actually get a an implicit exit as well okay the OS library calls exit for you successfully all right the entry point of the executable is in the OS library so the OS library when you do a compile and link basically says that main is the program that gets called almost think of this as the the first thread actually calls main and then it exits and it kills off the process when you execute exit okay and exit code and return code will essentially do similar things okay and if you notice if main returns the library calls exit all right so let's look at something more interesting and unfortunately we're not going to have a lot of time for this but hopefully you guys can stick around for five more minutes i want to talk about fork because fork is one of the most interesting strange things that we're going to talk about for process management because it's it's sort of a legacy operation in some sense but it's also kind of the backbone of a lot of the way that unix operating systems work and it's the one that you're looking at as well pintas is going to be similar to that and fork is used to create a brand new process and what it does is it copies the current process uh entirely so if you imagine that you have one process with all of its address space what fork does is it copies the whole thing to another process okay or to another address space and then it starts running in the other address space so now when you're done you have two identical copies of things running whereas before you only had one so fork is really taking and duplicating everything about a process okay and this is going to be a little weird so this is why i'm hoping you'll give me this extra five minutes with the return value from fork is uh basically one of three things if it's greater than zero then you know you're running in the original parent and the return value is the process id of the new child if you get back zero you know you're the new child and if you get back less than zero it's an error okay and pid here means process id okay so the state of the original process is duplicated in both the parent and the child okay pretty much everything address space file descriptors etc so here's a good example where we're running along and we call fork okay and at the point that we call fork as soon as we return from fork a very weird thing happens we now have two processes that are running two of them and those two processes are identical except for the thing that comes back from fork so in one of them we get a value greater than zero and the other one we get a value equal to zero and only when fork fails because uh say fork has run out of memory or something then only one of them comes back and we say fork fail now there was a question about fork a fork bomb that would be an instance where we are forking so many times uh that we have so many processes running that memory runs out and we're toast and often that's usually because of a bug in the operating system or something okay but if you notice in this instance where things work the original process does not get killed it's happily running but it comes back with cpid greater than zero all right and the child comes back with it equal to zero and if you notice here so that means the parent is running there okay so let's take a look here so we uh we call fork and now suddenly we have two things that have returned from fork and two different processes uh and one of them the original parent that's what p stands for has cpid greater than zero which is the uh pid of the child and basically you can say well my I get my own pid I can say I'm the parent of that child otherwise you can say here's my pid okay okay now um memory allocated by other threads uh so typically the memory is going to be duplicated but you're only going to have one thread running initially in that other that other process okay now if you fork and fork again uh you would end up potentially with a tree that was a question except for the fact that if you could have uh the parent do the fork again but the child not and then you'd have three processes running okay so uh it may be a tree but it doesn't have to be a binary tree okay so again we're gonna make sure that we leave with this rather strange concept okay it's that once we execute fork in the original single process when we're done there's two of them two that are identical except when one of them runs fork returns value greater than zero and when the other one runs fork returns zero and that is the way that those two processes know whether they are the parent or the child okay and you're thinking about this too hard if you try to think about somebody's created already at somebody else in fact what happens is the memory space is exactly duplicated and the original parent uh there are there is information in its process table as to whether it was the parent or not and so we will get the return back and the processes are put in a tree inside the kernel because the parent has linkages to all of its children okay and if the child calls fork then it becomes the parent for the things that it just created okay and so we get everything's duplicated including the stack and they're not the same address space because they're duplicated address spaces they have the same values not the same address space now um lest you go away from this lecture thinking that sounds ridiculously expensive how can that possibly be the right thing to do i will tell you that you play tricks with page tables so that you don't actually copy everything what you do is you copy the page table and you set them as read only and you do some tricks okay that's going to be topic for another more fun discussion and yes linux has a version of fork called spawn that doesn't actually do this copying but again we'll get to that later i want you guys to all understand fork for now okay and here is a race for you okay and uh the question is if you look what happens here if we fork and we say in the parent we have i equals zero to i plus one and we go forward and in the child we go backwards what gets printed out does anybody want to make an argument about this what is it print does it get confused where i goes up a little and then down and then up a little and then down i see somebody says infinite loop yes great different i because the processes are completely different i is completely different and the parent goes up and the child goes down and they don't interfere with each other the only thing that happens is an interleaving might be different based on scheduling very good okay and because the prints are printing the same standard out all right good um and then uh we will pick up with this next time because we're out of space but for exec by the way here look this is the code the way we create a brand new process is we fork a new process and then we call exec which immediately says throw out all of my address space and replace it with this new program and that's how a new program is created all right so in conclusion we've been talking about um yes it's true for global variables are copied as well so they're completely separate address spaces with no interaction because they're separate processes they are not threads so there's only racing for i o ordering on the same output screen but not anything to do with any of the computations all right so threads uh are the unit of concurrency okay uh their abstraction of a virtual cpu a process is a protection domain or address space with one or more threads and we can see the role of the os library and the system calls are how we control uh access and entrance uh to the kernel okay and the finally uh the question was if the parent gets killed does the child die no what happens is in fact when the parent gets killed if the child is still running then a grandparent uh inherits the child and ultimately in it inherits the child if it's still running um so all right i'm going to say goodbye to everybody sorry for going over a little bit but i wanted to make sure that we talked about fork may you all have a great holiday weekend remember no class on uh monday and also remember that friday is drop day so if you want to be in the class great if you don't please drop all right ciao all and uh have a great weekend bye