 Hello everyone, welcome to the eighth lecture in the course design and engineering of computer systems. In this lecture, we will continue our discussion of operating systems concepts. Specifically, we will be discussing the concepts of threads in this lecture. So, let us get started. So, what are threads? So, we have seen what are processes right and sometimes it may so happen that a process may want to run multiple copies of itself on the same machine. Why? For example, consider something like a web server that is handling requests from multiple clients at the same time. Then it may so happen that one of the web server processes has blocked because the user requested a file and it had to read the file from the disk and it made a blocking system call. In that case, we want another copy of the web server process around to service other requests from other users. Similarly, you may want to have multiple copies of a process to run in parallel on multiple CPU cores, right. One process can only run on one core. If you had multiple copies, you can use the multiple CPU cores efficiently. So, there are many reasons why a process may want to have multiple copies of itself. But if you simply just create multiple processes and run them on the same hardware, then we are wasting memory, right. Every process, the code, data, the memory image of the process unnecessarily multiple copies of those are existing in RAM and they are occupying memory unnecessarily. We want to avoid this, right. So, if you have such a situation where you want to run multiple versions of an application but want to keep only one copy of the code in memory, then a better option to do that is to use what are called threads. So, you can think of threads as nothing but lightweight processes, right. A process can create multiple threads of execution and these multiple threads will run on the same memory image of the process but they will run independently, right. These multiple threads will share the same memory image but they will run independently like separate processes. So, if one of the thread blocks the other can run and these threads can run in parallel on different CPU cores, right. So, therefore, it is useful for a process to have multiple threads. If you do not do anything, if you do not create threads then the default is just a single threaded process but most real life programs and applications are typically multi threaded for these reasons. So, let us understand clearly what is a thread, ok. Multiple threads of a process share the code, all the global static variables and heap, ok. In your memory image, if you look at the memory image of a process, there is the code and there is the compile time data and heap. All of these are common. If you have multiple threads they all share all of these elements of the memory image of a process but these threads will execute independently on the process code. So, one thread could be running some part of the code, another thread could be running at another location in the code, right. Each thread will run independently on the process code. Therefore, each thread will have its own separate CPU context, right. So, for thread 1 the program counter could be pointing to one location and thread 2 the program counter could be pointing to some other location executing a different piece of code, right. So, each threads program counter registers they will all be different because they are all running independently on the same process code. So, as a result because they execute independently they will make function calls independently they will push elements on the stack independently as a result each thread will have its own separate stack. So, the user space tags are different for different threads because each thread runs independently calls functions independently and so on. And inside the operating system a thread also has its own separate data structure which is usually referred to as the thread controlled block or TCP. This is like the PCB but if a thread is blocked its context will be stored in the TCB and so on, right. Information about the thread the context of a thread and other things will be maintained in the PCB. And these TCBs of you know if a process has multiple threads all the TCBs will somehow be connected up with the PCB also because details like the memory image and everything are common, right. So, each thread does not have its own memory image. So, that information you will get from the common PCB. So, the summary that you have to understand is threads share a lot of code and data but they have their own separate stacks because they run independently and they have their own separate CPU context because they run independently. So, how do you create threads in a program? So, there are many libraries available for different programming languages that let you create threads. In Linux a very popular library is what is called P-threads. This stands for 4-6 threads, right. So, this P-threads library allows you to create multiple threads and also provides various functions to deal with these threads. So, we will just give you a very simple example of how P-threads works but you are encouraged to look up the exact syntax of this library online. So, if you look at a simple program, normally a program will just have default one thread of execution, right. There is just one way in which you can go through a program code but if you do something like this P-thread create, this creates a separate thread of execution in your program. So, here is your program code, there is one thread that is running through the program code in a certain order. If you create a separate thread, that separate thread will execute a different part of the program's code. So, if you create a thread t1 over here, this thread is given a start function and this thread t1 starts running the program from this point. It executes this function, it can make other function calls, it can do other things on the memory image. Similarly, a thread t2 is given another start function f2 and this thread t2 starts executing the memory image from here, okay. So, each thread starts at a different location and runs through the program's memory image, runs through the program's code independently. And after they are created, the threads run independently from the parent, okay. And the parent can also wait for the threads to complete. So, you have pthreads also provides an API called join. So, if the parent execution, so the threads are executing in their corresponding functions, the parent can wait for the threads to complete using this function pthread join. This is similar to the wait system call. Though with threads, this is optional, it is not necessary for a parent thread to wait for the threads it has spawned to finish, okay. So, the other thing to understand is that with pthreads, these separate threads that are created are treated as separate entities by the OS scheduler. That is if a process creates two threads, then this main process and the two different threads are scheduled independently by the CPU scheduler. They are treated as almost like separate processes that can run independently on the CPU scheduler, right. So, these are called kernel level threads. Why? Because the operating system is aware that these threads exist. But not all threading libraries give you this functionality. Some threading libraries will let you create a thread but the OS is not aware of it. And all these threads are scheduled as part of the same process, right. They are not scheduled independently. So, such threads are called user level threads, right. Then why you might ask, why do we even have such threads? There are many reasons for ease of programming and so on that threading libraries may provide you the abstraction of threads, but actually these threads do not run independently on the underlying CPU hardware. But with pthreads, these threads that you create are scheduled independently by the OS scheduler and therefore they are called kernel level threads. So, now let us understand a little bit more about how threads share the memory of a program. So, threads of a program share all the code in the memory image as well as all the global variables, static variables, heap data, right. Except for the data in the stack, everything else is shared between the threads of a program. So, now let us understand when two threads simultaneously concurrently access the same data what happens, right. You might even wonder what will happen they will just access the data, but that is not the case. There are some challenges here. Let us understand that with an example. So, again it is the same program. I am just giving you small example using the pthread API. Here is a program that has created two threads T1 and T2 and both these threads execute the same start function here, right. And what these threads are doing? There is a global counter which is initialized to 0 and each of these threads executes the same function and terminates. And what is this function doing? It is incrementing this counter a thousand times. So, the parent just executes, thread 1 runs this function increments counter thousand times, thread 2 also increments counter thousand times and both these threads as well as the parent process run independently, you know, they can run simultaneously on different core, same core whatever. But by the time all of this finishes what do you expect the final counter value to be, you expect it to be 2000, right. Two threads have incremented it thousand times each. But if you write a simple program like this and actually run it, in reality the value will not be 2000 it will be smaller than 2000, okay. So, what is happening here suddenly with threads what is the computer forgotten how to add two numbers? No, that is not the case. In fact, something very tricky is happening here. So, let us understand what are the challenges when two threads simultaneously access a shared data variable. So, now let us take a deep dive into this example, right. Both the threads incremented a counter. They did something like this counter equals counter plus 1. But this is just a C language code. But when this is compiled and actually in the memory image it is in the form of CPU instructions and this line of C code will actually be three different CPU instructions for example, which is you will load this value of the counter from memory you will read it from memory to load it into a register. Then you will increment the register and then you will store back from the register into the counter, right. This is how this line of code is implemented in your compiled executable. Now, when two threads try to run these instructions concurrently, what happens? Here is one possible scenario that can happen. Of course, one thing that can happen is thread one fully runs, increments counter, then thread two runs fully increments the counter, then again thread one runs again thread two runs. If they run exactly one after the other, what will happen? The counter will be incremented once, twice and so on and you will get your value of 2000 that we have seen in the previous slide. But sometimes that may not happen like that. Sometimes if the execution of the thread, so this is the one thread execution, this is the other thread execution, if they overlap like this, something weird will happen. What is happening here? Initially your counter value is 0 and thread one first ran, it loaded this counter into a register, then this register value is incremented. So, now your register has a value of 1 from 0 to 1. At this point, a context which occurred thread one has stopped, all its register values have been saved and now thread two starts, right. You have a concurrent execution. At this point, when thread two starts, what is the counter value? The counter value is still 0 because thread one has not stored it back yet, it has not completed. So, now thread two once again begins with a counter value of 0, stores it in the register, increments the register to value 1 and writes back a counter value of 1 into memory, thread two is complete. Now thread one runs again, it resumes where it left off, remember. So, now once again this register value is 1, it is written into this counter variable 1 and the counter value is once again rewritten to a value of 1. So, what has happened here? Two threads have run, each has run the code to increment a counter, at the end after incrementing a counter, two times the value is still 1. Note that this may not happen all the time, sometimes they will run one after the other and the increment will happen correctly, but sometimes the increment may not happen correctly because of this concurrent execution. So, such incorrect execution of code that happens due to concurrent execution, that is called a race condition, right. The two threads are somehow racing each other and interleaving in their executions. If they run one after the other in an orderly fashion, there is no problem, but when they interleave like this, when they race each other like this, that is when sometimes bad things can happen, that is why it is called a race condition. And this is due to this unfortunate timing of context switches, right, that you do not know where a context switch can happen, it can happen at a very inconvenient location. And note that the user code cannot disable interrupts or context switches and say that I do not want to have a context switch anymore, right. This is not in the control of the user program. So, therefore, this can happen at any point and it can happen with any data structures, not just counters, right. Data structures, shared data structures can be updated incorrectly due to race conditions. So, when can race conditions happen? They can happen anytime we have concurrent execution on shared data. For example, threads in a process that share the memory of the process, that is one place where you can have race conditions. The other common example is when processes go to kernel mode, even single threaded process, when they are in kernel mode and they share OS data structures. So, there is the common OS data structures and two different processes both go to kernel mode, say on different CPUs and they try to access the OS data structures, right. So, the operating system is shared, the kernel mode execution is shared across all the user processes. So, in such cases also, two processes in kernel mode accessing the kernel data structures can result in race conditions. Note that however, if you just have single threaded processes in user mode, they do not share anything. So, therefore, between two different processes you may not have any race conditions. And how do we fix these race conditions? We require a property called mutual exclusion. That is whenever any shared data is being accessed in some parts of the code, concurrent execution should not be permitted, right. We do not want concurrent execution at all. And parts of the program that need this mutual exclusion are called critical sections, right. So, some parts of multi threaded programs, some parts of the OS code, they are called critical sections. Note that of course, not the entire multi threaded program need not be a critical section, right. Sometimes if the two threads are working on different pieces of data, they can run concurrently, they can run in parallel, whatever it does not matter. But only when they are accessing shared data, common data, that is when we need mutual exclusion and such parts of the code that have access to shared data structures, they are called critical sections. And inside a critical section, we need mutual exclusion, we need only one thread at a time or one process at a time accessing these critical sections. So, how do we ensure this? We ensure this using a mechanism called locking, right. So, this is intuitive. So, if you do not want two people to enter a room at the same time, what do you do? You put a lock and the person who enters the room closes the door, locks the door and when that person comes out, the next person can enter, right. That is the simple concept that we all know. Similarly, for threads also, we have a mechanism called locks. So, various threading libraries will provide these locks to you, right. For example, P thread also provides locks. So, locks are nothing but they are just special variables that you know one thread can acquire the lock or you know lock the lock and the thread when it is done can unlock the lock and as long as a thread has the lock, no other thread can execute the critical section. For example, let us see this example of two threads being created T1 and T2 and both these threads run this function. So, now suppose thread T1 starts running this function, it acquires a lock. You have a P thread mutex lock variable that is there and you can you know acquire or lock this lock. So, once thread T1 locks this lock and you know proceeds to the critical section. At this point, if another thread T2 also tries to run the same piece of code, T2 will block here. T2 cannot proceed here because it does not have the lock and once T1 finishes and releases the lock at this point, then T2 can enter, right. After T1 has released the lock, then T2 will be able to acquire the lock and enter the critical section. So, any thread that when entering a critical section, before entering a critical section it will acquire the lock, after the critical section it will release the lock. This kind of programming will ensure mutual exclusion in the critical section and this is a very important thing to keep in mind when running multi-threaded programs or when accessing operating system code in kernel mode and so on. So, how are these locks implemented? We have seen how locks are used, right, before entering a critical section you acquire a lock, after critical section you release a lock, right. But how is this lock implemented? Say by the P thread library, how do you implement the lock? So, this is not easy, this is quite tricky. So, let us see why, right. So, here I have tried to show you a very simple implementation of a lock. What I am doing is I am using a simple Boolean variable to indicate the lock status. If nobody has taken the lock, this Boolean variable is false and a thread that tries to acquire this lock, what it will do is it will check. Is this lock acquired by somebody else is locked? Is this variable true? While it is true, what do you do? This is a busy while loop, it does nothing. You keep going back while, while is it locked, is it locked, is it locked? You keep on checking, you just keep spinning in this while statement doing nothing, you just wait here. And if nobody has this lock, if this is locked is false, nobody has this lock, then there is no need to wait. You immediately enter the lock function, you set this lock to true, so that nobody else can take this lock after you and then you go away. Now, you have acquired the lock, you have checked that nobody else has the lock, you have acquired it, you have set it to true and now whichever thread completes this code has the lock with itself. Now, until this thread releases the lock again, sets it to false again, any other thread that starts will basically get stuck here. Any other thread that tries to acquire the lock will keep on waiting here. That is the idea of this simple implementation of a lock using a simple Boolean variable or you can use any other integer or whatever you want. So, this seems to work. For example, this thread T1 is the first thread trying to acquire a lock, it does not spin here because this while returns false, the lock is free. Then it sets the lock to true, it enters the critical section, all good. Now, while it is in the critical section, another thread that comes is just simply spinning here, it is checking, locked is true. So, while true, while true, while true, it is just in this while loop all this time. Once the lock is released, then this thread, this while stops executing, while returns false, the condition returns false. Now, after thread 1 has released the lock, thread 2 has acquired the lock and it enters the critical section. In this way, they are entering the critical section one after the other. So, we are happy. Then why are we calling this an incorrect lock implementation? Because once again, bad things can happen if context which is happening at unfortunate locations, which we have no control over. For example, this thread, look at this sequence of execution, thread T1, first checks that is the lock free, the lock is free, it is the first thread coming in lock is free. At this point, it has decided that the lock is free and the program counter has moved on to the next line. It has decided to acquire the lock, it has moved there. At this point, before it can set this to true, a context which has unfortunately happened, the program counter value saved, all the context is saved and T1 stops execution at this point, T2 runs. Now, T1 has not yet set the lock variable. So, now T2 checks, lock is free, it will set it to true, it will enter the critical section. Now, at a later point of time, T1 resumes execution, the program counter is here, it is going to run this instruction, it will not go back. Why should it go back? Because it executed that while loop already. So, it will resume from here. It will once again set the lock to true, lock the lock, enter the critical section. Now, what has happened? We have two threads both running inside the critical section. We tried to implement a lock, but we did not achieve mutual exclusion, right. So, therefore, it is the same problem. If you understand, it is the same problem as the previous example. We needed locks to avoid these unfortunate context switches, but what if these context switches happen in the implementation of the lock? It is the same problem, right. We are not solving the problem at all here. So, how do we solve this problem? We have some special hardware support in modern systems to solve this problem, right. We need a way to check a variable as well as set its value and do both these steps atomically, that is in one block. I want to check, is the lock free? If it is free, I want to lock it. I want to set it to true. And in between these two steps, I do not want to break. I do not want to check a lock, think it is free, not set it and somebody else then comes and takes the lock, right. I do not want that. So, I want these two instructions to happen atomically. But how do I do that? As I said before, user programs have no control over context switches. A user program cannot say, oh, please do not context switch me out for some time. That is a huge security violation, right. What if a process just says that, suppose there is a way a system called to tell a process that no context switches, then a process can just do that and completely take over the CPU, right. It can just disable all context switches at any point of time. That is a very bad thing. It starves the other processes in the system. It violates isolation. So, we do not want to ever give user programs at control, saying all context switches are disabled for you for some time. So, then how do we solve this problem if a context switch can happen and break this atomicity? The solution is to use what are called hardware atomic instructions. So, these are instructions that do two things at the same time in one CPU instruction itself, ok. So, we have what is called a test and set instruction. In one instruction itself, you can set the value of a variable and you can check its old value, right. And because it is one hardware instruction, one instruction obviously happens atomically. You cannot have a context switch in between a CPU instruction. You can only have context switch between two CPU instructions, but not in the middle of one CPU instruction. So, if you do both these steps in one CPU instruction itself, in what are called atomic instructions, then your problem is solved. So, for example, here is an example of a lock implementation using test and set, right. Instead of directly just writing using C language code to set this lock variable, I am using this test and set instruction to set the lock variable to true, right. So, what does test and set return? If you say test and set is locked true, it will write the value of true into this variable and it will return the old value, ok. And if the old value returned is true, what does this mean? This lock was already held by somebody else, locked was true. It is held by somebody else and you simply try to write true value again into it. It means you have not acquired the lock, the lock is with somebody else and you have to wait. So, if test and set of this lock variable, if you try to set it to true and the old value returned is also true, it means you do not have the lock, you will wait. On the other hand, if you set a value of true, but the old value returned is false, that means this variable was false, nobody else had the lock and you have managed to set it to true, that means you have acquired the lock, you own this lock now, the lock was free and you have acquired it and this whole transition checking if it was free or not and acquiring the lock happened in one step. And if test and set, this returns false, it means that the lock is free, you have acquired it and then you can break from this while loop, right. So, this is the code to acquire a lock using a hardware atomic instruction. And as you can see, this is a single CPU instruction. Therefore, there is no way in which you break atomicity here, there is no way where you check the lock to be free and then you forget to set it, you come back later to set it incorrectly, all of that is not going to happen here, you cannot be interrupted. So, all modern threading libraries provide locking primitives and all of these locking primitives are implemented using hardware atomic instructions. So, before CPUs supported such instructions, so this is a complicated instruction, right, this is not your regular load store add kind of a instruction, it is a special complicated CPU instruction. Before CPUs supported these kind of instructions, a lot of work was done on how to write software algorithms that can do locking, but all of those are not very easy to do as we have seen in the previous slide. And therefore, we use today all lock implementations use these hardware atomic instructions. So, now in the previous examples of locks that I have seen, if you do not get a lock, if a thread does not acquire a lock, what does it do? It simply keeps spinning in a while loop, right. So, while if the lock is not free, you just do nothing, you keep spinning, right. So, such lock implementations are called spin locks, that is if T1 has acquired the lock, T2 also calls the acquire function, it will keep on busily spinning while, while, while, while, right. It will keep on checking this condition in a while loop. And this is a spin lock, but you can see that this is not very efficient, right. Why is this thread even running when the lock is not free? Why is it wasting CPU cycles for a long time? So, a better implementation can be a thread that realizes that the lock is not free, can just go to sleep, can be blocked, okay. And when the lock becomes free, it can run again, right. Its context can be saved and it can restart. So, such implementations are also possible for locks. Such locks are called mutexes or sleeping mutexes, right. You can either have a spin lock or you can have a sleeping mutex. So, whenever you use any threading library and there is a lock API provided, you can check in the description of the library. Does it implement it as a spin lock or does it implement it as a sleeping mutex? And both these are valid implementations, right. If a thread T1 has a lock, the thread T2 can either busily spin or go to sleep, whatever it is. Both these are correct implementations of mutual exclusion, but they just differ in terms of efficiency. And there are cases where both can be used. For example, if T1 will hold the lock only for a very short period of time, then T2 can just wait busily spin and then get the lock, right. The spin lock is useful. But on the other hand, if T1 will hold the lock for a very long time, then it is better for T2 to go to sleep, be blocked, let some other process run on the CPU and come back again later. So, it depends on your application, how long your critical sections are. Based on that, you can choose either a spin lock or a sleeping mutex for your locking variables. So, there are some guidelines for using locks, which is as we have seen the reasons before. Whenever you write multi-threaded programs with shared data structures and critical sections, you must follow locking discipline. Every shared data structure will be protect, should be protected by a lock, right. And it is up to you, the programmer to ensure that you acquire the lock before access and release the lock after access. And these locks can be coarse grained or fine grained. What does this mean? Suppose you have a big array, right. You can protect this whole array by one big lock saying, you know, acquire this lock, access any element of the array and then release the lock. You can do that or you can have individual locks for each of these array elements, right. I want to access this array element. So, I will acquire this lock, access this element, release this lock. You can do fine grained locking or coarse grained locking, right. It depends on your underlying software architecture. For coarse grained locking is easy to do, you do not have to think which lock, what you just have one big fat lock. But on the other hand, it is inefficient. It does not allow parallel concurrent execution, right. So, there are trade-offs here and this is an aspect of application design that the programmer must think about. And also are locks only for reading or writing. So, it is a good habit to acquire locks both for reading as well as writing data. So, you might say what is the harm in, you know, multiple threads reading the same variable. All of these issues are happening only if you are trying to modify a variable, right. When we are incrementing a counter, a problem is happening. But if I am reading a counter, there is no issue, right. So, two threads can read the counter. Why do I need locks? You need locks because while you are reading, somebody else might be writing. And therefore, you do not know, right. Unless you take a lock, you cannot guarantee that nobody else is also updating the data at the same time. Therefore, it is good practice to take a lock whether for reading or writing shared data. But if you think taking a lock for reading is inefficient, a lot of libraries provide separate locks for reading and writing, so if multiple threads just want to read a common variable, they will be allowed. If they say we want a lock only for reading, but if another thread wants to write, then this reading would not be allowed, right. So, you have threading libraries that provide separate locks for reading as well as writing. And if you are using any other third party libraries, implementations of other data structures in your multithreaded programs, you should also check that all those other libraries are thread safe. That is, if you are, you know, in your program you have locks, but if you use another implementation of say a hash table or a queue or something and inside that library, they are not using locks, then you are still in trouble, right. Your multiple threads will access that shared data and get into trouble. So therefore, whenever you are using a library, you should check if the library is thread safe. That is, if the library's implementation will work correctly when multiple threads access the library concurrently, okay. So, these are all the things to keep in mind as a programmer when you are writing multithreaded programs. So, that is all I have for today's lecture. To summarize, we have studied what are threads and why they are needed for concurrency and parallelism, but using threads is not easy. You have race conditions, you have critical sections that have to be protected by mutual exclusion and locks is a mechanism to do that. We have seen how to use locks as well as how to implement locks and we have studied the concept of hardware atomic instructions which are useful for implementing locks. So, as a simple programming exercise for you, please try to write a simple multithreaded program, right. You have many tutorials available online. The P-threads API is particularly simple and easy to use. Try to write a program that has multiple threads and all of these threads are updating a shared counter like the example we have seen. Try to observe race conditions in practice and try to fix them using locks. This is a small exercise that will give you a hands-on feel for whatever we have studied in today's lecture. Thank you all and we will continue our discussion in the next lecture.