 lecture in the course design and engineering of computer systems. So far in the previous 5 weeks, we have seen the various building blocks of computer systems. We have seen how hardware works, how the operating system manages the hardware, how computers communicate over the network. So we have seen all the building blocks. So in this week, we are now ready to start putting all the pieces together and designing a system end to end, okay. So we have a couple more building blocks to cover in the first couple of lectures of this week where we understand how the different threads in a process and different processes in a system communicate with each other. And after that, we are going to put together an end to end design of a somewhat realistic computer system. So let us get started. So this week, as I said, we will cover end to end design of computer systems. The one thing to note is that computer systems in real life, they are not monolithic, okay. So they are composed of multiple components that are distributed across several machines. So each of those components will have some hardware, some OS running and managing the hardware and some applications running on top of it and all of these are communicating over the network. You know that is how real life computer systems are composed of multiple such components. And all of these building blocks of, you know, hardware OS, the system call API for user applications, communication over the network, all of that we have seen. So the next thing we are going to see is how to put all of these pieces together. So in a single system also, you do not have just, you know, one process or something. So typically have your application will have multiple threads or multiple processes which are working together with each other. So before we put together an end to end system, these are the two things we need to understand. We need to understand how multiple threads in a process work together and how multiple processes in a system work together. So we will do that in the next two lectures and the lecture after that we will start putting together an end to end system design. So first let us understand how you design applications that have multiple threads in them. So suppose you have a web server or any other application server that is talking to various clients and providing some service, typically a lot of these servers have multiple threads, they are multi-threaded. So why is that? We have seen the reason for this in one of the previous lectures, you know. For example, if you consider a web server, it needs to have multiple threads. Why? Because the server is managing multiple sockets. You know the web server has one listen socket that is listening for new connections and once a connection is established with the client, you have multiple connected sockets each talking to a different client. So the server needs to manage all of these sockets, the listen socket as well as the connected sockets. And one way to do it is you to use a one thread per connection design. So the server will have multiple threads. The main thread of the server can be you know listening to this listen socket and handling new connection request. Then once a new connection is established, you can create multiple threads and give each thread this socket and this thread will handle this client. You know read the request of the client, process the request and reply back whatever is needed for this client C1, C2, for each client you have a thread at the server that is the one thread per connection design. This thread blocks on the accept system call and these threads block on reading from the socket. So this is something that we have seen before that is why most application servers in a client server model need to have multiple threads. So the one subtlety is that instead of you know creating a thread every time a client connects and you know destroying the thread later on instead of having this churn of threads what we can do is we can have a pool of threads. You know the server has a pool of threads. There is a master thread and there is a pool of worker threads. So the master thread is listening for new connections and whenever a new connection comes it will give it to one of the worker threads and say okay you handle this connection. Then when the worker threads handle the connection it will come back to the master and say okay I am done. The master will give more work. In this way this is how real systems use thread pools. These are called thread pools and primarily to avoid the overhead of you know constantly creating and destroying threads. So when you have a thread pool you need some way to coordinate between this main master thread that is you know giving out work to all of these worker threads. And typically the way it is done is there is usually some kind of a queue or you know a shared buffer or something where the master thread will keep putting requests you know hey a new client has come a new request has come and the worker threads will keep taking the request from this queue and servicing them one by one and of course all of these threads if there are any shared data structures we have seen we use locks for mutual exclusion to avoid race conditions all of that we have seen okay. So now you can visualize you know this master worker model you know work coming in being distributed to multiple people to do the work but there is still a complication if you think about it closely how do these worker threads know that you know work has arrived in the queue you know there is a master that is creating work and there are worker threads waiting for work. So now how will the worker know oh request has come I should go check the queue one way is all the workers are constantly reading the queue is there work is there work is there work this is like the polling based model but that is clearly inefficient you know I mean if work comes only once in a while what is the reason for everybody to constantly keep waiting a better way can be the master can somehow wake up the threads the threads can be sleeping in blocked state not running the master can somehow wake up the threads and tell them hey look some work has arrived for you right we need these coordination mechanism between threads where a thread can tell another thread hey you know do this and this thread will say okay I am done with this so that is what we will study in this lecture what is the mechanism available for threads to coordinate in this way with each other and one way to do it is using what are called condition variables okay so threading libraries provide many mechanisms for this thread synchronization in this lecture we would not have time to cover all of them but the simplest and the most useful of them is a mechanism called condition variables and this is useful for threads to do simple coordination between each other for example let us consider a concrete example you know there are two threads in your application thread T1 does some work does some task say you know it accepts a request and puts it in a queue and only after that you want read T2 to take the request from the queue and process it so T1 does something only then T2 does something we do not want T2 to constantly keep waiting running wasting CPU time before T1 has done its share of the work and so we need some mechanism for this for patterns like this condition variables are useful and note that logs alone we have seen logs threading libraries that provide mechanisms for creating threads also provide logs like you know p thread logs but that is not enough we need a different mechanism here okay for this kind of signaling this kind of coordination you need a different mechanism which is condition variables so how do condition variables work condition variables is you know any variable it is like any other variable like a log or something you create in a program and on a condition variable you can call two functions you can call a wait function and a signal function on a condition variable so whenever a thread calls wait on a condition variable it will be blocked and every condition variable internally it maintains some sort of a list or queue of threads that are waiting and a thread can say okay I also want to join the queue you know I do not have work to do now my task my turn has not come my task has not arrived a thread can call wait and another thread that wants this thread the sleeping thread to run can call signal so thread that calls wait will be blocked and added to a list another thread that calls signal will wake up one of these threads okay will make one of these sleeping threads ready to run again that is how wait and signal work so let us revisit this example of you know we want this semantics of t1 does some work only then t2 does something else how do you implement it using condition variables without condition variables you know t2 is always checking is it done is it done is it my turn is it my turn that is inefficient we do not want that a better way will be something like this where t2 will check if the work is not done it will call wait on a condition variable and t1 when it does the work which will unblock t2 it will basically call signal on a condition variable okay so for example if t2 runs before t1 we do not want that right we want t1 to do its work only then t2 to run so if t2 runs first it will check that this variable done or something it will check in the data structures indicating the work is not done it will call wait at this point t2 blocks it is context which doubt it won't be run now when t1 calls signal on a condition variable after doing the work then this thread is marked as ready to run and the CPU scheduler will resume it at a later point of time so now whatever work t2 has to do after t1 that will be done here okay so this is an easy way for threads to synchronize and this wait and signal are of course implemented with the operating system support of you know moving some threads to a block state ready state all of that is done so the wait function will put this thread into a block state the signal function will wake up one of the threads and make them ready to run now what if you know t2 didn't run first and only t1 ran first then t1 does the work at this point t2 is ready to run it doesn't have to wait so it will check that is why there is usually some variable you will check some condition you will check now there is no need to wait this condition is already true okay if this condition was false you call wait but if this condition is already true you know the work is already done then there is no need to wait you will proceed okay so in this way check the condition and only sleep only if the condition you are waiting for is not yet true then you sleep if the condition is already true there is no need to call weight, T2 will proceed. So now there is a slight complication with this weight and signal, there is another scope for race conditions that let us understand carefully, okay. If you do not do this weight and signal carefully, you might end up in a very bad state, you know, you might end up in what are called deadlocks. What is a deadlock where a thread is somehow sleeping not doing any work, okay. So let us check what these conditions are, okay. So I am describing that condition here in this figure where some bad things can happen. So for example we had this code, T2 is checking, if the condition is true, if it is true it would not wait, but if the condition is not yet satisfied, T1 is not yet done, then it has to call weight. Now suppose T2 has checked this condition and figured out that the condition is not yet satisfied, T1 is not yet done. This case T2 has to wait. So now T2 has decided to wait, has moved its program counter to this line, okay. At that point, just before it calls weight, context which has happened, unfortunately, okay. This is always the problem with the threads and processes, you know, at a unfortunate time a context which can happen. Now T2 has decided to sleep, but it has not yet called weight, it has not yet added itself to the list of waiting processes on this condition variable. At this point a context which occurs, T1 runs, it does the work, it calls signal, will this signal wake anybody up? No, because T2 has not yet joined the queue, so the signal won't wake anybody up, T1 is done. Now T2 resumes again, the program counter is here, so T2 will execute this weight statement, it will go to sleep. Now will anybody wake T2 up? No, because T1 already did signaling. So T2 will sleep forever, right, this is a deadlock. Why did this happen? Because all of this should be done atomically, you know, checking a condition and sleeping should be done in one go, we don't want people to interrupt in between, okay. So that is why to protect this atomicity of sleeping, we use a lock along with condition variables. So the way condition variables are used is that the thread that wants to sleep will acquire a lock, check the condition and go to sleep and this lock will be passed to the weight function and after this thread goes to sleep fully, then this lock is released by this P thread or whatever library that is implementing this weight function, okay. After T2 sleeps, then this lock is released. So that the lock is now available for T1, before calling signal also you take the lock, then call signal, then you unlock. Now T2 has slept, now T1 has taken the lock signal, has woken T2 up. Now when T2 returns, once again, the lock is reacquired by this library, T2 returns with the lock and it can unlock it, do whatever it wants with the lock, okay. So we are using a lock to protect this atomicity of checking a condition and going to sleep. Because of this lock, now this situation cannot occur, you know, you cannot have a signal run here. Why? Because the signal also needs a lock and you don't have the lock here, all of this is a critical section, nobody can interrupt you in between or no other critical section can be squeezed in here. Therefore, even if T2 is context switched out, it is okay because T1 does not have the lock, it cannot run the signal, right. So therefore, all of this will happen only then the signal will happen. So therefore, you have to use locks along with condition variables. Of course, the implementation is taken care of by the libraries, but you have to understand that every time you call a sleep on a condition variable, you also have to, you know, acquire a lock and pass that lock. And whenever you call signal, you also have to hold this lock and call signal. This will ensure that the wait and the signal will happen either, you know, before or after, but not interleaved in this problematic manner. So what we have seen is an example of a situation that is called a deadlock, okay. When you have multiple threads in a process, sometimes these threads can get stuck in a very bad state where they are all sleeping, they are not doing any work and your system is stuck, okay. That is called a deadlock. And we in general don't want these deadlocks to happen because, you know, you want life to go on. Another problem that can also occur with threads is what is called a live lock, where the threads are running but still not doing any useful work. The difference between deadlock and live lock and in deadlock the threads are blocked and not being woken up. In live lock the threads are running but not making progress, okay. These are related but slightly different problems. So we have seen one example of a deadlock which is a thread sleeps by calling wait on a condition variable but no other thread calls signal. So this thread sleeps forever, you know. You have called wait and nobody else calls signal. Another example is even without condition variables with just locks you can have deadlocks when you have multiple locks in your system and you acquire them in some weird order. So consider this example where there are two threads T1, T2. Both of them need two locks, you know. They acquire two locks, do some work, release two locks. Now T1 acquires them in this order lock A, lock B, T2 acquires lock B, lock A. Then what will happen? If sometimes, you know, all is well, life is good, T1 acquires both locks, does its work, releases the locks, then T2 acquires both locks, does its work, one after the other they work, that is okay. But sometimes, you know, you will have a weird situation like this, you know, with some weird interleavings you will have a situation where T1 has acquired lock A, then there is a context switch, T2 has acquired lock B, then there is again a context switch. Now you are back to T1. T1 is waiting for lock B, will it get it? No, because T2 has it. T2 is waiting for lock A, it also won't get it because T1 has it. Will T1 finish its work and release these locks? No, it won't, similarly with T2. So, both of these have one thing, they are waiting for the second thing, neither will finish. Now if you know T1 gets both locks, it will finish its critical section, release the locks, but we are not there yet, similarly for T2. So, this is a deadlock. Even without condition variables also you can have deadlocks when you have multiple locks and you acquire them in different orders across different threads. So, what are some of the best practices in your user programs to avoid deadlocks is, you know, there are many techniques, two simple ones is whenever you acquire locks, make sure that you acquire them in the same order across all threads. You know, if you do lock A lock B, everywhere in your program do lock A lock B. Don't do this jumbled up order that will cause deadlocks. The other thing is, you know, this acquiring locks in a jumbled up order will cause what is called circular weight, you know, where this guy is waiting for this lock, this guy is waiting for this lock. And the other thing is when you have condition variables and whenever you are, you know, doing any weight or sleep or something ensure that the code that has to wake you up can run. Okay? Whenever you call weight on a condition variable always check that the code path that call signal can also run. For example, if that code path needs some logic, need some lock, need something to run, make sure that that path is open so that whenever you are sleeping, somebody will eventually wake you up. So, now we have seen what are condition variables and a common pattern, a design pattern that is found in computer systems is what is called a producer-consumer situation. You know, you have some threads that are producing somewhere, there are threads that are consuming somewhere and you know, there is a shared buffer of some fixed size of some bounded size and these producer threads are adding items into this buffer and the consumer threads are reading items from this buffer and consuming them. This is a common pattern in which threads in an application interact with each other. So, in such a scenario, let us see how we can write the code for such programs which have this producer-consumer pattern using condition variables. So, here is some simple code for producers and consumer threads. So, what the producer thread will do is it will first check, is there space in this buffer to produce more items, you know. Of course, everywhere you have a lock, any access to any of the shared buffer has to happen with a lock and the producer will check. If there is space in the buffer, it will produce an item. If there is no space in the buffer, then it will wait until the space is freed up in this buffer, it will wait on a condition variable and who is waking up this producer, whenever the consumer consumes an item, it will call signal on this condition variable that will wake up the producer. So, the producers keep on producing at some point, this buffer is full, then the producer threads will wait, then later on the consumer threads will run, they will consume items free up space in the buffer and they will signal the producer threads which will wake up producer again, right. So, you have a nice sort of a tango going on here. Producers are producing, they are waiting, the consumers are consuming telling the producers, hey, produce more, then putting the producers, put them in the buffer, consumers consume. So, you have like a nice coordination happening between the threads in a system where they are both doing their work, okay. Similarly, the consumer also will wait if there are no items in the buffer. If suppose the producers have not yet produced anything, then there is no point for the consumer to run. So, the consumer will check if there are no items in the buffer, the consumer will wait. Just note that this is another queue for the consumers, another condition variable. And when the producer produces an item, the producer will signal the consumer and wake up this consumer thread, okay. So, in this way we are using two condition variables for, you know, roughly there are two types of waiting, two types of queues. One is for producer threads that are waiting for space in the buffer, one is for consumer threads that are waiting for items to consume. Therefore, you use two condition variables. And of course, there is a mutex lock that has to be held while accessing the shared data structures or while waiting, you know, you can use the same lock for all of these multiple purposes. So, this is a simple example of a producer-consumer code, you know, producer will lock, check if there is space to produce, otherwise it will wait, giving this lock as the argument to wait. So, this producer thread will go to sleep, this lock is released. At a later point of time when the producer is woken up, it will produce and signal the consumer. Similarly, the consumer code is also symmetric, okay. So, this is a very common pattern that occurs in multi-threaded applications where some threads do some work, then signal the other threads which will do the rest of the work. So, in a multi-threaded server, you know a TCP server, web server, any application server also, this producer-consumer pattern is seen. For example, when you know you have a thread pool, you have a master thread that is or one or more master threads that are producing work, they will put all these requests in a shared buffer and you have the worker threads that will take request from the shared buffer and handle them. You know a TCP server, as whenever a new client comes, the client socket or something will be put in this queue and the worker threads will take that socket from the queue and service that client, you know, read the client's request and reply back and so on. So, this is a very common pattern with multi-threaded servers that have a thread pool to do some work for them, okay? And now the question comes up, you know, you have one master thread that is producing work and you have multiple worker threads that are doing the work in a thread pool and whenever the worker thread finishes its work, it will come back again to the queue, take the next item, it do that, handle that request, then come back again, right? So, in this scenario, the question comes up, how many such threads should I have in the thread pool? What is this optimum value? Of course, this has to be tuned carefully in your particular system, there are pros and cons. If you have too few threads, what will happen? The producer is putting a lot of requests, they are not enough consumers, enough worker threads to handle these requests. Then your, you know, your queue will build up, your clients may not be served on time or you do not have enough threads to utilize all your CPU cores. So, you cannot have a thread pool with, you know, one or two threads, usually that does not make sense. You need some respectable number of threads to quickly process all the requests, to use all the CPU cores for parallelism and so on. At the same time, you cannot have like millions of threads, you know, each thread, you know, occupies some memory, there is a PCB, the scheduler has to take care of it, every thread adds some overhead. So, you cannot have like infinite or a very large number of threads in your system, that is also counterproductive. So, this optimum value is usually has to be tuned. So, when we study performance, we will revisit, you know, how to tune all of these things for good performance, we will come back to this question later. Now, the other design aspect you have to consider when building multi threaded servers is, now you have this worker thread, it takes a request, you know, a client has connected to a server, that client is assigned a worker thread, which will handle the client's request. Now, there are again different design choices here. Either you can do a run to completion model, that is, each worker thread can basically fully handle a client, you know, each worker thread is given a connected client socket, it can read from the socket, then handle the request, send a reply back, again the client will say something, whatever the entire work that is needed for a request can be handled by just one thread, that is called run to completion model. The other model is a pipeline model, where you know, you have some worker threads, the master has given some work, put something in a queue and some worker threads do some part of the work, then they will, you know, put it in another queue, then other worker threads will do the other part of the work, right? You can have a pipeline of stages, one worker thread does something, then says, okay, my part is done, it will give it off to another worker thread, another worker thread between different stages, you can keep going back to the queue. And each worker thread is just specialized for all of these of course, can be part of the same thread pool also. Each worker thread is specialized for a particular task, it will do that task, then hand it over to somebody. So, this is called the run to completion or the pipeline model, both of these are valid design choices, okay? So, then you might be thinking after all this discussion, you might be thinking, oh, what if I use an event driven API? We had studied this, if I use something like ePoll, I do not need to block on any socket read or anything, I can have a single threaded server process handle multiple clients. So, then the question might come up, why do I need all this multi-threading worker threads, thread pool and all of that if I use event driven programming. But even with event driven programming in real life, you will need multiple threads because if you have a single threaded ePoll server and you have multiple cores, you know, in real life systems, you have to handle a large amount of traffic and you are running on powerful high end servers that have multiple cores. So, you cannot just do a single threaded ePoll server, it cannot handle, it cannot use multiple cores and it also cannot handle any blocking things like disk IO. If you have a single threaded server, that is in an event loop, you know, ePoll wait, request comes, handle the request, then ePoll wait, if you are doing that, if you have a single threaded server, this thread cannot block for discreet for example, you know, then all the events will be kind of left without anybody to handle them. Therefore, in practice, even with event driven servers, you will have multiple threads. For example, you can have multiple threads on multiple cores, each of them calling ePoll, you know, each thread calls ePoll, handles some requests on this core, handles some events, you know, handles a set of clients on this thread, another thread on another core, handles another set of events and so on. So, you can have multiple threads each doing ePoll in order to use the multiple cores in a system or you can also have a master worker model with ePoll, where there is one thread that is doing ePoll, collecting all the requests, new requests that are coming up and then you have a pool of worker threads that are handling the work that is generated by this ePoll master thread. And these worker threads can handle disk IO, they can block, you know, this thread, this ePoll thread should not block the master thread. Why? Because the events are continuously coming up from the kernel and you have to handle them. But these other threads can block. So, the summary is that whatever APIs you are using, you know, blocking APIs or even driven APIs, typically real life servers on multi-core systems will end up having multiple threads and all of these threads need some mechanism for coordination like condition variables. In this lecture, I have only spoken about condition variables, but different programming languages, different libraries will give you many other such mechanisms where you can build these sort of pipelines, you know, thread pools where one thread does some work, signals some other thread, then some other thread does something else. There are many mechanisms to do all of these. So, when you are building computer systems, when you are building a single server or multi-threaded server, it is important for you to understand what these mechanisms are and how to use them in a real system. So, that is all I have for today's lecture. We have seen how to build multi-threaded servers whether using blocking APIs or even driven APIs. And we have seen how mechanisms like condition variables are useful in such situations. Of course, there are many more such mechanisms in other programming language libraries. We have also studied briefly about, you know, bugs like deadlocks that can occur in multi-threaded programs, where we have to take care that all your threads are running all the time and, you know, are not blocked in some unstable state. So, as a programming exercise, I will request you to try to write a client server program where the server has a thread pool to handle all the requests coming in from multiple clients. You know, you can use condition variables from the pthreads library and this will help you understand the concepts of this lecture better with a practical example. So, thank you all. That is all I have for this lecture and let us continue this discussion in the next lecture.