 Hello and welcome to the next lecture in the course on introduction to computer and network performance analysis using queuing systems. I am Professor Varshaapte and I am a faculty member in the department of computer science and engineering IoT Bombay. So today we are going to start something new which is closed queuing systems. So far we have been learning open queuing systems. What does that mean? That the source as to like where does the request come from really and where is it going? This is not explicitly modeled. All we care about is that there is some kind of a rate of arrival, some lambda rate into the server, there is some service rate and then we have various metrics based on these, this kind of focus of the system. But in some cases it is actually important to model the sources of where a request is coming from. For example, if you are modeling let us say a lab in which say some 50 students are sitting and interacting with the course website or if you are modeling a call center where there are some 200 agents creating complaints or sales tickets or something like that on a particular web application. And there is one more very interesting example as to in a, let us say in a computer this is just representing a computer there is a CPU, suppose this represents a CPU and I am just going to draw something like this for threads, these are threads. So this is CPU let us say there is also of course IO in the computer, there is some secondary storage and there is network, there is some network happening. So if this is some sort of a thread pool of a web server for example, these threads what they are going to be doing is they are going to be in a kind of a loop where they use the CPU for some time. Then they are done then they make an IO call either to the network or to the disk and then they basically have to wait for IO, they have to wait for IO. So means use a different color here. So this thread will be here for some time then it will go away from the CPU and be here or be here, be here meaning it is going to stay of course just in the computer but it is going to be waiting for an IO request or waiting for a disk IO request or waiting for a network IO request it will be leaving the CPU and then it will again finish this and come back to the CPU. So there is a fixed number of threads often in a web server there is a fixed number of threads and they are going to be in this sort of a CPU burst and wait for IO loop and here also you can this also you can think of as a system where you are modeling the source, the source of requests coming to CPU is this is a thread pool, this is a source and in the sync also where does the request go after it leaves the CPU it also goes back to the thread pool in a state where it is either doing IO or disk IO or network IO. So whenever there is a fixed number of requests that come to a certain resource use that resource for some time then leave the resource and then are idle or doing something else but are going to come back to that resource. This system is considered closed, the reason it is called closed is because if you think of it as if you think of this whole thing we represent it like this with this kind of picture, this is the source and actually in a way it is a sync also this is where the requests come from this to the server and this is where the request go back and why is it the source and the sync because these are basically the clients. In the examples that I gave here, this is either a student student 1 to student m or it is an agent 1 to agent m or lastly it can be thread 1 to thread m, these are just examples the generic view is this that there is a server, the server part is identical to what we have seen in open queuing system, there is a buffer, there is a number of servers, there is a service time, there is C number of servers. Only thing we actually assume in closed systems that the buffer size can always be sized to be greater than the number of these so called clients because if this is known, if this is fixed then why would you make the buffer size less than this. You can of course, advanced models will have some kind of a can allow for the buffer size being smaller than m, but in our course here we will only be studying systems which make the simplifying assumption where the buffer size is assumed to be greater than m. So this is the basic structure that there are these, there is this station, there is a client station and this is the server station or we can call it the client node or the server node. At the client station, it models basically the users, often they are actually human users who are issuing the request. So if it is a student or an agent they may be having, they may at their client they probably have a browser on which they are clicking on the link and when that click on the link happens then the request goes to the server, then the server fulfills the request and then the response goes back to the user. If it is a thread then whenever the thread actually is done with its IO or its wait for IO that is when it goes and uses the CPU and then comes back and sits in the thread pool basically. So this is basically either modeling the set of students or agents or a thread pool and then there can of course be various examples where this sort of a scenario is done, we will be looking at examples. So again the source and sync of requests which is the users are explicitly modeled. The users are in a think, then request and response loop with the server. What is the think? These times where after a response is received and the time when the user or the agent or the student or the agent or the thread is in a way idle before they send the next request that is called think time because it is actually not an idle time. In the example of let us say a student interacting with the course website or the call center agents interacting with the complaints website what is actually happening is you are processing a response, you process in your mind or you can say read or think about the response. So if I click on the website some page comes to me I am going to spend some time reading that page before I click on the next link that reading time is called think time. You can say that it may not really be think time it may be representing a read time in case of threads actually obviously threads are not people. So they are not thinking what they are doing is they are actually somewhere else they are waiting for some IO with some network or disk IO. So they are with respect to the CPU they are doing something else that is all they are just not issuing a request or they are not actually waiting for the waiting to use the CPU they are preoccupied with something else the same thing when the student or a call center agent gets a response they are preoccupied with reading it when they finish reading it they will issue the next request. So this is very important the users are in a think request response loop with the server meaning at the end of your thinking period you issue a request then you get a then the server takes its time to send back the response once the response comes you again send go into a think mode meaning you are reading the response and then you send the request. So if you think of a user timeline let us draw a user timeline here the user let us say which starts with sending a request when it sends a request then there is going to be this let us say this much time in getting the response this is the time that the server needs to send the response then the user is going to think the user may think for a longer time. Then again issue the request request is issued here the request issuing activity is not assumed to be taking any time you just click on the link that is the issuing of the request that is considered sort of instantaneous and then again you wait for the response and this loop continues. Some important assumptions for our sort of simple model is that we only issue one request at a time this may not be very realistic because when you are browsing a website we often click on many links at the same time we open many tabs. So those are of course advanced models for the purpose of a simple model to study in this course we assume that the only one request is issued at a time. So this is the basic model the parameters of this model are basically the number of clients then that is usually we denote by m so that is this then there is a think time we denote by h the other parameters are like I said at the server side the parameters are the same we have C servers we have the tau is the service time mu is equal to 1 over tau and here we have explicitly shown that the arrival rate is no longer a parameter for a closed queuing system and we are also not considering a buffer size we can consider a buffer size for advanced models but in our examples of the models that we study in this course we are not considering that. So for this system again we ask a question as to what are going to be the matrix in the matrix case again there is basically no difference you are again going to be interested in the same things what is the response time that is the most important what are the user you know when the user clicks on the link when is the user going to get a response then waiting time how much did the request spend in the queue waiting what is the throughput of the server these two are going to be important to the user the throughput is going to be important to the server system owner similarly the server utilization is going to be it is a system metric which is important to the system owner of course number of customers in the queue and the queue length these are related matrix that the system owner is going to be interested in. One clarification of also of one assumption is that although there seems to be some kind of a travel here of this response back to the queuing station here and the request going from here to here in reality there might be a network on which this request travels but in our again the simplistic models that we are going to study in this course we do not assume any network bottleneck or any network delay. Whenever the server finishes the request basically in 0 time it goes back to the user. So, this is some of the matrix and the assumptions this is summarizing the matrix as usual we have number of jobs in the system that is going to be this part and here queue will be over here just this part waiting time we know response time has usual definitions throughput server utilization R cycle is a new metric for closed systems R cycle is actually the time for a request to do a whole cycle. So, we can pick any point we can say the point at which request was issued request issued. So, this will include response time and then since this time is 0 it includes response time and then there will have to be think time. So, basically the cycle time is always going to be equal to response time plus think time. Again, I have explicitly shown that since we are not a modeling limited buffer we are not considering blocking probability or loss probability at the buffer. So, the question is how do we calculate this? So, we are going to take a simple example because it is always easy to understand things in terms of an example. Suppose a single threaded server it could be a web server or something is running on a one core CPU. So, we have a single. So, basically this is just to make it a single server system and suppose there are 180 clients again this could be students in a computer lab. This could be some kind of a course management server. So, the student there are 180 students in the lab, they click on the link and then the web server does some processing for that processing let us say the service time is 5 milliseconds and sends back a response. Once a response comes let us assume whatever it is that the students are doing they need only one second to read the response and then they send the next request again. So, this depends on the setting what is the realistic think time sometimes you might need many seconds or minutes to read and process the response in your mind. Sometimes you could just be doing something routine and you are just clicking on the link waiting for the page to load and then you just click on the link again. So, this is just the example and we want to find all our usual metrics the throughput, CPU utilization, waiting time, response time, Q length at server and number at server. So, remember the various laws we have at our disposal. We know utilization law, we know little's law. These are the two main laws we have. Question is how do we use, how do we reason about it here? We do not know there is no such thing as an arrival rate because the arrival rate now is no longer fixed. The arrival rate here depends on how many users are thinking versus how many users are at the server either waiting or being serviced. For example, out of these 180 clients suppose a server currently is some this service time is just an average length. This service time is just an average suppose lot of requests with the large service times are have ended up getting queued here and so out of 180 almost 100 requests are queued here and only 80 clients are thinking. So, if 80 clients are thinking the rate at which in that time that we can expect requests to be coming here will be very different as opposed to suppose only 20 requests were here. Only 20 requests were here and 160 clients were here. So, if there are 160 clients about to finish thinking and send a request that expected arrival rate will be much higher. So, the point is that the arrival rate is not constant. So, the standard tools that we had where we could write that utilization law is equal to throughput multiplied by tau and then for an open system we could just write it as arrival rate multiplied by tau. This basically this method is not available to us because we do not know what is small lambda. So, a completely different trick and very interesting approach is used to model closed systems and show this. So, what is done is as follows, there is a very innovative definition of little slaw region. So, the little slaw region is drawn like this where you are drawing this kind of a funny shape where you just make a little gap here, this is a shape here and just draw it over in this shape again. This is the shape and you are doing like this here. So, that there is a bit of a gap here and what this represents is that a request that leaves the server briefly leaves the system and almost instantaneously enters again, instantaneously exits and enters the system. So, this is because you want to have some kind of throughput defined if you want to apply little slaw. For little slaw you need some region in which something is coming and something is going but if you look at this whole region out of the 180 clients they are either thinking or they are at the server. So, it sort of looks like nothing leaves and nothing enters unless you make some kind of an artificial boundary. So, this is a trick that is used and you will see how it becomes very useful. Remember that this is kind of a instantaneous entry and exit. So, you do not expect anything to be here in this region which is not inside this boundary for any particular amount of time. But the flow through this region is going to be the throughput. The flow through this region, the request rate through this region is going to be the throughput. So, now for this little slaw region this is the throughput we have the think time here and we have the response time here. So, if you have to apply remember it is I will write it in a different color here the n is equal to lambda r. So, the r here has to be the time through the region. In this case what is the n and what is the lambda and what is the r let us think about it. If this is the boundary then the number inside this region, the number inside this region is always going to be 180. The clients or the requests are either at the client station in the think mode or they are at the server getting serviced. One way or the other there are going to be 180 requests inside the system. Remember that this is supposed to be sort of a zero time transit, instantaneous exit and entry. So, always inside this region we know there are 180 requests. Now, this 180 is going to be equal to this throughput whatever it is we do not know the throughput. But we know that this throughput is the same as this server throughput because that is the picture tells you that that the server throughput is the throughput through this region. So, that is lambda and what is the time through the region? The time through the region is actually response time plus think time. We do not know the response time yet. So, we are just going to call it r and but we know the think time, the think time is given in this example as 1 second. And so, this becomes what we get from little slow. We get 180 is equal to throughput through this region multiplied by response time plus think time. So, this is just summarizing this the number of requests in the little slow region is 180. The throughput through the region is some lambda we do not know that yet. The time through the region is response time plus think time. And according to little slow the number has to be throughput multiplied by time through the region. So, this is what we get by little slow. And you can rearrange it and write it as throughput is equal to 180 divided by r plus 1. But we still do not know lambda and r as usual whenever we use little slow often we end up not really knowing throughput or response time but we can relate it. So, in this case let us try to first find the asymptotes maybe we can actually find the values of throughput and response time for the asymptotes. So, let us start with response time. So, this is m equal to 1 that is only one user. So, if there is only one user remember there is one CPU service time is 5 milliseconds think time is 1 second. So, response time with only one user obviously there is never going to be any queue. So, the response time is going to be equal to the service time and this is going to be equal to 5 milliseconds. Now, we have response time we know the think time. So, we can write the throughput. So, throughput is going to be equal to 180 divided by we can write it as 0.005 plus 1. This is request per second. Now, we have throughput we have. So, waiting time of course is going to be 0. There is only one user there is no queue waiting time is 0 queue length at the server is also going to be 0 never going to be any queue there. CPU utilization basically we know it should be throughput multiplied by tau that is utilization law always applies we know the throughput it is 180 divided by 1.005 multiplied by it will be multiplied by 0.005 that is the utilization. Number at server we can get by little slaw little slaw applied to this region remember that we got these sum of the values by applying little slaw to this region shown in the red. But if you apply little slaw just to the server then we can get the number at the server. So, that will be equal to throughput multiplied by response time through the server only through this part and that is R. So, this is equal to 180 divided by 1.005 in this case again multiplied by service time itself just written in seconds. So, this is basically the these are the low load asymptotes we were able to get values for everything. Now, let us check the high load asymptotes. What will happen when the number of users keeps increasing? One thing we can say no is that throughput is going to go to mu as usual we know that this should go to the maximum. If the number of users keeps on increasing the maximum that the server can do is whatever its capacity is which is 1 over tau this is 1 over 0.005 milliseconds that is seconds requests per second 1 over 0.005 requests per second and that is 200 requests per second. CPU utilization clearly when the server is going at its full capacity this is going to go to 1. What is response time number at server which of these can be right? Again let us now use the little slaw as applied to the whole region. So, we know that inside this region there are going to be m clients they are going to be large number but let us use the number m. We know that the number m is always going to be inside this region and this is going to be equal to the throughput multiplied by r plus 1. So, but we know here that the throughput is going to go to 200. So, one thing we can write this as r plus 1 is equal to m by lambda which is equal to m by 200 and this implies that r is equal to m by 200 minus 1. So, we are not able to get an exact number and because clearly as m goes to infinity r is actually going to infinity there is no doubt about that there is not a limited buffer. So, r is going to go to infinity. So, that is not a that is not in question as m goes to infinity r goes to infinity. But we know a little bit more about how it goes to infinity. Think about it a little bit when you can see that the function of r as a function of m is like this what does that imply that implies this is linear with slope 1 over 200 seconds. So, this is a very nice result we get that we can actually express r in terms of m with a very in a very simple relation. Once we get r then waiting time is going to be nothing but r minus tau. So, that we can just substitute that and get in this case is going to be m divided by 200 minus 1 minus 0.005 to just convert it to seconds. A number at server again we can get by throughput multiplied by r. So, that is going to be equal to 200 multiplied by r. So, again here we can get one or the other. We do not really know exactly sorry. So, 200 multiplied by r which is equal to we can multiply this this is going to be m minus 200. So, again we can see that the number of if m is going to go to infinity then the number that is queued at the server will also of course go to infinity. So, it but we know the way that it is going to go to infinity just like r we know the know a little bit more about how it goes to infinity similarly we know about the throughput also. So, this is just summarizing everything that we just learned the throughput for the low load asymptotes 1 over 1 plus 0.05 you can go over all of this we just derived all of these things just now and then the as m goes to infinity all of the values that the matrix take the response time is m divided by 200 minus 1 we just derived that number at server will be this r multiplied by throughput and so on. So, we will just take a pause here and we will continue this analysis for general values of m and we will also discuss a very important metric for closed systems called saturation number. Thank you.