 Hello and welcome to this lecture in the course on Introduction to Computer and Network Performance Analysis using queuing systems. I am Professor Varsha Apte and I am faculty member in the Department of Computer Science and Engineering IIT Bombay. So let us continue where we stopped in the previous lecture, we were looking at closed queuing systems and we had taken an example and done some analysis. Now we are going to generalize all the metrics that we calculated and we will also be looking at one interesting metric called the saturation number. So just a recall again closed systems are used when you have a small number of clients and whose think time is also small and they are in a request response loop, that is the clients can represent students in the lab or call center agents interacting with the website or their threads interacting with the CPU. So the basic loop is that a client thinks for some time, then issues a request, then the request is serviced, it gets a response and the responses returns to the user. And then again the think, issue, request, wait for response loop begins, continues. So this is basically the loop and the number of clients, the think time, the average service time, service rate, number of servers, these are the usual ones that are the parameters of the system and these are the new ones that are the parameters of a closed queuing system. Again metrics are very similar, we had gone over this last time the only new one being this R cycle, which basically represents the entire time taken from any point to any point. So if you start measuring from here, the time taken from an issue, request being issued, waiting for response, coming back thinking and then at the issue time again. So this whole loop is going to be basically response time plus think time and that is the new metric called the cycle time, that is the only new thing here, otherwise everything the definition is the same. Previously what we did was we actually did low load asymptote and high load asymptote calculation for the metrics for a specific numerical example. All that I want to do now is to write the same things symbolically. So it is not very difficult, so let us do that. Remember that the main trick that was used to reason about this system was to draw a little slur region of a kind of a weird shape, where you have something like this. You enclose the whole thing, but you draw it like this so that you have some definition of a throughput that a request is leaving from here and re-entering here. So this is like an instantaneous exit and entry. So that the time spent in this whole transit here is a 0, but you get to talk about the throughput here. So the exit rate here is going to be the throughput and this throughput is going to be nothing but the throughput of the server. So now you have a way to talk about the throughput and the biggest advantage you get is that inside this region, the number of clients is fixed, so the number of requests is fixed. There are going to be exactly M requests in this region, no matter where the requests are, there could be some requests here and some requests here, but together they are going to be always total of M requests. Because remember that the assumption is that the user gives only one request per user at any time, maximum, otherwise the user may be thinking also in which case the user has not issued the request. But once a request is issued, before the user gets a response, the user is not going to issue another request again. This is a very important assumption and because of that assumption you know that the total number of requests either in think mode or in waiting for service, waiting for or in service, or in service, so either the request is thinking or it is waiting or it is in service, total of this is going to be M and this is a very important and very clever trick for us to use. So given this, let us go through, so for this then the little slaw becomes essentially M is equal to lambda r plus h, the time here is r, the time here is h, the time the think time here is h and that is it, we do not assume any other times, remember that we had said that any network delay is assumed to be 0. So basically the time through this region is r plus h, the throughput through this region is the server throughput and the total of request in this region is always fixed at M. Given a certain number of clients, it is going to be fixed at M. So at M equal to 1, it becomes easy M equal to 1 if we are trying to reason this for low load asymptote M equal to 1, this is going to be 1 is equal to lambda r plus h, what is r going to be at M equal to 1, r is equal to tau, so we have 1 is equal to lambda tau plus h, so for the low load asymptote we have an actual value for throughput, throughput is going to be 1 over tau plus h. So this is kind of obvious that if there is only one user in a request response loop, the rate at which a request is going to be done is 1 every think time plus response time, that is what is happening in this system, 1 every think time plus response time the request gets done, but the response time with one user is tau, so this is kind of clear. So once we have the throughput, in this case we can calculate the r, we have the throughput, we can calculate the server utilization and we can calculate the number in the system. So what is server utilization, this is lambda tau, so this is going to be equal to tau divided by tau plus h. So we have server utilization, what else do we need, we can get n, n is equal to lambda r, n is the number of customers here, the number of customers at the server. We have r is equal to just tau, so we have tau divided by tau plus h for number of customers at the server also. So low load asymptote for all of these in symbolically we have calculated, we will show a table later also to reinforce all of this. The waiting time of course when there is just one customer in the system is 0, there is never going to be any waiting here, so similarly q size will also be 0, there will be never any customers or any requests in the queue. So we have n, we have q, we have r, we have w and we have server utilization and we have throughput. So we have the basic matrix for the low load asymptote in sort of symbolic sense. Now let us write the same thing for m going to infinity for a large m, symbolically again. So this was tau, let us also define mu as 1 by tau. Remember we are doing all of this for a single server, single server. We are not really going to be doing this analysis for multiple servers. So for single server as m tends to infinity lambda, capital lambda the throughput is going to go to mu, this much we know. And again we have the basic little slur region applies again here, let me draw that region again and this is r and this is h. So the basic formula that m is equal to throughput multiplied by r plus h this always applies but we can in low loads and high loads some of these symbols get some specific values in terms of the parameters. So as m tends to infinity we have this m is equal to mu r plus h. That is one difference. We can say that this throughput now we can replace by the parameter mu. So we can write r as I will take this mu down m by mu and h on this side minus h. So this is a very important formula actually. It tells us about the behavior of r which is that it is linear in number of clients when the number of clients is large. That means every time you add a client the response time is going to increase linearly with slope 1 by w. So if you plot r versus m initially at m equal to 1 this value is going to be what? We know that this is going to be tau and initially it will increase in whatever way but we know that the high load asymptote is linear and this slope is basically tau. It turns out that the middle part looks something like this and this is the high load asymptote and this is the low load asymptote. So response time in a closed system has a very interesting behavior. This part is linear and that makes some calculations very interesting. Then of course we can also write once we have throughput we can write clearly utilization at high load is going to be 1. There is nothing much to think there. What about n and q and so on? They are just from Little s law we can write n as throughput multiplied by r which is going to be mu multiplied by m by mu minus h which is equal to m minus mu h. So this is something we can write the n with and similarly if you want to find w, w is going to be that is the waiting time that is going to be r minus tau and we can do the calculation for q in a similar way lambda multiplied by r minus tau which is we had this lambda r here as m minus mu h and capital lambda tau of course this is going to be equal to mu tau which is just going to be equal to 1. So this is how we go about the whole calculation. We started with the basic m is equal to capital lambda r plus h the Little s law region. The basic point from where we start is we know that throughput is going to go to converge to mu when m tends to infinity that gave us r. From r we got n then we from again throughput we get waiting time by Little s law or by taking r minus service time basically and then we get q by Little s law. So this is all the same calculations we did earlier just symbolically so we will just show that. So here we have the whatever I wrote earlier 1 over h plus tau tau divided by h plus tau 0 waiting time response time equal to service time n is equal to tau divided by h plus tau this is for the low load asymptote and the high load asymptote we did all these calculations where we had throughput is equal to 1 by tau which is equal to mu CPU utilization as 1 waiting time was m tau minus h minus tau then we have response time was m tau minus h q length was m minus 1 tau minus h by tau and this is number at the server is m minus h by tau. At the general m where it is either you cannot assume low load or high load this is basically just whatever follows from Little s law which is m is equal to lambda r plus h. We have response time is equal to m divided by lambda minus h CPU utilization is the throughput which is a throughput is equal to nothing but m divided by r plus h and you take multiply this by tau that is what you get a CPU utilization throughput is actually just m divided by r plus h which is just rearranged by expression of this Little s law then we have q length number at the server is going to be basically lambda r. So, we take the throughput which is m h plus r and just write r otherwise we will just get lambda back and q length at the server is the same thing we get we take the throughput and multiply it by the waiting time. So, one point to recognize for the general the m which is neither low load nor high load is this non asymptotic matrix which is a matrix which is this column here you can see that the matrix are actually interrelated we do not really know you know if we know r then we know throughput and if we know throughput then we know r. So, it is we do not really know either one of them exactly ok. So, we have we need to have other methods to find it if we have measurement of r then we know we can get throughput if we have a measurement of response time sorry measurement of throughput then we can get response time and so on. So, it is useful to have these relationships, but only when you are looking at the asymptotes is when you get the values of these matrix in the terms of the parameters remember here these are all parameters m tau h these are the parameters ok. So, all of these and all of these are in terms of the parameters, but the non asymptotic matrix are not ok. So, this just summarizes what we have seen so far. There is another interesting metric in terms of when we are talking about closed queuing systems. There is a whenever we are talking about any server system a the basic question that is in the mind of people who are designing that system is what can it support when we are talking about the open queuing systems that question was basically what is the maximum arrival rate supported by the system and that was nothing, but C mu right. If there is the mu is the service rate per server and C is the number of server the question to ask there was what is the capacity of the system. If mu was 100 requests per second and C was 5 we could say that the maximum this system supports is 500 requests per second. So, this can of course, be asked about closed queuing system also I mean if we are considering a single server system then mu is going to be what the maximum is, but the point is that the load is not being measured in terms of an arrival rate right we are not talking about an arrival rate lambda. So, if you are not talking the measures of the load are not lambda they are m and h right. So, the question to ask is what is the m we are not I mean the question to ask is not going to be what is the think time that this system can support that is not a very natural question. The natural question is how many users can this system support that is what is the m what is the maximum m the system can support. So, for this there is a very famous number called the Klein Rocks saturation number or Klein Rock saturation heuristic and that goes as follows. So, you have already seen that the plot of r versus m for a closed queuing system is going to look like this the minimum is tau and the high load asymptote is linear with a slope of equal to tau. Now, any intuitive system designer who works my intuition will know that the maximum number of customers this system should support should be somewhere whatever number that is that basically is somewhere here. This is the number we want and this number is actually denoted by m star. We do not want to be here because when it is here system is 100 percent utilized we know that we do not want to be there. We want and we do not want to be here also because this is this means system is underutilized here it is underutilized. So, kind of a heuristic not a very rigorous, but a heuristic indication is that if we are operating somewhere here that is a good place to operate at because just after that the system is reaching full capacity and we do not want to be there. So, the idea of the Klein-Rox saturation heuristic is that this number is actually given by the intersection of these two lines. Now, what are these two lines? This is the low load asymptote it is just a flat line and this is the high load asymptote. So, if you just drag these two lines the intersection point that you get is a good indicator of where the response time is beginning to become linear and therefore, you should not go there and it seems to intersect at the correct point. This is also called the knee of the curve. This is also called the knee of the curve. We want to be at the knee and this intersection seems to be giving us the knee. So, we have a method. So, if we can find the intersection point of these two lines we have our m star. So, we need to do an LHS equal to RHS. We need to find the intersection of these two lines. The first line is simply this. So, the response time is tau at the low load asymptote and we know that the high load asymptote is m tau minus h. Remember this is what we had derived earlier m tau minus h is the high load asymptote. So, we want to find the intersection where the low load asymptote and the high load asymptote are the same. So, that we will get by this and we want to find the find it find the m. So, what is that m going to be we will just rearrange this m tau is equal to tau plus h and m is equal to 1 plus h by tau. So, this is actually the m star we want to find and so, this is the equation that gives us m tau. Now, again what is the story of this as I have always said before the story of any formula should be thought about. So, this is basically m star is equal to 1 plus think time divided by service time. So, it is very intuitive in a way we can also look at it as equal to 1 plus service rate divided by you can call it something like think rate. So, in a way if let us think of this the think time if it was 1 second or 6 seconds and service time was 100 milliseconds. So, then this should be 1 plus 6 divided by 0.1 and this is equal to 61. So, this system supports 61 users that is how to calculate that. A 6 second service time corresponds to a let us take another example let us take no 100 milliseconds service time corresponds to a this is actually 10 requests per second and 6 seconds service time and 6 seconds service rate is 1 request per 6 seconds that is around 0.16 requests per second 0.16 some kind of a think rate. This is not really the rate at which a user issues a request this is the rate at which a user can issue a request if response time was 0. So, in a way you are also taking 10 which is the total rate at which this server can go and dividing it by the rate at which each user can go if response time could be very fast this is the maximum at which one user can go if there was basically instantaneous response from the server the request the user would just issue request at the rate that the user can think. So, you can see that this is quite intuitive that this is how many users 10 divided by 0.16 is how many users the server can support roughly and then there is a plus 1 which is not explained by this reasoning. But it is in that sort of range that you have the server going at 10 requests per second a user can go at 0.16 requests per second. So, how many such user can the server support it is roughly 10 divided by 0.16 and then we add 1. So, this is one way to understand this there is one more explanation of the Kleinox saturation heuristic. So, that is an interesting explanation. So, this is to think of it from a server timeline angle and I will tell you what that is. So, this is basically a depiction of the server timeline. What is it showing? Suppose one request one user's request comes in over here. This is a server timeline. Suppose the this is our origin this is our time 0 and this time 0 is when one user's request came into the server. As soon as that one user's request came into the server the server was busy for would be busy for an average of tau seconds. And now we know that suppose this the user gets response in this tau seconds. Now, the user is going to be thinking for some time before the user issues the next request. So, suppose this was user one. The user one issued a request this suppose that user came to a empty queue and the server instantly started working on the user's request took tau t seconds to process the request and now the user is has gotten the response and the user is going to think for think time equal to equal to h. So, we know that the next request from this user is not going to be coming for h more seconds. The question then is how many users can the server process in this time while the other user is busy thinking. So, that is nothing but in this h time how many tau's can we find. So, this is going to be equal to h divided by tau. So, this is an interesting way of thinking that now that means that this request can be issued by some user 2 we can have some user 3 and so on up to some user k and this total should be will be equal to h divided by tau plus this first user. So, in fact, this is what 1 plus h by tau is and this is exactly what Klein Rock saturation number was if you remember from the previous slide m star was nothing but 1 plus h by tau. So, this is an interesting way of understanding why the saturation number why the maximum number of users that the server that a single server can support turns out to be 1 plus h by tau. This is one visual for that. So, just as an example whatever example we had seen earlier we had 5 millisecond service time, 1 second think time and earlier we had done an example with 180 clients. So, the saturation number for this system is actually 1 plus the think time is in milliseconds 1000 and divided by 5 in milliseconds. So, 201 turns out to be the maximum. So, we were the example we had taken was just a little below saturation number and these were the metrics we had found for that example. So, this brings us to a close of the basic theory and simple examples for closed queuing systems and in the next lecture we will look at some more examples and we will go back to our case study of a web server this time we will look at closed load measurements. Thank you.