 Hello, and welcome to the next lecture in the course on Introduction to Computer and Network Performance Analysis using queuing systems. I am Professor Varsha Apte, I am a faculty member in the Department of Computer Science and Engineering IIT Bombay. So, we are going to talk about what are called Jackson open queuing networks today, which are basically queuing networks with branching and feedback. Last time we had studied only tandem queuing networks, today we are going to study these kind of generalized queuing networks. So, what do we have in these general queuing networks? We have multiple queuing stations, as you can see this is S1, S2, S3 these are multiple queuing station and request can go from one station which we sometimes also called node, node and station I will use interchangeably to another with some probability. So, for example, here in fact, there are three branches going on once a request finishes its service at station S1, there are three possibilities. It can finish, let us say that probability was some 0.3, it can go directly to station S3, let us say that probability is 0.5 or with some probability is 0.2, it comes to server or server station S2. So, this is the branching that can happen, furthermore that can be this feedback, which is that after it received service at S1, suppose it went to, suppose it took this branch and went to station S3, it may need one more service at server S1 and it is coming back there. So, that is called feedback. So, what are the applications under which this is possible, we had discussed this in last lecture also, for example, threads may visit CPU or IO multiple times threads or processes. Certainly a process will not visit the CPU only once and IO only once, it may have multiple burst of CPU and IO, this is how interactive processes are and it may need to suppose this is CPU and this is IO, let us say this server is not there, this is just CPU and IO. This is a typical visiting routine that any process does, some CPU, some IO, some CPU, some IO. Then a typical web application, again let us say now we can actually have a third server here. Let us say this is not a CPU or IO, but some web server, then some database server, this could be some authentication server. So, it could be that some requests need authentication, so they go here and then go to the database, it could be that some request do not need authentication, so they can go directly to the database. And then of course, after visiting database, most requests will actually need to come back to the web server to do some processing. So, this is very common and we do need generalized queuing networks. So, again what are the parameters, the number is one, the number of queuing stations, the external arrival rate, this is now called the external arrival rate, why is it external? That is as opposed to internal, so this is also an arrival rate, but there is nothing coming here from outside, so this will be in fact, we have a name for it which is we call it effective arrival rates and we can call it by this notation, small lambda 2 and there will be some arrival here also, this will be lambda 3 and these are not external arrivals, this is external arrival and there is only one external arrival rate that we consider, which is the overall arrival rate into the entire queuing network. Of course, we have our usual tau i's average service time per visit, now we have to use this phrase per visit, because we have multiple visits, so this tau 1, tau 2, tau 3, we will continue to denote the amount of service that a request needs at that service station in one visit, ui as usual is just 1 over tau i, ci is a number of servers at station i, generally speaking in the next in the remaining lecture in this course for queuing networks we are only going to deal with examples with single servers, but actually the general Jackson queuing network theorem does apply to multiple servers. This vi is a very, very important new parameter for open queuing networks, this is the average number of visits to station i before departure, so without this we cannot really analyze or understand the queuing network and its throughput, set utilizations and response times and so on, because once we are talking about some request visiting a server multiple times, we need to know how many times it did, so for example we know that server 1, there is feedback here, server 1 is obviously going to be visited more than once, because every request that comes from outside visits server 1 once and then there is some probability that it goes of course after one visit itself it can go, but there is a very good probability that they may come back, there is a non-zero probability that they may come back and then the average will actually end up being greater than 1. The average number of visits that you do to S1 will end up being greater than 1 and we need to know that number, so this is an input to the queuing network description and actually this di is not a direct parameter, it is a parameter that is dependent on vi and tau i, this is called service demand, this is a very, very new important term we have to learn, service demand, what is service demand? So if us request is now visiting a server multiple times, there has to be a parameter or a number which tells us what is the total work that a particular service station has to do to fulfill any request, so for example if tau 1 was 5 milliseconds, but a request on an average, on an average does 5 visits to server 1, the total amount of work that the server 1 has to do to fulfill one request on an average is 25 milliseconds. So it is kind of intuitive that we need to know this number to be able to figure out what the utilization of this server is going to be, what the maximum throughput of this server can be and so on, so this is an important number. So now that we saw the parameters of open queuing networks, as usual let us look at matrix, so this is the entire table of matrix, system throughput, we have seen this in the tandem queuing network, it has the same meaning, there is an entry, the only difference is now the request will visit the various servers of the network multiple times, the queuing network multiple times, but there will be at some point of time they will be done, so the rate at which they are completed is the system throughput, lambda sis. System response time, again the definition is the same, time from entry, that is where the request will start and then they will exit, suppose this was time t1 and this was time t2, then basically r sis is t2 minus t1, so that is the total time required for request to complete their multiple visits, get their service and leave the system. Throughput of station i continues to be the same again, so for example throughput of server s1 is capital lambda 1, here we have throughput of server s2 as capital lambda 2, throughput of server s3 as capital lambda 3, this will be different from the system throughput, that is what is different between tandem queuing networks and open queuing networks. The system throughput will be determined by request completion rate and the server throughput per node throughput or station throughput will be determined by how many times a request visits that server, so we will see all of this. Then we have the server utilization that has the same meaning, s1 will have some utilization rho, rho1, this will be rho3, rho2. Then we have something called the effective arrival rate at station i, this is again because there are multiple visits, even though there will be some external arrival rate lambda, the arrival rate right at the server, right at the station s1 and s2 and s3 will be different and that we denote by small lambda 1, this is small lambda 2 here and this is small lambda 3 and it is kind of intuitive that this arrival rate here is going to be nothing but overall arrival rate, external arrival rate multiplied by the number of visits here, right. So, we will see these examples, but a quick example is if let us say 10 requests per second are coming to the system and for example, each request visit server s1 twice, obviously we are going to get 20 requests per second here, right, 20 requests per second. So, that is how it works, RI, WI, NI, QI have the usual meanings, response R1 for example will be the whole response time here, W1 will be just the waiting time, then we have correspondingly the queue length and the number of customers at server s1, similarly for s2 and s3. So, let us start calculating them, let us start with system throughput as usual, throughput is always the easiest one to reason about, ok. Again let us first define bottleneck throughput and then of course, the server that is going to be the bottleneck is going to be the bottleneck server, ok. So, bottleneck throughput for queuing networks with branching and feedback has to depend on the service demand, ok. The bottleneck throughput is going to be determined by the service demand not the service time. So, we first define the bottleneck service demand, which is basically this number of visits multiplied by per visit service time, you take the maximum of that and that is d sub b is going to be your bottleneck service demand and that you know that the server which has to do the most work per request across all the visits, it has to do the most work per request that is going to be the bottleneck server, right. That is going to be the one that determines the overall output, maximum output capacity of the queuing network. So, the bottleneck server sb is the server with the maximum service demand which is that db and then bottleneck throughput is nothing but 1 over db, right. So, if a server needs to do some 50 milliseconds of total work for a, so let us say db is equal to 50 milliseconds, then we know that this system cannot produce more than 20 requests per second, right. It just cannot happen. So, that is what the bottleneck throughput is. So, let us further high load asymptote, what happens as lambda starts going to infinity, okay. Of course, the system throughput is going to converge to the bottleneck throughput. The system cannot do more than its slowest server can, okay. The slowest server, the slowest station is going to be determining what the system as a whole can do. So, system throughput converges to bottleneck throughput, but in general what we have is the system throughput is going to be equal to minimum of the external arrival rate and the bottleneck throughput, right. So, this is similar to how we had lambda was minimum of small lambda and c mu, right. We call for single node. This is what we had. Now, the c mu here is c mu was basically the maximum that a single node with c servers could do, here that maximum is determined by this bottleneck service demand and what the reciprocal of that bottleneck service demand is, okay. Let us take an example because nothing like an example to reinforce this point. So, suppose this server has 5 milliseconds tau, this is S2 is 10 milliseconds tau and S3 is 4 milliseconds, okay. Now, we might be tempted to, so clearly mu1 is 200 requests per second, mu2 is 100 requests per second, mu3 is 250 requests per second, okay. But let us find out, so here in terms of it looks like the mu2 is the slowest service rate, right. But remember that this may not be the bottleneck because finally, because server 2 may not get as many visits. If server 1 gets a lot of visits, then even if it is faster than server 2, it may end up being the bottleneck. Even server 3, if it has, it seems to be the fastest, but if it has many, many more visits done by the request, then it will slow down anyway. That is intuitive, right. So it really depends on the visits. So let us take the examples of what the visits are. So here we are taking v1 is equal to 5, v2 is equal to 2, v3 equal to 3. Now we will write the service demands, d1 is 5 multiplied by 5, 25 milliseconds, d2 is 2 multiplied by 10, 20 milliseconds, d3 is 3 multiplied by 4, 12 milliseconds. So now it is clear that even though server 1 seemed like, you know, it has less work per visit than server 2, overall work required to be done by server 1 to fulfill a request is more. 25 is the most, 20 is next, 12 is the least. So actually bottleneck server is going to be s1 and bottleneck throughput is going to be 1000 by 20, which is 40 requests per second. This is what this system can do. System this can do is 40 requests per second, okay. So now just as an example, if external arrival rate is 30 requests per second, this is less than 40. So then system throughput will also be 30 requests per second. If arrival rate is 50 requests per second, system throughput will have to be converged to 40 requests per second, okay. Now let us look at utilizations, right. Throughput is clear. Let us look at utilizations. So each request, remember that it results in average of vi visits to the node i. So effective arrival rate at node i is an important quantity as it is similar to service demand. There are two ways one can account for the additional work that a node has to do because now the request visits are node multiple times. Either you inflate the arrival rate and you say that I have my effective arrival rate is lambda i which is equal to lambda multiplied by vi and then at each visit I do tau i amount of work or you just think of it as an external arrival lambda and then you say that each external arrival, each new arrival needs a service demand amount of work. So in case of service demand, we inflate the work required by the server. In case we are talking about effective arrival rates, we are inflating the rate at which requests come to a server, okay. It both achieve kind of the same purpose. So anyway, so effective arrival rates are denoted by lambda i which is lambda multiplied by vi. Those are this here. So the arrival rate here is going to be lambda v1. The arrival rate here is lambda v3 and the arrival rate here is lambda v2, okay. So for a stable queuing network which is where the external arrival rate is less than the bottleneck throughput only then the system can be stable. The server utilizations are actually trivial. We can actually use the normal arrival rate formula, server utilization formula which is lambda i by tau i and then substitute lambda i with lambda vi multiplied by tau i. And if you do this, then actually you can see that actually we are getting this vi tau i in this formula and we can rewrite it as di also. So as I said, there are two ways to look at it. Here it is the arrival rate that is inflated and then multiplied by the service time or the arrival rate is the external arrival rate and the service required at the station is inflated to the service demand, okay. So this is how utilizations are calculated. And for the open queuing network with branching and feedback, we will actually not do this calculation for high load asymptote. We will do this, all the calculations only for stable queuing network. Let us look at an example as usual, 5, 10, 4 is the per-visit service time, but visits are 5, 2, 3. So actually this inflates to 25, this inflates to D2 is 20 and S3 service inflates to 12, okay. So that is really what the total is required by each request. Now let us look at the bottleneck server. Again this we had done earlier, this is the same example. So S1 is going to be the bottleneck server, bottleneck throughput is 40 requests per second, we are carrying on the same example. For lambda equal to 30 requests per second, which is less than 40, so the network is stable. Rho 1 is we can use this formula, 30 multiplied by 25 divided by 1000 just to make it because 25 is milliseconds and this is request per second. So we get 0.75, Rho 2 is 30, I am taking this lambda and taking D2 which is 20 divided by 1000, this is 0.6. Again 30 multiplied by 12 by 1000, this is 0.36. So these are the utilizations. Now we can look at response times. So here there is a very, very important result called Jackson's theorem that is required. Actually we cannot reason about response time without this result called Jackson's theorem and because of this theorem all of these kind of queuing networks are actually called Jackson queuing networks. So what is it that makes a queuing network, a Jackson queuing network, okay. First is that branching is memoryless. What does that mean? What it means is that the branching is purely by probability. So there is a P1 here like there is a P2 and then this third will have to be 1-P1-P2, okay. What memoryless branching means is that every time a request exits this system, the first server station it is as if it does not remember how many times it has already visited this, okay. Every time it exits it does not remember. It is completely memoryless and these probabilities P1, P2, 1-P1-P2 they do not change. It does not change based on whether you this is your fourth visit to S1 or your tenth visit to S1 or your hundred visit to S1. You will exit with probability P1, you will go to server S3 with probability P2 and you will go to server S2 with probability 1-P1-P2. This does not change. This is very important. The other assumption is service times are exponential, okay and external arrivals have to be poised on with rate lambda. So why these assumptions are needed it is for all the mathematical proofs to work out as usual we are not going to be doing proofs in this class I am just stating these but these are important because sometimes this may not be realistic because especially consider a example of a web server if this was a web server and this is a database server. Depending on the code sometimes the code is very clear that you are visiting you are making 3 database calls or 2 database calls. So it is a fixed number of calls. So once one call is done then with probability 1 so after the first visit to server station S1 with probability 1 you will go to server S3 then after the second visit to web server is done you will go with probability 1 2 server S3 and after the third visit you will not go if exactly 2 database calls are being done by this code. After the third visit P2 is actually 0 and even 1-P1-P2 is 0. So this is not actually something that can be captured by a memory less branching but nonetheless many realistic systems are random and memory less branching may actually capture reality. And service times exponential and external arrivals are poised on with rate lambda. These are assumptions you need to prove this result. What is the result? It says that each node i behaves like an MMCI queuing system with effective arrival rate lambda i equal to lambda vi. So here we have like I said earlier we have an effective arrival rate lambda 2 here, here lambda 3 this is and here also lambda 1. External arrival rate is just what comes into the overall system because of feedback and branching the actual arrival rate to each of these server stations will be different. So this basically for example server S2 is going to be an M1, we have to assume that it is an MM1 system, we have to assume that the service time is exponential and if this is poisson and service time is exponential this behaves like an MM1 queue. Remember that we have to use the word behaves like because this is not really an MM1 queue these arrivals are not poisson. Even then you can use the formulae and the results corresponding to the MMC queues for response times and so on. So that is a very, very useful result and that allows us to easily calculate matrix of open queuing networks. The other kind of corollary of this theorem is that the queuing network average matrix especially response time are same as an equivalent tandem queuing network where the single visit service times are equal to service demands of the original queuing network. So what this result says which is a follow up of Jackson's theorem that this queuing network is actually equivalent in all ways to this tandem queuing network. So there is an external arrival rate then one server with the service demand D1 is going to be equal to V1 tau 1 which is the V1 is of this network then V2 tau 2 where the V2 is of this network and tau 2 tau 1 and tau 2 and here also this is equal to V2 tau 3 V3 tau 3 where the V3 is also of this network. So these two are equivalent and if these are equivalent then the response times can also be written as a sum of the response times through each through this tandem queuing network which is going to be nothing but summation of i summation over i of Di divided by 1 minus rho i. So this is just the response time of MM1 queuing system with arrival rate lambda and service time Di. So this is a very useful result. So let us look at an example as earlier this is again 5, 10, 4 visits are 5, 2, 3, D1 is 25, D2 is 20, D12 is D3 is 12 these are the products 5 multiplied by 5, 2 multiplied by 10 and 3 multiplied by 4. Again we will do all of this only for stable. So this is less than 40 remember the bottleneck throughput here is 40. So this is less than 40 requests per second. The 40 comes from the maximum service demand here and the rate is going to be so this is bottleneck throughput right that is where the 40 comes from. The utilizations we had calculated earlier itself 0.75, 0.6, 0.36. So if arrivals are poisson we can treat it like a tandem queuing network like this where the external arrival rate is 30 and the service time is the same as the service demand here and it is just a single sequential run through this queuing network and the response times will be given by this D1 divided by 1 minus rho 1, D2 divided by 1 minus rho 2, D2 divided by 1 minus rho 3 and this is what you get. So this actually concludes our open queuing network topic and in the next class we will be studying some examples and exercises from not only open queuing networks but other things also other topics that we have covered in now. Thank you.