 Hello, and welcome to the next lecture in the course on Introduction to Computer and Network Performance Analysis using queuing systems. My name is Professor Varsha Apte and I am a faculty member in the department of Computer Science and Engineering IIT Bombay. So, we are going to talk about open queuing networks today, let me start introducing that. So, so far we have looked at basically a queuing system where there is one kind of server or one set of servers that are that represent something like one node you can say, just one resource type or one node, but what if there is a request that needs services from various as they call as a called stations. For example, if you are looking at packets going through a multiple links on a network path, right. So, suppose this is a network, let me just draw let us say these are some routers and there might be some connections here and then maybe it is all coming together to this one router, we could have packets that are coming to this router and that then they have to go to of course, they will go on this link and then go to this router and then go to this router and then go to this router and only then maybe there is a destination here, right. So, it they will go through multiple routers and links before they go to the destination. Similarly, we can have a server center server side of a let us say web application. So, a web application, all web applications today are not supported by just one node, right. That is of course, we have the web server, but we also have let us say the database server, right. And any request that comes to the web server will very likely have to go to the database server also before it is done, right and only then maybe it will be done and then it can go back to the client, okay. So, this is another example, then if you look inside a actually just inside a server machine, okay. If you look inside a machine as such, you have the CPU which you know I just like to show like that, let us say this is a CPU and you can have a disk and a thread could be doing some CPU work, it will need CPU sometimes and then it might need some IO and only then it will be done, okay. So, so far we have been completely focused on a request that just needs one thing and then seems like it is done, but it may actually after it is done with one service, it may need another service and that is what we need the concept of queuing networks for, okay. So, this is now from queues, we go to queuing network where after this exit, we can add another queue to which a request goes, okay. So, now it is going to another queue, then it can leave from here and go to another queue, okay. So, this is what the queuing network is and as a first very simple introduction to queuing networks, we will just look at something called a tandem queuing network. What is a tandem queuing network? It is when requests visit multiple nodes in sequence and then exits. So, it just goes to this, goes to this, goes to this and then exits. One is opposed to what? It is as opposed to where there can be some branching, for example, that it goes here with let us say some probability 0.5 and to some other queuing system with probability 0.5, so some other server and then maybe exits. So, this is called branching and another thing that can happen in all the examples in 2 out of the 3 examples that I gave you in the previous page, actually there can be something called feedback. So, suppose after server 2 with some probability, the request needs another service. So, suppose this is probability 0.1 and this is probability 0.9 with some probability, the request needs another service at server 1. This can happen when we have again considered the web server, the web server and database server example. In real scripts and dynamic web applications that run on the server, it is rarely that the request just visits the web server and then the database server and then exits, it is actually almost never like this. There is always once the request visits the web server and the database server, it will actually need another service at the web server and then it may need to go to the database server again and then it needs another service at the web server. So, this may happen many times. Similarly, in the example that I gave you about CPUs and CPU and IO, it is almost never the case that there will be a process that just goes to the CPU once and goes to IO, needs IO once and then exits the system. It is actually going to need many visits to the CPU and alternate like it will have a CPU burst, then an IO burst, then a CPU burst, then an IO burst. This is something you would have learned in operating systems. So, different types of resources are often needed by requests multiple times to fulfill their service. But as a simple example in the beginning, we are going to not consider any of these, we are not going to consider any of these possibilities and just imagine a queuing network which is completely in sequence and request never comes back to any server and also there is no splitting and there is no branching. So, this is just to keep things simple in the beginning to reason through queuing networks. So, this kind of a queuing network is all called a tandem queuing network. Actually, a packet network path is a good example of a tandem queuing network. A packet is unless it is going in a loop, it is rarely going to come back to a router. It really should never come back to a router. That means something is wrong. Also, we can imagine that maybe between two source and a destination there is exactly one path. So, there is no branching that is happening. So, packet network path is a good example for a tandem queuing network. So, what are the kind of parameters and metrics that we can define for such a network? We can have n servers in the tandem queue. Here, we are seeing an example with 3 servers. Of course, we have the tau i. Now, all of these the tau and other metrics that we had seen for a single node is going to get a subscript i which is basically showing that it corresponds to the server i. So, we show the servers as S1, S2, S3, tau n is a service time at server 1, tau 2 is a service time at server 2, tau 3 is at server 3. So, we have that and what we have is one external arrival rate. We call this an external arrival rate because this is sort of external to this whole system. In a way, this arrival here is also an arrival to this queue. There is going to be an arrival here. There is going to be that is an arrival to server S3. These are all going to be internal arrivals because they are coming from server S1 to S2 and coming from S2 to S3. So, these are internal arrivals but there is only one external arrival rate and that is what is the actual work coming to this system. So, that is the parameter. So, we have the lambda and we have the tau is and we are going to consider only infinite buffer for the queuing networks. So, we are not going to consider KA and so on. We can have multiple servers but again for this example, right now we are going to just consider one server. So, we are considering C1 equal to 1, C2 equal to 1 and C3 equal to 1. Then we have a new metric called the system throughput. Again, we need to distinguish individual throughputs. So, lambda i is going to be the throughput of the station i, right because here also there is an exit rate. This is going to be the lambda 1. This is lambda 1. This is the exit rate and this is lambda 2 and lambda 3 in fact will be equal to lambda 6. So, the exit rate out of the system is the system throughput and the exit out of each station is the node throughput or the station throughput. Then we have the system response time which is this whole time. Time from entering this system, the queuing network to completely exiting it, from here to here. But we will of course have a particular response time at each station or each node. So, the response time here there will be R1, this will be R2, this will be R3. We have server utilization makes sense only for each node. There is no such thing as a whole system utilization. So, we have utilization of the servers at this station, the row 1, at this station row 2 and at this station row 3. We will have number of customers. Again, number of customers in the whole system is not a very interesting metric. We will have it at N1, N1 will be for this node, N2 for this node, N3 for this node and so on same for the number of customers in the queue and similarly for waiting time. A more interesting metric that we ask for queuing network is what is called the bottleneck throughput. Now, what is a bottleneck throughput? Bottleneck always refers to the resource in any kind of connected system that is going to be the resource that slows down the capacity of the system or that determines the overall capacity of the system. If you have for example, a road network, there can be some a road that is wider than one road and then it all feeds into the small narrow road and then there is a big road. If this is the kind of a road system that you have in actual road traffic, then this it looks like a neck of a bottle. So, this is called that is why it is called a bottleneck and this road is the one that will actually determine what is the throughput of the cars in this road network. So, that is called bottleneck throughput. Remember that this was irrelevant when you are talking about a single queue because that single just if you are only talking about one station, then the capacity of that station is going to determine the maximum throughput of that system. But now when you have multiple then you have to worry about is S1 going to be the station which is actually the slowest one is S2 going to be the station or node that is going to be the slowest one is S3 going to be the one that limits my capacity. So, that is the which is the bottleneck server and what is its bottleneck throughput capacity. So, this is the maximum capacity of requests per second that this queuing network can support. So, let us start calculating this. So, let us start with system throughput, asymptotes and non-asymptotes. Let us take an example. Suppose this tau 1 is 5 milliseconds, this is 20 milliseconds and this is 10 milliseconds. So, the I will write the mu 1 as usual mu 1 mu denotes the service rate. So, the mu 1 for the station 1 is 1000 by 5 that is 200 requests per second, mu 2 is 50 requests per second and mu 3 is 100 requests per second. And so, the system throughput basically if we have the lambda if the lambda is less than all of these mu 1 mu 2 and mu 3 then it is clear that system throughput which we denote as lambda 6 is going to be lambda. For example, if lambda is 30 requests per second, if 30 requests per second is coming here, since this allows 200 requests per second, 30 will go here. Since this capacity is 50 requests per second, its output will be 30 because those the laws of throughput apply individually to each of these queuing systems. If the capacity of this queuing system is 50 requests per second and 30 arrival rate is coming here, then it can do 30 and then all the way it will be 30. So, if the arrival rate is less than the service rates of each of the stations, then we know that the system throughput should be equal to the arrival rate. And what is the asymptotic value? So, what if lambda goes to infinity? For high load when lambda goes to infinity, clearly this system we remember that for one node we have as lambda goes to infinity that the for a single server node lambda goes to capital lambda that is the throughput goes to mu for single node. What is going to happen here is that the output the maximum throughput that this system can produce will be determined by the slowest node. So, imagine if lambda here in this example is 500 requests per second. So, then this one will the first one will be able to do 200. So, let us say take the example of 500, then the first server will be able to do 200 because that is what its capacity is. Now, if 200 is coming to server S2, its capacity is 50. So, it will only be able to do 50. But now 50 is less than the mu 3 is equal to 100. So, this will be actually produce 50. So, essentially the value that the system throughput will get is the minimum of all service rates. So, that should be obvious. So, as lambda tends to infinity, lambda 6 tends to the minimum of over i of all the mu i's which is minimum you can say the which is equal to the minimum of i of over 1 over tau i. Basically this is also how the bottleneck throughput is defined the bottleneck throughput is actually going to be the minimum of mu i's and lambda says as lambda tends to infinity basically tends to the bottleneck throughput. So, this is about the throughputs. Now, we can look at the server utilizations again let us look at the non-assimptors and the asymptotes we can take the same examples this is 5 milliseconds, this is 20 milliseconds, this is 10 milliseconds which means mu 1 is 200 request per second, mu 2 is 50 request per second and mu 3 is 100 request per second. So, what is the utilization going to look like? Again let us take the case of lambda being less than all of the mu 1's. This also means that the whole network is stable, stable queuing network Qn for queuing network. So, if it is the whole network is stable then and the rival rates rate is less than the service rate of each of the queuing network then we know that actually the each node throughput is actually going to be that lambda. It is going to be the rival rate because lambda is less than each of the service rates in which case the utilization is trivial rho 1 is going to be same as normal is going to be lambda tau 1, rho 2 will be lambda tau 2 and rho 3 is equal to lambda tau 3. Now, we should take the case, the case has become very interesting when this is not satisfied, what if lambda is greater than either one of mu 1, mu 2 or mu 3. So, let us best to take an example here. First let us take an example of lambda equal to let us take it as 150 requests per second. So, if it is 150 requests per second, capital lambda 1 is going to be equal to 150 requests per second. So, output here if this is 150, this is also going to be 150 because 150 is less than 200. Now, 150 is greater than 50. So, we have rho 2, the server 2 will actually be a the station 2 will actually be an unstable station and rho 2 will be equal to 1. Rho 1 here is of course, equal to 150 by 200 that is basically 0.75. What will capital lambda 2 be? This is going at full capacity. So, capital lambda 2 is actually going to be 50 requests per second. Since capital lambda 2 thus that is the arrival rate here is now going to be 50. So, then the station S 3 is actually stable. So, rho 3 will be 50 by 100 and that is equal to 0.5. So, utilizations really depend on what the arrival rate to that particular station is going to be. Although in this case finally, what is lambda 6 is going to be? Lambda 6 was determined again by whatever the minimum here was. So, lambda 6 is actually 50 requests per second. So, the output final throughput was 50 requests per second, but server 1 was still utilized at 75 percent. So, remember that if 50 requests per second was the arrival rate itself, then rho 1 would have been rho 1 would have been actually 50 by 200, but that is not the case here. Rho 1 will actually be 0.75. Let us take another example. I will again, so this is 200 requests per second, this is 50 requests per second and this is 100 requests per second and now let us take lambda is equal to let us take it as 90. So, now 90 is less than 200. So, rho 1 is going to be 90 by 200, but now 90 will come here. Again rho 2 will be 1 and the exit out of here will again be 50 and rho 3 will be again 50 by 100 equal to 0.5. You can see that this because there is a slower server before this server, actually the utilization of station 3 will just never reach 1. No matter what happens when lambda tends to infinity also, the output of this server will go to 50 and therefore, this server asymptote of this server will always be 0.5. So, this is a very interesting thing and we need to have a general rule to write this. So, if you are talking about a station i, if you are talking about the lambda tending to infinity for example, the high load asymptote of server utilization, the arrival rate to station i will end up being the minimum of all of the maximum rates of the previous station. So, it will be the minimum of mu 1, mu 2, mu 3 to mu i minus 1. As lambda tends to infinity, the minimum of the capacities of the previous stations will end up being the arrival rate to the station i. Just like here, as lambda tended to infinity, the mu 1 here was 200 and mu 2 was 50. So, at station s3, it would basically get not never more than 50. However, at station s2, it could get 200. So, this is something now we are going to try and generalize this later. But let us look at one more metric here, which is the response time and queue length. So, the response time is actually as we saw in the previous lectures. First of all, we will only define it for stable queuing network and we have to assume Poisson arrivals. Remember that the throughputs and utilization did not require us to assume Poisson arrivals and we will also assume exponentially distributed service time. Otherwise, some of the rules here they are difficult to remember that for example, if it is an general mg 1 queue, we do not know that. So, if this is mg 1, then this will not be Poisson. The arrivals here will not be Poisson. Only if this is mm1, the arrivals here will be Poisson. So, we will assume that the service time is also exponential so that we can write a formula. And if that is so, remember that formula was given by so R1 will be equal to tau 1 divided by 1 minus rho 1. And similarly R2 will be tau 2 divided by 1 minus rho 2 and R3 will be tau 3 divided by 1 minus rho 3 and R6 will be just the sum of all this. And once we have these, then of course, n1 will be equal to lambda R1 and n2 will be equal to lambda R2 and n3 will be equal to lambda R3. And we can do the same thing with waiting times and queue lengths. So, let us summarize this, all the matrix, this is the matrix we were after. So, for a stable network, which means that the lambda multiplied by the maximum of the tau i's will be less than 1, that means the arrival rate is less than the slowest service stations rate in the network. For a stable network, the system throughput is nothing but the arrival rate. And the bottleneck, this is actually just defined for the queuing system, nothing to do with the arrival rate as such. The bottleneck server is 1 over max of over i of tau i, this is just the slowest server. Utilization will simply be lambda tau i, because it is a stable system. And if we can assume exponential service time and single server, then no response times are given by this. And system response time is just the sum of all that and then we can get the rest by the little's law. Assim totes are a little more interesting. As I said earlier as lambda tends to infinity, throughput will of course be whatever the minimum of all these service rates is. Server, as far as server utilization, we know that the first server as lambda goes to infinity, the utilization of the first server has to go to 1, because eventually lambda will become greater than mu 1, because lambda is just going to infinity. But for the downstream servers, the next servers, their utilizations are determined by what comes out of the previous server. So this will be the, of course, we always have to have that it is minimum of 1.0, 1.0. And the minimum of all these service rates, these become the effective arrival rate multiplied by tau i. So, we have already done a few examples, but I will just show a few couple of more examples. Lambda equal to 30, we in fact did this. So this is just reinforcing what we did, throughput. This is service rates. Now we have a little different example. This is first tau 1 is 10 milliseconds, second tau 2 is 20, third tau 3 is 5. And so service rates are 150 and 200 respectively, just a little bit of change so that we can see one more example. So throughput of course at lambda equal to 30, it is less than all these 3. So throughput is 30. And then server utilizations will be rho 1 is equal to 30 multiplied by 10 by 1000, rho 2 will be 30 multiplied by 20 by 1000, rho 3 is equal to 30 multiplied by 5 by 1000. And response times are actually for each one of them, it is just basically, I will just show for R1, it is going to be 10 divided by 1 minus 0.3 and similarly for R2 and R3. So now let us see all these metrics for lambda equal to 70. So remember again it is 100 requests per second for S1, 50 requests per second for S2 and 200 requests per second for S3, these are the service rates. So compare 70 with 100, 70 with 50. So when you if you compare 70 with 100, 70 is less. So throughput of this server is actually going to be 70, but 70 is coming to the server S2 whose service rate is 50 requests per second. So its output is only going to be 50 and now 50 requests per second will come to S3. So S3 is going to be able to do 50 requests per second. So this output here is going to be 50 and that is why of course we have the overall throughput as 50, but each server's utilization depends on what its own service rate is. So rho 1 is going to be of course 70 by 100 which is 0.7, rho 2 is going to be 1 because 70 is more than 50. So the server fully utilized rho 3 is going to be 50 by 200 which is basically 0.25. So those are the server utilizations. Now what about the response time? Of course system response time we know is going to be infinity because S2 is now an unstable server, S2 is unstable. So system response time is going to be infinity, but we know that R1 utilization is less than 1. So actually server 1 will have a finite response time. So R1 is going to be tau which is 10 divided by 1 minus 0.7, this is 33.3 milliseconds. R2 is of course infinity, it is an unstable server, R3 is server 3 is stable. So it is 5 divided by 1 minus rho 3 which is 0.25, this is actually around 6.66 milliseconds. So now we have seen a small like lambda which is 30 which is less than the entire tandem queuing networks capacity, 70 which is more than some server's capacity less than some other. Now just let us take lambda to infinity. Of course the bottleneck throughput remains 50 requests per second because that is what the slowest server here is. So throughput remains 50 requests per second. Now let us go one by one again as lambda goes to infinity. Of course rho 1 will go to 1 because as it will exceed 100 eventually. Since so when it exceeds, when the lambda here exceeds 100, the output here will become 100, but 100 is greater than 50. So of course server 2 will also utilization will be 1. But when arrival rate here is 100 for server 2, output rate will go to 40, throughput will go to 50. So since 50 is less than 200, we will have rho 3 again here as 50 by 200 which is 0.25. Again there are two servers now that are unstable, both S2 and S1 are both unstable, these are both unstable. So R6 is infinity, R1 is infinity, R2 is infinity, R3 is the same as before because S2 again sends the same arrival rate to S3 as when lambda was 70. So this is 5 divided by 1 minus 0.25 which is 6.66 milliseconds. So this last slide actually just summarizes everything that we just derived, you can look at it later. So in this lecture what we did was we saw only a kind of structure where requests go from one server to the other server, next server and then to the next server and then they leave the system. But sometimes you can have requests that have to return after this service for example after server 2, they have to return to server 1. So in the next lecture we will look at generalized open queuing networks. Thank you.