 Hello and welcome to the next lecture in the course on Introduction to Computer and Network Performance Analysis using queuing systems. I am Professor Varsha Abte and I am faculty member in the Department of Computer Science and Engineering at IIT Bombay. So in the previous lecture we discussed the theory of Little's law and we did some intuitive understanding of Little's law. Today we are going to do some examples and I will also actually start a case study where I will show you results of some experiments that are done with a real server and how we can apply queuing theory of open queuing systems which is what we have been learning so far in understanding those measurement results. So let us continue. So this is again recap our standard table here just given for your reference. As usual we have some asymptotic values for many of these metrics. For the non asymptotes we actually do not have a lot of the metrics for especially finite buffer. When there is finite buffer we may not have a lot of metrics. We just know their upper bounds and so on and especially for the waiting time and number of customers in the system those sort of metrics what we have is a relationship and that relationship is what is Little's law which is what we had learnt in the previous lecture. So again to remind everybody what Little's law says is n is equal to lambda r where n is the number of customers in a region, lambda is the throughput, r is the response time and if you are talking about just the buffer then this relationship will be the number of customers in the buffer is equal to throughput multiplied by the waiting time which is the time in the buffer. So let us move ahead and look at some examples. So this is actually recalling an example from our lecture 7 where we had done lot of just practice examples for utilization law for asymptotes and so on. You can go back and revise that lecture and that is the example 5 I have taken just for continuity. The example is of a single threaded server running on one core CPU, buffer size is 10 and 1 in the server. So total number of requests that can sit at the server is 11. Request processing time is 5 milliseconds, request inter arrival time is 6 milliseconds. So this is a stable system, it is a stable system and we had discussed various asymptotes. The only reason I am showing this here right now is to actually bring home the point that all these asymptotes we had actually written before we studied little slur. These are nothing to do with little slur but you will find it nice that actually if you apply little slur to these numbers they will match. For the lambda going to 0 it is sort of trivially applicable you should check that yourself but let us check it for the high load asymptote. So let us first check n is equal to lambda r. So n here is 11, n here is 11. So we have 11 here and on this side we have throughput is 200 requests per second and response time is 55 milliseconds. So we divide by 1000 and this you can trivially check that is actually equal to 200 multiplied by 55 this is going to be 11,000 divided by 1000 and you get 11. Same thing for the Q is equal to lambda w right on this side we have 10 and then on the other side we have again 200 this time multiplied by 50 since it is milliseconds we will divide it by 1000 of course is going to be this is 10000 divided by 1000 which is equal to 10. So it is again it is interesting because we had not used little slur in actually deriving these things but you can see that these asymptotes actually do follow from little slur also. So let us move on how about the same example we had actually tried to write something at a point that is neither at a low load or at a high load at some kind of a medium load. So this is what we had written that given what the numbers are if this is lambda basically this is around 167 all we can say is that throughput is going to be something less than 167 requests because it is a finite buffer. So we do not know we have not done the background yet to calculate the probability of loss. Let us just suppose that this throughput is given suppose it is measured suppose we are observing this server and we have measured the throughput and we know that the throughput is 150 requests per second. So given this actually we can write CPU utilization we know that CPU utilization should be equal to throughput multiplied by tau we just have one core so there is no divided by C. So this is 150 multiplied by tau is 5 divided by 1000 because 5 is milliseconds and so this is 0.75 this is 0.75. So we can from here we get the utilization now again Q length between Q length and waiting time or number in system or response time we need one we do not know yet how to calculate both we know the relationship between n and r and lambda but we need 2 out of those 3 to calculate the third one. So throughput is given suppose Q length is also given this is Q this is also given. So now we can write if we have been given Q we can write W. So we know that Q is equal to lambda W so W should be equal to Q by lambda and so this should be equal to 8 divided by 150 this will be in seconds and so this will be equal to 8000 by 150 milliseconds and this is 800 by 15 milliseconds. So whatever that is similarly if Q length is given and we also know of course the utilization we can actually write the number in system as follows. So n which is an average remember that these are average so this average is going to be equal to the average in the Q which is the Q plus the average in in the server. So note that this average is not 1 average in the server is not 1 the average in the server given that it is 1 server it is basically the utilization. So why is it that because with probability rho there is 1 in the server and with probability 1 minus rho there is 0 in the server right rho is utilization rho is probability that the server is busy. So this is going to be nothing but rho. So another interpretation of utilization for a single core for a single server Q is the average number of customers in the server getting service average in server getting service. So this is nothing but Q plus rho. So this is actually 8.75 and now that we have n we can get r right again we have n is equal to lambda r. So r is equal to n by lambda so this is going to be equal to 8.75 divided by 150 in seconds and that is 8750 by 150 milliseconds that is 875 by 15 milliseconds. So this is what we can do with Little's law and we are not calculating loss probability that is not something Little's law helps us with. So this is just repeating the calculations I showed and I am just doing the calculations here now. So 8 by 150 is 53.3 milliseconds and 8.75 by 150 is 58.3 milliseconds. This of course response time is actually always equal to waiting time plus service time. So that you can see that we did not actually calculated that way but we ended up with the correct relationship. This is 53.3 this is 58.3 and that is actually waiting time plus service time. So this is the one example of how we can apply Little's law to adjust a straightforward queuing system. Now let us go to some one interesting example. I want to make give you some applications and not just artificial queues to do to think about Little's law. So I am going to read out this example. Consider a database server and a web server. I will draw this as I read it. So suppose there is a database server. I am going to show a server like this like a server kind of a machine. This is the database server and there is a web server. So this is a different machine a web server. They do not have to be different machines but let us assume that they are different machines that calls the database server. So there are threads the web server threads here. There are database server threads here. There is of course a disk and there is CPU here also there is a disk and there is CPU. So we are not really operating at that level of detail. All we know is that there is a database server and a web server that calls the database server. So there is call and there is a response. Suppose for a certain long duration t to t plus t. So t is large, t is large. You have access to the database servers time stamp log. So in this disk let us assume that the database server writes some kind of a log with some time stamps t1, t2, t3 it writes. Of what is it the log of? It is a log of completed requests completed queries. Web server makes a query to the database. Database sends a response and whenever it sends a response it writes the time stamp that I have sent this response back. And suppose you have this from t to t plus capital T. And in that same time the web server also records the response time of all the queries. So let us assume that this time elapsed the response time. This is also recorded by the database server in some kind of a log. So we have some r1, r2, r3 from t to t plus capital T. So these two things we have. Can you estimate how many queries are either queued or being processed at the DB server? So in this duration t to t plus t which is given to be long, we assume it to be long so that we can assume steady state. And we will have to assume stability and we will have to assume various the scheduling policy at the DB server is consistent with little slow assumptions. But for the moment let us not complicate things and let us think of this as a real problem where we have this kind of a multi-tier web server and database server setup. And we are trying to figure out maybe we need to figure out the memory requirement of the queries that are waiting at the database server. We need to have enough of a buffer size there. So we need to know how many queries are ending up at the database server. Maybe there is some bottleneck and that is why we need to figure out how many queries are waiting or being served at the database server. So as a first cut can we try and estimate what is going to be the number of queries here. So if I show the queries at the database server by let us say these are the queries from the web server that are queued here or they are executing. Then how many are there. So what we are doing when the essential question here is that if you think of the database server as the little slur region then we are asking basically for the n average n inside this region. So it is not very difficult. We already know the time through this region, the time that the query enters here and exits here. If we assume that network delay is negligible then the web server is actually recording this time. So we have, we can average this, we can find an average R. Now what do we have here? If the database server is writing a timestamped entry of all the completed requests that means we can count the number c of requests completed in time t and t plus capital T. And if this is c then we know that throughput is nothing but c by t. So we can find the average from the response time log. We can find the average response time from the response time log. We can find the throughput from the database server completed queries log. So that is it. We have lambda, we have lambda, we have R. So n will be lambda R. So yes we can estimate, we have to make some assumptions that there is requests are not disappearing somewhere, that request does not spawn. All those little slow assumption of course implicitly we are assuming. There is also no, since we are recording the elapsed time, the response time at the web server. We are assuming that that time is dominated by the time at the database server and the network delay and all other delays are negligible. But I wanted to give this example to show that in real life when you are trying to apply queuing systems you do have to make some reasonable assumptions. This is okay if this was on the land, these are all on the land on the local area network, the web server and the database server if they are on the local area network. This is a reasonable assumption and this is a reasonable way to estimate how many queries are at the database server. If these two servers are not on the same land, they are separated by a big network, then actually our model and our approach is not correct. We will have to use some advanced reasoning. Let us go to the, this again just summarizes what I already said. You can revise this offline. So, let us go to the next problem. This is an interesting setup. Again I am giving a slightly different scenario for little slaw so that you see something, how little slaw gets applied in very different scenarios. Assume that emails come to my inbox at the rate of 10 per hour. So now I am going to just draw something, let us see like a laptop and this is me and I am reading my email. Assume that emails come to my inbox at the rate of 10 per hour. So, let us assume that my inbox is on my laptop and emails come here at the rate of 10 per hour. An email sits in my mailbox on an average for one hour before I read and immediately delete it. So, emails are basically sort of queuing in a disk on a disk there and what it says is that the time from which an email comes to my laptop and the time by which I read it and I delete it this is one hour. So, now suppose that the average size of email is each email is one email is 200 kilobytes. Assume that my disk space is large enough that there is always space to store an arriving email that means the mailbox never gets full. There is an infinite buffer here in the disk in the laptop and so I do not have a problem of a mail getting dropped or anything. So, basically we are assuming infinite buffer. On an average how much disk space am I using to store my emails? Assume that I do nothing other than reading and deleting email all the time and that I can do so at the rate of more than 10 per hour. This is being given to show stability this is my mu. So, all that it says is that mu is greater than 10. So, this is a stable system we never want to apply a little slough when the system is not stable. Now you might say what is little slough here? What is the queue? What is the server? So, here I am the server and the service I am giving of read and then delete. So, my little slough region is basically like me and this laptop and the deleting of the email is like the exit. Emails come in and exit is basically like a delete if the delete is basically like the exit and what I have been given is that this journey takes one hour. So, this is one hour through this and I have been since it is stable I know that the throughput is 10 per hour. So, r is 1 lambda is 10 per hour and what is being asked is how much disk space am I using? Obviously, what I need to find out here is the average number of emails. What I need to find? Need to find the average number of emails queued or currently being read in my mailbox. If I find this then I can multiply that by 200 kilobytes and I will get the space taken. So, obviously, I got r I got lambda. So, I have n equal to lambda r which is 10 that is I multiply this by 1 and I get 10 and I have now I have to just calculate how much space this takes. So, 10 multiplied by 200 kb will be 2000 kb which is around 2 megabytes. So, again I can use little slur to find the, to answer this question. So, I think in both these examples I hope that you realize how you have to kind of innovatively apply little slur. You have to figure out what the reason is that we are talking about a journey of a request through a system, what is the server, where is the response time and often little slur really works out and is very useful in situations that you might not imagine that little slur applies. So, this one again just that was showing this solution. So, last thing that I wanted to do today was to actually start one case study. The case study is of experimental performance measurement of a web server. So, far in this course we have learnt a lot of theory, but the theory is not for the sake of theory, this theory has a very practical application in it and the practical application is to understand the performance of typically of web servers and multi-tier networked servers or networks or other resources in computing systems like even log file we had discussed in our very first lecture that it can also be a resource. So, this is our world and we want to be able to understand our world of computing through the queuing systems. So, we had a lot of theory, a lot of toy examples, but if we really measure a web server, if we really do measure all these metrics on some server, will they be close to what we learnt, will they match what we theoretically would have predicted? So, that is what this case study is about. So, what is the setup? There is an experimental setup here. We have a server machine on which a web server runs. So, web servers, modern web servers are either they spawn multiple processes or some of them are multi-threaded and so we have a web server running and the web server runs a CPU bound script. So, for the sake of a simple experiment, we have kept the script a little simple and just CPU bound is just a tight loop, tight CPU loop and it does nothing other than just using the CPU for a given amount of time. Now, if we want to study the behavior of this web server under some kind of load, then we need to have load coming to it. Since it is an experiment, we need to generate that load artificially and what you have here is when you can get software which generates artificial web server requests. There are basically threads here that behave like users and these threads basically send requests to the web server. So, the web requests are sent and the web server sends responses and this load generator notes response time. So, for the sake of analysis and throughput is also measured at the load generator and at the server we can use operating system utilities to measure things like utilization, CPU utilization. So, we have requests that we can specify a given arrival rate and they are sent from the client to the web server. For each request, the web server executes a CPU bound script and sends a small response. We do not want to load the network in this experiment. So, we just send a small response. In this first experiment, the execution time of the script is known. It is approximately 50 milliseconds. There is one core in the server and one thread. So, it is basically just a single threaded system. So, it is g1 and actually there is a limited buffer, but we do not know what it that is. Because anything in real life in the server there, there will be a maximum number of processes or maximum number of requests that can queue at the web server. So, there is no such thing as really as an infinite buffer. So, here there is a finite buffer. So, that is what the system is. The execution time of the script is fairly deterministic. So, we could think of this as gd1k. So, other and of course, this is also done on the LAN. So, all the other delays we are ignoring. So, let us see what the throughput of this system is going to be. Now, given that we know the service time of the system and we know that it has one core, we can actually draw this throughput graph. So, suppose we draw it as versus arrival rate of course. So, this is throughput versus arrival rate. We know that 50 milliseconds is a service time. So, mu is going to be 20 requests per second. So, this is very easy to draw. This is 20, this is 20, till this point this is going to be just a straight line. And theoretically, we know that this is going to flatten out at 20 requests per second. This is our theoretical prediction. Let us see what happens in the measured program. So, this is actually measured. So, this is again throughput request per second on the y-axis, arrival rate request per second on the x-axis. You can see that pretty much at 5 it is 5, at 10 it is 10, at 15 it is 15. So, it is fairly, the way we predicted theoretically is exactly what was measured. At 20, it is also just about 20. This orange line is measured and the blue line is the predicted, the calculated is shown here. After 20, there is some divergence. You will rarely see this flattening out in real systems. We will discuss the reasons why later. Today, I am just showing this to you. Similarly, we can do utilization. Utilization is also very easy to predict. Let us draw the graph first, our own graph first. We know this should be, so this is versus arrival rate, rho, this is 0 to 1, let us say. So, we know that at 20 requests per second, we should have 100% utilization and till then it should be linear and this slope is actually going to be tau and this is what it is. So, let us see what the actual measurement was. Yes, as you can see at 20, this is the orange line again shows the measured results and at 20, it is exactly 100% and at 0, there is no point here. There is no, I mean obviously it is going to be 0. There was no separate measurement but all other points also you can see, you can extend this line and you can see that it is going to be just really, really almost like a theoretically correct straight line but all these orange points are measured and slight difference here but otherwise quite matching the theoretical prediction. So, this is just a first glimpse of some results. This is response time, all these values these are not visible here at this scale but all the values at the low load are actually 51 milliseconds. So, remember this is the low load asymptote is supposed to be tau, the service time which was 50 milliseconds. So, this also matches quite well and then there is some increase that happens here and this is something that we do not know yet. So, what about comparing the predicted response time with the at the higher arrival rates? So, far whatever we have learnt in theory, we have only learnt again we have learnt you know n is equal to lambda r, we have learnt the relationships but we have not standalone learnt how to calculate either r or n. So, we still we do not know this, we do not know this. So, that is what we will be doing in the next class. We will be doing some results for mg1qs where we actually can calculate the response time and there is also some properties of memory less arrivals that we will be going over and then we will come back to this case study and see whether we can compare the theoretical response time with the measured response time. Thank you.