 Hello and welcome to the next lecture in the course on Introduction to Computer and Network Performance Analysis using queuing systems. I am Professor Varshaapte, I am a faculty member in the Department of Computer Science and Engineering at IIT Bombay. So far in the course, we have actually done a lot of theory on open queuing systems. We have been looking at numerical examples. In the previous lecture, we actually saw some formulae for response time for MM1 and MD1 queues and we also saw some simple numerical examples and some plots on that. But this course is of practical value, we have to look for practical application of the queuing formulae and of the queuing laws that we are studying. The whole goal of the course is so that people start applying the theory of queuing systems to whenever they measure and do any other ways of analyzing systems, they should be able to apply queuing system laws also. So, in keeping with that, we are going to continue the case study that we had started in the lecture before the last of the experimental performance measurement of a web server and we will continue to do a study of it under an open generation of load because we are studying open queuing systems. So, this recap like just again I give this for reference every time we have our little slaw and we have our asymptotes and some metrics for the non-asymptotes. Again, just recap of the setup, there is a client load generator and there is a web server and here we measure the utilization, here we can measure the response time and throughput, we have one core and one thread and the execution time of the script at the web server was 50 milliseconds. So, this is again recap the throughput and utilization graphs we had seen in the lecture before the previous one and we had seen that the match of the predicted versus measured for throughput and both utilization also is pretty good, the match it only diverges a little bit at high loads otherwise it is pretty good. So, we had left the previous case study lecture we had left at wondering as to what about response time, we had the measured graph for the response time, can we do any prediction for response time and meanwhile then in the previous lecture, we have learned those formulae, right. So, let us go ahead and assume that the web server is an MD1 queuing system, whether the arrivals are really memory less or not is not very clear, you can give the inter arrival time as exp exponential to the load generator, but it may not be able to do that very precisely, but we still assume because one reason is because that is the only formula we have. So, we may want to just ask the question as to if I use this formula then how far from the real measurement is this formula as prediction, right. So, the D part though the deterministic part is not bad, we had referred to that earlier also the execution time of the script is not expected to be a variable, it is a fixed size loop. So, it should take a fixed amount of time. So, MD1 response time we had derived in the previous lecture, it is basically tau plus lambda tau squared divided by 2 1 minus rho, this is the response time. So, this graph basically shows the predicted R using this formula compared with the measured R and you can see how it matches up. So, the lower values are just as the load increases, the measured R has a strange behavior, it is almost as if it goes down a little over here and then it shoots up sharply. Still, as a trend and a visual comparison of actually capturing the nonlinearity of how response time increases, the predicted response time does quite okay and also the point at which the sudden increase happens that is also predicted quite well by the formula. So, considering how beautiful and simple looking the formula is, my claim is that it still does a very good job at predicting something like response time. The match, the strict match may not be that accurate at the higher level, this part for example is not, this part is obviously there is a big gap here, but these are not preferred regions of operation anyway, if I am operating a web server, this is not the point if the capacity of the web server I know is around 20, I will not be wanting to really operate or the web server at like arrival rate 19 per second or even like 18 or something like that. I will be planning my server system such that each web server actually just gets you know load at this, at maximum this levels, so the claim I am making here is that once you are operating in a region that you do not really want to anyway, then all bets are off and response times even in will differ from measurement to measurement, there will be huge differences, it is just a very unstable region of operation and at that point the matches are not typically good okay. The next thing is the number of requests at the server okay, we can calculate if now that we have response time and we have throughput also from our measurement recall here we have throughput right, we have the throughput and we have response time, so we could just calculate the number of requests that we think are at the server okay, but then again the question is can we actually measure it and see how well that calculation match the measurement. So it turns out we can actually measure something at the if we do some clever measurements like we can measure the number of active processes of the web server, those will be the requests that are actually being processed at the web server that is actually for a particular kind of web server which was there in this experiment, it actually does correspond to the number of requests that are currently at the server because the queue is actually 0 and every request just corresponds to a process. So if you measure the number of active processes that is actually basically how many requests are currently at the server. So this is what was done in this particular experiment and the again the blue line shows the calculated and the orange line shows the measured and the low values the match is amazing and at the high level there is some divergence, but again it is you know it is just like from 240 to 270 or something like that it is really not bad if you needed this to just do some sizing of say memory or some other requirement of those many requests at the server, it is a pretty good match, again little slaw is so beautiful look at this little formula, this such a small and elegant looking formula and it matches such complicated measurements. So that is one point I want to emphasize that there are lot of assumptions, it feels like it is so unrealistic, but the match that these formula give is actually pretty good. I have just shown this because this is there is so much overlap here and so I have shown this in a log scale here and in the log scale the differences at the low load also are apparent, but again they are not so great and in fact this looks you know kind of better at the log scale. So again moral of the story take away is that it is a pretty good match between predicted and measured. So now I want to show one more experiment in the same setup largely the same idea there is a web server, there is a load generator, this time the only main difference is that the script is different, we have the script is taking 42 milliseconds instead of 50 and we have 4 CPUs here, if I show a CPU like this, we have 4 CPUs and we have lots of threads, lot of threads so the threads are not a bottleneck, the main queuing system that we will be worried about is the cores and as usual we ignore the other delays, everything else remains the same. So this was an actual experiment again done by a pair of students as a part of a class project and I am just showing some of those results. Let us look at throughput first and let us record it is a 42 millisecond script running on a 4 core machine on the server. So initially up to a certain point here, you can see here there is a fairly clear x equal to y line, this is 40 and this is 40, this is 60, this is 60, there is a very clear this is 20, this 20 is almost very theoretical x equal to y line that we expect for throughput and then at one point here which is around 70 requests per second, there is some, so if you just look at a sort of average, approximately 70 requests per second is where the throughput seems to be flattening out. Now that means our C mu here is 70, since C is 4, mu is basically 17.5 requests per second, if you take the inverse of that, our tau estimate from this graph turns out to be 57 milliseconds. I hope you are realizing what I am doing, I am actually going backwards from the throughput graph. The maximum throughput achieved is 70 requests per second, theory tells us that that should be C mu, so our C mu is 70. So if I back calculate from there, my tau is turning out to be 57 milliseconds. So what is happening here is if I calculate from the throughput, if I try to estimate my tau, it is not matching this 42 milliseconds that well, I got 57, but I got 42 here. So just note this issue. Now let us look at utilization, CPU utilization, again we have a fairly linear kind of curve here and it flattens out at less than 100%, that is not according to theory, it seems to flatten out at around 92%. This is not according to theory at our, when our through arrival rate is maximum, where the CPU is supposed to go to 100%. Also the flattening out is happening at around 80 requests per second. So this is also weird, what we expect is it happens at C mu and C mu in the from the previous thing looked like it was 70. And in fact, so if we, but if we consider this as C mu, which is the C mu at which maximum utilization happens, if that we consider as 80 requests per second, then we get a tau of 50 milliseconds. If we go from say 80 is a C mu divided by 4, 20 requests per second is mu and then inverse of that will be tau, which is 50 milliseconds. So this is yet another estimate of service time that we have. Now let us look at some values here, what about values here? This also is measured utilization, this also should give us an estimate of tau. So we are basically doing rho equal to lambda tau by C, tau is C rho by tau. So if you plug that in here, you get a 47 approximately 46.3 millisecond estimate for this value and you get a 48.28 millisecond estimate if you take this value. So if you take two different values of rho and try to estimate tau, you get this. So this is close enough, this is not bad and average is 47 milliseconds. Now this is closer to the running time of the script of 42 milliseconds. Note that 42 milliseconds if this is the tau, then C mu is actually close to 80 requests per second. No, actually it is greater than 80, much greater than 80 requests per second. With 50 milliseconds, we have a 20 requests per second capacity of 1 core and 80 for 4 core. So with 42 milliseconds, we should be having much more. Now let us look at response time. What we are getting here is the low load estimate of response time here is actually around 50 milliseconds, which if we use that as our tau, then our mu is 80 requests per second. C mu is 80 requests per second, mu will be 20 requests per second and C mu will be 80 requests per second. But what we are seeing here is that the response time actually rises sharply at 70, not 80. So it looks like 70 should be close to the C mu and if you use that, again we are back to the 57 milliseconds tau. So this is a kind of a confusing experiment. What is happening here? To recap, the throughput here, the throughput curve showed a C mu of 70 milliseconds, 70 requests per second that is what this is saying. So that corresponded to a tau of 57 milliseconds. The utilization on the other hand, if you look at this part, it is showing that the capacity after which there is a flattening out is looks like around 80, but the script running time is 42. So at 80, this corresponds to basically 50 milliseconds tau, but this actual script running time which the people who did this experiment have measured was 42 milliseconds. So what is going on here? What is going on is actually nothing, there is no major explanation and in this class actually is not where I am going to provide a very detailed explanation. This our focus here is to just study and see whether our queuing theory laws are matching our predictions or not and the outcome of this study is simply to say that there is something wrong with this experiment and there is something we are missing. And if we were really trying to find out that what is happening, then we need to understand the system better and maybe model the system better. So the summary remarks from this case study is that in some setups, the metric values predicted by queuing theory, theoretic formulae will match exactly. That is the experiment one that we saw that every chart that we saw that you know there was a perfect match and it matched with the service time and matched with what we would have theoretically predicted. But in some setups, the predicted versus measured numbers may not match exactly. The amount of this match between 57 and 50 and 40 might be fine also, but still there is some mystery as to why the throughput is actually flattening out at 70 requests per second. If the tau is actually 40 milliseconds, then mu is 25 requests per second and C mu is 100 requests per second. So with 42 millisecond, our C mu should be close to 100, maybe it will be 90 or something. It should not be 70. So what this is doing, what theory is helping us do all these calculations and at least part that there is something wrong in this experiment that one chart is telling me one thing, one chart is telling me another thing, third chart is telling me another thing. So I should look for what the problem is and the leads to how to look for it is sometimes there is really just measurement error. Maybe the throughput is being measured wrongly, maybe it is actually all consistent, but there is something going wrong in our calculations or our experimental setup in the way we measure it. The other thing could be that genuinely the model is not capturing the complexities of the system. For example, there is some other bottleneck, Cpu is not the bottleneck even though we had written a Cpu bound script. So that Cpu is what will determine the capacity of the system, but it may not be the bottleneck. Now what do I mean by bottleneck? Bottle neck is a word that is used in systems when there are multiple resources. So there is one resource that could be the Cpu, there is another resource that could be the IO and a request that comes into the system may be using this resource and may be using that resource. We have so far looked at a request that is from the point of view of only one resource, but requests may need multiple resources and maybe the throughput is being limited not by Cpu, but by IO. So there could be something else that is going on here that we need to understand about the system and model a little better and in fact that leads us to the next topic for this class which is going to be open queuing networks. So, thank you.