 and welcome to the next lecture in the course on Introduction to Computer and Network Performance Analysis using queuing systems. I am Professor Varshaapte, I am a faculty member in the department of Computer Science and Engineering at IIT Bombay and today we will be talking about introduction to queuing systems. So, quick recall of how we ended the last lecture, we were talking about the various ways that performance analysis of computing systems can be done or networks. One was measuring, then we talked about simulation and we said that analytical methods which is basically mathematical methods with pen and paper and just reasoning are the ones that are the focus of this that give most insights, require a little bit of expertise, the kind that you will learn in this course, but it will give you quick answers. And as opposed to that measuring is of course very flexible, you can create any scenario in it, but it is costly, it can also be more realistic, whereas mathematical models are, can be need some assumptions that sometimes are not realistic. Again continuing the recall that we went through this table which had specific metrics and parameters that were specific to these resource resources, CPU, cellular channels, web server threads, wireless mediums. These were specific to these resources, but we found that in a fundamental level they were all very similar, this is what we went through in the last lecture. And the main insight was that everything seems similar like there is a jobs per second here and calls carried per second, then is not a generalized model possible. And that is what brings us to a our next focus which is queuing systems which is precisely that it is a universal mathematical model that represents any resource and user pair that we, the kind that we discussed in the last couple of lectures in a generic way. We will use general terminology, but it can represent everything that we showed in the previous table. So, let us start with how does one describe a queuing system, what does it consist of? I will go through some standard parameters and description and first thing I am going to talk about something called open queuing system. And what is that? That is the following, this is how we generalize the model. First of all, let us assume just one server, one resource. So, let us say like one CPU or one printer, something like that. We show that by a circle like this, this is the resource. Then we show a queue for it in this way. And in the terminology of queuing systems, we call this a server. We can call this maybe queue or buffer, sometimes it is called waiting room. So, this is a server, this is the queue and then we show an arrow like this to show arrivals. And the most general word that we use for these things that come here, we call these customers. So, this is customer, this is server, this is the queue and that is what basically describes queuing system. We will go into more details. If there are more servers, so let us say there are four CPUs or two network links, let us say two network links, how do we show them? We show them by having two circles. So, we can have something like this, that is it. We also show departures like this, there is an arrow. Of course, whatever comes in gets a service and leaves. So, these are departures. You can show this here also. And if there are many, suppose it was not just two, but there are many servers, it is hard to actually draw them on this. Suppose there are, it is like a 64 core CPU. How are you going to show it? You just do a dotted line here. We can write like 1 here and 64 here, something like that we can do. So, this is basically the standard way something called an open queuing system is shown. Now, what do we mean open? We mean that there is this server system and there are these external arrivals to it. That means customers come to the system from somewhere and they can also depart from it. So, there is departing outside of the system. So, if we think of it that this is actually, this is the server system, then requests are coming from it to it from outside and also leaving the system that is why it is considered open. So, we will also of course, learn what closed means soon, but this is open. So, now, what are the ways, what are the things that we cannot just draw a picture and be happy? There were so many things we discussed that I will describe the resources, the parameters and the metrics. So, right now since we are talking about the parameters, let us go ahead with that. So, for the server side, we will first talk about the number. We can have the number. How many resources do we have? How many servers do we have in the queuing system terminology? Then we talk about what is the service time. Now, this is a very specific terminology. It means how long time for which the resource is exclusively held by the customer. So, this is very you have to really remember this terminology service time. It includes only the time that the customer is actually using the server. So, if this is a web server, let us say these are threads of a web server, then this is the amount of time that an HTTP request is actually holding that thread. If it is a CPU, then it is the amount of time that a thread is actually executing in the CPU. So, there is a service time and service time is described how we have an average for it. We can also have the probability distribution for it. We can have the probability distribution for it. And once you have the probability distribution, actually that is the most detailed description you can get of a service time. So, that is about the servers. Then we have, what about the customer arrivals? It is very similar. What we have here is we have, we know that many customers are going to come here. And the customer arrivals are described by inter-arrival time distribution. Again by distribution, I mean probability distribution. So, this is described by an inter-arrival time distribution. And that is it. That can also, once you have the distribution, then you can have the average inter-arrival time. And the reciprocals of these become rates. So, then you can actually also describe this by an arrival rate. And you can also describe the service speed by a service rate. Just 1 divided by the average inter-arrival time is the arrival rate and 1 divided by the average service time is the service rate. Then a very important parameter is also what is the size of this buffer. So, size of the buffer is an important thing. So, size, sometimes if it is very large, we actually assume that it is infinite. Now, infinite there is not really any such a thing in reality as an infinite buffer. It is just that if it is so large that it is okay to assume that it is infinite for the reasoning part, then we just consider it as infinite. So, we have the number of servers, the service rate, the queue size, the arrival rate. And the last but not the least is you cannot analyze a queuing system unless you know what is the, I will write it in another color, scheduling policy. That means if on arrival, this server is busy or all these servers are busy, then whenever it becomes idle, one of the server finishes its processing, then which of these is it going to pick? That is basically what a scheduling policy is. We know some standard scheduling policies like first come first serve. There is actually also last come first serve for some reasons sometimes last come first serve is used. There is priority, there is in your operating systems, you would have learned round robin. So, all of all those are scheduling policies. So, this next picture is just showing everything that we just talked about in a clean way. Just to repeat everything, we have the service time which has the average and the distribution. Again recall the time for which one customer holds one server. One divided by average service time is the service rate. We have the overall service rate. Of course, if we have multiple servers, then the overall service rate of the system becomes the number of servers multiplied by the service rate of one server. So, if one server is going at a certain speed, more servers can go at a faster speed overall. Then we have the waiting room, we have the customer inter arrival time, we have the average the distribution and arrival rate is 1 divided by this average inter arrival time and then we have the queuing discipline. I will just maybe give some examples here. Let us say service time is, this average here is 10 milliseconds. This typically the time we can say a request, a web server request might take for doing some computation or some other work. So, let us say service time is 10 milliseconds. Then the service rate will be 1 by 10, which this will be requests per millisecond. So, that is actually 100 requests per second. So, this becomes 100 requests per second. That can be the service rate. If there were two servers, then the overall service rate is 200 requests per second. Yeah, servers can be two, waiting room size we can say, let us say maybe 200 requests can sit in the buffer. If it is such that you know really it is 20000 or something, then we could just let us say it was 20000, we can just call it infinite. Maybe it is big enough that we can call it infinite. So, this is actually a modeller's decision. The person who is reasoning about the queuing system, it is their decision as to if it is very large, then shall I just call it as infinite. And customer inter arrival time, let us say it could be 1 every 20 milliseconds. So, in that case, actually the arrival rate will be, you can do the calculation, it will actually become 100 requests, sorry 50 requests per second. It will be 50 requests per second. So, these are the kind of numbers you can have. I am not yet talking about the distributions, we will talk about that in a moment. Similarly, there is another formulation of queuing systems called closed. In the closed queuing system, we are not only worried about the server system, but we want to include the sources of the requests, the source of the request. Where are the requests coming from? Where are the customers coming from? We want to include that in our modeling and then it becomes closed because there are no external arrivals, they are actually coming from the users or because the wherever the customers are and there are no departures outside the system. So, the way we draw these typically are again I will just show let us say a single server here. The server part is the same. There are arrivals, there are departures, but these arrivals are now coming from another place which we call the client station. We can also call this the server station and these are actually coming from here and after the service they are actually going back here. What does this represent? You can think that this represents, let us say there are students in a lab on a fixed number of PCs. So, fixed number of students in a lab on a fixed number of PCs interacting with a institute server, some server. So, what are these students doing typically? They will let us say they issue, they click on a, let us say they are interacting with a web server, a website of the institute. They will click on the link and then the request goes to the server. This is a web server. It may queue for the thread. It will get processed and then the response will come back to the user and then the user does what we call think. So, this represents the response. So, this is the page, the web page which has gone back to the user. Now, when the student is looking at the web page, she will spend some time actually reading that web page. Something will be there on that web page. She will look at it and then she will click on the next link and then again she will click on the next link and then again the request for that page will go to the server and then the page will be sent back. Again, she will read the web page again. So, this is a request response loop. So, in a closed system, the users are in a request response and then you read which in our terminology we call think and then you go back and we issue the request again. So, users are in a request response loop and there is nothing coming into this system from outside and nothing going out. So, here we draw, we say that this is one system. Our entire system is this and it is closed, nothing goes in, nothing goes out. A request is either here meaning it is in the user's head or it is here or it is actually executing but it does not go out or nothing comes in. That is why these systems are called closed. So, these are the two types of queuing systems. Again, this just shows the same picture cleanly. The earlier parameters for the server system are again applicable for the closed system. Everything has just repeat here. But how are the clients described? We need some quantitative way to describe the clients. Again, you have the number of clients. This is fixed. And then the think time again has an average and it has a distribution. Whenever there is a time in queuing systems, we will have an average for it and we will have a probability distribution. So, again this is just bringing everything together. The descriptors of a queuing system, we have the number of servers, waiting room and its size, the service time distribution, the inter arrival time distribution, the number of users if it is closed. This is if closed. This is not required if it is open. And of course, we cannot ignore the queuing discipline. These are the descriptors and these descriptors actually have some standard parameter notations. The number of servers we will typically denote by C and of course these values this can take is 1, 2, 3 and so on. Size of the buffer we show by capital K. Size of the buffer note it can be 0. Sometimes there are queuing systems in which actually there is no queue. Remember we had the example of this was cellular network. When we place a cellular call and all, if at all all the channels are busy, we just get a network busy signal. We are not put in any queue. So, we need, so K size of buffer can go from 0 to remember that we can also assume infinity if there are too many, if this buffer size is very large. Then service time and inter arrival time distributions, there are various standard distributions that people assume. There is of course constant. If the service time is just fixed for every request, then you can assume constant, the formal word for it is deterministic. Then there are distributions like exponential, uniform. You would have learned this in your undergraduate mathematics. And general means it can be just not a standard distribution. Then we have the symbol lambda for the average arrival rate, the symbol tau for average service time. And the reciprocal of the service time is the average service rate. So, mu here is equal to 1 by tau. And of course then we have the system service rate which will be C mu. So, since we were talking about probability distributions, we are not going to go very much in depth in this course. Every now and then whenever we need some specific piece of knowledge from probability distributions, we will revise it. So, right now since I was talking so much about probability distributions, I thought I will just refresh it a little bit. So, what is a probability distribution? So, let X be a continuous random variable. Now, there are many words here. Random variable of course means a quantity that can take some varied values with some randomness to it. You cannot exactly, it is not constant, it does not always take the same value. So, so many things in life are random variables. And what is continuous? Continuous means there are not a discrete or countable set of values. For example, temperature, temperature in Mumbai, let us say. So, it is some random variable which can take a value, let us say in an extreme case, if we could even, if we have lot of climate change, we could go to 5 degree Celsius to maximum let us say 50 degree Celsius. So, of course, that is where the probability is, the probability of a 5 degrees Celsius temperature in Mumbai is almost 0, even 50 is almost 0. But 30 is probably I would say half the time it is around 30 plus minus. Not exactly 30 though, that is the whole point about continuous. We cannot say that it is exactly 30, we can say probability of the temperature being like 28 to 35 is almost more than half. So, that makes it a continuous random variable, you can take any value like 28.013 or 35.961, there is no, it can take any value. Then for a continuous random variable, we define this cumulative distribution function which is the probability that the random variable takes some value less than t and we denote that by f of x, fx of t. Then again recall that small fx t is the derivative of this CDF and that is called the probability density function. And this is not exactly the probability, but it is something that is close that goes up and down with the probability of random variable taking a value near this t. So, note that this is not a probability. It is basically just the derivative of this CDF. So, examples of probability distributions are uniform, exponential. Again, I just want to give an intuitive feeling and recall. So, for example, uniform if you remember that just means that every value there is no bias and there is nothing like certain values are more probable than the others, it is not like that. So, for example, the continuous uniform distribution uniform is usually between shown like this will be between some parameters a and b. So, when we say that an x is a uniform random variable taking values between a to b, this is the PDF. The PDF will look something like this. This is a, this is b and everywhere it is 0, only it will have some value here and then everywhere else also it is 0. And this value in fact is 1 over b minus a. You do not have to remember all of these too much just giving feeling, recalling what distributions are. And then we have the exponential distribution which is the picture goes like this. Again, this is f x t where x is exponential and this is t. So, this is just what the PDF looks like. For constant, we can also draw a CDF, but constant just means that it takes a fixed value with probability 1. So, this is actually a discrete random variable. So, this is just to recall what probability distributions are. And when we talk about queuing systems, we have to make some assumptions about these things. That is how we can do the reasoning. Now queuing systems have one more way to describe them and that is called the Kendall notation. And that has these 1, 2, 3, 4, 5, 6 spots which you have to fill with certain notation. The first one is about the inter arrival time distribution. And these distributions that I just talked about, they are denoted by D for deterministic m for exponential. And that is because exponential is something called memory less. And I am going to go into this in the next lecture or G for general which means we are not making any assumption. Why the next thing here denotes the service time distribution. Again, we will use the same symbols for it. Z is the number of servers and that is this spot here. A here is buffer size and this B here is population size. This is for closed systems only. And this last thing is the discipline. So, for example, if I show you this whole notation like this, it means inter arrival is exponential. The service time distribution is general. We do not really know, we do not really, we are not making any standard assumption for it. Number of servers is 4. Buffer size is 50. This is the closed system with 2000 clients and the discipline is last come first serve. Here are some examples. We have let us say a web server with 256 threads, 256 buffer size and if it is modeled as an open queue then and we are not making any assumptions about the inter arrival time distribution or service time distribution. Then we have G, G 256, 256 and what we do is if fcfs, you know we can drop the notation. So, we do not have to write slash slash. This just if I do not write anything, it means fcfs. Then we have the cellular channels. Remember suppose there are 20 channels in a cell no buffer and call duration let us say is exponentially distributed then and we are not making an assumption about the inter arrival. So, that is G, M number of channels is 20 and buffer size is 0. If we have let us say the last example is of the CPU queuing system itself. There were 256 web server threads using a 4 core CPU. Now these threads are not going anywhere. They are just fixed. These are fixed number of threads. So, this becomes a closed system and these 256 threads are like the users which sometimes use the CPU or sometimes are idle. So, we have this is a closed system. This will be the service time is fixed. We are assuming a fixed service time. We have 4 cores. We do not assume a fixed short buffer size. Anything more than 256 means it is going to be enough. So, it is okay to assume a large buffer size. Then we have 256 is the population and processor sharing is the queuing discipline. So, this is the queuing discipline which is close to round robin. So, it is close to round robin and it is what is used often for CPUs. So, with this we will stop. We are done with the introduction to queuing systems. And now we will see how we get some metrics out of the queuing systems and that will be in the next lecture. So, these are some examples of queuing systems in the Kandall notation that we just saw. But the whole point of defining a queuing system is to analyze it. We do not just want to describe the parameters and be done with it. So, what are the metrics of an open queuing system? Remember that when we talked about that parameters and metrics table, we said that we can have a general model. So, the general model is not only for the parameters, it is for the metrics also. So, these are the metrics of the open queuing system. First thing we can talk about when a customer comes into the system, how long is the customer going to be waiting. So, depending on what the queuing discipline is, it could be the time from the arrival into the queuing system to the start of service. This is one way to define the waiting time. But if it is, if the scheduling discipline here is preemptive, for example, remember round robin is a preemptive service discipline, which means that even after a request starts in the server, it may get, actually it may get preempted and put back into the queue and another job can start. If that is the case, then waiting time is actually defined as total time spent in queue. Because you may spend some time in the, you may arrive at a certain time, start the service, but again put back into the queue. So, waiting time is really the total time spent in the queue. So, that is one metric. Then we have throughput, which is basically the rate at which requests complete successfully. Anyway, this arrow here shows a successful completion and the rate at which requests complete, this is called throughput. So, this can represent the bits per second carried by a link, this can represent calls per second carried by a cellular network and so on. Then we have, for the servers, we have a very important metric called utilization. This is actually the average number of busy servers divided by the total number of servers. So, sometimes you can have 50 percent of the servers are busy or 80 percent of the servers are busy on an average and that is server utilization. Then we have response time, of course, this is waiting time plus service time. So, waiting plus service. So, this would be the total absolute total time the request spends in this whole system, that is response time. If we have a finite buffer, we can have a request that comes in, it does not actually get space in the buffer and it is dropped. So, that is called, we capture that behavior by a metric called blocking probability. Then there are two more metrics, one is what is the average, what is the queue length, that is how many customers are to be found here and a related metric which is how many customers are found in the whole system. And for all of these metrics, actually we talk about the average which is a most simple and most common metric that most people are most common statistic of the metric that we are interested in. Then we may talk about the variance and the most detailed thing that we may want from this is the distribution. We may want the probability distribution, let us say the probability distribution of the queue length or we may want the CDF cumulative distribution function of the response time. So, these are the kind of things we may want from a queuing system. This is just the same thing repeated here, just to remind that the queue length, the utilization and the throughput are going to be system metrics. These are things that the system owner is interested in. Remember when we talked about system and user perceived metrics earlier. And the other ones the waiting time, response time and blocking probability are metrics that the actual customer will experience and can be measured by the user. So, these are the classification of the performance metrics that we saw. So, this concludes our introduction to queuing systems, the parameters, the metrics. And we talked about one particular property called memorylessness when we talked about exponential distribution. And like I said, there are some background from probability that we will just revise as needed. And one of that is going to be memorylessness. So, that will come next. Thank you.