 Welcome to my lecture on experimental design in simulation. We have to remember that once the model building is over and is validated and we have a valid model to work with, to simulate with, the job isn't really done. The whole point of the experimental design phase is to make note of the fact that we are going to analyze the results of the simulation. We're going to use the simulation to generate data and to learn even more about the system than we did during the model building phase. When we analyze the data, we're going to try to extract as much information as possible from it. Of course the information has to be in there to be found and that's where the design comes in. The goal of the design of the experiment is to make sure that the data contains as much relevant information as possible. And of course we know that simulation projects have a lot of time and budget constraints and so we're going to make sure we're practical as well as trying for the max. It's always interesting to compare simulation experiments to statistical experiments in what we might call the real world. Sometimes you might call it a field experiment. When we have to go out into the world and collect data, whether the data involves human subjects or not, machine parts. And we have to collect data sufficiently to construct an experiment in order to test the responses, the response variables that we're interested in. It often requires a large amount of data. Often it requires a lot of data that we really can't get easily, maybe not even at all. Simulation experiments allow us to manipulate randomness. They allow us to control randomness. We can control the environment, the random phenomena so that we make sure that we have this exactly the same environment at every single design point. That's something you can't do in the real world. It makes it very interesting to design experiments for simulation. Expanding in the real world is so expensive. Each end value, each data item that's collected adds an additional expense to the study. And so we try very hard to keep the experiment small in the real world instead of large enough or maybe only large enough. In simulation analysis, on the other hand, a lot of the expense goes into building the model in the first place. Once the model is built and we can use it to generate data, it's a very simple thing to create a lot of different factor levels, a lot of different design points, even looking at many different response variables. And it doesn't really add very much to the cost of the experiment. Unlike in the real world where every data value has a great deal of variable cost that it adds to the experiment. Once we know what the objective of the simulation study is, we can start to identify the variables that we're interested in. Some of the variables will be control variables, will be independent variables, which we'll call factors in experimental design terminology. These independent variables have values that we set beforehand, values that we're interested in studying. We're typically going to be setting the values of independent variables rather than just measuring them. The response variable is the dependent variable and that's the variable we're interested in. That's the reason we're doing the study in the first place. And whenever we're interested in learning about, it's going to have something to do with the response variable, with the dependent variable. What you might have been calling the Y variable and we still will call it that again. Factors can be quantitative and it can be qualitative. Quantitative factors naturally have numeric values like the number of machines in a workstation, the number of tellers in a bank. Qualitative factors represent structural assumptions and they might be categorized. For instance, Q-discipline could be either LIFO, FIFO, priority, and so on. The shortest job first, there's a good one. So sometimes they're qualitative and they'd be coded. We might be referring to this variable as having values one, two, and three for the different values of the Q-discipline. There are two types of variables in the area of experimental design that are not quite factors and certainly not responses. We don't worry about them as much in simulation because these variables in real life are variables that can't really be controlled. But in simulation, of course, everything is controlled. Nuisance variables affect the behavior of the system but can't be controlled directly. And intermediate variables are affected by the factors, by the independent variables. But we can't control them, we measure them. So between measuring and monitoring in the real world, we do the best we can with these other variables that are not quite factors and not quite responses. Sometimes the goal of the simulation implies a particular experimental design must be used. Sometimes the goal of the simulation means that there are a number of experimental designs that can be used to achieve that goal. Some of the more commonly used designs or a better way to look at it might be goals, objectives of the simulation study are on this slide. And this is what we're going to be continuing to look at for the rest of the lecture. We might want to estimate a parameter. We might want to look at different systems and compare the value of the estimated parameter. We might want to look at more than just a one-way comparison and that would mean we're looking at factorial designs. We might want to look at a lot of different alternative systems, one-way, two-way, whatever, and optimize the value of the response variable. Find the best system for what we're studying. Sometimes we want best and second best and maybe third best and then we'd be looking at ranking the alternative systems. Sometimes we're more interested in isolating the important factors instead of looking only at the response variable, factor screening. And sometimes we want to look at the entire relationship between the inputs and the outputs between the factors and the responses. And we can do that very nicely with the simulation metamodel. Parameter estimation is one of the most common goals of any experiment, including the simulation experiment. By this point in the course, you've already done some of your own simulation homework and you've looked at estimating performance measures, measures of effectiveness from the results of the simulation run. So this probably speaks to you very clearly. Sometimes the parameter estimation is even taken as a generic. When we talk about simulation output analysis, we assume, oh, we're outputting the values of a measure that is going to be averaged at the end of the run. And we're going to use that to estimate the parameter, the real-world system. We don't know the real-world system when we're simulating much in the same way that we don't know the population when we collect a sample from the real-world system and use it to estimate a parameter. Simulation is a little different because we can make it more or less complex as we wish. And also, we can control runs by controlling the randomness, the random numbers or the random variables that we input into it. So that just makes the experiment a little bit more interesting, but in the end, if we're estimating a parameter, we're estimating a parameter. Probably the second most popular or most typical experimental design in a simulation is one where we're comparing alternatives. It might be alternative systems that we're thinking of building. It might be alternative strategies, things we may be thinking of changing. If we have an automobile repair shop or an airplane maintenance section center, we could be thinking of how many servers do we need? How much machinery do we need to process the large vehicles that come in? So we could try different configurations of the system before we build it, and it's certainly better doing it that way in a simulation than doing it in the real-world. It's not even possible in the real-world. And also, there are all kinds of behavioral issues in the real-world. Suppose we want to test two tellers versus five tellers in a bank with various other inputs. Hiring people, moving them around just for the purpose of an experiment, you have to train them. Sometimes it's just too much getting in the way. We're just changing too much to do this experiment in the real-world. But in simulation, that's not a problem. Now you might think, similar to the real-world, that if I run my simulation or my system with one strategy one week and then do that with the other system the next week, I can't compare anyway because I have different configuration of demands coming in, different service times perhaps. The thing is with simulation, I can control all that because I can control the input distribution. I can control the input variables that are generated from the input distribution that come into the simulation. Basically, I'm controlling the arrivals, I'm controlling the customers, the entities, the cars, the trucks, the airplanes that are coming in. And I can make sure that every time I run a new configuration, it's the same sequence of arrivals coming in. Let's look at a small example. Suppose we're interested in comparing two alternative systems. System one, we arrange for multiple queues, one queue in front of each server. And with system two, we just have a single queue and the individual at the front of the queue goes to the next available server. We have run 26 independent replications for each of these systems, each system alternative. And we're examining the response variable we're looking at is wait time. How long does an average customer have to stand or sit or wait in the queue before being served? So in this case, we're looking at it from the customer perspective. There might be other opportunities where we're looking at something similar but let's say examining utilization of each server and how the two different systems react. At any rate, you see the data for this example on the left, the raw data, well the averages from each run. And we've got on the right the summary statistics, sample size is 26 in both. The mean for system one 6.12 the mean for system two, not that much different 6.18 and the standard deviations point 016 and 013. So the question is, are these different, or are these basically two different samples from the same ultimate population, which would mean that there's really no difference in waiting time. The population parameter is the same, no matter what queuing type or queuing arrangement structure that we're using. Here are the results from MS Excel, Microsoft Excel, you could be using just about any statistical software for this. This is just one example of what you might use in order to analyze this using a t test with assuming equal but unknown variances. It's sort of halfway down, you've got what's called t stats. That's the computed value of the t statistic. And if you know anything about t tests, a t statistic of negative 14.85 is is huge. And we don't even have to look at the p value anymore I mean we know that there's a large significant difference between the two systems, even though when we looked at the data. It didn't look like a very large difference, but don't forget the standard deviations were quite small. The p value, if we're looking at a two tail test, the p value is 5.7 times e to the minus 20. I mean there's a lot of zeros in there 0.000000. If you're working at the oh five level of significance it's going to be way way less than oh five. And you even given the look up at the table look up the critical value from the T with 50 degrees of freedom. And 0.05 alpha equals 0.05 is 2.00 or whatever. And so the conclusion is there is a significant difference between these two systems with regard to time spent waiting in queue. A factorial design is similar to the experiments of comparison. But affect those are one way. Usually that's that we think of them as one way. A factorial design is not one way it's a cross study. So for example, in the simple case where we're looking at two control variables and one response variable. We might be looking at single queue versus multiple queues of one in one in front of each server. We might also be looking at traffic in the system help the customers may come in very very quickly very small interval time. They may come in spread more spread apart. So we could be looking at three levels of average into arrival time. And that gives us a three by two factorial design with. So that's six cells. And in each cell, that's going to be the result of the run of the simulation. Let's say we're looking at time and system. And for any customer in the system. So each one of those is the result the average output from a single run. We could then have replications in each cell also. So that's the essence of a factorial design. But we're going to look at a different types of factorial designs that are used in general and specifically in simulation. There are several specific types of factorial designs that are studied in this field. For the most part, they're trying to allay the disadvantage, the main disadvantage that the number of cells in a complete factorial design grows very quickly. And you saw the previous slide in a simple two variable three by two we end up with six cells. Three by three would be nine. And what if we had another variable? What if we had five variables control variables that we're looking at? So the number of runs, you know, the number of the sample size grows very, very quickly. And that's only if we want to do one run in each cell. What if we what if we want replications? It turns out though that this is more of a problem in the real world than in the simulation world as we saw before. Because the main expense of the simulation comes before we actually generate the runs. Sometimes there's a simulation that's very, very complicated and takes a very long time to run and having to generate extra runs would make a difference there. But for the most part, this is more of a problem in the real world. However, because of the goals of typical simulation projects, we do often use specific types of factorial designs. And we're going to look at 2k factorial designs and fractional factorial designs. First, let's look at what we might call the one at a time approach where we have more than one factor. We want to look at the contribution or the effect at each of these factors on the response variable of interest. We could do this one at a time, take each independent variable, each control variable that we're calling X, take X1, look at it separately, do the analysis, take X2, do the analysis and so on. There are some statistical issues with this, but probably one of the most important is that we won't be able to test to see if there is an interaction effect. We're only looking at these separately in terms of the main effect. And in addition, of course, we need to make many, many simulation runs in order to get the precision that we need. But probably more important from an overall point of view is this notion of maybe we want to test for interactions. We're going to see this a little bit more depth on the next slide. Here's an example. Suppose we're looking at two different control factors in a system, a queuing type of a system. We're looking at how experienced the server is and we have only two design points for that. We are looking at what we call experienced servers, those who have more than or greater than or equal to five years of experience. And we're looking at comparing them to the inexperienced servers, the trainees who have less than one year experienced. On the other hand, the other factor we're interested in is seeing how the servers handle low traffic. In other words, a system that's not terribly crowded as opposed to heavy traffic. And we know from our study of queuing systems that we can identify a low traffic system as something with large inter-arrival times. The span between successive customers entering the system is large and heavy traffic would have small inter-arrival times. The responsive variable here is something based on time in minutes. And let's say what we're looking at is not necessarily time in the system, but we're looking actually on server performance. So we're interested in the service time, the actual service time, how long it takes the server to complete the transaction. Now if you look at the data and you just look at averages of each factor ignoring the other. Experienced versus inexperienced, the totals are very, very close. Probably you won't see any difference between experienced and inexperienced servers with regard to service time. Low traffic versus heavy traffic, you look at the totals, same thing. You won't see that much of a difference between the service times in low traffic system or heavy traffic system. And yet there is a difference. And the difference is all in the interaction. The experienced servers in low traffic take longer to complete their transaction. Perhaps they're chatting, enjoying the social interaction with the customers. And yet in heavy traffic, they speed up in order to move things along. And then they have an average of 5.5 minutes service time. The inexperienced servers on the other hand, in low traffic, they're about what they should be probably 6.1 minutes. Compared to the low average of 5.5 minutes that the experienced servers have in the heavy traffic. So the inexperienced workers do their job 6.1 minutes on the average in low traffic. But all of a sudden when the system gets crowded, they start to panic. They know they're supposed to be moving people through, but they're not quite sure how to do it. And they don't have the self-confidence to do that quickly. And they slow down in order to make sure to get it right. There's the interaction without testing for an interaction effect that won't be picked up. A better approach, and one that still includes some layers of efficiency, is what's called 2K factorial designs. Where for every control factor that you have, you have two levels, much like we did in the previous slide in the previous example. You have two levels of each X of each control factor. And you set them far enough apart so that you'll be examining the effect in a realistic manner and also coming up with enough wiggle room to look at the interactions. So you choose two levels for each factor. All together, those are the design points. You assume that the response is approximately linear over the range of the factor. Because you want to make sure that when you interpolate, that you're relying on it on the data to follow your assumptions. You might be missing something. If the actual data output is not linear, if the actual data is not linear, you're looking at only two points of the factor. You might actually be missing something important. So here's what a 2 to the 3, where K is 3, factorial design might look like. This is called a design matrix. We've got, in this case, three factors, factor 1, factor 2, factor 3. We've got the response. Factors 1, 2, and 3 are what we might call the X's factor. The response is what we might call the Y. And remember, in every statistical study, it's the response that we're interested in. That's what we're studying. So if we have three factors, and each factor has two points that we're studying, a low and a high, signified by, noted by negative and the positive, the minus sign and the plus sign. We end up with eight design points. The first design point has factor 1, factor 2, factor 3, all at the low point. And then the second design point has a plus and then a minus minus. So factor 1 is at the high point, 2 or 3. And you can see that basically this is a permutation. And we've got eight design points with the eighth one being plus, plus, plus, the high levels of all of the factors. So it's a very nice, simple way to set up a factorial design where we're studying all the main effects and the interactions. And we're also making use of efficiencies in collecting our data. You can probably see from the example in the previous slide that as K gets large, the number of design points pretty much explodes. And so what you might start to think is, can I reduce the size of my experiment so that I have fewer design points? And still get practical information out of it. And when you do that, that's called a fractional factorial design where you're not looking at the full 2K design. And you're probably giving up on the opportunity to study some interaction effects, but you will be able to study certain of them. So this involves much more expertise in order to reduce the size and the complexity of the design of the study and also be within budget. Sometimes we're specifically examining the factors. We not only want to study what makes our response variable best, what gives our best performance for the measure of effectiveness. But we're specifically interested in which factors contribute the most amount of information to studying the response variable. And that's called factor screening. We might be using 2K factorial designs, fractional factorial designs, other types of factor screening designs. But the ultimate goal is really to examine which control factors are important and must be left in and which can be left behind and considered part of the random variation or noise factor. Optimization is one of the most common objectives in a simulation. When we say optimization, what we mean is we want to come up with the levels of the control factors that produce the best response, the best value of Y. So if you imagine a regression equation, although I have to say we're not usually imagining a linear function for simulation output. But if you imagine a regression equation, you pretty much get the idea and you've done this in regression, I'm sure. In nonlinear systems, in simulation systems, in large data systems, this might be called response surface methodology. We're not going to look at this from a practical point of view. If you want to go into it, at least you know what it's called and you can learn more about it. We can do this with a regression model. That's the topic of regression metamodels and we're going to look at that in its own lecture. And in addition, let's not forget that we could also really be doing what if analysis here. It may not be the most sophisticated way of going about it, but it's certainly something that's doable, easy, and something that everyone understands. Just as in real life, we may want to optimize and find the best possible solution, but we may be satisfied with the best possible solution in a price range in our budget. And so we may not only want to look at the optimal, we may want to rank alternative systems with regard to the response variable of interest. So that we could look at the second best or the third best and certainly we can satisfy. And that's exactly what the design called ranking and selection of alternative systems is doing. Most of the experimental designs we've looked at here assume an underlying linear model, which on closer examination may or may not be true with our simulation data. A metamodel is where we're explicitly looking at a model, whether it's a linear model or a nonlinear model to represent our simulation. We know that a simulation is a transformation of inputs to outputs, right? And if we can take our simulation inputs and our simulation outputs and generate an analytic model, think in your head of a regression model, although the form might be slightly different. Then we can learn a lot more about the interactions between the inputs and the outputs and about the interactions among the Xs and even among the Ys because we're assuming there would be more than one response variable. And we can learn a lot just by looking at the form that the equation takes. The metamodel itself as an experimental design technique we're going to look at in its own lecture coming up soon. You may think from looking at the experimental designs studied here and from what you've done in your earlier statistics courses that we always are looking at only a single response variable. There's always just one Y that we're studying. It might be waiting time in queue. It might be total time in system. It might be the utilization of the servers. It might be maximum queue size. Whatever it is, it seems like we have to pick one, but we know that that's not realistic. That doesn't really happen in the real world. In the real world, there's always more than one thing that we want to study for one thing. These are complicated systems. We're putting a lot of work into it and a lot of money into it and we might as well study more than one variable for that for the money. So now we have to consider can we just go along merrily doing what we've been doing and just do it repeatedly. I have the analysis that I want to do, the design that I've created for a single response. I'll just do it over and over again. The same thing for waiting time. The same thing for size of the queue. The same thing for utilization of server. And we'll have independent sections in our report for the analysis. Well, the fact of the matter is there are several things that are wrong with this. The most obvious is the significance level for these tests. And if you think you're performing each test at a particular significance level alpha, where alpha is the probability that you're rejecting the null hypothesis when really you should not reject it. You're really operating at a different one because you have to look at the entire experiment. And you, these individual univariate tests are not independent because they're all on the same set of data. And if you're increasing the number of tests, then you're naturally increasing the probability that you'll reject falsely and so you're very likely to reject something. Or if you do several tests at a probability of type one error of point one, and you do many, many of these univariate tests, the chance that you're going to reject something just keeps increasing. And you can actually compute that you compute the experiment wise alpha rate, but that doesn't help you. If all you can do is compute it and report it. It's something, but it's not exactly the end of the matter. Let's see what happens on the next slide. So if we know the alpha error we would like to be working at, and we can compute the experiment wise error rate. So what we would really like to do is use that experiment wise error rate, take the number of univariate tests we are doing and manipulate this algebraically or by the Bonferroni approach in order to come up with the level of significance alpha that we should be using for each univariate test. So that's one way to do it. It's still leave something to be desired because it doesn't look globally at the inter-correlation among the various why variables. Now, if you've taken any course in multivariate statistics, you see that there are techniques that have been developed specifically for this reason. And even if you haven't taken a test yet, a course like that yet, you really should or you should try to do this on your own if you can, but that's the solution to the multiple response problem. And just about every univariate technique that we've discussed or that you've studied in your statistics course has a multivariate equivalent. Sometimes the researcher wants to avoid using multivariate techniques, which I don't recommend, but one way that they might do that is to say, well, let's take all of the responses and make one out of it. Take all of the possible responses, combine it into some sort of a formula and have one overall metric that represents the system. It could be some sort of a utility function. You lose a lot of interesting and important information in this way, but it is one way of simplifying and solving the problem. Although, as I mentioned a minute or so ago, using the multivariate techniques that were created for exactly this situation gives you much more information about the responses that you've gotten. As far as is possible and practical, you do want to validate the experimental design, even before you generate the simulation runs to collect the data. You can do that by making assumptions, run, use constants and see what happens, try your design, have made up data for all of your design points. You can perhaps run one single simulation experiment, one single simulation run, rather, and use that data to come up with your values for testing. It's sort of like testing the stress of the experiment, the experimental design. What you want to do is you want to look for issues that might come up when you get your actual data. One way of testing, of testing the stress on a system is using weird numbers, put in very, very large values. Imagine that you had very large metrics, large MOAs that came out. Imagine that they were small, imagine some things are zero, and then put that into the experiment as if that were the data that you came up with, and test. These are some ways that we validate an experimental design. In this lecture, we looked at experimental design as one important phase of a simulation project. We know that when we design the experiment, it's so that it will contain the most possible information. When we actually do the data analysis, that's when we would like to extract all this information from the data. We have looked at several different experimental designs that are typically used in simulation experiments. And we have looked at issues such as the variables that are involved, the responses, the factors, whether we have a single variable, a univariate solution, or multiple response variables, the multivariate problem analysis. And even though this is a very, very long lecture, what it probably did for the most part is point you in a direction where you could go and study further and relate it to whatever statistics courses you might already be taking or have taken. Thank you for joining me in this lecture.