 Welcome to this lecture on statistical considerations in simulation experiments. The ideas we look at in this lecture are things we have seen before. Probably just about everything or almost everything that you learn here. We'll have seen before in this course. You may have looked at it previously in a slightly more superficial way. It might have been in different locations in the course. I try here to pull things together and to look at some of them in a little bit more depth as we go along. The important point is that when you talk about simulation, you are talking about experimentation. A simulation is a run of a particular model that represents reality. One run doesn't do it. We're running the model for a reason. That reason includes building this experiment around the simulation model. If you have only the modeling aspect, when you're talking about a simulation, it's incomplete. If you're looking at only the experimentation aspect, it's incomplete because attention has to be given to the model too. When it comes to the empirical aspect, the experimental aspect of a simulation project, what kinds of things are we talking about? Some of them affect every single part of the simulation study. That's what we call a strategic consideration. Some are decisions that have to be made as you go along on different aspects of the study. Those are tactical. The strategic considerations will of course affect the decisions that have to be made later on on tactics. The strategic considerations are the ones that you want to consider, take care of at the beginning of your simulation study. Before you've done a lot of other things that you might have to change depending on these decisions. You want to examine the simulation model. You want to build the simulation model early. There's really no special ordering in your decisions about the simulation model, the simulation type, the experimental design, and even choosing the response variable. Each time you make one of these overarching decisions, you may have to then go back to the others and reconsider. We've looked at simulation model building previously. There are a lot of aspects to constructing the simulation model. There's the conceptual model and then there's the model implementation. All of that is in the context of the larger simulation study involving the statistical aspects, which is what we're looking at here. You could think of the simulation model as its own activity. By the time you finish creating a model, implementing a model, and validating it and making sure that it matches the world, such as you know it. Many people would say that that's good enough. You've gotten a lot out of the simulation. You understand the system and you can move on. If you don't wish to experiment with the simulation model, that's fine. For the most part, most of the time you do want to take the simulation model itself and work with it once it's built and validated. Those are the types of simulations that we're looking at here. We have also looked at simulation type previously. The slide you're looking at now is an overall summary of the kinds of things we examined in the lecture on simulation type. Basically, your simulation type can be of two sorts, terminating simulation and a steady state simulation. The steady state simulation is just non-terminating, which has its own issues. With the terminating simulation, you'll have a particular event that is known to end the simulation. So you don't have a question of how long to run it. You know when it will end. Often when that happens, you also know how it will begin and you have to set initial conditions at the beginning of the simulation. With steady state simulations, which are not inherently terminating, you have to decide externally how long to run the simulation. And when you're thinking in terms of steady state, and for the most part you are, you're looking at the results of the simulation and waiting until you get into the state of dynamic equilibrium, at which point you can start collecting the output measures. One of the most important strategic considerations here that we haven't examined elsewhere in this course is what are we studying? We know one of the most important decisions is to come up with the objective of the simulation. Part of that is to determine the variable that we're going to output from the simulation that we're interested in studying. It's usually going to be a measure of success or a measure of effectiveness of the system that we're studying. We've seen, the examples we've seen, they're pretty much averages, like average size of Q. But it doesn't have to be average. It can be a median. It can be a probability distribution, the top 20% of the longest waits, if you're talking about waiting time in Q. In fact, we have seen, you know, researchers have discussed this. You can look at two system configurations, and they could be the same with regard to means, but their medians could be very, very different. And the median might be more important. The entire distribution might be different. When we're dealing with means, especially if we have our material ingrained from our statistics courses, which this is, a statistical experiment, we're thinking in terms of the central limit there, where everything somehow magically reduces to a normal distribution. Most of the systems we end up studying in simulation are very far from normal. In fact, they're not even symmetric. Just think of an exponential distribution, and you'll see what I mean. So we should really think more carefully about the possibilities in terms of the response variable, the variable that we're studying. And as we do that, why are we thinking of only one? There is no reason in the world that we can't examine multiple response variables. In other words, multiple whys. Remember that the simulation is an input-output transform. It's an algorithm that takes inputs and produces outputs. Well, you don't need only one Y variable to be output. We could have several. If we do that, recall from your other courses that you'll have to use multiple multivariate techniques, multiple response techniques in your statistical analysis. This brings us into the topic of experimental design. Do we have one variable? Do we have many? Do we have one X? Probably not. Do we have many? And if so, what kind of experimental design are we modeling here? Which is going to help us understand what kind of output we need from the simulation in order to do the analysis that we need. Experimental design is such a broad topic. We're going to consider it in its own lecture later on in the course. Now we'll move to examine some of the more tactical considerations in a simulation study. These are the ones that are often affected by the decisions that we made when we looked at these strategic considerations. These are, and this is only a small list. There are potentially many, many more tactical considerations that you can think of. But here's, but for example, we need to make decisions about the initial conditions in the simulation run. We need to think about initialization bias. How are we going to replicate the results of the simulation? What methods of replication will we use? What's, what are we going to do about the length of the run? And that's a decision we have to make if we've decided to do a steady state or non-terminating simulation. We need to decide on sample size, which actually goes back to methods of replication. How are we going to collect the data? What are we considering an item in the sample? And then finally of the ones we're considering here, is there a way that we can reduce the variance of the sample? Can we possibly reduce the spread of the data in order to make our statistical techniques more efficient? And we're going to study that topic in another lecture. As you can see, we're pulling together some of the material that we studied in more depth earlier in this lecture on statistical considerations and simulation. And here again, we have a summary slide of issues relating to initial conditions and initialization bias. When we build a simulation and we run the simulation, we have to take into consideration how that runs starts. What time does it start? If it's a terminating simulation, we know when it ends. Perhaps we know when it starts. Is there a condition that starts the simulation? If so, what are the values of the state variables at that point? And sometimes it falls very naturally into place, especially if we're simulating a terminating system with a terminating simulation. If it's a retail establishment, well, we can collect data regarding the number of customers that are waiting at the door at opening time. Let's say 9 o'clock. How many customers are typically waiting there? It'll be a distribution. We can sample from the distribution. But certainly we can depend on the real system for this data in the simulation. In a non-terminating simulation, on the other hand, we want to generally, we want to run the simulation until we get into a particular state, a steady state also called dynamic equilibrium. So even though the initial conditions are certainly important, we could start with the empty and idle state. Time is zero, the system is empty, and nothing starts until we have an event that boots it up, an event like a customer coming in or a machine coming in to be repaired. You could reduce that circle around the initialization bias by having more typical starting conditions. You have to know what they are first, obviously. So all of that comes into play in a non-terminating simulation. The size of the initialization bias will determine how long we have to run the simulation until we can be pretty sure that it is in this state of dynamic equilibrium. And certainly one way of doing that is to do one long run first and eyeball it visually. Look at the, just graph it like we have here and get an idea of where would be a good place to reset to stop the simulation, throw away the averages, keep the state of the system as it is, and start from there. We're going to start looking at the different ways to collect a sample of data when we're doing a simulation, a simulation study. Often we need more than one simulation run, and one way of doing that is by what the method called independent replications, which is very, very much like any statistical experiment using empirical data. We have to collect a sample of a particular size, and we want the data values to be IID data, independent and identically distributed. If we have already decided that we're doing a terminating simulation, well then just naturally this is what we're doing. This method of replications is going to be independent replications. And so we'll do as many runs of this terminating simulation as necessary in order to reach our sample size. This is a nice technique. It matches everything we've already studied in our simulation courses. We don't need to worry about the length of the simulation run if we have a terminating simulation, or even if we decide to do independent replications of a non-terminating simulation, we do have to decide on the length of the run, but we just do that once and then get as many independent replications as we choose. That's a little less optimal, obviously, because there's going to be a lot of computer time and power invested in it, but it's with the cost of computer time and storage nowadays going down, it's certainly doable. The method of replication of collecting a sample in a simulation known as batch means, the batch means method, is based on one long simulation run, which is partitioned into a series of non-overlapping sequences, batches of equal size, and then a mean is computed of the values in each batch. So that now instead of having a long sequence or a long run of values, as they're generated, as we saw before, instead we have a sequence of batch means and the idea is to cut down on the order correlation in the data. And of course, there's considerations about the size of the batch and depending on the size of the batch, the correlation between adjacent batch means will either go up or go down. We want to make sure that the order correlation is as small as possible. We're going to test for independence, as you can see from the algorithm on the right. But we also want to make sure that it's relatively efficient. We want to be able to collect the smallest amount of data possible for the largest bank for the buck, as always. Here's a method in which the considerations of run length, how long is the simulation run going to be, and sample size, how much data are we collecting, what's N, size of our sample. Here's where they come together in a very important way. Each decision is made separately and yet each one affects the other in this method of replication. A lot of it is an art. You'll find a lot of material published in the scholarly literature and in textbooks on this. We do the best we can. One way to do it is iteratively, in other words, sequentially, you compute, select the batch size, compute the batch means and then test. You might as well test them and see if there's correlation there or not. If they're independent enough or if there's order correlation and you don't have. Basically, we're trying to say that we have independent and identically distributed data. If there's order correlation, you don't. So if the test for independence fails, try another value for the size of the batch and do it again. The nice thing about having data that we generate with the computer as in simulation studies is that we can do this repeatedly and work towards a solution. Another thing that has been advised. That's a good idea is we not only come up with batches, sequential non overlapping batches, but we make sure they're not totally consecutive. You eliminate some data values in between the batches and that cuts down on order correlation even further. If you have worked with time series data before, you've probably been listening to this lecture and thinking, why in the world are we bothering with making believe we have independent replications when basically we have methods to work with to data analysis. For time series data. And that's exactly one of the possibilities. Let's not try to squeeze time series data into fitting into some other kind of mold. We take into not only take into account, we celebrate the fact that there's order correlation in the data. We make use of it and we apply time series methods in order to do our estimation or our hypothesis testing in whatever whatever way we were. We've decided to run the simulation that does to apply to the design of the simulation. Of course, the calculations are considerable. The expertise required may be different from what the simulation researcher has and it might involve going to another to a consultant who's an expert in the area. But it certainly makes a lot of sense to take the inherent order correlation in data like this and analyze it making use of it. In general, the run length and sample size issues often go together. The conflict is very much like what you've studied before in your introductory statistics course. Suppose we are estimating a parameter. We're going to construct a confidence interval estimator. We want to achieve a particular level of precision for this estimator for this interval. And yet we don't want to have to generate so much data that the cost of the experimentation would be prohibitively high. That's a very, very common typical trade-off that we see all the time in statistical experiments. With regard to simulation though, we don't begin to collect our sample until we consider the notion of run length. For terminating simulations, run length is organic. There is a terminating event that ends the simulation. This may also involve decisions about the beginning of the simulation run, and we've discussed that previously. For non-terminating simulations, the run length will also depend on the things we've discussed already. Interestingly, a lot of these are the other tactical statistical considerations that we thread through the lecture here. So we may be looking at the size of the initialization bias. A large amount of initialization bias will increase the length of the run. The time it takes to achieve steady state, it's related to that. The size of the batch, if we're using the batch means method, how much thinning, what's the size of the autocorrelation. All the other things we talked about would go into determining the run length. It's difficult or impossible to order these considerations into an algorithm and say, number one, do this, number two, do that because they affect each other in many different ways. Whether we're doing a terminating simulation or a steady state non-terminating simulation, sample size is going to be an issue either way. It may come up as the question, how many replications of the entire simulation will we run? How many batches will we collect? What is the size of each batch? So sample size is going to be a problem no matter what, why? Because this is a statistical experiment and sample size is always going to matter. We could determine sample size in a fixed manner or a sequential manner. The nice thing about simulation is we have the opportunity to do it sequentially. It's easier with simulation than with, let's say, a field experiment to do this sequentially. With a fixed method, what we're doing is we're saying we're going to assume a known variance for whatever reason. I mean, there are a lot of good reasons. We may have some previous research. We may be simulating a system that's similar to another system that does have a known variance and so on. So we have the variance we're making assumptions about. We've got the precision that we want, the width of the confidence interval. We use that in order to determine the size of the sample. That's very similar to what we've already looked at in other courses. A sequential method is one where we don't know the variance. We're treating n as a random variable, the sample size, and we keep generating data and examining the measures that come out of it until we finally decide we're close enough to the sample size that we would like. How do we do that? One method is on the right side of your screen. We have this collect data and test cycle. We're collecting data in the simulation run, obviously. So it's a run test kind of a cycle. We collect data, we test it. If the test succeeds, we're finished. If it fails, we have to go back and collect more data. However, where we're doing that, presumably, we have to increase the size of the length of the simulation run at the same time. Now, what kind of tests are we doing here? We're testing to see if the data meets some pre-specified criterion. For instance, maybe we have an idea of the analytic value that we're aiming for. Maybe we have an idea because we have some sort of deterministic approximation of the metric. Maybe we have a historical output data that we're trying to replicate. We want to get close to it because, presumably, that's the output from the real system that we're simulating and so on. So whatever we're using as a determination metric, this test will either succeed or fail. And if it fails, we just go around and collect more data and test again. We know how important variance is. Sometimes we're looking particularly at the variance in our measures that come out of the simulation run. But most of the time we're looking at means and the means that we analyze our data analysis will be much more efficient if we can reduce the variance in the data. There were a lot of techniques for doing that, especially in a simulation study. We're not going to look at them here. We're going to look at the various methods more carefully in another lecture on variance reduction techniques. Thank you for attending my lecture. In this lecture we have looked at statistical considerations in simulation because we acknowledge that simulation is an experimental technique and it follows all the same guidelines as any statistical study. In particular, though, in a simulation study, we have some decisions that have to be made before others. These decisions impact everything else in the study and those are what we call the strategic considerations and they include the type of the simulation. Is it germinating or study state? The simulation model itself, the experimental design, the choice of their variable that will be studying and so on. And there were obviously more concerns than the ones we list here, but these are definitely the major ones that we want to look at. Then there were tactical considerations. Those are the ones that pretty much can't be made until you've already straightened out the strategic decisions to be made. And those include issues of run length and sample size, initial conditions, initialization bias, and of course variance reduction techniques.