 All right, so we're recording. Let me go back to sharing. Okay, so you can see the screen simulation metamodal, right? Okay, great. Thank you. Last time, we did a problem with a t-test, right? A two sample t-test. That was not, I don't think it was, it was presented any difficulty for you. It's a pretty basic introductory statistical technique. I'm sure you've done t-tests in the past. I don't know if anybody here, I should have created a poll for that, has done analysis of variants. Analysis of variants is where, whereas with a t, you might have two groups, two samples, with analysis of variants, you might have, or with analysis of variants, you might have several Xs, several factors, right? Sometimes you see the analysis of variants laid out on the regression analysis as part of the output. And you've done regression, most of you have, some of you have done more intense work in regression. If you have a regression equation with several Xs, each X has a coefficient and then there's a constant term, that kind of equation can really represent just about any statistical technique you've worked with. Just give a shout out, a quick shout out if you've used or heard of the term, the general linear model. Yeah. Yeah, okay, at least I got one, yeah. So I'll assume that that means yeah. But basically the general linear model is in the form of a regression model, but it can be specifically used to refer to just about any statistical technique that you've learned, most of them assume an underlying relationship of linearity in the data and in the world. Doesn't mean it's true. Very often we violate that assumption of linearity. And in fact, most of the time in simulation models that are based on queuing systems or inventory systems or productions is, I mean, most simulation models are engineering type of systems. And for the most part, they're not gonna be linear. So even when we do simple statistical tests like t-test, ANOVA, regression, we're assuming linearity and maybe we shouldn't be. And that's one of the things that I want you to keep in mind throughout this lecture, but we're not gonna address it until, maybe we're two thirds of the way through. This you've seen before, I just grabbed this from the experimental design lecture notes and basically when you design an experiment, you wanna make sure that as much possible information is in there when you analyze the results, that's when you wanna pull that information out that you made sure was in there at the design phase. And this is exactly what the simulation metamodel is made for. The simulation metamodel you can, until you know more about what it is, which is gonna be the next slide, but you can really think of it like a regression model. All right, here we are. If you, if we look at this as, it should really be in the form of a cloud, right? If we look at this as a depiction of whatever the real system is, there's inputs, there's some transform, some transform function, something in the real world that transforms the inputs into the outputs, the response variables. All right, and the outputs are in terms of the parameters of the system, and the real system is something we don't know. That's why it's big. It's full of complexity. It's the largest icon, it's the largest box in this picture. So this Greek letter phi is a function. It's the function that transforms the set of inputs into a set of outputs. The simulation model is built in order to represent the real system. We take data from the real system, we observe behavior in the real system, and basically we're building the simulation model to represent whatever our real system is. And the system, the simulation model is also a transform, an input-output transformation where you have some input variables and you have some output variables. The input variables are control factors. These are values you set. Sometimes they might be values you measure because you have no control over them and you just attach them to the rest of the data. The Y are the responses that you get out of the simulation, just like you get the output variables from the real system. This F over here, that's a program. Okay, this F is your simulation program or your simulation model. And E is just something that we throw in randomness to model the fact that the real system has much more complexity than the simulation model. There are a lot of things in the real system we're ignoring because that's the point of a model. You ignore a lot of things and you highlight other things and you throw in an error term to represent all of those things that you're not studying. What are these M's? For those of you who have had or are taking the multivariate course, M is a response variable. So you'd have a separate simulation model, you might say, for each response variable, although we all know that we don't really have a separate model for each response variable. We use one multivariate function, the simulation program, input all the factors and output the various responses. So the simulation model is slimmer and has less complexity and less randomness than the real system, but it's still trying to represent the real system. Then the meta model, which in this case, we're showing it as a linear additive model, a regression model, in other words, could be taking the same Xs as before and just putting it into the form of a linear regression model with several models, really, one for each multivariate response, one for each response. And the linear additive meta model is, again, smaller than the simulation model, which was smaller than the real system because the meta model is an actual function, it's an equation, and it's not this big model that you can't really examine all together, you have to run it. The simulation model, you could put in your inputs, you could put in your outputs, but you can't really tell how one turned into the other. When you have a mathematical formula, you have your inputs and you have your outputs and you can actually talk about the coefficients, you can talk about the relationship between the inputs and the outputs. So it would be a valuable addition to the simulation. Now, I just wanna point out here that if we call this meta model, the GLM, the general linear model, then basically what it does is it describes, it's an umbrella, for every possible statistical technique we might have done on our data. We have the real system, which we can't get at, we have the simulation model, which we build to represent the real system, we generate data from the simulation model, we use that data in either a t-test or in a nova or regression, something that falls under the umbrella of the general linear model. And it helps us to understand what's going on in the simulation and indeed in addition, it helps us to understand what's going on in the real system overall. I just repeated this again, so I don't have to keep on going back and forth with the slides. The simulation meta model is not only convenient and powerful, it helps us interpret, I mean, when we do a t-test, it helps us interpret, but with a meta model, a formula, you can actually look at relationships more concretely and more exactly than you can if you're just doing it, let's say a t-test. And it's used very, very often. It makes the implicit models, every one of those statistical tools that you learn has a model that it's based on, an underlying linear model. I've been calling that the implicit model. It just takes that implicit model and makes it explicit. So it's been there all along, it's just where we'd be working with it now, assuming that we want to have a linear additive model here. Okay, each one of these is a way of taking input and turning it into output, right? And it helps us to better understand what we do at every stage of the game. All right, we're gonna look at how this is done using an example. This is an MM1 system, which could easily be changed to an MMS system, but right now we're keeping it to MM1. There's an order taker using an old-fashioned black mobile phone. I don't know why, but okay. This is a call center. Calls come in, completed calls go out. And there might be some waiting time. We simulate the system, we validate the system, we run the system. It's a very simple simulation of an MM1 system. And here's what output data we got. We have 10 replications in each system variant. Okay, how many system variants do we have? Well, here's the design matrix that we used. Arrival rate could either be 9, 12, 15, or 18, it would two nines. Service rate 10, 12, 16, or 20, there's two 20s. And you know the basic rule of thumb is that you want arrival rate to be less than the service rate. And so these design points were chosen very carefully. Servers is here as of X variable, but it can be ignored because it's the same all the way down. And so each one of these, one, two, three, four, five, each one of these five system variants had 10 replications. The simulation ran for 10, 10's not a lot, as you know. And still we found some interesting results, even with 10. We're looking at the three basic system variants measures of effectiveness. L, the number of, in this case calls, the number of calls in the system, W, the amount of time each call spends in the system from arrival to departure to when it goes out. And utilization of the server, it's only one server. With arrival rate nine, I'm thinking these are minutes. Haven't really, I don't remember, did it say? Let's say minutes. With arrival rate nine and service rate 10, we had 8.84 calls in the system on average. With arrival rate nine and service rate 12 that goes down dramatically. Well, that's to be expected, right? And these are the standard deviations. With arrival rate nine, service rate 10, the amount of time a call spends in the system is 0.988, I'm assuming I'm still using minutes. I'm gonna just be consistent, let's say minutes. I really should go back to the original data and check it out. And here are the standard deviations. And again, this drops the time in system, drops once the lambda of the system is more, you know, less crowded. Utilization doesn't change as much, but it starts out at almost 90% for the first system, the very crowded system. Okay, so this is our simulation data, you're used to it. You know, there was a lot you could do with this data. You could even really build a regression, right? Where arrival rate is X1, I'm sorry, not a, yeah, well, that's right, these are the X's. Arrival rate is X1, service rate is X2, we could probably ignore X3. And these are our three response variables, our three Y's. So yeah, this is what I was talking about. So let's say we do that. The regression model or what you might call the general linear model might have Y equal to B0, you know, some constant plus the sum of the factors times the coefficients, that slash is not supposed to be there, and some sort of an error term. And Y could be either L, let's say Y, that's Y1, W, or utilization. Those are the three output variables, the three response variables. What's the main assumption here? There's more than one, but what's the most critical assumption and the most troubling assumption if we do it this way? The most troubling assumption really is that we're assuming that this MM1 system is linear. It's linear, that a linear additive model can represent the data in the simulation, which means that it can also represent the data in the real world. Now, no one has ever said that an MM1 queuing system is a linear additive system because it's not. You might be able to fit a regression model to it because you can fit a regression model to anything. Of course, you'd wanna test it for fit to make sure the model is a good fit to the data. I'm not even gonna bother with that because the kind of system that we have, the typical queuing type of system, I know is not, just from research, if you look at the research, you know it's not linear. It's much more likely to be some kind of multiplicative model. Let's just look at the top row here for a minute. MOE is some kind of someone, each one of the Y's is an MOE, right? So I'm calling it MOE right now. And if alpha is the constant and we have our X, X1 arrival rate, X2 service rate, we're ignoring X3, and new is some kind of an error term, then basically we're creating a multiplicative model. We can hypothesize this model based on other people's research. And we could also try it. I mean, we can do this empirically and say, well, let's try this and see how it works. We should be able to test. And it turns out that a multiplicative model works on a lot of different levels. One way it works is we can actually turn it into a linear model and regression is so easy to do. We know how to create regression equations. We don't really know how to create non-linear equations. All right, so one way to do this is to say, here's my hypothesized model, the one on the top. If I take logs of both sides, I haven't changed anything, but what I've done is turned this into a linear model. The natural log of an MOE, the natural log of one of the Ys, is equal to each of these pieces. Then remember, when you have a multiplicative model and you apply logs, you add or subtract, depending on whether you're multiplying or dividing. So you end up with an alpha factor, a beta one times a something, instead of the something being X1, it's the log of X1, a beta two times the log of X2. And then the error term, call it error term, call it log of error term, really, it doesn't matter to us anymore. Here's our change of variables, just for the purposes of the regression. The log of L, we're calling Y1. The log of W, we'll call Y2, the log of the utilization, we'll call Y3. And this is what I was showing you before in the equation, the betas, we're gonna call the, oh, this thing, the log of alpha, we'll just call it beta zero. Okay. And there's a different one for every M. And then we have X1, the log of arrival X2, the log of service and the error term. So all of a sudden, we're looking at this resulting equation and we're going, oh, hey, I have an easy thing here. I know how to do regression. And that's exactly what I did. And that's exactly what you would do. This is like the simplest kind of meta-model imaginable, by the way. The only thing is you have to test to make sure you were right about your hypothesized relationship. And we'll take a look at that. All right, here's the result of the regression, the estimated coefficients B0, B1, B2 for the response variables Y1, Y2, Y3. And of course the standard errors of the coefficients. And each one of these was tested separately for significance. And one of them was basically not significantly different from zero. If you've taken a multivariate class, you know that you can set up your multivariate regression model on a higher level, on a more abstract level or on a more overall level, then just looking at, in this case, three individual regression models, Y1 equals, Y2 equals, Y3 equals. So the Wilkes-Lambda statistic, which is really something everybody has and they have much better ones now. This is, as you could see from the typeface, this date is a little old. The Wilkes-Lambda statistic tests the significance of the model itself on a multivariate level. And you don't even have to go past this huge F statistic. But if you do, it's definitely the p-value is way smaller than 0.01. Testing X1 in the multivariate model, same thing, X2, same thing. And for lack of fit, what do we want when we're testing for lack of fit? We want the model to fit, right? Okay, we're testing for, when you're doing a fit test, you wanna see that it fits. And what you want is a very small F and a probability greater than whatever you're working at 0.05 or whatever. So all this did was a different kind of test for the model. Is the model significant? And how's the fit of the model? Remember last time we talked about experiment-wise error? And this was it exactly just to illustrate that if you have three variables, three Y variables and you think you're working at 0.05, you're not, you're really working at 0.143. And that's why the Wilkes-Lambda was done. So assuming, I know I am doing this very, very quickly, but it's not exactly the part I wanted to focus on. All I wanna say from here to here is yes, the model was good. We tested it, it was good, it was significant. So we have our equation. We have our metamodel. And the linear additive metamodel that we have are these three. And instead of saying Y1, Y2 and Y3, I went back to the natural log of L, the natural log of W, the natural log of utilization. And instead of saying X1 and X2, we have the log of arrival and the log, excuse me, the log of service. Okay, so it kind of like, what did I accomplish here? Right, you kind of say, okay, fine, I have a nice little equation. And I managed with the logs to take something that was not linear and turn it into something linear just so that I could apply regression to it. Cause I know regression, very fine. Like what's the big deal? Well, we can't really use these. Like, you wanna use regression to predict, right? So let's say we want to take a different value of X1, one we didn't have necessarily in the experiment and a different value of X2, a different value of an arrival rate and a service rate and predict what L and W and utilization would be. It's very hard to do with these logs in the linear additive model. We wanna go back to the predictive model because that's the one we're actually working with. Okay, so this is just a repeat of what we saw before. So what do we do? We take antilogs. What's an antilog? It's just, you know, e to the power. All right, so when we take antilogs cause we look at these equations, if the antilog of the left-hand side would be e to the log of L, just left with L. And you apply e to the right side and do a little algebra and we get this. And it looks familiar, right? We're back to the multiplicative model that we had before, but now we have the coefficients, the, what used to be the coefficients in the linear model. The first equation turns out to be L. L can be computed as the arrival rate divided by the service rate, which by the way, we know was just lambda, right? The arrival rate divided by the service rate raised to a power of 5.8 times e to the 2.8. That's one equation. And of course, there's always a little wiggle room cause there should be an error term. With W, what we have is e to the 2.8 times the inverse of arrival and then times lambda raised to 5.8. Utilization looks very familiar, right? What's utilization in queuing in an MM1 system? If anybody is courageous enough to unmute themselves, what is utilization? It's just lambda over mu, right? By the way, over here, what I meant to say was lambda over mu. It's utilization, that's utilization, lambda over mu, arrival over service, that's utilization. 0.97 is how it turned out from the data. 0.97 is very close to one, but the data said 0.97 and regression is very much a database technique. You can take anything and create a regression from it. But we happen to know, you don't always know this, but we happen to know because we know the MM1 system very well. That arrival divided by service raised to the power of one is actually utilization. Another thing we have when we look at the equations is you can say, you can refine L, redo L, because if you say utilization is arrival over service, right? You could say that L is e to the 2.8 times the utilization raised to the power of six-ish. And you could also say that L is equal to lambda times W. Let's see, L, let's see, arrival, if you take W, okay. I know I did this, but I should have had the way I worked it out over here. Take my word for it, it works out. We should really do it outside of class. I'm not gonna spend time doing it again now unless somebody really wants me to. But basically you can show that lambda times W, it can compute L. There's a relationship here. We can relate L and W by lambda by the arrival rate. Isn't that interesting? Well, somebody else figured this out and it actually has a name. It's called Little's Law. Now, why am I so excited about this? Let me go back up a minute. You know why I'm so excited about this? Here's why I'm so excited about this. Because if the real system is our MM1Q, right? Then Little's Law holds in the queuing system, it's actually a law that comes from an examination of the mathematical models used in queuing. That's the real system, okay? And if I skip the simulation model and I look at my data and my data actually came up with this thing called Little's Law. L is equal to lambda times W. What does that say about my whole process? That's kind of like saying, I've just validated my simulation model because I use data from the real system to create the simulation model. I use data from the simulation model to create my metamodel. But all of a sudden I can prove that my metamodel has a characteristic that's supposed to be true of the real system. I mean, it's a very, very interesting finding. It's an incredible finding. It makes you feel good about the whole simulation process, not to mention about the multiplicative model that you then took logs on and applied regression to, which you kind of say, I have no idea why that works. And then all of a sudden you go, oh, hey, this actually is a feature of the real system and I know it's been proved repeatedly, okay. So what do we benefit from this? Imagine you're trying to study the real system, you're trying to study a particular system, you build a simulation model, you want to do certain things, all of those different kinds of experimental designs and the experimental design lecture, you want to do them. You might want to estimate, you might want to compare systems. You might want to optimize, you might want to say, what's the, what value of the X's do I have to have to come out with the best value of Y? So you might want to optimize. All of those things are easier to do with an equation than with a program and with a simulation program. If you could take the simulation program, generate data according to some kind of table of design point. So you need a good design for that. Generate data, use that data to build a model and equation, no matter what the form of the equation is, you could throw away the simulation. You don't have to go back to the simulation to work with it anymore. So let's, let me just go, I don't know if these are in any particular order. Yes, I have simplified my model tremendously. I can explore and interpret the model better than I can with simulation. You look at, I can look at the slope terms. I can look at things I can't look at with the simulation model. I can optimize and I don't, I'm not limited to the system configurations that I built the meta model on, although we all know the danger of extrapolating versus interpolating. I can get a better understanding of the relationship between the input and the output, which is what we really, that's all we mean by system behavior. I can generalize to other models of other systems. If I have an equation, I could say, oh, look, this was a linear model or oh, look, this was a multiplicative model. Other systems, what do they have? And other people, there has been other research about systems, not necessarily simulations, but other systems. So once you know the form of your meta model, you can relate what you've learned to the research from other systems. You can do sensitivity analysis. It's much easier to do that using an equation than using simulation. You don't have to keep doing repeated runs. Same thing with what if questions. What if I try this particular level of X and this particular level of Y? It's easier to answer inverse questions, which I mentioned before, like the optimization. If I want a particular value of Y, what X do I need? And of course, I can test many hypotheses without running costly additional runs. How do we validate a simulation meta model? Well, we kind of saw one way in a sense, if you can come up with something like Little's Law, you've basically validated, although you can't depend on having that kind of luck all the time. You want to look at internal validity and you wanna look at external validity. Is the meta model a valid representation of the simulation model? Okay, so at the very least, you wanna say, can I prove that my equation really does represent the simulation? And you do that by doing the fifth test and testing the model. Then you wanna ask about external validity. Is this meta model a valid representation of the real world system? And here's where we may want to actually collect data and test it against the real world system or in the sense of the MM1Q, we know the true averages of the real world system. And we're gonna do that in a minute. Okay, so the simulation meta model is database. One of the, I don't know if you've studied this, but one of the best ways of testing your regression model, because remember, when you do regression, you will always come up with a model. Okay, so the question is, how do you know it's the right model? One way to do that is to take a holdout sample. You have your data, don't use all the data for your regression. Use two thirds of the data, let's say. And then the last third that you held out, use it to test, see if you can predict the y's with those x's. Okay, see if you can predict what you got already. And that's what you do with the holdout sample. Now what some people do is they just split their data in half and they use each as a control for the other. They'll develop a regression equation with one and test it with the other and then switch and see what happens. We can look at the mean absolute percentage error in terms of comparing the prediction that we should have gotten compared with what we got. And this is very, very elementary. I would hope that some of you in your regression class, if not all of you, have learned techniques like this. But with the meta model, we not only have the data we use to build the model, we also might have information about the real system. And that's what we wanna do with external validation. So we not only wanna prove that the meta model is a good representation of the simulation, but we wanna show that it's a good representation of the real system so that any conclusions that we make from the meta model can be applied to the real system because that's actually what we're studying. We don't really, we're not studying the simulation model, except as it helps us understand the real system. So you could take historical data and test the meta model by using the Xs to predict the Ys. But in our case, in this particular example, we know the real system. We know everything about the MN1 system that was simulated. So here's an example of external validation when you have a theoretical system to compare it to. Here are various levels. And notice it's not only the design points that we used to build the meta model. We have an arrival rate, a service rate. We made sure that arrival rate was always less than service rate, that much we did. Here's the analytic values. Here's the, when you use the queuing formulas and you put in the particular arrival rate and service rate as lambda and mu, here's what you get for each combination of arrival rate and service rate. Here's what you get for L, for W and for utilization. And when we take the meta model, and I'll just remind you for a minute of what the meta model looks like, it's this. This is the meta model, okay? Separate equation for L, a separate equation for W, a separate equation for utilization. And so if you take the meta model and apply arrival rate and service rate, you get these values for L, for W, and for utilization at each of these combinations of X1 and X2 or of arrival rate and service rate. So now you just need a way to, we need a metric to say something like, when arrival rate is eight, service rate is nine, the value of L should be eight, I got 8.06. How good is that? So that's what we're looking for. The measure we used is average absolute error, which basically is meta model minus analytics. So it's the deviation between the output from the meta model and the analytic long-term average divided by the analytics. So it's basically what you're looking at is a proportion of the true value that it was supposed to be. If it's exactly the same, if you got exactly the analytic value, that's a hundred percent, right? So that's what this is. It's a proportion, I think I left it here as a proportion instead of a percentage. It's a proportion of the universe of one. So depending on how far away the value of the meta model was from the analytic, and over all the system configurations in the range studied, the average absolute error for L was 0.0771, for W was 0.0772, and utilization was really good. It was 0.005. And I'm missing something. There's a way to determine what's good. Obviously, these were good. I think since they were less than 0.1, it was considered good, but I'm gonna have to try to get that and put it back in the set of slides. So if you need it, you should be able to find it in a week or two. But basically this was good. Okay, what I don't have here is how I knew it was good. Look, you know, first of all, it's way, way from 100%, right? Well, actually, no, that's not good. We want 100%. Meta model minus analytic over as a proportion of analytic. Okay, I think I'm missing something. Maybe it's missing something. I gotta get back to you. I did, it's not like I didn't review this before class, but I missed this. Okay. Basically, if you learn nothing from me all semester, you'll learn it's okay to say I don't know. That's my, that's another one of my favorite lines. This is just a summary. We've got the real system that inputs, X's and outputs the true averages. You've got the simulation model that inputs X's, the control variables and outputs the responses. And then of course, you've got the simulation model which does the same thing, but it does it in a much simpler manner. Something is overwritten and so I think I saw that. Thank you. Yes. This is looking at the residuals. And it's kind of the same thing, residual over Y. But yeah, thank you. It is overwritten. I'll have to fix that. You can see this is a work in progress, but I hope at least it interested you and gave you an idea of number one, why an explicit metamodel might be a very valuable tool, but number two, how any of the statistical techniques that you're using anyway for your simulation to analyze your simulation data are really in a way applying an implicit metamodel because they all rely, they assume an underlying linear relationship. Yeah, thank you very much. Any other errors that you find? And of course there won't be any errors, but just in case, please send them to me. I'm gonna stop recording now so that if anyone is shy, they won't mind There we are. More. Professor, could we look at slide number 10 at the Wilkes-Lambert results. So the p-value is below 0.01 for the model x1 and x2. Right, the p-value will come out on the whatever statistical software you're using, but it was very, very low. And I didn't even, basically when you look at research papers, they'll usually just say something like this. It was so small that I didn't even bother, it was less than 0.01. So that means that the model has validity. It means that the model about validity, it means that the model has an effect. It's a significant effect. It has explanatory power. Right. So it's just, I don't know what the null hypothesis is here. Have you had the multivariate course? Okay. So I'm not that interested in going into a multivariate course, but you're right. It's not a bad idea. Says tests of multivariate hypotheses. I'm gonna put it together and I'll put it in this slide. Thank you. It is a good idea. The lack of fit being such a high team means the model has, as I say, as a family model. Yeah, basically you're looking for the high P in that case. Yeah, yeah. Yeah, yeah. That's the only one that we want to say there. Yeah, the coefficient on that is zero.