 So, when working in environmental engineering, there's often a series of processes. For example, the process where we clear water, where we clean water or something like that. And in these processes, there's a number of different things that you can change, a number of different independent variables that you can change. And those independent variables will ultimately affect some sort of dependent variable. For example, in water, the dependent variable might be a measure of water quality, such as turbidity. And you might adjust the number of different variables to sort of affect the overall turbidity at the end of the process. Well, when we're working with different processes, we might need to just determine, we might want to make some change to some of these processes and see how those changes affect the overall objective, the overall output. So we want to approve a process, and we have several proposed changes, and we want to see which of these affect the process in the desired way. Now, the sort of standard scientific method is to determine the relationship between sort of an input or an independent variable and an output, a dependent variable, is to leave everything else the same to hold everything constant, except for the thing that you're interested in looking at. However, if there are multiple things, multiple proposed changes, you're looking for an improvement, but you have multiple possibilities, you may not have the time or the resources to repeat every single or to test every single one of the possible proposed changes. And so that's the idea here is to discuss some way that can be both efficient and yet give you some indication of the information that you're looking for. This can be particularly difficult if some of your input variables react with each other. For example, if you change two things, but there's some sort of interaction between the two things that you change, that leads to either they counteract each other, or they somehow multiply each other in a way that's unexpected. So we're going to use an example here. We're going to talk about a process for ice cream. So there's an ice cream manufacturer, and this manufacturer is finding that ice crystals are forming in the ice cream. And if any of you have ever had ice cream that has quote unquote freezer burn, you'll find that certain ice crystals in the ice cream make it not taste as good. It's not mixed the right way, and it generally tastes much worse after it has this freezer burn or these ice crystals. So the staff sits down and comes up with some ideas that maybe that's being stored at too high of a temperature, or maybe it's sitting in the freezer case either too long or not long enough, or perhaps it has a sugar content that is too low, and maybe additional sugar will help reduce those. There's some ideas about the chemical processes, so maybe they could change some of the things. Now obviously, they have some limited ability to test all of these things, because if they do test them, you're wasting a certain amount of ice cream, and maybe even creating an entire process to add new sugar content that could be pretty expensive to test all of these things. So they're trying to figure out the appropriate and efficient way of seeing whether these things improve the overall quality of the ice cream. So they look at some reasonable changes here, and they're going to say, okay, let's start with what they call our low parameters here are going to be what they consider to be the bad quality situations, okay, and the things that the high parameters are what they're going to assume improves the quality. Now that may not necessarily be the case, but we're going to go with that, okay. So they assume that the temperature of storage that if they lower the storage temperature that should improve the quality, or they think it might. If they allow more time in storage, they're going to assume that that's going to improve the quality. It may or it may not, and similarly with the sugar content, they're going to try to raise the sugar content to see if that actually improves the quality of ice cream. So they identify two things. And notice in this case, they're creating what's called binary systems for each of these things. They could try a whole series of temperatures, 30, 29, 28, 27, 26, and test all of those. They could do the time in the store from one week and then do two weeks and then three weeks or they could add days in between. They could try the sugar content at 10% and 10.5% and 11% and 11.5%. But obviously all of those variations add to additional possible tests. And what they really want to see is just whether or not they get improvement and how much for any of these changes. Or maybe it makes things worse and they'd like to know how much it makes things worse. So they're going to look at each of these changes and only basically keep two parameters. What we're going to call the baseline, which is sort of the way things are done now. And then we're going to have some possible changes. That's what the high end is going to represent are the proposed changes they're going to make to see how it affects the condition of the ice cream. So the simplest test here is to start with the baseline. And we're going to let the baseline be represented. Go back here. The baseline will be represented by number one and the changes will be represented by number two. In other words, the low values number one, the high values are number two. So the first thing they can do is they can do a test where all the variables are low. They're all there at the baseline. They're what we're going to have now, basically. One, one, one is what we have now. And then they could go through each one and increase. We'll increase the first variable to two, see how that affects. We'll increase the second variable to two, see how that affects things. And we'll increase the third variable to two and see how that affects things. That's the simplest case, and it kind of follows with our standard scientific method, that you leave everything else the same and you just change one thing at a time. So there's some problems with that particular test. The first thing is that each condition is only tested once. And let's say that there's something that we're not controlling for. There's some variation that happens in there naturally, and we don't know why. By only testing each thing once against the baseline, you don't have any sense whether or not there's a lot of variability. If there's slight changes in how you measured something, you have no idea whether or not there's lots of variation in your results. And you can't really use statistical tests to test things, to see which one is more important than another. So in order to resolve this, typically what we would need to do is repeat the tests. As you know in any science class, if you really want to get good measurements, you repeat it three, five, seven. Some number of times gives you a better sense of what your output is. But if you do each of these three times, you're running at least 10 tests. One for the baseline, plus three times three, three tests for each of the other three variations. And the other problem here is that each condition is only tested against the baseline of the other conditions. So you don't know if there's some sort of interaction that works in between the things. It's really difficult to see those interactions because you haven't varied things together. You've only varied each thing. And basically we're looking at each test compared to the baseline. So we're comparing the 111 to the 121. We're comparing the 111 to the 211 and the 111 to the 112. In other words, all we're doing is changing one variable in each case. Well, we could be a little more complete than that. What we can say is how about we test all the possible switches? If you think about each of these cases as being a switch where you have high and low, in other words, one or two, we have three of those switches. We have one switch for the sugar content. That's that switch here. We have one switch for the time in the freezer case. That's the second switch. And we have one switch for the storage temperature. That's the third switch. And so we can think about all the possible combinations of switches being on two versus switches being on one. Either they're all down, 111, we switch one of them on, or we switch a different one on, or switch two of them on. And you can see that this is all the possible combinations of these things. Well, that's not too bad. That's only eight tests. And that is a little bit better than our combination of the 10 tests if we wanted to have some variability. And the benefit here is that you're actually doing multiple tests. In this case, I have one, two, three, four, five tests where I've increased the sugar content and five tests where I have not. Or actually four tests where I've increased it and four tests where I have not. In each case, you have four tests where you've increased and four tests not. Which should give you some sense of the variability there. The problem with doing all of the variations. It's not a big deal when you only have three variables. But if you have six variables, now it's two to the sixth power. Which goes to eight, two to the fourth is 16, two to the fifth is 32, two to the sixth power is 64. Notice very quickly, even if you're just using a binary system where you have one thing on or one thing off, you're just testing one thing. But if you now have six parameters, you now have 64 tests. And so doing a full, complete test becomes very difficult or very expensive in terms of resources and time. So there's a famous design of experiments. There's a famous design of experiments proposed by Taguchi. Which basically sets a very specific layout for experiments. Where it limits the number of experiments to a number of parameters. To basically one more than the number of parameters. And in this case, if you have something like three parameters, you would actually have four experiments. However, it's also very specific. The layout of these is also very specific. That you end up having some two to the n minus one number of parameters. In other words, you might have three parameters, which is two squared minus one. Or you might have seven parameters, which is, so this would be two squared minus one. Two to the third minus one is seven. Two to the fourth minus one would be 15. So typically, you're looking for conditions where you have three parameters or seven parameters that you're going to vary. Although if you want to have some number that's in between, you would use the process for seven parameters. And then just have something else that would stay constant in that case. So setting it up this way, you'll notice when we do it for the three parameter example, essentially what we're doing is we're still testing the baseline. Here's the baseline. But now instead of moving one thing up at any time, we end up testing cases where we're adjusting two of the variables at any one time. So for our second experiment, we raise the second and third of our parameters. For the third experiment, we raise the first and third of our parameters. And for the fourth experiment, we raise the first and second of our parameters. Seems very simple, very simple idea in this case, that we're not just adjusting one thing and comparing it. We're adjusting two things at a time. Now that's going to make things a little more complicated because if you do see a change, you're not necessarily going to know whether or not that change was a result of the first parameter that you raised or the second parameter that you raised. But the idea here is that you're going to be able to compare it to a condition where you're raising, let's say we're talking about parameter two. Parameter two gets raised in two locations and it's not raised in two other locations. The goal here is to compare the not raised to the raised, but by summing them or actually by averaging the results that we get. So let's take a look at what happens for the ice cream. Here, for example, is the L4 sort of matrix, the orthogonal matrix for the ice cream. But what we're going to test, we only have to do four experiments. But we're going to assume that because we're going to be raising at least two of those experiments for each of my parameters that we'll get some variability that we can account for. We can see some of the variability and yet we're still keeping some efficiency in the number of experiments by only having to do four. OK, and in this case, you see we have two cases where we're considering each temperature, the high in the baseline temperature. We have two cases where we're considering each of the amount of time in the freezer case and we're having two cases where we're considering each of the types of sugar content. If we go ahead and do the experiment, you'll see our results. And in this case, the result is some measure of quality of the ice cream where that measure probably has to do with the number of sugar crystals. But where the number of sugar crystals is probably the number of ice crystals in the ice cream, where the number of ice crystals in the ice cream results in a lower quality and then there's some objective measure here. So we find out with these various combinations that the best quality of ice cream ends up being in the conditions where we have a low storage temperature, a lower time in the freezer case and a sugar content, a higher sugar content. So those three combinations give us the best result, but then our overall question sort of says which of these things, let's say you can only change one of them, or which of these things is the most important. And what we do in that case is we compare the results that go along with our high values. For example, the time in storage, all right, we see that these values here, the 30 plus 30, our low result is 10 plus 12, which is an average of 11. Our high result is 15 plus 9. Notice in one case it went up and the other case it went down. If we'd only done one measurement, we wouldn't necessarily know. And we see in that particular case that the average there of 12 is a little bit better. So the freezer time improves things. If we move from our higher storage temperature or lower storage temperature, it improves it with the difference of one unit here in our ice cream quality. If we do a similar thing comparing the freezer case, we see that our low case of 10 and 15, okay, averages 12 and a half. Whereas our high case at three weeks ends up being 10 and a half. And in fact, if we leave it in the case longer, it doesn't really help us. It actually makes it worse and that actually makes some logical sense that we have a difference of negative two in the quality. So you don't want to leave it in the case longer. If we look at the sugar content, however, we see that the increase in sugar content definitely has a bigger difference in our overall improvement over ice cream quality. And notice it doesn't necessarily make them both, I mean, it doesn't, we don't necessarily know whether that's the only thing that improves it. We've already demonstrated the storage does improve it, but this does seem to have the most improvement. Now, it seems here that we probably could have gotten more information simply by doing the additional four experiments here. And then we would have covered all of our possibilities. And that's definitely true that when we're only talking about three parameters, it doesn't seem to have a major efficiency that's sort of added there. Here's an example of our graphs of each thing. And you can sort of see how we plot this. In this case, we have the temperature plotted here. Okay. And we can see that the difference in the temperature usually actually that we should sort of plot some sort of mean that goes in between those two things. We can see that there's definitely a difference in how the storage affects things, but that difference is in a direction we don't consider positive. And then we can also see sort of the difference in how things are adjusted by the sugar content. But again, we probably could have gotten more information without too many more experiments. But where this really comes into play here is when you have a larger number of parameters up to seven parameters. So in the case of something like up to seven parameters here, you'll see with each of the parameters, we're again, splitting it into four and four of each experiment. Okay. But these are distributed in different groups here to make sure that different ones are sort of randomly associated with others. So in this case, you'll notice, for example, that parameter four is increased first with parameters five, six and seven. Later, it's increased with parameters five, three and two. Later, it's increased with parameters three, one and six. And finally, it's increased with parameters one, two and seven. So it does get increased with each of the parameters in turn at different times. In fact, I think it gets increased with each pair of parameters in turn, but it isn't always increased with all the possible combinations. For example, we don't have one being increased with three and being increased with four at any, well, we do, we have one, three and four there, but there are some combinations that will not be included here. However, this test rate trick still should give us enough variability to be able to see whether or not we have major changes in our quality based on all seven parameters and to tease out which parameters are the most effective. So one of the things we're going to ask you to do potentially in the class is to consider an experimental design for water treatment and we can keep it relatively simple where we can consider three different variables of interest. Okay, but in a case like this, if you have limited resources, for example, if you're trying to filter water at home and you have limited resources, you might be able to work with other people within your class to generate different experiments. And because we have a limited number of experiments, we can usually do this in a class setting. You may or may not be asked to do this piece. It depends on what you actually have been provided within your home kits.