 the recording. So welcome back people on YouTube and also on Moodle, if you're watching it on Moodle or on YouTube. Just a couple of words about block design since last time we talked about randomization two weeks back, and I wanted to go a little bit more into detail in block designs, and it's going to be a couple of slides. It shouldn't be too long, but we'll see how far we get. All right, so first I want to talk a little bit about nuisance variables, and then about blocking, and especially blocking versus randomization. And then just run through a couple of block designs, and in the assignments data from HABA that's dropped, because this used to be a lecture contributed by a student, but I kept part of the lecture, but the assignments I moved out. So you can ignore the assignment part. All right, so nuisance variable is a random variable that is fundamental to a certain probabilistic model, right? So we're building a model and we're trying to model some kind of an event, but a nuisance variable is defined of something which is of no particular interest in itself, or which is no longer of interest, because we already know the effect that the variable has on our measurements, so we're not really interested in it anymore. But you can think of it as, for example, the temperature outside. Often when we do things like microarrays, or we do things like sequencing, there is an influence of the outside temperature on the experiment that we're doing, but this is not of interest. We don't really want to study the effect that temperature has, right? Or for example, we have a field design, so we are planting potatoes across a field, and we know of course that every part of the field is a little bit different, so there might be a little bit more rain, or it might be a little bit more sunny on some parts of the field, but that is not of interest, right? We want to study the yield of the potatoes, and we want to study, for example, how different types of potatoes perform on the field that we have selected, and we're not really interested in that there are little variations on the parts of the field, right? So it is information that is not of direct interest, but which needs to be taken into account when you analyze your data, right? So why do we want to do blocking? So blocking can be used to reduce or eliminate contributions to experimental error contributed by nuisance factors. So at the nuisance factor is something that we generally want to block out. So when we have an experimental design, and then we can say, okay, so we have a field and we know that every part of the field will be a little bit different. So what we do is we divide our field into smaller subfields. And inside of the smaller subfields, we assume that data is more homogeneous, because it's smaller than just a big football field, because we're now just looking at 5 by 5 meter, 25 square meter areas, right? And then because we now have kind of blocked this, we have blocked, for example, the north south axis of the field. So we can use homogeneous blocks in which the nuisance factor are held constant. And the factor of interest is allowed to vary, right? So we have a whole football field. We divide our football field from north to south into several subfields. And we also do that from east to west. And so then we know that all of the different blocks that we created on the south side are on the south side, the ones that are on the north side, we are, we know that some of them are in the middle. But we assume that the effect of our nuisance variable is more or less similar to the blocks on the north side of the field. And it is similar on the south side of the field. And we have different fields so we can compare our factor of interest, for example, the different type of potato that we are planting. And then we can investigate the yield. And all the variants that will come from some plants being on the north and other plants being on the south will be captured by this nuisance variable. So by this blocking factor that we are introducing. So what are the advantages of a block design? So the advantages of a randomized complete block design is that it is flexible. You can create any number of treatments. You can create any number of blocks. And it provides more accurate results than the complete randomized design due to grouping, right? So it's a relatively easy statistical analysis even when data is missing. And it allows you to calculate unbiased errors for specific treatments because we don't have, for example, a sunshine effect which might be bigger at the east side of the field compared to the west side of the field. And so compared to randomized designs, block designs are more powerful and are more accurate because we are taking something into account by blocking it instead of by randomizing it away. So the disadvantages of a randomized complete block design is that it is not suitable for a large number of treatments because the blocks become too large. So the advantages of the randomized complete block design are these and the disadvantages are that it is not suitable for a large number of treatments because blocks would become too large. Because for every block that we introduce, so for everything that we are blocking, we also have to introduce the treatment inside of these or across these different blocks, right? So if we have seven different treatments and we are blocking our field by three different factors, then of course we need to have 21, so three times seven blocks to do all of these seven treatments under these three blocking factors. And it's also not suitable when complete blocks contain considerable variability. So if our blocks are too big, right? If instead of looking at a five by five area, we actually look at like a hundred meters by a hundred meters, then that is of course not really blocking all of the random variation that we are interested in. And of course there might be an interaction between the block and the treatment effect. For example, we might have one type of potato which grows better on the south side of the field compared to the north side of the field. So if there's an interaction, if the potato type and the amount of sunshine has an influence on each other, then also blocking can actually increase our error in our analysis. So we have to be very careful when we use blocking compared to when we use randomization. So blocking is the arrangement of experimental units that are similar to one another and the blocking factors are a source of variability that is not of interest to the experiment. So we can't block the thing that we're interested in. So an example might be that it's, for example, sex. So the human or males are different from females. So we want to, for example, say our study is blocking by sex. So we give males, so we take a hundred males, fifty get a treatment, fifty don't get a treatment, we take a hundred females, fifty get a treatment, fifty don't get a treatment. And when we do the analysis, we do the analysis across males and females, but we treat sex as a blocking factor. So the differences between males and females will be caught by the sex factor that we are blocking on. And of course by blocking on the source of variability we get better accuracy in our results in the end. And so the general rule when you talk about blocking versus randomization is block what you can, randomize what you cannot. Right? So if we have, because generally at the start of an experiment you know that some things are going to be of influence, but there are also always things that you cannot anticipate beforehand. So the things that we cannot anticipate beforehand we are going to try to randomize out, but the things that we know will cause differences like males versus females or north side of the field versus south side of the field, those things we try to block. So we make sure that our treatment is assigned equally between males and females and that we don't end up with taking a hundred males, a hundred females and all of the females by randomization get the treatment. Right? Because randomly that might come up. Because if we just really randomize then of course we could end up a hundred females get the treatment, a hundred males get the control, but then of course we have not blocked. Right? So blocking is important because we can then say no within each block we are going to randomly assign the treatment instead of randomly assigning the treatment across the whole study group that we are using. So the complete randomized design means that subjects are randomly assigned to a treatment. Right? So this is just, this has no blocking factors. For example we have a thousand people, we have one vaccine, randomly assign 500 people to get the vaccine and 500 people get a placebo and then we randomize this to control for nuisance variables. Right? Because we have to make sure, or not so much have to make sure, but when we assign the people that get the vaccine and we assign them randomly and then we assume that by randomly taking 500 people getting the vaccine and 500 people getting the placebo then we actually hope that by randomizing it that the 500 people that get the vaccine have as many smokers, percentually speaking, to the 500 people that get the placebo. And there's, if we do this for other kind of nuisance variables, which might be people taking a certain drug, we assume that by randomizing we have the best chance of having equal kind of nuisance in both of the groups that we do. And so we assume that on average external factors will affect treatment versus control equally. Of course this is not guaranteed by randomizing. Right? Randomizing might still lead up with 500 smokers in the vaccine group and 500 non-smokers in the placebo group. But normally if you would have a thousand people and you would randomly assign them and then you would expect that by randomly assigning we don't get an overrepresentation in smokers in the one group versus the other group. And this holds also for like medicine use and age and weight and BMI and all of these things. Right? By just randomly assigning we hope that most of these effects will be equally divided by both groups. Of course the randomized block design, we have the experiment, we first divide things into subgroups. Right? So again the same example as that we had before. Imagine that we have a thousand people, we have one vaccine and we block now on gender. And then what do we do? We randomly assign 250 males to get the vaccine, 250 males get the placebo, 250 males get the vaccine, 250 females get the placebo. Right? So this is just a very common sense thing to do. Right? And everyone knows that males are different from females. And so we have to test the vaccine and we have to block on sex. But of course, like both structures have their advantage and have their disadvantage. Hey, you can see that if we're testing like the vaccine in 10 different dosages, then by blocking also on the dosage, we now end up with a problem because we now start making groups which are really small. Right? We have 25 males getting the vaccine at a certain dose, 25 males getting the vaccine at a higher dose, 10 different dosage level. Hey, because we now have one blocking factor and then if we would introduce another blocking factor, then these two kind of multiply together. Yeah, so now we have sex as a blocking factor. But if we have another, then we kind of exponentially or not so much exponentially, but we multiplicatively increase the number of groups that we have to define, making each group smaller. So the model for randomized block design looks more or less like this. So we have the Y, which is our measurement is determined by the global mean, so the general location parameter. So mu, which is the mean, plus the Ti, which is the effect of the treatment, plus the effect for being in block J. So the treatment I block J, and then we have some random error. And so this accounts only for one nuisance variable. Of course, we could have block J block. So we can have multiple blocks if we wanted to have, but of course, we still have to randomize within the blocks. So but this is the general model that you use to analyze kind of block designs. So we can also do a matched pair design, which is also kind of a block design. But this is a special case of a randomized block design. It is usable when the experiment has only two conditions and participant can be grouped into pairs. So have which in each pairs, participants are randomly assigned. Again, when we look at the example that we had where we have 1000 people, we have one vaccine, then we can also do matching to kind of block out the effect of things like sex or age, right? So we just say, well, we take two women who are both 21 years old, one of them gets the vaccine, the other one does not. And we randomly assign who gets the vaccine. Then we take another pair. So for example, two men who are both 34 years old, one gets the vaccine, the other one does not. And by doing it like this and creating 500 pairs like this, then we reduce a lot of the age variance as well, right? So matching is more or less the same as defining blocks, but a match is just a you do a pair of two things. I hope that's clear because it's kind of a block, but it's not really a block, right? Because we create a whole little bitty bunch of sub blocks where every block just has two people in there. Of course, we still block on males versus females because we won't we won't pair a 21 year old male with a 21 year old female. But we take out of our block, we take two individuals, and then we match these two individuals based on some other nuisance factor, for example, the age. Then we have something called the randomized complete block design, which is the standard design for agriculture experiments where similar experimental units are grouped into blocks. It is used to control variation from an experiment by accounting for different spatial effects that occur in a greenhouse or in a field. So if we have a field, then we know that if the field is big enough, there will be variations in fertility of the soil. There will be differences in drainage. There will be differences in sunlight. So had the complete the randomized complete block design, the RCBD is there to kind of account for all of these like small variations that occur across an experimental field. And then have by doing this, we can block out most of the effects that come from these kinds of nuisance variables. Because generally, we're not really interested in fertility of the field. We're not really interested in the drainage of the field ever interested in which potato species has the most yield on a certain soil, for example. So we might have like three different soil types or three different fields, one in clay one in and but on these fields on the clay field, there will also be variations in fertility. So by blocking across the field, we kind of reduce the variance that come or we reduce the effect of the variance coming from fertility or drainage or sunlight. So how does an RCBD work? We take a space. So that means a field. It is divided into uniform units to account for any variation, so that the observed differences are largely due to the differences between the treatments. And treatments here can of course be real treatments, but they can also be taking two different strains of potatoes or tomatoes. And then the treatments are assigned at random to the subject once they are in each block. So the this is the defining feature of a randomized complete block design is that each block sees each treatment exactly once. So how does this look? So for example, we have a head so the model for a randomized complete block design looks like this. So this is our field, right? So in this case, we define our field from east to west into five different blocks. And then we make for each block, we make a sub block. And we get so here we don't correct for the north south effect, we only correct for the east west effect to block. But in block one, we say we randomly assign treatments to each of the blocks, right? So we have, for example, potatoes here, here, here, here, and here. And once we have put all the potatoes in all of the blocks, then we say, Okay, so now we have four treatments. For example, we do something with the potatoes like we spray them with chemicals. And we spray them with four different chemicals. But block one, treatment one, treatment three, treatment four and treatment two. And this is then the randomization, right? So for the, so the east west axis is blocked. And the north south axis is randomized across the different blocks. Is that clear? Because then we have why I j is mu plus the treatment that we apply plus the block that you're in plus some random error. And the block corrects for the east west effect. And so we have a big field, we divide it into five, more or less, well, squares, and within each squares are five different, yeah, five different squares. And within each squares, we define a rectangle which has 100 plants in there. And all of these 100 plants get the same treatments. So there is a special type of design, which is called the Latin square design, which is used a lot. And it is a special case of a row column design for two blocking factors. So if you do a field experiment, you generally use like a random complete block design like this. But often you don't just block one axis than the east west axis, but you also want to block the north south axis. So it is a special case of this, this randomized complete block design, have, for example, a plot of land, the fertility of the land might change into directions due to soil or moisture gradients. And often you would want to take the north to south axis and the east to west axis on the field. So how does it look? It looks exactly the same as the other one. But now when we when we look at a row, then each row sees each column or sees each treatment only once. And the same thing holds for the columns. So there is no real random assignment anymore. We just say no, on row one, we put A, B and C. So because we now put A already here in row one, column one only can see B and C. And we could flip these two around. But there, once you've chosen a certain layout, then the layout for the other parts of the blocks are fixed, right? So you have A, B and C, then we have B, C and A, and C, B and A. But if you look at each row, then each row sees each treatment once. And the same thing holds for each column. So each column also sees each treatment once. And that is called a Latin square design. And the Latin square design is a very, very powerful design and is used a lot in field research. So hey, you have a field, you define it. If you have three factors, you define it into a three by three square. And if you would have four treatments, then you would divide it into a four by four square. So you would have 16 more or less sub blocks. So the Latin square design, you can analyze a little bit differently. We can say that the measurement that we have is generated by the mu, which is the overall kind of fertility or mean of the field. And then we have the row effect for being in row I. We can calculate the column effect for being in column J. And then we can have the treatment effect for being in treatment K. Right? Because each row and each column sees the treatment once. We can then calculate the treatment effect very, very accurately because we are correcting for the row effect and we are correcting for the column effect that we have in our field. When we have more than two blocking factors, then we can do things like the gray co-latin squares or we can do the hyper gray co-latin squares. And these are all very complex design and these are all having very complex statistical analysis themselves. But if you are interested in field designs or how to design experimental layouts for fields, then I would definitely advise you to read Fisher's book, which was published already in 1966, which is called The Design of Experiments and there he explains how to design experiments to get the most statistical power out of your experiment. And of course, experimental design is a field of study of on its own. So you can make a whole career not doing experiments, but designing experiments and studying different designs for the same experiment. But when you have two blocking factors, generally people go from the left and square design. If you have one nuisance variable, which you know is going to be of a big interest, then you generally try to block it out. For example, when you do vaccines, you generally take males and females. And then of course, you not randomly assign across the whole group, but you randomly assign the males and you randomly assign the females separate from each other. So that is what blocking is. So blocking is defining things which have an influence, which are not of interest. And you just kind of block them away by saying, no, I'm just going to first divide before I start random assignment. All right, that's everything that I kind of wanted to say about block design and about blocking factors. Because we talked about randomization before and we talked about like what is good randomization. I thought that we should have at least a couple of words about the blocking. And since blocking is used very, very much in any kind of experimental design, I think it's important that you know what a blocking factor is. And I think it's important that you know how it is different from randomization. So both are techniques to kind of get rid of unwanted variation in your experiment. But the blocking design is just much better at getting rid of variation because, hey, you know that something has an influence. So you're now going to control for this. All right, that's it for today. Are there any questions about block designs or any questions about creating our packages or more questions about the assignments, questions in general, then speak now or wait until next week, Tuesday, when we will have the talking about the assignment hour where you can ask questions about the different assignments. I will put the assignments on Moodle as well. And I also will put my package, so the your package name, package on Moodle, so you can download it from there and you can look at it. Right, doesn't seem like there's any questions now. So thank you for being with me. Is the book still actual? Is the book still actual? Yes. It's science. Like, the stuff that Newton did is still relevant, right? Newton's theory of gravity didn't become invalidated by Einstein posing a new theory. Right, that's that's how science works. Science like things that you find out in science are laws of nature. So they are always accurate, the way to design experiments and get most statistical power out of it. Of course, there are new designs that have come up since 1966. But in 1966, we already had a very, very good idea about the different types of designs that you could build up for two or three or four factors. And of course, there has been progress as well. But the progress that has been made doesn't invalidate the old knowledge that is still valid. So it's not the most actual not the most actual book, but it's still valid. And it's still a good book to read. And but we did not use a lot design from the 19th century. We actually do. If you do field research, then field design hasn't really changed that much since Mendel. Mendel also blocked his piece. And he also randomized. So and the designs that we like, and that's generally the thing like, I think nowadays people, people forget how good the classics were, in a way. Because like in the 18th century, or in the 19th century, research was done, and that research was done very well and very valid. And in many cases, it was a lot better than what we are doing nowadays. Nowadays, we generally rely on statistics to save a poor experimental design. While in the 1919 beginning of 1900s, they would not hear like nowadays, we do things which are very expensive, right? We do, for example, whole genome sequencing. But then because this is so expensive, we only do 15 animals, or we do 50 animals, right? But in the 19th century, when we first started doing genome analysis, we used very cheap techniques. But then we would use a sample size of 25,000. Nowadays, no one does an experiment anymore, where you put 25,000 plants on a field, and then start treating them with 10 different treatments. So in a way, we used to, or in science, the science used to be very much more accurate, because we just use large and large sample sizes to kind of overcome all of these nuisance effects that we have. And nowadays, because we have climate control chambers, and all of these things like greenhouses, and lighting, which automatically turns off and on, and all of these things, we are controlling the environmental variables better in a way. But because we are doing that, we are also reducing our sample size accordingly, because then the experiment itself becomes cheaper. But of course, from a statistical point of view, sample size is everything. It's better to have a design out on a field where there's rain and weather and all kinds of other effects, but having 25,000 plants, then putting 250 plants in a greenhouse. In a greenhouse, the environmental effects are less. But the sample size is only like a tenth or a hundredth of what we used to do. And that is of course, a big drawback, because in statistics, the power comes from sample size, and from proper design. And I've seen people nowadays to fuel experiments, and not blocking for north, south, or east, west effects. And then in the end, being surprised that they don't find any effects or that it's very puzzling to figure out what's happening. And that is because they didn't read the classic books and just didn't decide to use a Latin square design, which is kind of what you want to do. You want to use a proper experimental design, which is tested and validated. So that would be my answer to it. It's kind of the same as when you look at space shuttle development, right, or space things. We don't put the newest generation of computer chips in our equipment that we send to space. Why don't we do that? Because it's better to use technology from the 1980s, which has been used a lot and where we know all of the ins and outs and quirks and weirdness, then to just put a very powerful CPU in there, but then having it screw up, because there's some kind of a weird bug that we haven't encountered yet, because it only occurs once every five years, right? So then it's better to use the old stuff. So I would say that reading the classics never hurts. And it is science. So even though new things might come up, and but they never invalidate all the results. And the theory of gravity that Newton has is still valid today. It's just that the one that Einstein has is more broader applicable, like it works on a satellite as well. But if you're on earth, and you're doing like experiments where you're dropping stuff from certain heights, then Newton's theories perfectly accurate and perfectly valid to use. So I hope that answers your question. I'm going to stop the recording here as well. So people on YouTube, see you next week and people on Moodle. Thank you also for watching on Moodle.