 So, welcome back everyone, I hope you enjoy the fish break with the nice animated GIFs. So for the next step, we need to start organizing more, right? So and the nice thing is, is that in this case, we have three dimensions, right? So we have the dimension, which is the type of fish, so the fish name, then we have different lakes, and we have these lakes measured in three different years. So we have a really nice kind of 3D matrix structure, right? So we can use 3D and 4D and 5D matrices in R, and I think that this data set is a really good example on why you should want to use a 3D or a 4D matrix. So a little bit more organization using a 3D matrix. So since we have three very clear dimensions, I'm going to use these three dimensions and put all of the data into one big, one big more or less variable, right? But first we have to put missing values in, because having a zero there is not accurate, right? That you, if you, if you think about fishing and they did electro fishing, so they probably got a lot, but not seeing a fish in a certain lake doesn't mean that they're not there. It just means that you haven't observed it, right? So a question to you guys is, if we have 19 lakes and 26 types of fish and three different years, how many NAs do we need to put in? And it's a, it's, it's one of these questions that when you want to make a matrix like that, you need to, you need to kind of figure this out beforehand. So the way that I did it is make a nice three-dimensional matrix, right? So first I'm saying that get the NAs. So I'm first going to make a single vector with all of the values that I want to put in this matrix. And initially I want to make an empty matrix. So I'm going to say repeat NA, take the length of F names, so the fish names that we previously defined, then multiply that with the length of the lakes and then multiply that by three, which is the number of years that we have. And then I'm going to use the array function. And the array function as the first parameter takes the content. So those are the NAs that I just defined. And then we have to specify the three dimensions that we want to have. So my first dimension is going to be the name of the fish. The second dimension is going to be the lakes. And the third dimension is going to be time. So that's just three. And because it is R, we can use names. So we can name or we can give row names, column names. And then to the other dimension, we can also give names. So I'm just going to say that, well, give them names. So use dim names or dimension names is a list. And the first element of the list is the names of the first dimension, which is, of course, F names. We have the names of the lakes. And then I combine 2017, 2018, and 2020 to have them by name. And this creates a three-dimensional matrix, which we can now query. Of course, we have to fill it first with the data that we have obtained. But I think that this is a really good example of why you would want to have a three-dimensional matrix. And this will simplify our code a lot. And we will see that in the end. Three-dimensional matrix, three different dimensions. We have a fish dimension, a lake dimension, and a year dimension. And by putting it in a 3D matrix, we can kind of switch between dimensions very easily by using things like the apply function. The apply function can also use three different dimensions. But first, let's fill it up. So what kind of fishies did we get? So of course, what I'm going to do is just say, well, I'm going to make a for loop for f in fishy name. So f is fish. Then we are going to say for l in lakes. So l is the current lake that we're looking at. And then how am I going to fill it up? Well, I'm just going to say fishy, which is the three-dimensional matrix. For this fish, for this lake in 2017. And now I just have to compute how many there are. So I see that I still use the old structure here because I put this in a function, right? So we had this fix names function. But the slide is not updated. So it just uses the gsub because I didn't figure out until yesterday evening that there was also this i, i, e error in there. But what do we do? Well, we just say, well, take from f 2017, where the lake is this. And then we're going to grab the fish name in the fixed names that we created, right? So I'm just going to take all the whole column with fish names out. And then I'm just going to say greppel. And greppel means do a grep. So match the fish name that we have against all of the names in this thing. And then we're going to do that for 2017, 2018, and 2020. So greppel is at the L here stands for logical. So if I do grep, it will return to me 1, 2, 3, 4, 5. So all of the indexes of the rows that match. But when you use greppel, then it just gives you back a true false vector. And since we want to combine it, right? Because I want to make sure that I get only the fish, which are in the lake that we're currently looking at because we have a double for loop. I need to use the L greppel function, right? Because of the ampasson here. So this gives me back a true false vector. This gives me back a true false vector. And of course, where both elements are true, this is a fish which lives in the lake that we are currently filling. And then of course I'm going to ask which and then ask length to get the number of fish that were in this lake in 2017 having the name F. So F is the fish name. I had the G sub here to standardize from this as set to SS. This returns a true false vector. And then we do this logical end with the lake name. And so we get a which true true false. And then we go from having a vector which has true and false to the indexes. And then from this we just ask the length because those are the number of fish that we caught at this lake with this name. And now we can see or this is the first real example where we can really, really easily use this three-dimensional structure of our matrix. And so the first dimension is fish type, right? So I'm going to say, let's see what we dragged up. So I'm just going to calculate total fish, right? So total fish will be the total amount of each fish species that we used. So I'm going to apply to our 3D matrix where I standardize the third dimension to be 2017, right? So this goes from a three-dimensional matrix where we select or where we fix one of the dimensions. And then so I do comma, comma. So first dimension anything, second dimension anything, the third dimension needs to be 2017. And then to this matrix which I get, so we go from a three-dimensional kind of box to a two-dimensional matrix. And then to the first dimension, compute the sum. And do this again for 2018 and do this as well for 2020. And I'm just going to row bind these together because here for each of these dimensions I get a vector. So I bind these three vectors together and then I'm just going to put the row names on there saying that, well, this was 2017, this was 2018, and this was 2020. So when I do type total fish in R, then this is what I get. And now we start seeing something interesting but also something which worried me a lot because we can see that sometimes some fish have very low observation numbers. So if we look at the first one at the Hecht, then in 2017, across all of the different lakes that we had, we fished up only 56 of them, which is not a lot but enough for statistics, right? But in 2018, we only observed 18 of them. And 18 is not really enough to start doing statistics, right? If you want to get like reliable statistics, you want to have like 30-ish observations of a fish. For example, the Eucalype is even worse. In 2017, there were only two of them caught across 24 different lakes. So statistically speaking, we can't say anything about this fish. Funnily enough, in 2020, they actually caught 324 of them. So in 2020, we have more than enough statistical power to say something about this Eucalype but we can't compare the 2020 Eucalype catches to the 2017 and 2018 ones. So species dominance and catchability of bigger fish, larger fish are harder to catch with electrofishing gear. Yeah, but you can't tell me that you caught 320, but in the two years or in the years preceding to that, you didn't catch any of them. And we'll get back to that because for example, the cow bars is something that you can't do any statistics on these numbers. The Steinbein, again, you have 37, which is enough in 2018. But you have to remember that these are summarized across all of the different lakes, right? So that means that a lot of lakes will have no Steinbeiser being caught. But some of them are really good, right? The Rothauga and the Barge, they have really good numbers. So these are fish species that we can do statistics on. But on the other ones, these are really hard. In 2020, we can do statistics on the Eucalype, but we can't do any statistics in 2017 and 2018. The nice thing is we can do the exact same thing for lakes. And the code only changes in one position. I only change the one by a two. Instead of saying, give me the sum of the first dimension, I'm going to say, give me the sum of the second dimension. And that's it. And the rest of the code changes the same. I'm just changing the dimension. And now it gives me the lake overview, right? So in 2017, in the Cothamster Cork, 40 fish were caught. In 2018, 46 fish were caught. But in 2020, 360 fish were lifted out of this lake. And here we see this issue coming in, right? So this is something which, statistically speaking, is really, really hard to deal with. Because the number of fish caught per lake are so different, we probably can't compare some lakes from one year to the other, right? We can't compare the fish that were caught in the Cothamster Cork in 2018 with the fish that were caught in 2020. Because the total amount being caught is so much different that the percentages, right? The percentage of fish is also going to fluctuate heavily. Because in this case, being one fish out of 46 in this lake means that you are at x%. But here, if you catch one of them, then one of them is one out of 360. So that is less than 0.0 something percent. While here it is like 2% of the whole fish he's caught. And here it's like less than a third of a percent. Fortunately, there are some lakes where we have relatively stable numbers and relatively high numbers, so we can look at that as well. But this is relatively difficult to deal with statistically. And yeah, there are some things that you can say about it. But it's just difficult. But fortunately, we have at least three lakes here. But the question here is, do we want to deal with fish numbers? Or do we want to deal with things like fish percentage? Like 5% of the fish in this lake is a barge. Or do we want to say 47 fish caught were a rote auge? Because that's something very fundamental in the analysis. If we deal with numbers or if we deal with percentages, we'll change the whole downstream analysis. So that's something that we want to investigate. But the first thing that we have to do is define what we find is enough observations. So we want to select fish that are fishy enough. So we have enough observations of them. And the thing is that we want to just recreate our three-dimensional matrix based on some of the filtering parameters that we have. And we want to make space for a group which is called other, because if you don't have enough observations of this fish, we don't throw it out, because it's still a fish that we caught. So it still contributes to the total number. So we have to define this other group and every fish that is not fishy enough, right, where we have too few observations, this whole fish species, we just merge into a group which is called other, or side catch, or whatever you want to call it. So the way that I did this is just first figure out which fishies are fishy enough and which lakes are lakey enough, so to speak, right, because we need to know which fish are consistently measured at high numbers and we need to know which lakes are consistently having a large number of fish in there so that we can do statistic. So we load the data again, we compute how many fish there are in the other group and we can easily do that because I'm just gonna kind of shrink my 3D matrix. I'm gonna create a new 3D matrix which is smaller, but we already calculated the total number of fish per lake, right, so we already have this variable called total lake which has these numbers of how many fish there are in total for each of the years. So I'm just going to load in the data again like we did before with the double four loop, leave the other group empty and then the other group is going to be filled by just computing saying that, well, we know how many fish were caught in this lake in this year and then we're just going to subtract from the just newly recreated and reloaded matrix the sum of all of the fish that were in the groups that we deemed to be okay. So first we want to filter right and create a new 3D matrix. So we want to have some numbers saying that, well, above this number, I'm happy with the number of fish and above this number, I'm happy with the number of fish that we got from this lake. So I just chose 30 and why do I choose 30? 30 is the number that you need to do correlations with. So if you want to do correlation, a reliable correlation estimate is when you have 30 observations of x, 30 observations of y. So that's why my general rule of thumb is that if you do something and you're not sure which kind of statistics you can use, right, if you are very certain about what kind of distributions you have, you can go lower. But for me, right, I just saw this data set for the first time. So for me, it seemed logical to say, well, a fish species for which we have 30 observations at least for every year, right? So for every year, I want to have to, I want to have seen the fish 30 times. So that's in this matrix, right? So here we would say that barge is good enough because every year we saw at least 30 of them. But the hecht is not good enough because in 2018, we only saw 18 of them. So that's below 30. And this is that in the back of your mind, you have to keep in mind that there are many different lakes. So even by demanding that we have observed the fish 30 times, it could still be that we'd run into a situation where we have one fish observed in a single lake. And then in the other lake, we see 29 fish. So, but for me, 30 was kind of the rule of thumb that I applied. So I'm just going to say, well, I'm going to create fishy enough, a variable. So I'm going to apply to the total fish, just say, well, which ones are above 30? And then I'm going to apply through the columns all, right? So I'm going to say for each of the columns are all of the values above 30. And then give me the names because the names are the names of the fish that I want. And then of course, I'm going to add a new fish type, which is called otter. So we have fishy enough to combine with otter. And then of course, I'm going to do the exact same thing for the lakes. So again, here, I'm going to just take the total lake when there are more than 30 fish observed in a lake consistently across all of the three years. So all of the three years need to have at least 30 observations and then give me the names of the lake. So I'm just going to then create my 3D matrix, right? So I'm going to say, well, make a new 3D matrix. Here we have the content. So of course, we're going to fill it with NAs again. So let's take the array, do the NAs and now the dimensions have changed because we now have on the one dimension the fishy enough plus the otter group. We have the leaky enough. So the lakes, which have enough observations. And then again, we still have three years. And then of course, I'm going to add the names to it as well. I'm going to refill the 3D matrix. I'm just going to do the exact same double four loop that I had. So I'm going to say four F in fishy enough, four L in leaky enough. And then I'm just going to reload it the way that I load it again. And then of course, I have to compute how many fish were in the otter group. And fortunately, we have already computed the total lake. So I'm just going to say for each lake in leaky enough, take the total amount of fish caught in this lake in 2017 minus the sum from the matrix that we just reloaded. And then this is of course, the number of fish in the otter category for this lake in 2017. And the same thing I do for 2018 and 2020. All right, it's this clear. Let's make sure that my R code is at the same point. So I can show you the notepad window as well. Yes, so it's just the same code. So we load in the data files, get the latitude and longitude, make this table and then do the F-lol where we have the nice plot which we can overlay on Google Maps. Make the 3D matrix, do the total fish, do the total lakes, and then we define what is fishy enough, what is leaky enough. And then the first conclusion is, is that we have enough measurements for around eight fish species and around 11 lakes. So we can use 11 lakes and we can use eight fish species which are consistently measured at high enough amounts to start doing statistics on them. So let's remodel our data. And then I'm just going to use the exact same code as I did before. This is just a copy paste and then I'm going to fill in the other group. So let's just select all of this all the way to the top and then switch to R and then it'll take a little bit of time and little bit of time, little bit of time. So it's reloading the data, calculating, making the nice plot, filling in the fishies. Then we have here all of the observations and then we do the same thing again and then we compute the other. So now when we look at fishy, it looks like this, right? So now we have our, so we have our species. So here we look at the 2017 dimension. Here we have the species. So barge, rood, auge, all, schlei, braze, roodveder, noistacherlich, gibbel, gibbel, and then the other. And then here we have the lakes and of course we have the exact same thing for 2018. So we kind of reduced our data set. We can still see that there are issues here, right? In 2018, we only observed one rood, auge, in the Cothamster Colk and 41 fish which were not of any of these types that we had. So that's a little bit difficult as well to deal with but that's one of these issues with real data. Real data is always having these kinds of challenges. But for example, we can see that in the Visor de Meir, the barge is the most dominant species with 109 caught in 2018. And funnily enough, in the Visor de Meir, if we look at in 2017, we see that although barge was the dominant species, we also caught like rood, auge, and brazei, and roodveder. So that's a little bit strange because we didn't catch any of these fish species a year later. So the consistency of one year to the next year is very strange. It doesn't seem very unbiased, but we'll have to look at that more. But that's the way that I treat my data. So I'm just going to kind of minimize the number of, or not the number of dimensions, but I'm going to reduce each of the dimensions to dimensions which are having enough observations so that I can do something with it. Right, so conclusion using 30 observations for fish, 30 observations per year for a fish, and 30 observations, we have enough data on eight fish species and 11 lakes. And then it looks like this for 2017. All right, so we also want to compute the percentage of fish caught, right? So for each of the species that we have, we want to make some plots, we want to make some pie charts, we want to visually see how many fish we caught and how much percent of it was for, so how much percent of a certain fish was something versus how many percent of the other ones. So what we can do is we can calculate the percentage of total fish for each species, right? So how much bars did we caught out of all of the fish that we caught? So the percentage is of course defined as having 100 times the sum of a certain fish dimension type and then of course, divided by the total number of fish caught, which is the sum of the sum of the fish dimension, right? So here we see first the sum of the first dimension of the 2017 matrix, so we take out 2017 and then we do the same thing. So we take out 2017 again, first dimension compute the sum across the different rows and then we sum everything together. So this is just the total number of fish. And I call this P fish, so this is for pie chart. So have when I want to do a pie chart visualization, I can use this P fish to do the plotting. And I do the same thing again for the lakes and the only thing again which changes is just changing the dimension in the apply function. So instead of looking at the one dimension, which is the fish dimension, here I'm looking at the second dimension, which is the lake dimension. The mathematics doesn't change, the code doesn't change. The only thing which changes is instead of having a one, we now have a two because we wanna look into the second dimension. So let's see the results from the first dimension. So I make a pie chart, right? I use the nice colors that I got already and I say plot the fish, say that this was done in 2017. So in 2017 we can see that the majority of species that we caught was barge and road fader. The giebel was kind of one of the lowest ones together with the owl and the schlaya and the braze. But we see that there are three or four fish species which are caught in relatively high amounts. So in 2017 we caught a lot of barge, a lot of road fader, a lot of road auge and some noi stachlinger. Then we have some minor fish species and then we have the other group which just contains all of the other fish that we caught. Of course I can do the exact same thing saying now use the pea lake and here we see the contribution and I'm using 2020 here for the lakes, right? So we can see that in 2020, the lake which gave us the most fish was the Keysteig Brellinge and the Cothamster Colk and the Steadorfer Bacherzee are the three lakes which are more or less having kind of a large amount of fish taken from them. And we see here that the Meitzlerzee is relatively minor. So not a lot of fish were caught there. But that's just the way that it is. Of course we can do the, oh that's way too short. See, that's always what you get when you do a presentation then if you do it for the first time then you might run through your slides way too quickly. So I was actually expecting to spend like 40 to 50 minutes on this part, but I actually only use 27. So let's do some questions so far or remarks or discussion because you guys probably have your own ideas about how you would want to approach a problem like this. Faya, no, no, no, we still have part number three. So, but this is part number two and I went through it way too quickly. We're not gonna do firearms yet. We could do because then I'm home earlier for my birthday and I can eat cake again. But I do think that it's good that you guys ask some questions, right? Because like, he just doofed it. So any questions? Do you think using a 3D matrix is good or would you say, no, I would just use the F 2017 variable or I would use the F 2020 variable, right? Because we started off having three different variables have which we also use to fill it. But I try to always kind of gather my data into one big data structure where I have most of the interesting properties that I want. And the nice thing is that it actually helps you a lot with coding, right? So like here, what we saw is that they just changing the dimension has so just changing the one by the two will give you the same analysis but in this case it's just done properly, you know what? But there's of course a lot of different ways that you could do this, right? It's not that this is the only way to analyze it but have for me as a statistician I always hammer on sample size. Because if you catch one fish from a lake then of course this fish is not going to be representative for all of the fish in the lake. If you catch five of them, is that going to be representative? When having a lack of enough sampling data for a lake or a fish species, would it make sense to group them? Group what? Lakes? So take two different lakes and group them together or treatments because let's show this table, right? So let's look a little bit at this table. This is the kind of fishy data for 2017. So if you look at the lake like Lomar, right? The Lomar lake we caught 82 Neustachlicher, right? But for example in a lake like Cothamster Colk we caught zero. Here we caught like 30 fish in the whole lake. Here we caught around 90 fish in the whole lake. So in this lake, right? If we just assume that we did a good job, right? And we caught like a random sample from the lake which is the underlying assumption of all statistics is that you randomly sample then we would say that we cannot merge Lomar with Cothamster Colk because in Lomar like almost 90% of the fish living there are of Neustachlicher. While in Cothamster Colk, 0% of the fish living there are Neustachlicher. So we can't just kind of squeeze these two things into each other and say, well, because they are different things, right? They are different lakes and you can only merge lakes when they are similar, right? And that's just the way that it works. If I have a lake and I just throw in like 500 turtles, then of course this lake will now be different from a lake which has no turtles. And I can't group a lake with turtles with a lake without turtles. That's statistically, that's very, very shaky saying that I'm going to just group them all together. We will start grouping of course because that's part number three is to kind of see how we can... All right, let's assume that Hopples and Lomar are managed by a fishing club XYZ and we should have made an implock horse by a club ABC. Yeah, so let's see Hopples and Lomar. So Hopples and Lomar, they might be similar, right? Do you see that in 2017? Let's look at the whole thing, right? We can actually do that relatively easy. Let me switch you guys to the R window and let's look at fishy, right? So you are saying that you want to look at the lake called Hopples. So let's take Hopples and take Lomar. Hopples and Lomar. So these are then, so in 2017 we had 24 rotaugen, zero for Lomar. We had Neustachlinger, so Neustachlinger are apparently in 2017 the dominant fish species. But a year later, the Neustachlinger apparently are not the dominant fish species anymore. The dominant fish species in Lomar is now the rotveder and in Hopples the dominant fish species, the brazen. A year later, Neustachlinger are the dominant fish species in Hopples. In Lomar the rotveder is still the dominant species. So can we group these lakes together? Statistically speaking, I don't know. Let's take one year, right? Let's analyze this, like let's say 2020, right? Let's do a correlation between these two lakes, right? Just say correlate them together. Then you can see that there is no correlation in 2020 between these lakes. So if there's no correlation between the fish that you got in the first lake or in the second lake, then of course we can't group them. These are two lakes which are completely different, right? Also we have to deal with the fact that there is a lake effect. One lake might have like very high CO2 content. Another lake might have very high nitrite content, right? And this might have an influence on the type of fish that live there, right? The thing that you have to remember is you can only group things together when they are similar. And similar generally is defined as not being different. And not being different is something that we can statistically test, right? We can do a statistical test saying is this lake different from the other lake? If we do this test here, right? So if we just say core.test, then I have to actually separate them out because it wants to have two of these. So give me hopples and then give me, right? Then it will tell us that new. These things are very, very different statistically speaking, right? There's a non-significant p-value. In this case, a significant p-value means that they are equal. So hopples and Lomar are not equal to each other. So grouping them will not work. And if you would just group all of them together, right? And you would write a paper and you would submit it to a journal. If the journal would send it out for review and one of the reviewers would be a statistician, it would just, they would just reject your paper and saying they are putting things together which do not belong together. So you're putting apples and pears and then assuming that you can put apples and pears together and then analyze the taste, right? And then compare that to when you put apples and bananas together and compare the taste. So every lake is a single experiment more or less, right? And having multiple lakes means that you do the same experiment multiple times and you can group these of course when you have some indication that they are equal. So one of the things that is striking here is that these lakes, or hey, if I would just take one lake, right? So are we recording? Yeah, we're recording, no worries. And Twitch saves it as well, right? Not the core, but if we just look at the Lomar Lake, right? Not 2020, look at all three dimensions, right? So if we look at the Lomar Lake, then we see that 2017, 2018 and 2020, we see that there is a big shift in the lake, right? We go from having Neustachlinger being the dominant species, almost 90% being caught in 2017. In 2018, this picture flips totally around. Now, a road fader with more than 90% being caught is road fader. And in 2020, we see that it's the same thing. So if we would do a correlation on this thing, right? So we would just say correlate and then in theory, of course, since we have very few values, because we're having three vectors, which only have a length of nine, we can't use Pearson correlation. But just for the sake of argument, let's use Pearson correlation. If we use Spearman correlation, which is actually the one that we should use because we have a limited sample size and we don't wanna be sensitive to outliers. If we use Spearman correlation, then we see that 2017 is highly correlated to 2018. But why is it highly correlated? It's highly correlated because most of the fish species do not occur in any of the years, right? So it seems that they are relatively well correlated while they are actually not that correlated, right? So the real correlation estimate for 2017 versus 2018 ranges from like being as low as 6% correlated to being as high as 96% correlated. Yeah, so this is a massive issue of zero inflation in this case, but the Pearson correlation will give you the lower bound estimate and the Spearman correlation will give you the higher bound estimate. But if we just look at the data kind of from a distance and we would look at them, we would say, well, Lomar, the lake in 2018 is probably groupable with the lake in 2020, right? Most of the dominant species didn't change. So it's just that we caught a little bit more Neustachlinger in 2020 than we did in 2018. But the data here is relatively consistent in a way. But 2017 doesn't belong there. So we can't even, if we compare the fish population in 2018, then we cannot really use the fish population in 2017 to compare it with because it's very different. So we kind of have Lomar Lake 2017 and then we have Lomar Lake in the other two years. In the other two years, you can group together, most likely. But the 2017 is probably not something that you can group together. But in this case, you can formalize and we'll do that in like the next hour. We can formalize when we can put these things together and how we can kind of analyze them a little bit. A little bit better, right? But I didn't make any conclusions from this. You know what? We still have some time left. We're going to do three slides, which are after the lecture and then I'm going to show you guys the break. So let's switch back to the PowerPoint. So the number of fish percentage caught by year, right? So let's do a pie chart and I'm just doing a pie chart again. So I'm just calculating the, taking 2017, go across the rows, take the sum and then take the sum of the whole thing. So let's take the sum of the sums. So this is just calculating the percentage for the different years and making a nice pie chart out of it. So you can see that from one year to the other year to the other year, there is a big, big difference across the lakes. So across all of the lakes in fish population. So you go from having a massive amount of road fader to having even more road fader to having only 25% road fader. So it's the dominant species in 2017. It's the dominant species in 2018. But in 2020, it's all of a sudden not the dominant species anymore because that is actually the barge. If we do the same thing for the lakes, then we end up with something like this. And now we start seeing what the issue is. The Steadorfer Baggersee contributed less than 10% to the total fish amount in 2017 while almost 40% of the fish caught in 2018 came from the Steadorfer Baggersee. So the impact that this lake has on the total amount of fish is changing year by year. And of course, that means that we can't really start doing any of this grouping because we would just be looking at like, how well did you fish a lake or how many fish did you pull up from a lake compared to all of the other ones, right? If you're fishing, like if you have two lakes and in one year you fish the first lake and in the second year you fish the second lake and in the third year you fish both of them, then of course the years are not comparable to each other at all because you can't compare the first year to the second year because you did a different lake and you can't compare the second year to the third year or the first year to the third year because in the third year you fish two lakes and in the other years, you only fished one lake. So the conclusions that I wrote down for this is that there are significant differences in fish composition per year. Are these due to fishing the lakes differently? And that is yes, in different years, the same lake was fished differently. As such, the contribution of a lake to the total amount of fish varies a lot between 2017, 2018, and 2020. This means that probably we are not allowed to draw any conclusions across years and across lakes, unless we limit ourselves to a single lake. But even there, we run into issues, which we saw in the Lomar situation, right? Where the situation in 2017 is just completely different than the situation in 2018 and 2020. So it is really, really difficult because this massively hurts our power to detect any global changes in fish stocks across the years, right? And of course, are there solutions? Yes, there are solutions. And we can limit ourselves. We can say, well, we only look at one lake and one fish species at the same time. But we run into different issues then later on. But that is always the case in experiments like this. Statistically speaking, it is better to limit the amount of lakes that you fish and just fish more in the lakes that you're looking at than to fish in more lakes, which are different and just have lower number of fish from each lake. And if you do multiple years, if you wanna compare your data from year one to year two, you should make sure that you do the same thing every year. Right? If I am working in a lab, for example, we have the same issue. If I am, for example, using a certain chemical in 2017, and I read you my experiment in 2018, but now I use a completely different chemical, then of course I cannot compare the results from 2018 to the ones from 2017. So there are some significant drawbacks in these kinds of experiments that where you look at different lakes, you fish each lake differently across the different years. And statistically speaking, this causes a lot and a lot of issues with having these things. Like because every new thing that you do, every treatment that you add, every lake that you add, adds another dimension to your data, which means that your total amount of measurements just gets subdivided and subdivided and subdivided into smaller groups. And in the end you end up with a single fish in a single condition, in a single lake, in a single year. And of course a single measurement is no measurement. You have to have at least 30, 40 measurements before you can start calculating things like means and ratios and standard deviations. And if I wanna do statistical testing, then that is kind of the important part. All right, so first three slides, then we will go back three slides and we will take a short break unless there's more questions or more remarks. If there are not, then I will see you guys in seven, eight minutes, probably 10 minutes, so just after four we'll continue and then we'll do the last part of my initial look into this Bajase project data. So see you in 10 minutes. Let me stop the recording.