 chord button. All right, so welcome everyone. Also the people who are watching it on Moodle, perhaps on YouTube. I don't know if it's going to be on YouTube though. We'll have to see what, how it goes. For me it's the first time talking about this project and it's not my project. So the data that I will be talking about today was given to me by Fritz and Fritz worked for Professor Arlinghaus. So it's a multi-year project and like I always said, if you have a nice dataset just send it to me and I can make a lecture about it. So that's what we did and I really, really like this dataset. There's a lot of interesting stuff in there, especially for teaching. So yeah, the Bacherse project. I called it fishy data, which is a little bit of a negative connotation. I don't mean it negatively. I just like fishies. So and I used my new pen that I have so I can make all kinds of drawings. Like I said, spend a lot of time on it. So we only have like 60 slides, 67 slides or something like that. But I hope you guys like it. In total, let me check. I produced, scroll, scroll, scroll, scroll, scroll, scroll, 245 lines of code. So Fritz, it's coming your way. I hope you like it. I hope there's something that you can do with it. But there are also some points of concern on my side. But we'll just go through it. So today, fishy data, the Bacherse project. Many, many thanks for allowing me to work on the dataset. And I hope you guys enjoy the lecture. It won't be as smooth as the other lecture since this is the first time that I'm doing it, right? And the other lectures I've done already a couple of years in a row. So I'm hoping that you guys like it a lot. This is going to be the new style. So the kind of layout, right, you have to think of, think out the fishy data and the other stuff. But I'm thinking about keeping a style like this for next year or next semester. Because I think we will still be in lockdown. So in-person lectures will probably still be hard. So that's my kind of estimate for the coming semesters. So I'm thinking about just keeping using this style. So big time bueno. Thank you, thank you. Yeah, yeah, yeah. And the drawing is really nice as well. So I'm really glad with the new toys that I bought. So I hope that it works well and that you guys like it as well. All right. So I wanted to do kind of a story thing like 2000 years ago and ancient things. So let's just start. One day I received one email. And this email, when I got it, looked a little bit like a fishy email. Because it said, Denny, please find attached. And then it said fish and the word abundance. And I was like, okay. But I saw that there were 17 attachments to the email, which is generally not common for a fishy fishing email. Although 17 attachments is a lot, they could be divided in one PowerPoint presentation, which I quickly scrolled through. And there were a lot of nice photos in there about people measuring fish. So I thought, all right, that's okay. And some graphs and stuff. There were six R scripts in attached to the email. I have to confess, I did not really look at the R scripts at all. Because I'm going to do my own right. That's how we roll. You give me data and I analyze the data. So six R scripts was okay. I have some comments on the coding style in the different R scripts, but we won't be going into detail about that. But Fritz, we do have to talk about the whole thing afterwards, I think. But attached to this one nice fishing email was also nine different CSV files. So I was happy about that. So the first thing that I did was just click download all and then extracted it. And I put it in a folder. And that's of course, the way that I generally do it, right? So if I receive an email and there's data, then I just put the data file somewhere. And of course, then I start looking at the structure, because first, I need to kind of understand what I got. And since I'm more or less a statistician, that's kind of how I've been trained. I kind of don't want to know too much about the data, because the more you know, the more kind of researcher bias you bring into the thing. So only when I have like very fundamental questions about a data set, will I start asking the people who gave me data, what does this mean? Or what do you mean by it? But generally, I just look at the data and have the data speak to me. So in this case, when I looked at the structure of the data file names, there were a little bit of a weird structure. Because there was three times a file which was called data with capital letters. And then there was something there. So stuff like road auger, zero road fader. And of course, I'm not a native German speaker, but I googled it and it turned out that those were fish names, or at least two of them were fish names. I don't know exactly what zero meant. But I just loaded in the files, right? But besides that, there were five different data files which were structured x, so something, then 20, then xx, which is a date. And then another x. So there was something behind it as well. So there were data files which were called Jungfisch 2017 fish, Fang Jungfisch 2018, Jungfisch 2020 fish. So there was some structure in the data. But the first thing that I always do when I get data is open a new file, add a header, so fishyanalysis.r, data buy, code buy, and catch them all, right? Because that's the idea. We want to catch all of the fish in the file. One thing that I found really funny was that there's an umweldaten1.csv file, which makes me think that there's an umweldaten2, or an umweldaten.csv. I'm just curious. But I looked at all of the things, right? So that's what I normally do. So the thing was, of course, open a new file, add a header, and then the next thing is loading in the data files. So loading in the data, we've done this hundreds of times, or hundreds of times, but we've done this a couple of times during the different lectures. So the first step after step zero is to load in the data. And of course, we first have to do a set working directory to where we stored all of our files. And then we just do a readcsv on filename.csv and we store it in a variable. And then of course, from this variable, we look at the first five entries. I could have used the head function, but since it's a matrix, why not just use matrix coding, right? You can also use head. But when I tried this, it didn't work. If only things were so easy, because the data is actually not a comma separated file. I don't know why people, like, if you make a file, right? So that's always what I try to do. If you make a file and you name it docx, then I expect it to be a Word document. If it's a pptx file, I expect it to be a PowerPoint presentation. And when people send me CSV files, I expect those things to be CSV files. But they were not. So of course, like, hey, I was hoping, readcsv, give the file name and then look at what's inside. But no, so I had to open them up in a text editor, manually go through them. Because most of the files use a dot comma as the separator. There were some files which used European coding for numbers and not computer readable coding, or at least the way that R expects numbers to be coded. So numbers were coded like one comma two instead of one dot two. And funnily enough, I had to go through all of the files and figure out what people thought were missing values. So there's things which are like nothing. There are columns where NA is used, sometimes an X is used, and sometimes three question marks. I might have missed some of the NA values, though, but there's not, not fair, there's very creative naming. So that was the first thing that I thought, like the naming of these files are really creative. And that makes it really hard for a computer to understand it. Like after doing a little bit of analysis, I found that there were 41 different ways that they were using to write down 26 different fish species. And this is like something that I would like really want to stress. If something is a single thing, like a single fish, or a single fish species, then write it the same all of the time, because a computer can't easily figure this out. So you're going to have to write a lot of code, or you have to going to write very smart code to kind of get around this issue of all of these different namings. I even found a fish species, which was called fish. I don't know what kind of a fish species is fish, but it's not a roed auge, or a roed vader. But no idea. It might be that they just fished up something that they could not identify and just said, well, this is a fish. There were some mysterious columns. And one of them I actually mailed them about, because I thought it might be very, very interesting, or not so much that it might be very impactful on the data analysis. And it was this mysterious column called end.u.m. And it contains the mix of values, which are very strange. So I found values like 2.5, 1,5, 0.55. But then there were also a lot of entries, which were like 2nd of June, 1st of May, 18th of February, and these kinds of things. So I didn't really know what to do with this column. So I mailed them and like based on the response that I got, I kind of decided that, well, this is not the most important column and I have nine different files to work with. So let's not get into it. But one of the things that I want to stress, if you want to have your computer analyze data for you, make sure that you structure it and that you keep to this structure. And in this case, the thing that I was missing the most was just a simple Word document, which explains what is in which file and what things mean. Have a kind of metadata file, which describes, well, we have these columns, there's these values in there. And this will come back and back because it kind of trips you up on every step of the analysis. And it forces you to write a lot of code, which you probably should not have written, if someone would just sit down and more or less harmonize the data across all of the data structures. But again, this is normal data. I've never got a data set which did not use these kind of creative namings or wrong separators, different number coding. So it's very general. So all of the data that you get kind of has their own limitations. But yeah, creative naming 41 ways of writing 26 different fish species. So of course, first, I come up with some basic questions. So the basic questions that I came up with is how many fish species are there? Are there because have fish, they live in different lakes? Are there differences between the different similarities between the different lakes? And of course, we want to do some modeling on the fish population. And one of the biggest thing in the data set, which I kind of figured out, is that there was this treatment column. So they also did some intervention studies, which is really, really interesting. But of course, my first question is always, if I get a data set with animals, I want to know where these animals live, right? That's my first kind of question that I want to have an answer to, because like, can I go there? Can I touch them? Right? That's the first thing that if I get data from mice or cows or something, then where do they live and can I go there? Can I see it? So this is the first question that we will be answering during the first part of our lecture. So like I told you guys, there's difficulties with the NA values. So the first thing that I did is do the set working directory, right, to move to where I downloaded the files. I create a special variable called NA, which will contain the missing values that I encountered. So every time that I encounter a new value, which I think means missing data, or not available data, I add it to the list. So I started loading in. So the first thing is the environmental data called umweldaten1. This has a dot comma as a separator, the NA strings, I just give the NA's that I have. And this file turned to use a standard decimal separator. So the wrong one for R. Unfortunately, after setting these parameters, I could load in the data file. And then, of course, the data files that I saw, there were young fish 2017, 2018, and 2020. So I just read them in, and these had the proper separator. And I just loaded them in into three different variables called f2017, f2018, and f2020 for easy using them. And of course, since it's a presentation, I always try to keep code short so that it fits on a slide. So that was really nice. And of course, I always look at the first five rows. So let's just quickly hop into R and let's just look at the first five rows. So let's just quickly hop into R and hop into R and look at the rows together. Let me load the data and let you switch you guys to R. So let's just load them in and look at the first five rows of all of them. Let me make it a little bit bigger and just look like that, right? So this is the 2017 data. And what we can see is is that we have an ID, which I don't really know what to do. I think this is just a row number, but that's what I got from it. Then we have a Gewasser name, which my perfect German directly translated to lake. Then we have sampling point. I think this is the different positions in the lake where they fished. Then they had fish art, which is of course the species of the fish. Total length. That's the length of the fish. But looking at the numbers, that's why I was directly confused in what kind of a measurement this was measured in. Is this centimeters or millimeters? Because a fish, which is like 148 centimeters, that's a big fish. And 162 is even a bigger fish. That's huge. So I was really wondering. And again, yeah, very big fish. So the thing that I was wondering about, because they have these strange column names sometimes, why not just add the kind of unit of measurement to the column? Because in the end, when you produce data, General Gulak says, if you can catch a hecht, which is 148 centimeters, you're going to be famous. I wouldn't know. That's the thing, right? Like I'm just a poor statistician looking at this data. And for me, it makes perfect sense that it was 148 centimeters. Like I look Discovery Channel where they do this monster fish fishing, right? So 148 made total sense to me. And I just assumed it was centimeters. And then we have a Bemerkum column. And there's Bemerkum things like formal proba. I don't know what that means. So I kind of just ignored it. But it's still in there. And of course, this is 2017. But then if we look at the data from 2018, we already see that they start being creative in their naming. Because now, all of a sudden, we have the name of the fish and then a space. And then between the brackets, we see that this is a cobitis tajena or something like that. And I don't know, like, is this important for the data analysis? And I would think that someone who works with fish knows the Latin name of the fish that they are working with. But of course, this creates a problem directly, right? I cannot compare the fish being caught in 2017 with the fish being caught in 2018. The fish were stolen 143 centimeters long. So apparently it is possible to catch a fish, which is 143 centimeters. But yeah, but there's a discrepancy here. So the 2017 data is coded in a different way than the 2018 data. The total length seems to be very similar, because values here are more or less similar to values in the other one. And then when we go one lower, then we see that in 2020, they kind of used the 2018 way of writing down the fish names. But now, all of a sudden, there's a second commentary column missing. So this is the first data that I got. So that's really nice. So let's look at the omweltdaten as well, because the omweltdaten contain the stuff that we probably want to be interested in, right? Because we are looking for covariates to our analysis. So you can see it's scrolling in front of the screen. So there's a lot of data in here. And just to give you an idea, let's just first look at the column names. So the column names are things like table, gewasser, gewasser ID, table, gewasser, gewasser name, gewasser kurzel, frein, breite grad, länge grad. All right. So now I was interested, because now I can start looking to see where the fish are. So a omlaut is no problem for r. No, no, r doesn't care. You can use Chinese signs as well. You can even use these things for variable names. Let me just as a little, little funny thing, since I'm only having 60 slides. So let me think skull and bones, the UTF thingy. All right. So let's see if we can do this. So in r, we can use this skull and bone symbol as a plotting symbol. So if I say plot 1 to 10, and then I say pch is use this symbol, then it will kind of mess up. But you can see that you can even make them a little bit bigger. And now you see that it does start messing up a little bit in the plotting thing, lead everything afterwards. Pch is 10, and then do it like this. Cx need an is sign. So you see that it starts messing up in the terminal, right? But you can use different plotting symbols as well. So that's interesting. Now it starts messing up too much. But you can use any of these. And the nice thing is if they're in a file, then you can just use them. And for some things in mathematics, you can just name variables like pi. You could use pi for the variable name. So that's one of the things which is really nice about r. But if we look at the Omwel table, we see that there's a lot of things that they measured. So abiotics, weather, there's commentary in there. There's like all kinds of concentrations in the water, like how much calcium or magnesium is in there. And here they actually did mention the units. So I don't know why they didn't do that for the length of the fish. But it might be that the length of the fish is in millimeters and not in centimeters. But there's a lot of things in there. But for me, the thing that I wanted to do was visit the fish and see where they lived, right? And then I need to have the Breitengrad and Leningrad, which is the like longitude and latitude. So the first thing that we did, let me switch back to the presentation, is just read it in, right? And reading it in for me means showing me myself the first five lines and kind of getting familiar with the data, seeing what I can understand and what I can't. So spend some time on that. It took me like half an hour to kind of get familiar with the data and where everything was. So the first thing that I wanted to do was create some structure, right? I told you guys that in every file that I have, fish names are coded differently. And I quickly looked into one of the R files and I saw that they kind of manually renamed the fish, which I thought was a little bit overkill because they wrote down all of the fish names, took like six lines of code, and then they try to harmonize it that way. But of course, we are using a computer and we're using R. So we want a computer to fix this for us. I'm not going to do anything. Like normally when people send me data, I am not going to touch it at all. The data file that I downloaded generally in our group goes on to the network address server, so the big NES that we have, and then it's being put into read-only mode so that no one can touch it. And there is no chance that anyone will delete the file or that other stuff will go wrong, right? Because raw data is raw data. So raw data gets put somewhere and gets put into read-only mode. So the first thing that I did is just take all of these fish art columns from the different years that we had. So I took the ones from 2017, the ones from 2018 and 2020, and then I just say C. So I combine them all together. So I create one big factor with fish names, and then I say give me the unique names, right? Because I want to have the unique names. So there were some minor things. So sometimes they'd use the asset symbol, and sometimes they used SS as just the two letters S next to each other. So I use the G sub function, which is the global substitute function, and what this does, it takes a vector and it substitutes the asset symbol by the SS symbol, just to harmonize, because sometimes the fish name was written with SS and sometimes with asset. Then there were typos. So sometimes it was called Lysian, and sometimes it was called Lysen, and I just said, well, now I want to have everything called Lysen. I don't know if that's the correct fish name, but at least I needed to harmonize them, because all of the other letters were similar, but in one case they used an I, and in the other case they used IE. So I made a little function for that. So if I throw into this function a fish name, what it will do? It will replace all the assets with SS symbols, and then afterwards it will change all of the Lysian by Lysen, and then that harmonized some of the fish names that I had. Of course, then we were still left with this issue that we had the Latin name of the fish on the back, because we have the name of the fish, and then we have space, and then between brackets we have the Latin way of writing it, and that was done in 2018 and 2020, but not in 2017. So for that I did this call. So what I said was, well, first fix all the fishy names, or fix the names of the fish, and then do a string split. So just split the string into two parts by using the space. So when you encounter a space, just split it up. And then of course, because string split, when given a character vector with multiple elements, it returns a list. And I only am interested in the first one. I want to have the fish name being hecht, or roodauge, or roodvader, and I'm not too interested in the Latin name. Like in the end, someone can back translate that for me. So what I did is just say, well, so only take a string split it by space, and then use the select. So apply or L apply to the list that we get back, the function which is called select, more or less, because that's the selection operator in R, and then comma one. So select the first one. And then unlist it and take only the unique values. And then I looked at the length of the fishy names. And now there were 26 different fish names. Let's switch to R so that you guys can see that as well. So this is more or less what I did. And this seemed to work really well. So F names. So we have hecht, barge, roodauge, ukelai, al, zander, grundling, the dreistachler, hybrid. And fortunately, there was not like, sometimes you see that they like the recheck. I don't think recheck is actually the name of a fish. But what would I know, right? It might be that there's a fish species called recheck. And then of course, we also have fish, which is kind of general. But so there's still some cleaning to do. So I'm still not really sure that there are 26 fish species. I actually think that there's 24. But I just assumed that there were 26 and continued with that because we still need to do some filtering. So some of these typos will fall out when we start demanding that there are enough observations to do statistics. So that was the first thing that I spent like 25 minutes on, writing the little function to fix the names. And then afterwards, they're doing the string split and getting the values out. Alright, so after we did that, I wanted to know how many lakes there were. And also there, there were some issues, but it was actually easier. So since I wanted to keep it on the slide, I first defined which column I wanted to look at. So I want to take the Gewasser name column from 2017, 18 and 20. I combined them together. I take only the unique values and I store this in lakes. And then I do something which is a little bit strange, but I will show you guys why I did this. So the reason why I did this. So if we just look at the unique lakes, let me copy paste that in, right? So if we look at the unique lakes, then what we see is that there are normal lake names. And then we see that there is something strange with the Donner-Kiesgrube, because the Donner-Kiesgrube has numbers behind it. So the first 20 looked like normal lake names. But here at position seven, you have the Donner-Kiesgrube number three. So I wanted to get rid of those. So the thing that I did is I say, take only the first 20 and then throw away the seventh observation that we have here. So I did that. And then of course I wanted to know how many lakes there were. And of course, we had 19 lakes. And these are the names of the lakes that we then have. So Cothamster, Colg, Lohmore, and Salzdorf. And so that looked okay to me. All right. So now we've answered more or less our first question. And so our first question is, is how many different fish species are there? Well, they're 26 or 24. I don't know exactly, but I'm just assuming that they're 26. And then 19 lakes that we can work with. So then the next question was of course, where do these fishies live? So I took the Umweltdaten and I looked for the Breite and Lengergrat. And of course we can kind of draw a map and say, well, that's where the fishies live. So the issue here is that the coding of Breitegrat and Lengergrat was using the DMS format. So that's the, it's a format which means that you have the degree, then you have the minutes, and then you have the seconds, and then you have the milliseconds. So it's using the kind of way of writing down positions on the globe, which we used to do in like 1800s. Google doesn't even understand it anymore. If you go to Google Maps and you input a location in DMS format, it will not understand that. Because Google and many other kind of navigation equipment, they use decimal degrees, right? So we have to convert from one to the other. Fortunately, this conversion of going from one to the other is relatively easy. Because we can just say that the decimal degrees, which Google understands, and I can fill in on Google map, is equal to the degrees, plus the minutes divided by 60 plus the seconds divided by 3600. So that's the way that we convert from this DMS format into decimal degrees. So let's do the conversion, right? So here you see the little kind of block of code that I had to write to do the conversion. And this, of course, again was because there was some issues with how some of these numbers were inputted. So one of the things was that there was this single air quote being used, sometimes when they actually meant a dot. So the dot stands for seconds. Well, this other head, this is the degree symbol. And there were also issues with like additional spaces. So I had to G sub that out as well. But this is the code that I used. And I wanted to just go through it. Because here we have this little function that does the conversion for us, right? So the conversion is DD is the degrees plus the minutes divided by 60 plus the seconds divided by 3600. And that that's what's written here. So we say as numeric x one, which is the degrees x two is holding the minutes. And then we have the seconds. But I just want to run through this whole statement with you guys and kind of show you guys how I come up with these things, right? So I look at the data. And I can, I can show you guys how I do that. So let me switch to R and show you guys the Omwelt Breitegrad. So Omwelt Breitegrad looks like this. Let's show just a couple right like one to 10. So this is the coding that that is being used. So it's like 53 degrees, 14 degrees, 14 minutes, 20 seconds, 29 milliseconds north. So that's the way that it was coded. And there are some minor issues when you look at the whole thing. Because sometimes after the degree symbol, there's a space. And sometimes there's also some other like here, you see that there's a space after the degree symbol. And of course, the computer can't understand this, like the computer really needs to be told how to modify the original input data that we have to go to some data that it can actually understand. So let's run through this whole big statement. And fortunately, it's the same for the latitude as the long latitude. So have we, we first start. But this is of course, like how the hell do you come up with something like this to say, well, we do a G sub of a G sub of a G sub and then we string split and then we L apply then we L apply and then we on this. Right. So just to kind of show you guys how I how I build up such a statement. The way that I do it is first I look at the data that we have. Right. And I do three times G sub and in our like I know by now you should be kind of getting used to reading code and knowing that you read code from kind of the inside to the outside right because the inside is what happens first and the outside is what happens last. So that's why I'm also using this coding in colors, right. So these three G subs, they work at the same level. So they are more or less applied at the same time to your data. And after the G subs, then that it's a string split, then we have an L apply and then an L apply, which are more or less on the same level again. And then we have the unlist, which is kind of the top level function where which unlist everything into a single vector. So we have here the date input format. So we have 35 degree with a space 21. And then this thingy 56.64. And then we have this double thingy n. So three times G sub and G sub takes the from argument and the two argument. So the first thing that I'm saying is take these symbols here and convert them to dots, right. So if I apply that to this, then you see that the only thing that changes is that the first symbol changes to a dot. And I do this because that's easier to split because then I only have to split on the dot and the degree symbol. Otherwise I had to split on the dot, the degree symbol and the kind of single floating comma thingy. Then I fixed the double space error or then I fixed the space after the degree symbol. And then the next thing is is to substitute out this north and east part, which was in the text, right. And then I end up with something which looks like this, like 53, 21, 56, 64. And of course, now the next step would be is to split this, right, because now when I want to split this, I can split it in such a way that I can I can make individual numbers of it, right, because it's a big character thing. So I want to chop it into four parts. So the first part is the degrees, then we have the minutes, then we have the seconds and then we have the milliseconds. So hey, I just use a shrink split for that. So had the three g subs, just go into the string split. And then what do I split by? Well, I want to split by the degree symbol and I want to split by the dot. So that's what I'm doing. So I'm splitting by the degree symbol, I'm splitting by the dot. And then I end up with a long list, which has a lot of elements, and each element will contain something which has a length of four. Well, not really, because there were still some other errors in there, which did not allow me to just say, well, the length has to be four always. Sometimes I ended up with the length of five. So just to make sure that I always get the first three elements back, right, because I'm only going to use the degree, the minute and the second, and not anything else. So what do I do then? Well, I'll apply to the thing that we just had before. So to the list, which has four or five elements, I'm going to use the select function again. And then this select function, what do I want to select? Well, I want to select the first, the second, and the third element. So I'm just going to say one, two, three. And then of course, after I have these three elements, then what am I going to do? Well, I'm going to do an L apply again. So I'm going to go through the list, through each element of the list, and say, well, now call my little conversion function, right, which says do x one divided by one x two divided by 60 and x three by 3600. And then of course, we can now unlist the whole thing. And then now we have a numeric factor. And this is something that we can use. So some code split in five by error, but we only need the first three. So I'm just looking at the first three, and then ignoring all of the other ones. All right, so made a plot. And I wanted to check, right? So I computed the or plot the computed latitude and longitude. And this looked pretty okay. So we see here all of the different lakes. So I can show you in R how to do this, but like, I will make the code available as well. So but then when you see this, then every dot here should be a lake. So the first thing that I want to do is make sure that they are lakes, because fish do not live in on land, right? So you can't fish on land. So I need to make sure that these positions that they wrote down, that they correspond to lakes. And if this is correct, then I have done the conversion correctly as well. So of course, I did that. So I just checked out some coordinates. So something which was called the slept through per se was located at this digital degree thing. So I just filled this in into Google. And then we got here got smack in the forellise, which is a lake, right? So fish live in water. So I probably did the conversion correctly for this one. I checked two more just to be sure. And all of the things that I checked, they actually ended up on Google maps being a lake somewhere. So that was nice. So now we kind of know where they live. And I have one of the things which was really nice that here it says Lomar. That's the description that they give me. It actually, there's an Angezig storm Lomar Gewasser. So that made me think that, yeah, the conversion that we did is working. And that's really true. Of course, we still don't really know where the fish live, right? We can look at an individual coordinate and drive there and then take out some of the fish. But we want to do this in a more automated way. So of course, we want to create an overview. So the first thing that I wanted to make is make a little matrix which has the lakes in the rows. So like I told you guys, we have like 19 lakes. And then for each lake, I want to have the longitude. I want to have the latitude. And I want to assign a color to each of the lakes so that I can distinguish between the different lakes visually on a map. And that I could overlay this on things like Google map. So creating an overview is more or less just filling a matrix. So what I wanted to do is first create the lake and location table, right? So create lake and longitude and latitude. So what I did is I took the column which was called table Govassar name. And then I just asked which rows are in the lake locations, right? Because I already defined the lakes variable before, right? When we took all of the unique lakes. And then I said, well, there were some more lakes in the Ombel tabella that were not in the fish table. I don't know exactly why, but it's just the way that it is. So first I have to figure out which of the lakes that I had actually or which of the lakes in the Ombel CSV file are actually in the measured data. So the fish data from 2017. So I created the data frame. So I just say, well, I see bind the longitude, the latitude, and then I put the lake name last. And then I just say unique, right? Because if I call unique on a matrix, it will remove all of the duplicate rows. And of course, there will be duplicate rows because the Ombel data is data which has been measured over like a large period of time. So the same lake is in there multiple times. So by doing just the unique, I now go from a matrix which has like 100 rows to a matrix which has 19 rows, which is the number of lakes that or the number of unique lakes. I have to drop the third column or the third column, right? Because I first I'm going to say use the third column as the row names. And of course, now I have a numeric matrix, right? Because here I'm combining, sorry, I'm C binding together a numeric value, another numeric value and a character value. But as we know, in R, a matrix can only have one type. And because a matrix can only have one type, this matrix converts itself automatically into a character matrix. But if we take the third column, use the third column as the row names, we end up with a matrix that looks like that. And now of course, we can just drop the third column, because the names are now in the row names. And then we can just make it numeric by just saying apply to the matrix without the third column to the rows, the as numeric function. And then how we apply and then we do it transpose, because this supply actually flips it around, but that's not the biggest deal. So then we have the lake locks, and then the lake locks is our matrix, which now has two columns. So just to be clear, I'm also going to set the column names to know that the first column is the longitude and the second column is the latitude. And of course, make sure that you comment every step of your code. So let's just show you guys how I got that far. I can also show you the notepad window. So here we have our massive statement looking at latitude and longitude, right? Then I do my plot. And then here I make the lakes lock mapping table for going from a certain lake name to the latitude and the longitude. And then just show you how this looks in R. So we can go to R, just reload all of the data quickly. Why not? I'm getting some errors. Why is it not loading in the 2017? I don't care. But now when we look at the lake locks, right? So lake locks, then now the table looks like this. So we go ahead, so we filter down to the lakes in which are in the measured data. So we have 19 lakes. And for each lake, we have a latitude and a longitude. And now what we can do is we can now add the color. So for the color, I use the color brewer library, which is one of my favorite libraries, because they, it just has very beautiful colors. The biggest issue here is that a general color set, which is called a palet, so a brewer pal, you can just specify it by name. So set three and set two are just sets of colors. And we need 19 colors for 19 lakes, but the sets that I generally use, they only go up to like 12 or 14 colors per set. So I'm just combining two of them. So I'm taking 12 colors from set three. And then I'm taking seven colors from set two. And then I'm combining them together and calling this my palet. Right? So those are the 19 different colors that I'm going to use for the 19 different lakes. And of course, I can just see bind this to the column that I already had. So I'm just going to see bind to the lake locks, the different colors that I just selected. And now we have the matrix that we wanted to have. So we have all of the different lakes, then we have the longitude, the latitude and the color. So let's switch to R and do that. And just so that you guys know how it looks like. So I'm going to load the color brewer library. I am going to take my palet, right? So give me 19 different colors. And now we can see that when I see bind this palet to the lake locks that I already had, so to the matrix that we just created, what you can see is that we have Neumann's Kühler located at this longitude, this latitude. And now we have a color assigned to it. And what you can see is that the matrix that we just transformed into a numeric matrix, transformed back into a character matrix, because we added a column, which is a color, and colors are character values according to R. So the whole matrix gets transformed into a character matrix again, which we just have to keep in the back of our minds. Because that's just something that when we plot it, we have to make sure that we call an as numeric on the longitude and the latitude columns so that we can use it. All right, almost there, right? We can almost go to Google Maps and find our fishies or plotter routes so that we can visit all of the different lakes that they fished. So of course, we now need to create a nice new plot, right? So the thing here is the FLOL variable. So the FLOL variable is a variable, which takes the first name or the first letter of the lake, right? Because these colors that we selected, some of the colors are very similar. So I also want to have when I make the plot, right? Then I want to have the name of the lake, not the whole name, but just the first letter. And then I want to have this letter into the color that I actually am interested in. So what am I doing? Well, I'm first defining the main, which is the title of my plot, saying where do the fishies live. Then I do FLOL, which is the substring of the row names of the lake lock from the first to the first letter. So this just takes the first letter, right? It just takes the whole name of the lake and then takes the first letter and puts it in there. And then of course, we set up the plot. So the plot have, we want to set some margins. Then we want to do the plot. So the plot is just because I looked at the numbers and the lowest number was like seven point something. So I'm just saying on the x-axis go from 6.9 to 11.4. And so I just looked at the table and have we see that the smallest number is like 7.3 or something. And I do the same thing for the y-location, right? So we first have the longitude. So the longitude for the fish that they caught was between like 6.9 and 11.5. And then there was the latitude and the latitude range from like 52 to 53. I'm giving my own axis because I want to have some nice axis system there. And I don't want R to do that for me. So I say, do not plot anything. Do not give me an x-axis. Don't give me a y-axis. Don't put any labels on there. Just give it a title. So it just makes an empty plot with only a title. And then I'm taking the axis. So I'm taking the first axis. So this is the x-axis. And then saying, well, it goes from 6.9 to 11.4. Step by 0.5. And give these numbers to it. And the same thing for the y-axis, just make your own x and y-axis because you have more control. Then I'm going to add the data points, which is just points, lake-locks-longitude, lake-locks-latitude. And of course, I have to do an s-numeric on this, but it didn't fit on the slide. So I'm leaving that out. Make them a little bit bigger. And the pch, right? So the plotting symbol that you need to use is this f-lol. So the first letter of the lake. And then give it a color. And the color, of course, is the color of that we assigned to it. And then, of course, we need to add a legend because no one otherwise would know what the different colors mean and what the letters mean. So we make our own legend. I put it at the top right. I'd say give the row names of lake-locks. Make it a little bit smaller than use the f-lol as the plotting symbol. The colors you take from the color column of lake-locks. And then there are three columns. So just not to have like one big list of, but make the legend and divide the legend in three different columns. So when we do this, then we get a plot and this plot looks like this. And of course, I can just go to R and show you guys that this really works. Hey, of course, there's a little bit of figgling around because we have to do the S numeric thing. But when I do the plot, then now the plot looks like this. And of course, this already starts becoming a little bit better, right? Because now we can see here we have the S, which is in yellow. So the yellow S is solstice. And then here we have the N, which is in green, which is the Neumann's kühle. So now I'm wondering, right? Because we've now created this map. But of course, if we would want to use this map, we want to overlay it on Google Maps to see how accurate we were in recreating the positions. So that's what I did. So Lomor and Sleptreuse, I have no idea where these things are. So I went to Google Maps and I just say to Google Maps, well, give me this region and give me that region. So you can fill in, so you can zoom in and zoom out. And Google Maps in the URL will tell you what the longitude and the latitude ranges are that you're looking at. So when I do this, then it looks like this, right? So I can just say, well, this is the same plot as what we had before. I just removed the legend and just overplotted on Google Maps. So I just cut out Google Maps from online. And then I put it in my plot so we can see that we can go from Lomar to Salzdorf to Sleptreuse to Moermerland. So I figured out that this was in the north of Germany, kind of northwest of Germany, and that there were also some lakes which were relatively far away. And you can see all of the other lakes are still here on the thing. It's a 28-hour bike ride, yeah. Yeah, and that's just four of the lakes, right? So imagine how much time they spend driving from lake to lake, taking their rowing boat or whatever they used to fish up the fish. If you want to walk it, it's like a 600-hour walk, I think. But 28-hour bike ride, let's go. True. So I was really, really interested in that. So now we know where the fish live. If we take a bike, then we can visit four of the lakes and we can start touching the fish, right? Because that's what we actually want. We want to touch the fish. All right. Time for a coffee break. That actually worked out really well. That worked out really well. Let me stop the recording.