 Right, I think we should be live. I think people should be able to hear me as well. So this is the audio check Let me know if you can hear me and just throw something in chat like we can't hear you or we can hear you wait The first one's going to be impossible because if you can't hear me then you can't hear my request All right moderator says yes It's working good Perfect perfect perfect so yeah, we have a couple of minutes left So I don't know exactly What we want to do it this time check check double check. Yeah. No seems to be working So we will discuss the Assignments of course, but are there any questions to the assignments before we start like questions remarks Praise Complaints too hard too easy So let me know like Unfortunately, we couldn't do it in person today so I'm a little bit sad about that. I think it's always good that Assignments should be done in person so that people can get direct help One thing that I would like to mention is this that you are if you are a student and you are having issues with the YouTube chat For some reason, I think I missed a step where notepad plus plus and are we're connected or something? Well, they're not connected. You just have to open two windows So the way that I always show you guys, right? That's the way that it just looks on stream So I just have like the two individual windows open And I generally just write my code in notepad plus plus and then I just copy paste it to our to have it run So that's the way that I do it. But yeah, we have some remarks yesterday as well about the fact that it Or not yesterday, but last stream that it is kind of confusing what you're Okay, so do we use them separately? Yes, my notepad plus plus doesn't color what I type. It's all gray Okay, so this is one of these issues which occurs when you don't Save the file with the correct extension Let me see if I can actually make a nice screenshot from it and I will just make a screenshot and then I will do paint and then we're just going to Select a little part from it Like this and then say file You don't save put it in there. Good. So now we have the screenshot And then I can make a slide about that. So thank you I will show you how to have it color the text Because if you save it with an extension, which is dot are so not saving the file as answers 0 1.txt But saving it as answers 0 1.r. The notepad plus plus will use the proper highlighting for it If not, you can actually force it to do the proper highlighting if you look at the notepad plus plus There's file edit search view Yeah, you can also set your language via the syntax menu. Yeah, so that's the language option Let me just quickly. No, I can't edit it because we're already streaming Is there another way of doing this? Oh, no, no, no, I can still edit it good because we're still in the first slide layout So I just have a small screenshot, which I'm just gonna put there What did I miss you didn't miss anything yet We just started so just taking questions random questions from random people So Well, they should be oh, yeah, the sound check. Yeah, you missed the sound check, which obviously is working. Otherwise you Would not hear me We didn't do a music check though, so But yeah, no joe Lee sauna I will definitely I updated the slide for the assignment So I will just show you guys how you can do that I don't think I can show you live because I think the screen capture that I'm doing for notepad plus plus doesn't capture all of the menus and stuff No visual of Danny. Are you guys want to see me? That's not an issue. We can just switch to the lecture layout and then I'm here So visual Danny But now you have to watch me for like four minutes Like generally I don't want to start to stream too early, but it's good to make sure that everything works before too How do I save my R console you mean the output of the R console or Just the session that you did Because saving the R console so saving the history there's an option in R for that. So let me see there's an File Don't Yeah, I always advise people to not save anything from R But you can sometimes it is useful if you're dealing with like big data and you need the data on like 10 different computers because you're using a cluster Frogfrog I send you an email, but I still can't join a Moodle group even with password Okay That's something that we can fix if you Are you frog frog? Are you from the hau? Because let me check my email Did you send it to my gmail or did you send it to my hau account? Let me search through my gmail and I will also log into my webmail So that we can see if the email is there. All right. I will uh, I will I can't find your email. Um, could you quickly send your email again? Because I can add you to Moodle manually if you're part of the hau Um, so if you have a hau registration, then I should be able to easily add you I send it to your gmail All right Once the Moodle is up. So Moodle is not the best search term for this thing I need to know your real name for this to to be able to do this Because frog frog is not going to work searching through my gmail and I talk a lot about Moodle Apparently, so when I just search Moodle, I come up with like 150 different emails that I have to go through So yeah, do send the email again And just say I'm frog frog so that I can at least like figure out that you belong to the email, right? And then we can just do that. Um, and then we can get you into the uh Get you into the Moodle because in theory, um, let me see Um, in theory everyone should be able to join the Moodle Even when you're not If even if you're not from the hau or anything, um, because Um, all right, just got your email. Thanks No, that's not your email. All right. Yeah. All right. So we have another person signing up for the course Apparently, um, yes, you can and we're doing the lecture now. So please join us at And then let's get the link of the live stream. All right, so it's two so officially we started now, but um, Just gonna do a couple more things. So let me go to my own channel It's always difficult like first couple of lectures always have like a lot of these like household things that, uh, That you run into Oh, that's horrible. Don't give me the secondary sound Yeah, I joined as guests. Yeah, our course 2022. Yeah, and then you have to have the right link to the, uh, um So do you have the link for people that are joining as guests? Because that would be good because I think for The link that I send around is the link for people who can log into Moodle for the people who can't log into the Moodle there So another link heavily looking through my email to see where the link is Um, if I go to Moodle myself, right and then I log in to get the link to my course And that's this one There and then this is my course and then I open a porno browser and then I give the course Yeah, so It's the enrol link. So if you go here Um, yeah, so if you go to this link and then you use the password our course 2022 Um, so exactly like jolyson put it in chat Um, then everything should work Um, because then you do have to use the lowest link. Um, let me show you that Because I do have of course the opportunity to show you guys the firefox Um, so let's show you guys firefox. So Ah crap. I have to log out, of course Because if I'm logged in then it doesn't work So you go to the link that I just put into the chat then here you have the course, right? Then you have the self-enrolment Um, but for self-enrolment, you need to be part of the hau Um, but if you if you just want to see the Moodle You can just go to gasugang and then here you just click the key. So the key the our course 2020 2022 and then you just click spyger and then you should be going into the course although I typed it in wrong And then it should say here see betracht in decent course gerade as a guest So you're currently as a guest, but then at least you can like join the files and get that get the books and these kinds of things um So yeah, do do try. Um, and Would it be possible for you to copy your console into eg notepad plus plus after each session and upload it to Moodle? No, I'm not going to do that. I'm going to give you my answers um So we'll go through my answers and then I will upload the r file to Moodle and then you can just download my r file Um, because I generally don't use the console, right? I'm not interested in the output The output is just the answer, but it's not about the answer. It's about writing the code, right? So the code is the thing Okay, good. So frog frog was able to enter And we're all here. I think like 21 22 of us. Um, which is still not everyone, but at least enough um, so let's start so First slide So I hope everyone Appreciate the effort that I put in this beautiful drawing. I showed it to my colleagues and they are Their first reaction was what is the magic mushroom doing on the introductory slide and it's not a magic mushroom, right? It's the summer semester. So it's a beach umbrella It's it's not a magic mushroom. It looks like a magic mushroom because the sun is also looking a little bit sketchy, but Anyway, so Very quick recap of what we did last week So the quick recap of last week is we used our as a calculator, right? So you can just type in things And then we discussed Very very extensively all of the different types of data that there are in r. So a logical is something which is either true or false A numeric is a numerical value. So that can be one two three four But also something like 5.6 or 7.9 Characters are what in other programming languages are called strings. So those are character constants So these are all of the things between the double air quotes furthermore, we have vectors and lists. So a vector is a More or less statistical type where everything in the vector is of the same type So you can have a logical vector a numeric vector and a character vector And lists like we discussed are things which you can store anything. So it's kind of a general Array so heads it's different things and I can in the first element I can put a matrix in the second element I can put another list and in the third element I can put a vector A matrix is again a mathematical type, right? So it's a it's a for example a numeric matrix containing only numbers And then we have data frames which are related to the list. So a data frame is a data type Which allows you to a two-dimensional data type like a matrix And then every column can have a different type And then we talked about indexing data so that you can get the first element from a vector by using square brackets And we also talked about the fact that the same thing works for matrices, right? So you have only single square brackets and then you can select the first column or the first row Or 10 columns or 10 rows and a double square bracket. We discussed this is for lists So only lists have these double square bracket system And then furthermore we actually took things like we actually Went through several functions which allows you to generate lists of numbers Which is things like sec for generating a sequence And rep for repeating and a single element or multiple elements And then we also talked about the double point operator, which is kind of the sec operator But then always setting the step size to one. So this is a vector of 1 to 20 So that's kind of the short recap from last week All right, so the the assignment so we had a question. So I added this little screenshot here So if you want to have code highlighting you can go and click on language And then you scroll down and then there should be r as a language So and of course you can click any other language, but then it will use different colors So if you want to use the same color scheme as me Then you have to use the r highlighting for your language Again, I wanted to mention if you have any chat issues on youtube then Just join the zoom meeting that we have running in parallel The link is the same link as the previous time. So it's it's every week. It will be the same zoom link So there's no magic there. No things that are generated and I'm monitoring both chats. So We currently have one person in the zoom, which is okay But you can ask your questions there as well if you don't want to have your questions, for example pop up on youtube later on So the room issue I wanted to say a little bit of things about that, but I didn't get any solution yet, unfortunately So I hope that next week we can do an in-person lecture because that would be great and We have to figure out like the logistics of it, but I will mail you guys if I know more So unfortunately here I'm more or less stuck with the vaulting and if the vaulting is ignoring my emails then I can't really do anything Good. So with that being said and done, we will discuss assignment one. So I will Not read the entire question. I will just Let you guys know what the question was and then what my answer was. So let me first show you my answers Right. So here I have my notepad plus plus So hey, you can see that by all of the little links at all of the little icons at the top And besides that, of course, it it is saved as with a file name called answer zero one minus introduction dot r Because the kind of short name of the of the lecture last week was introduction Of course answers zero one dot r would be fine as well But make sure that when you make a file that the file name is meaningful Right because in programming it's very Important to be concise and precise. So by naming things answers or Answers to the lecture, right that that's not descriptive enough So besides that make sure that the extension is dot r because then notepad plus plus will automatically Detect it and then you don't have to manually set your language So I always start by doing a header just for me to know who made the file when I edited last But not only that make sure that you add a copyright statement in programming because you are writing code It is your code and in future you might run into issues where People steal your code, right and then by having a copyright statement at the at the top Hey, you can make sure that at least you're kind of covered there If you really want to be covered you have to of course share your code in some way So generally you use a version control system for that To kind of make sure that that you have an official timestamp saying that at this point in time I wrote this code It hasn't happened to me a lot, but I did have some issues in the past with people Using my code and not crediting me in their publications that they wrote out of it So it is not something that is very common in academia, but it is something that sometimes does occur So be aware of that Good. So the first couple of questions were to use are as a calculator, right? So and in this case, I always make sure that I have a comment saying which question I'm answering So this is question one a and the first question was to add 1234 to 1567 right, so I just do that by doing this plus that So and then for me the question is answered, right? Um, in this case, I'm not even going to copy it in r because I know that this should work, right? So I'm I'm not going to check it at this point Um, and then the next one was to do a subtraction and in addition So it's just some number a hundred thousand four hundred and fifty six minus three thousand three hundred and fifty plus 23 And then one c was to Take the natural logarithm. So this is the ln function normally But in r it's called the log because the log function standard has the e Value as the base of the logarithm if we wanted to take a 10 log We would write log 10 if we wanted to do a two log So with the base of two we can use log two But in this case, it's just log because that's the natural logarithm One d is a dividing so take a number divided by another number One e is to take a number and multiply it by another number and then we we go into the euclidean division So euclidean division is division by whole numbers. So it's integer division So it it never will it it will never give you a number which is a which has a fraction So 5.7 is not an option But 7.8 is And then in one g. I ask you how do you take the How do you take the Square root of a negative number Because r has built in support for Negative for taking the square root of negative numbers. So working with complex numbers So but the way to do this in r is to say square root minus eight plus zero i So you have to add this plus zero i saying that there is no imaginary part to this number Because only then will r realize that oh you want to use complex numbers So it switches internally and says well now I can take the square root So let's go to r and let's copy paste in the answer So this is this is the reason why I don't like saving my r session or taking my r history Because r when I copy paste it it will always put these larger than symbols in front of it, right? And then my code won't run Because like the larger than symbol won't work because that's just the input symbol that r uses or that r uses to signify That there is an input symbol Question in chat Okay, that's not a question. You don't need to be sorry. Shia that that's okay like Good issues happen with the internet. So internet issues are never a big issue All right, so um, let me go to r So I'm going to press the button and then I'm just going to copy my answers to question one a So I'm just going to go to notepad, right? So in notepad. I just select it. I press control a or not control a I just press I select it in this case. I press control c to copy it and then I go to r And in r, I just say paste. So I just do control v Right. So now you can see why Because now if I would save my history or I would go to r and copy paste this in my answers Then this won't work, right? Because if I do this as a command in r, then it says error, right? It doesn't expect this larger than symbol to be there So it is it is very Very important that you write code, which you can just copy paste in because the code itself Should run from beginning to end Without having any issues So when you when you would write it down slightly differently or you would would just save the r history Then that doesn't work and one thing that I would like to stress is that Never ever type into r directly. It works for these small assignments Also for the next batch of assignments, which belongs to this lecture It will still work But we are here to learn how to do data analysis, right? So that means loading in data doing for loops doing while loops and making making stuff out of your data So very quickly code will become large and code will be dependent on each other So we will start using code blocks and then you cannot type that into r anymore directly You have to first type the code and then run the code in in in r So I won't like to stress that always separate out the running of the code in the argui And writing the code in an editor And one of the things that Our studio does is in our studio you have this clear separation So you have a text editor on the top and then you have the r window on the bottom But try to avoid typing things into r directly always use the notepad plus plus window It's just going to help you in the long run All right, so question number two. So question number two is a little bit of practicing with vectors So one of the things that we are one of the questions was let me actually pull up the questions Because I opened so many windows that I don't know where the questions went That's the that I opened. Oh here there we have it. All right. So question number two So question number two is vectors Um for these exercises store the result each time in a variable vector 2a vector 2b unless specified otherwise So the first question was use the c operator to create a vector from 1 to 10 So you can do this in in two different ways, right? So my way of doing it was just being Well, not stupid, but by by being kind of explicit about it, right? So I say combine these numbers here Into a vector, right? So just say one two three four five and just type them all in Of course, I could have saved myself a lot of typing because the same thing I could have done and kind of cheating By just using the double point operator, right? So and then in theory the c is not necessary Because one double point 10 will automatically create a vector But since the assignment was to use the c version or to use the c The c function it it it it also works, right? So this also answers the question It saves you a little bit of typing, but it's not exactly what I wanted But so my my preferred answer to factor 2a for the question 2a would be to make a vector 2a say Just combine all the numbers manually and just type them in All right question, uh, can you explain 1g again? I tried x is minus 8 and then square Yeah, yeah, so if we go to r, right and I'm going to do something which I just told you to not do Is to to type it in right, but if I would just say square root of minus 8 Right, then it will say na n not a number Because this is true in normal mathematics. We cannot take the square root of a negative number This is an undefined quantity, right? Because the square root is defined as multiply two numbers together And then you get the the power and then the inverse of the power is the square root So that means separating out these two numbers again So the plus zero i stands for zero imaginary units So in there's if you Mathematics used to be based on normal numbers But then at a certain point in time someone came up with the idea It would be really nice to be able to take the square root Of a negative number Because it's useful in mathematics because you can just continue calculating then Instead of getting stuck So after that kind of mathematics branched out into two different fields So you have normal mathematics which uses normal numbers and then you have complex mathematics which uses complex numbers And you have to tell r that you are using a complex number, right? So the square root of eight minus eight plus zero i Now gives you an answer, right? So the square root of minus eight plus zero i is actually Zero real units. So zero. So it's zero Plus and then here you have the imaginary part Right, so this is the part that people just come up with to make it work Um, yeah, because if you would do the opposite, right and you would take this number Right and now do it to the power of two Then it will say that it is minus eight Or very close to minus eight, right? It it rounds down a little bit because it's not exact because there's no real but if you take minus four Right, then you would see that there's two imaginary units. So it's just a different way of of dealing with with numbers Which is very important when you do springs and slings and planets surrounding the solar system all of these things generally involve Mathematics which at a certain point you need to take the square root of a negative number to just continue with the formula Um, and then hey in the end you you only report the real part of the number. Generally, you don't report the imaginary part Um, but it's just to make to make Square rooting negative numbers work Um As a as a confession I have not really used this at all I think I used it like two times in the 14 years that I've been programming And that was just for very very specific use case where we did something with like orbital momentum of of planets and stuff All right back to question two So hey, you can you can just say make a vector put these numbers in and you use the c function to combine them together And then you have to use the the the round brackets to kind of say This is where the function begins and this is where the function parameters end, right? So that's how it works Then we do the same thing again But now using the double point operator. So the double point operator is like the sec function Where we have a step by of one. So this just means one two 11 to 20 right so again 10 numbers We start this in vector 2b Vector 2c is then the question was use the sec function to go from one to a hundred stepping by five So that's what we're doing here. So I'm using the sec function I say go from one to a hundred and then every time step by five units Um, and then the next one is a little bit tricky And I think that this is the assignment or the this is the part where most people kind of get stuck on the assignment And that is when we are talking about the even letters because it is an uncommon term, right? It uses something from mathematics Even numbers like two is even because it can be divided And it's it when you divide it by two there is no remainder That's kind of the the definition of an even number. You also have odd numbers, right? So one three five Those are odd numbers Two four six. Those are the even numbers But I told you that r has a lot of built-in constants, right? So letters is a vector It is a built-in vector which contains all of the letters of the roman alphabet And you can then say well from letters and then you can use the square brackets to do a selection from this vector And then what which letters do we want to select? Well, we want to select the second letter The fourth letter the sixth letter the eighth letter and so on all the way to 26, right? Because there's 26 letters in the alphabet So we can use the sec function to generate a vector Which contains the numbers two four six all the way up to 26 by doing sec two to 26 by two And then we just select this directly from the letters vector So let's me let me run this code for you guys in r. So let me copy it, right? Then I go to r and I switch you guys to r as well And then we just do like this and now I can check. So vector two a Is one to ten, which is not surprising. This is 11 to 20. That's perfectly fine Vector two c is from one to a hundred stepping by five Which means that a hundred is not in the list because of course After 96 doing plus five will lead to 101 and we said that we only wanted to go to a hundred So here are all of the numbers from one to a hundred when stepping by five And then we have vector 2d, which is the even letters. So the even letters are b d f h j l n p r T v x and z and if we want to have the uneven letters, we can do the same thing, right? So we can do letters But now we start from one So these are the uneven letters. So a is uneven b is even c is uneven d is even again. So That's how this works. So I hope everyone was able to kind of figure this one out. This is the trickiest one Because you have to kind of combine two things together, right? The first thing is is that you have to realize that letters is just a vector Exactly the same like the vectors that we made above And that you can just use the sec function To get all of the even numbers from two to 26 or the uneven numbers from one to 26 And today we will learn a way actually to make this even short because selecting even and uneven numbers You can do it even easier instead of doing it like this All right, so let's go back to notepad So next question was to ask for the class, right? Just so that you guys can practice a little bit with the type system again So the class of vector 2a is of course just Using the class function and then throwing in vector 2a And then we can ask questions about this and this will be very useful in the rest of the lecture when we start writing if statements So to do something based on the class of of an object So we can say is numeric is character and is factor, right? So in this case this one should be true and the other two should be false because vector 2a is a numeric vector Then I wanted to show you guys that when you take vector 2a right, which is a numeric vector And then you take a vector 2d, which is a character vector when you combine a numeric vector with a character vector It will upgrade this because a vector can only have one type And I'm now supplying the c function with two different types and then it will select the highest type because A logical value can be stored in a numeric value And a numeric value can be stored in a character value But a character value cannot be stored in a numeric value because there's more characters than that. There are more or less numbers, right because A character value can be anything All right, then the next one is Just do the square root function of all of the numbers from 1 to 10, right and the square root function works on vectors And this is one of these things that is unique to r or almost unique to r There are a couple of other programming languages like mudlap that can also do this But a lot of programming languages do not allow you to take the square root of a vector I did class vector 2a and it said integer. What does that mean? So r makes a distinction between whole numbers And fractionated numbers so numbers would have a fraction So an integer is a whole number and this this holds for the vector, right? Because all of the numbers in vector 2a are whole numbers So it will tell you that So it's kind of a subclass again, right? I told you that A logical can be stored in a numeric value and a numeric value can be stored in a character character value But internally r treats whole numbers slightly different as floating point numbers Because a whole numbers requires less storage space. So everything in r is there to minimize the storage space So numeric data types are actually split into two different data types The integer data types, which are the whole numbers and then you have the floating point data types Which are the broken numbers like 2.5 or 3.6 So if we would run this code, right and just Go to r let you guys go to r as well Right. So when we just copy it in then it says that in my case it says that the class of vector 2a is numeric I don't know exactly why it makes sometimes the decision to make it numeric or integer But I think that there is an is integer function as well integer function as well vector to a So for some reason r doesn't believe that My numbers are whole numbers. So internally it has stored it as floating point numbers But it can store it in both ways So in your case it decided that it didn't want to use as much memory. So it stored it as integer numbers So numbers without anything behind a decimal point But if we ask the question is this an integer Then it should say yes because it actually is But it doesn't so for some reason my r just upgrades everything to numeric But underwater in our numerics can come into two different forms. So integer forms, which means that they have no Nothing behind the comma and floating point, which means that they have a section behind the comma But generally in r we completely ignore this We completely ignore the fact that we have integers and floating point numbers No, because in r the whole group of numbers is just called numerics so It probably I think that that's actually an interesting question So in theory we could look into that and see why in your case it told you it's an integer Because it should not actually it should actually tell you that it's a numeric But it might be that you're using a slightly different version of r or that you installed the 32-bit version of r because the 32-bit version of r is Much more aggressive in optimizing the memory used. So that will pay it will more It is more likely to force stuff to them to the lowest type possible Maybe I used one to ten that might be the case. Let me see because then it should say Class of vector to be right Yeah, yeah, no, so that's then the reason why it did Yeah, so because the yeah the double point operator So then it should do the same thing for the sec function, right? So if we do class sec one to ten by one Then it's a numeric again Which is which is a little bit strange So but there's a difference between how r treats whole numbers and floating point numbers. So numbers with a dot Good, so let's go back to notepad, right? So the idea is in 2f you should learn that if you have two different types, right? So a numeric type and a character type Then it will upgrade it to the highest type and since a character can store a numeric value But a numeric value can store a character your class by combining vector 2a a numeric vector with vector 2d Which is a character vector the class of this should be Character, so let's make sure that that's actually the case That are really Does that and indeed it does so in this case it did Upgrade the vector 2a and if you if you would type just it in or if you would just type it in right You would also see it because it would start quoting The numbers in that sense So that that that makes sense to me All right, and this the square root like I told you guys This is one of these unique features of r is that almost all of the operators in r are vectorized That means that they work on vectors automatically And this is not the case in almost all other programming languages all other programming languages If you want to take the the square root of like a hundred different numbers You have to write a for loop so you have to say four x in one two the number of numbers that I have Take the square root of number one or of number i and then of number two and then of number three So you have to do it manually so you have to write the loop But in r this is not the case fortunately in r everything is vectorized So we can just ask for the square root of vector 2a and this should more or less work When we go to r so it should give us all of the square roots. So one 1.4 1.7 and then two so Good so that's using r as a calculator and a little bit of vector stuff So matrix 3a is then the next question because we I want you guys to learn on how to use matrices So of course you can just use a retable function to read a matrix from disk into into r But first I want you guys to be able to construct your matrices yourself So you can use the matrix function It takes the numbers that you want to put into the matrix and then it You specify the number of rows and then the number of columns And by default r fills it on a column wise basis So if I would do this and I would go to r and I would paste it in Then if I would now type matrix 3a right so show me the matrix Then you see that it puts in the numbers in a column wise manner If I don't want that and I want r to do this in a row wise manner So fill first row one then row two, which is kind of the more logical thing to do I can say by row equals true and this was um question 3b and I think everything here is small So now if we look at matrix 3b, then we see that indeed it filled now The first row then the second row and then the third row So there's another way of achieving the same goal And the other way of achieving the same goal is to use the transpose function So I'm skipping one. Sorry. So if I want to select the fifth column From matrix 3a I can say square bracket comma five right so give me all of the rows From the fifth column Here I'm asking the opposite. So I'm saying matrix 3b give me the fifth row and then all of the columns So by not specifying anything I'm getting everything which is kind of a nice thing for r So of course this will lead to the same answers because selecting the fifth column from 3a Is the same as selecting the fifth row from matrix 3b So if we go to r I show you r again Then it looks like this, right? So it's the same and I could say well in this case select the fourth row Right or select row two to four and all of these things are possible But you have to realize that if I select a single row from a matrix I get a vector back, but if I select multiple rows from a matrix I get a matrix back as well Right because it automatically collapses to the lowest possible type So this is not a matrix. This is a vector right because one row or one column from a matrix leads to a vector But selecting two rows or two columns from a matrix will give you back a matrix as well So the the the r type system continuously tries to more or less minimize The space used in the memory of your computer To kind of use as the minimum amount of space and store the maximum amount of of numbers or store the maximum amount of data And that's something that will come back in r a lot and this is one of these Kind of built in limitations from r because everything in r Needs to be in random access memory. Otherwise r can't access it or can't use it All right, so like I told you guys if I want to Imagine that I've made a matrix with which I filled by column And I think oh, that's a mistake. Um, then you can use the transpose function So the transpose function we mentioned in the first lecture And the transposition function is just a t like the combined function is just a c But by saying t matrix 3a right so matrix 3a was the matrix which is filled in a column wise manner It actually now takes column one and makes it row one right So it takes the matrix goes through the columns and every column becomes a row in the new matrix So I call this matrix 3a into 3b And then I make sure to check that every item in this new matrix is equal to matrix 3b Because that was my goal to to take matrix 3a filled by column and make it similar to matrix 3b, which is filled by row So that's 3d. So let's copy paste it in So if we go to r Then it looks like this right and now it says true true true true because all of the values are the same So if I would just show you Well, not show you that but show you matrix Uh 3a underscore into 3b right and now you see that it indeed did what I expected it to do So it indeed took the first column and made it the first row All right, then the last question is to add column names and row names So the question was how do I use the letters? Which is a vector to make column names for matrix 3a So in this case what I'm doing is I'm just saying that the column names of matrix 3a Should be the first 10 letters of the alphabet So I take the letters vector select the first 10 elements from it So square brackets 1 to 10 and then I assign this as the column names of matrix 3a And then the last question let me move that up a little bit because you kind of cut off on on the Screen and then I want to do the row names right and the row names of my matrix The question was how to use or how to set The first row name to measurement 1 Then the second row name to measurement 2 and then the third row name to measurement 3 Well, that's how we do it like here. So we use the paste function. So the paste function takes A character and then a vector or it takes two vectors and then it pastes them together And it uses a separator in this case the standard separator of the paste function is the space Which is good enough for our purpose right, so for Example purposes, let me just throw it into you or throw it in R for you Go to R So what we see here is first we set the column names of matrix 3a to be the letters 1 to 10 And then we print the matrix. So we see that indeed it now has column names And then we set the row names of matrix 3a using the paste function where we take the word measurements and then Paste it or paste it together with 1 to 10 And then when we look at it then now indeed we have a matrix which has row names 1 to 10 and which has column names So now we can use the row names to select from this matrix Which just is nicer to do right because if we now say matrix 3a Give me measurement 2 and then give me column C Right. Um, why is that measurements? Sorry. I made a typo Right now it will tell me that indeed this is have from from measurement 2 in column C The value is 22 And of course, this makes more sense if you have a matrix with measurements on on animals or on plants or on fish where you can say from From These fish give me the date at which they are caught right? That's much nicer than saying from this matrix select the 7 to the 9th column and then give me like row number 5 right if if you if you can use names or descriptive headers, um, you should because that just makes life easier for yourself Um, and I always try to get good row names and good column names, uh for my matrix So that it's easier to select and when people read the code It's much easier to understand what's going on right because if I take out an animal and I do that by the name of the animal Then everyone knows what I'm doing. Um, but if I just say from this matrix give me column or give me row number 5 Head them for people. It's really hard to understand. What's kind of going on there because you're just selecting 5 All right, so these are my answers and I will upload the answers So this file I will upload this to Moodle, of course So that you guys can take the file and compare it to your own answers so that We can see and of course if anyone has any other creative solutions and things Um, I did it in this way. Is this correct as well? Probably yes, and I'm always interested in in new creative solutions that you guys came up with because sometimes um The answers that students give me is really surprising and is really fun for me to see how people think like I know one of the students sent in their whole assignments to me with Well, I know we are not supposed to send in assignments, but could you look at it? And that was actually quite interesting for me to see Because then I also know or I kind of see where people go wrong So if you have any questions and you want any feedback saying that oh, I did it like this Is this correct or is this is this okay? um Just send it to me or ask it in chat and we can discuss and um There are a lot of different ways of achieving the same goal And that's true in many programming languages, right? So there's no one way of doing things or one right way of doing things generally. There's like 10 or 12 ways of achieving the exact same goal And that's not bad. It just allows you to be more creative and It allows you to kind of develop your own programming style Easier which which I think is good because a programming language should not limit you It should empower you to do the things that you want to do All right, so if there's no questions about the assignments, then um, what are we going to talk about today? So for today, I have the following Topics that I want to talk to you about and they are also in the description down So it should be clear for everyone what we're going to talk about So we are going to talk a little bit more about variables in r and then Because I wanted to make the whole analogy which I normally have because I'm a very visual person So I like to kind of see things in front of me And also that holds for code If I if I read other people's code in my mind, I'm starting to build Little factories with conveyor belts and little trucks driving between the factories taking boxes from one factory to the other Right. So in in my mind that works really well to kind of figure out what's going on in the code to Kind of think of functions as being factories control statements as being like conveyor belts, which are guiding things from a to b um, so I want to kind of run through this analogy with your guys, uh, and Control structures, of course, I want to talk about the difference between statements and expressions Because I think that that's important and there is an exam question about it So you guys should know about it. Um, and then we will talk about functions. So how to bundle up Code that you have created into us into a function so that you can reuse this code right and this was kind of the big invention of The early modern-day machines like the eniac is that it supports sub routines and that's another name for a function So it's just a routine which you execute and then after it's executed It returns the answer and it deletes everything which it used internally. So it's a really nice way of making your code accessible for other people Then I wanted to have a couple of slides about brackets because there's many different types of brackets in r that we are Using like round brackets square brackets and when we are talking about control structures and functions We also start introducing curly brackets. So I wanted to give you one slide where on the slide It is explained for you guys when to use which bracket so that it's that it's not as confusing and then I want to talk to you guys about escaping the inevitable, um, because When you are using the kind of double quotes, um in programming then of course these have a special meaning But sometimes you want to write text which has double quotes in there So that's kind of a catch 22 like how do you do that? And there is an Escape for that. So you can escape this inevitable pitfall And then I wanted to talk to you guys about randomness and distribution. So how do I create? Reproducible randomness right because we are doing science So science is a field where we want reproducible research So even if we are using random numbers We want to be still able to repeat what we did and not be falling into this trap that when we generate random numbers We get random outcomes because we want to have our analysis be kind of repeatable right that's that rule number one in science And then again like last week, I wanted to say a couple of things about clean and reusable code Um, because I think that that's one of these things that really helps beginning programmers to be kind of forced in um, how you should write clean code so code which looks good and which kind of From the structure of the code you can see what it's doing And then I want to talk a little bit about reusable code which kind of ties in with functions and so to make code which is which you can use over and over again, which means that you should not Take the fifth column, but you should take something by name Because if you get another data set where all of a sudden the fifth column is not the fifth column anymore Then your code stops working. But if you take the column called date Then it it doesn't matter which column it is. It can be the first one can be the 11th one, right? So that's why it's useful to um, to select things by name and not just say take the fifth column from this matrix and subtract five because In the next time the next data set that you get the fifth column might have shifted towards the seventh column Good so 10 minutes to three. So we'll do a couple slides. I think we will do the variable part um, and then we will have a little break and um you guys can Watch some animated gifs and you guys can listen to some music But I want to get a little bit into the lecture. So let's start So I told you guys variables are boxes You can put things in and you can use this box without knowing what is in the box Right. So this is just a very generic image of a box, but it's a black box, right? So you you have no idea what's in it So we do a lot of things with these variables, right? So variables are more or less the workhorse of programming because they store stuff. They are like the messenger RNA of programming, right? So everything in programming happens via variables and assigning things to variables like having a variable called result And this variable result is updated continuously By adding things from a matrix or something like that So I just wanted to repeat for you guys if you have a variable you can ask for the length So the length of an object will give you the number of elements which is in this variable, right? Because We might not know how many elements we read in from a file We might not know how many elements the user gave to us So it is very important to know the length of an object because often when we have a vector We're going to go through the vector from one to the end of the vector So the end of the vector is given by the length of a vector Matrices have of course a number of rows and a number of columns I also told you guys about the str function So if you have a very complex object like you have a list With a matrix in there and then on the second element of the list There's a vector and then on the third element of the list. There's another list Then the str will give you kind of a textual overview of how your object that you are currently looking at is structured The class tells you which class or which type an object is and is a very useful function to kind of deal with when R gives you a warning message or gives you an error saying that this is not a numeric type, right? So class can help you avoid these things by checking and saying if the class of this variable Is numeric Then I can do a computation But if the class of the object is character Then of course I cannot do a square root function because I cannot take the square root of the letter c that makes no sense And then of course we can go from one type to another type by using the s dot type so s dot logical will transform Something too illogical And then we can also use the is so is something illogical or is something a vector So control structures are conveyor belts, right? We have boxes So we now have the ability to write code which puts things in boxes, but we want to do something with these boxes, right? So we want to write algorithms, right? We want to say if I get a box and this box is a Is the pdf file of this presentation already uploaded? No, no, I was I was still drawing like little drawings for you guys Until like 15 minutes before we started So unfortunately, I haven't converted it and uploaded it yet Do you really want me to? Because I could because we have a break coming up then I can in the break just save it for you guys and upload it on Moodle So then you can like write your things on the side Let me just do that for you. So I will upload it during the during the break So control structures are conveyor belts, right? So you can put stuff in boxes, but then these boxes need to go somewhere because otherwise you don't have a functioning program so and this is based on a Based on a fixed algorithm, right? So control structures guide the boxes to the correct destination and you write the algorithm where boxes go And there's only two different types of control structures It's not entirely true But for the sake of this lecture, we will only discuss the two major control structures and those are branching And looping so branching is a conveyor belt which splits into multiple branches looping is a conveyor belt where the conveyor belt takes the box and then kind of Drops it back on the conveyor belt earlier. So there's no real Well, you can you can think of a conveyor belt which goes into a circle and then the box drops Down below and goes through the circle again, right? So that's looping. So doing something repeatedly. So While the box is not full Bring the box around and put more stuff in it Right, that's that's looping All right, so let's start with branching. So imagine that I have a box and There is something unknown in the box, right? So so not me but someone else did a statement saying that take my box And put the value true in there or take my box and put the value 12 in there or take my box and put the letter c in there Right. So imagine that I have no idea What was in the box? Then I can still figure out what's in the box by using branching, right? I can do an if statement so I can do a test so I can say if the class of the box is a character Call a function called right on the box. So move the box to the right track of my code If the box is not a character, then I want to move it to the left side, right? So here it is the structure here is that you write the keyword if so if then between round brackets you do your statement your statement is something a statement is something that evaluates to true or false So here the class of box is is so this is the Equality comparison operator. So to compare if something is equal. This is not an assignment So there's a big difference between using is and using is is so a double is is something that is a question Is equal to while a single is is assigning something to something else So just as a basic example if the class of the box is a character then call the function right Else call a function called left and then give it the box, right? So take the box and move it to the left or move it to the right So an if statement is relatively easy to read, but sometimes you want to have multiple branches, right? So you want to have a conveyor conveyor belt, which does not split into two But which splits into four or which splits into six different routes for the rest of the code to take Right, so then you can use a switch statement. So switch statement is very similar to an if statement The only thing is Can you give an example in our please? Yeah, sure. I can give an example in our Um, so I go to notepad plus plus right because I never write if or statements or anything in in in r Because that's just like it's a multi line statement, right? Um, so for example, um, I can say run if right. So run if draws a random number So give me one random number, um, and I have to save this. Otherwise. I don't have code highlighting So example that are No call it example to that are All right, so I have um V right so I make a variable and I put a random number in there. So when I run this code in r, um, like this Then I have no idea what's in v right? So when I check v it's 0.6 But if I do the statement again, I draw a new random number. So now it can be anything 0.57 So imagine that I now want to check If the number in v is smaller than 0.5, right? So I can say if V is smaller than 0.5 I want to do something right? So I want to for example say, um cut or I want to say print Let's do cut. Um V is smaller than 0.5 Right and L's right. So if v is not smaller than 0.5. I want to do something else. Um, and I want to say for example um I am a robot I don't know like you can print anything to the screen that you want Right. So now what I'm doing is I'm drawing a number. I have no idea what the number is Um, and I check so I use an if statement to check And if it's smaller than 0.5, then I print this and if it is larger than 0.5, then I say I am a robot So now when I run this code in r, right? So I copy paste the code and then I go to r and I paste it in right then First time it says I am a robot So now I know that v is going to be larger than 0.5 Which it is it's 0.7 I can run the code again and it says again. I'm a robot. So v again is larger than 0.5 And I can run this code all over again And every time it will tell me if v is larger or smaller than 0.5 Is that a good example? Although it says I'm a robot, but that that's just what I came up with Right, I can do normally you would do computation with it, right? So you would do For example the square root, right? So imagine that you have a mathematical function that if Yeah, so the square root you can't take from negative numbers So you would use an if statement to protect against something like that, right? So you would say if the number is larger than 0 do the square root else Give me a warning So that that's one of these more logical things or a more A more reasonable example, but yeah, but this is this is kind of how it works So and an if statement is really good when you want to branch into two If you want to branch into more Then we have to use the switch statement or we have to use an else if which we will explain on the next slide Good. So with this and with this little example, I think that we are at three o'clock. So we are so For you guys, I prepared a really nice set of animated gifs And the first one because I did this this morning is going to be Elephants, I think I think Yeah, yeah, I need coffee. I need coffee. I need a little bit of sugar as well So, yeah, you guys get some coffee or whatever you do like take a toilet break or or something like that and enjoy the elephant animated gifs and Gifts of boxes on conveyor belts. I wish that there were nice gifts like that, but unfortunately like They're not so I actually thought about drawing a conveyor belt But I gave up this morning when people started bothering me for other analysis But it doesn't matter. So you guys get to watch elephants for like seven to eight minutes And then I will be back and then we will continue with Control structures. So we will continue with first with ranching and then we will go and into looping Um, and there's going to be music I hope that I don't get copyright thing again like last week, which took me like a little bit of stress Um, but we got the copyright thing removed because I'm actually I have my my equipment that I bought So well, I didn't buy it. I got it as a present but That protects me against kind of copyright strikes because it comes with a library of music that I'm freely able to use on youtube But last week it went a little bit horribly wrong. Anyway, that's not of your concern. So you guys enjoy the short break 10 minutes. I will Run down get a cigarette. I will try and upload the slide so that Blubbery blub can actually make live notes, which Is a perfectly reasonable request. So I will put them on Moodle So see you guys in around 10 minutes and enjoy the coffee break and the elephants But I did get the lecture up on Moodle So if you refresh Moodle, there should now be lecture two more introduction. So there should be a uh pdf file that you can Use to put your notes on so I hope that worked So welcome back. Thank you guys for still being here after the nice elephants and all the other stuff. Let me Do that Good. So first part we talked about the if so let's make the if statement a little bit more complex Um, so we can do if something Do this else do something else if we have more than one thing We can use the switch statement if we have more than one thing that we want to do We can also use else if so we can also make multiple routes So the switch statement that I just showed you guys, right, which is really short and nice and clean Um, you can also write it like this So you can say if the class of the box is his character go to the right Else if and then again have a new check The class of the box is his logical go to the middle and then all the other boxes I call and send to the left right by using the function left So this is the way that you can add more than one clause to an if statement So you can you can use multiple routes or multiple clauses. Um, yes, so you can say well, I want to treat Um Characters differently from from numerics and I want to treat numerics different from from logicals So I hope this is clear like the else if is something that you just have to practice This is something that you get good at if statements if you write a couple of them And during the assignments, there will be a couple of if statements that I want you guys to write and there will be a couple of if statements or switch statements where you can use the else if as well, but this is something that um It's something that you just have to learn by doing right I can show you like 10 different if else statements, but in the end, um, you have your own um, kind of Problem that you want to solve so for data analysis at the if and the else statement they define your algorithm So each algorithm is suited for a specific type of data or specific structure of data So what if we want to do more or different comparisons, right? So we can say for example if x is smaller than five So that's the thing that we just used in the little example only we use 0.5 And then then we can say x is smaller than five right and of course The statement in the if statement so the the thing that we are testing has to evaluate to a single Logical value so it can only be true or it can only be false So we can't use an if statement easily when we are dealing with vectors, right? If we have 10 numbers then saying if One to 10 is smaller than five that doesn't work, right? Because then we have 10 times a true or a false value So you have to be aware that when you do an if statement it has to evaluate to a single True or a single false statement Um, we can also do the same thing when we have two variables, right? So we can say if variable x is smaller than variable y then we can then print something to the screen We can also use of course vectors, but then we have to realize that what we want is to be any or all So we can say if all of the numbers in x are smaller than five then we want to print this If we are okay with not all of them being smaller than five But we require that at least one of them is smaller than five then we can use the any statement So we can say if any number in x is smaller than five, right? So in this case, we we kind of read from the inside out Which is a very common thing in mathematics, but also in programming, right? You you read from the inside to the outside So if we check if the numbers of x are smaller than five and then we check if any of these are true So let me give you a quick example on that Just to show you guys how that works, right? So if we have the numbers one to ten Right, then we can say one to ten Smaller than five and then it will say true true true false false false false, right? Because the first ones are so now if I ask all then it will say False because not all of the numbers are smaller than five But I can ask the question any number smaller than five and this is of course true because there are four numbers which are smaller than five Good So be aware that this will become very complex at a certain point, right? Because if we have our own data head, then we might want to say Take all of the data from my matrix where the measurement date was in 2019 And the measurement was higher than five And some other parameter, right? So in the end you're when you're writing your own algorithms, you're you're building a very large and complex if statements or Selection statements to select from the matrix of data that you have so that you get the subset Or the exact subset that you are that you want Because generally we don't compare across all of the years. We just say no do year number 2009 and then Do the computation and then do the computation as well for 2010 For a while and repeat So if we want to do looping, right? If we want to do an operation and we want to do this operation multiple times Then we can do this using a for loop or we can do this using a while loop So for example, imagine that we have a variable called box and we put a thousand into the box Right, then we can now say well, I want to take the value One out of the box and then I want to take two out of the box And then I want to take three out of the box and then I want to take four out of the box, right? So if I want to every time I want to open up the box take something out and then close it again And then in the end I want to know how much is left So imagine that I want to do this not one time, but I want to do this 10 times, right? So I first want to take the value one out of the box Then the value two then the value three then the value four then the value five and then in the end I want to see what's left So I can write this down as an algorithm by saying four x and x is a new variable that I define And this variable is only available to me within the for loop So the for loop itself, this is the statement and here is the beginning So the the curly brackets denote the block of code that belongs to it Just like in the if statement, right in the if statement We also have this curly bracket and then the if statement ends here So within this if statement, I know that x is smaller than five After the after this statement, I am not I don't know that anymore Because this is it's not inside of the inside of the inside of the curly brackets So for x in one two ten, right? So do this 10 times the first time that I go through the loop So the first time that I execute the thing between the curly brackets x has the value of one The second time x has the value of two So now what I'm going to do is say, well, I'm going to take the box Right Remove an x amount of items from it And then I'm going to store this into the box again, right? So I'm just going to override the value that I had right So the first time that this happens, it will say a thousand minus one is 999 then the next time that I go through it, it will say 999 minus two is 997 And so on until it reaches 10 It will execute and then we can continue from below So you can use a for statement when you know how many times you want to do something So generally if you have a data matrix right with your own data Then you might want to do something for each of the rows in the matrix or for each of the columns in a matrix So then you would say for x in one two the number of columns of my matrix And then take out the first column or take out the second column or the third column, right? Because it will then go Loop through every column of your matrix You can do the same thing with a while loop So why are there two ways of doing this? Well, when you know how often you want to do something you can use a for loop But if you don't know How often you are going to do something Then you need to use the while because the while loop just checks if something is true So it will continue looping until the statement Is not true anymore So so if the statement is true, then it will continue If the statement is false, then it will stop So if I want to write the same algorithm, which is relatively short here Using a for loop if I want to write the same algorithm And I want to write it using a while then I now need two variables Because I need to have my box variable Which I put a thousand in and then I need to have another variable Which is the variable that is going to be the number of elements that I'm going to take out Right, so that's why I call it take out and the first time that I'm going to take out something from the box I'm going to take out one. That's why I'm assigning one here to take out And now I write my statement. So I say while The variable take out is smaller than or equal to 10 Execute these two lines of code, right? So within this block, there are two Expressions and for each expression will be executed and every time I execute That this is executed. I know That the take out value is going to be smaller than or equal to 10 So again, I do the same thing as I did before. So I take my box I do minus the number of elements that I want to take out So the first time that I'm doing this take out is one and then I'm just going to store this back into the box And then in the while statement, I have to now remember to manually Upgrade my take out variable because the second time I need to take out two elements, right? So I have to say Take out the so the new value of take out will be the old value of take out plus one And then store it back into take out, right? So this means that every time that I go through this loop Take out will be increased by one since we start at one Leonardo price ask a question. Is it also possible to type take out plus is one? No, that is not a valid operator in r plus is does not exist in r So you have to use take out is take out plus one You have to be very explicit in r um example for while please if possible Sure, uh, what do you want to do? So, um Let me come up with a nice little example So for a while loop So we can use random numbers again, right? So I can say v equals run if One so give me one random number store it into fee and now I can do Count right and count initially zero So what I'm trying to write now is I want to draw random numbers Until I hit a random number that is higher than a certain value. So I can say while V is Smaller than for example 0.9. I want to do something. So what do I want to do? Well, I want to update my count, right? So count is count plus one And then I want to draw a new random number. So I'm going to say v is run if one Right, so this will this will loop until I draw a random number which is higher than 0.9 And the amount of times that I went through the loop is stored in the count variable, right? And to make it explicit I'm going to say cut So I'm going to say Count is comma count comma comma value is And then v and then I'm going to add a new line, right? So I'm just going to every time that I go through the loop. I'm going to say Wait, this has to be like this, right? So I'm going to update my count by one I'm going to print the value of count the value of v So the the current random number that I drew and then I'm going to draw a new random number So this will just loop and loop and loop until by chance. I have a random number which is higher than 0.9 So if I run this in r, so let me go to r Then this should spam, right? So now I saw that it actually did three iterations. So in the first iteration the random number that it drew was 0.08 So that's not higher In the second iteration the random number that I drew was 0.74 And in the third iteration I drew the numbers 0.72 So why did the loop stop? Because I'm printing the value first and then I'm drawing a new number, right? So the number is evaluated So I know now for sure that the current value of v After doing this loop is going to be higher than 0.9 So what was the number that was drawn? 0.93 6 or something, right? If we then run this again Then now we are very unlucky, right? So because we drew random numbers and it took 35 times to draw a random number which is higher than 0.9 The value that we ended up with now is 0.9053 Right, so every time that I draw or that I loop this code it will randomly draw numbers If the number that it drew was higher than 0.9 it will stop So now I know that v is 0.9 or higher Yeah, no worries. No worries. That's what we're here for. So ask for examples if you want to learn more Good. So here I'm doing the box, right? So this is the for loop for this is the for loop because I know I want to do this 10 times And this is the while loop and with the while loop I can continue doing something until a certain condition is met So a little bit of an example For the for the for loop in this case So hey imagine that I take all of the even numbers from 2 to 10 and I my question is What is the total sum of all of the even numbers from 2 to 100, right? So I'm going to say generate all of the even numbers I'm going to define a new variable called total Initially that I haven't added up any numbers. So initially the total will be zero Then for number in even right because this is a vector. I'm allowed to do this I don't I don't have to say one two the length of even or something like that. No, I can directly loop through a vector So I'm going to say for number in even What do I want to do? Well, I want to just take the number that I'm looking at currently Add it to the total and then store this back into the total So what is happening? Well the first time that I go through the loop that my number is going to be two So the total is going to be zero plus two is two The next time I go through the loop the the number is going to be four So my total from the previous iteration was two Plus four. So now I'm going to store six into total The next number is going to be six. So I have my total from the previous time which was six Plus six, which is the new number and now I have 12 And this will continue, right? So if I'm just wanting to add up all of the even numbers from two to a hundred I can use this for loop and it will add up all of the numbers And in the end I can just type in total and it will show me what the total sum is after Going through this loop So I told you guys that there's a difference between statements and expression. So if And anything between the round brackets is called the statement, right and the statement needs to Go back or needs to kind of collapse when we evaluate it It needs to collapse to a true or a false value Because I need to decide if I need to execute the expression or the multiple expressions So this is called a block. So you have a block open So which is a curly bracket and then you have a block closed and you can have as many expressions in the if statement if you as as many as you want The same thing is for the while, right? So a while also has a statement and then everything within the curly brackets Is the expressions that belong to the while Like this these are the expressions that belong to the if And of course, I already told you guys that statements will Can and will become very complex, right? So you can for example negate your value by saying not so this means not a right so Box is is one test if the value in the box is one and then I'm asking not Right, so this is equivalent to saying if not the box is one Do this expression You can you can test multiple things at the same time So you can say if the box is larger than one and at the same time the value in the box is smaller than 100 I want to do something You can also use an or statement saying that if the box is smaller than or equal to zero Or the box is larger than 100 do something Right, so there's many different ways that you can ask questions So have what can we all what can we use because we have to do comparison, right? So we have to have different comparison operators. So our supports these comparison operators So we have the equal to which is is we have not equal to which is not is we have smaller than We have larger than we have smaller than or equal to or larger or equal to So those are the things that we can use and then of course, we also have the negate operator, which is the not statement Right, so the question are the exclamation mark? So I can take any true value and by putting an exclamation mark in front of it It will become false and I can take a false statement Put an exclamation mark in front and then it will make it true Right, so it's just inverting the answer So if the answer is true, it will be false if the answer is false, it will be true So the negate statement So here when we when I showed you this example, right here, we have the nn, right? Which is the and operator And here we have the or operator, which is two times the horizontal stripe thingy So There is a difference in r between a single and and a double end And this is complex and this will bite you in the ass a couple of times And then you will learn by doing that when to use which one But I can tell you a little bit on when you want to use each one of them, but I know from experience that it's uh For total, can you also write? Yes. Yes. So the the arrow and the is symbol are equivalent to each other So I can you I can show you why I use an is here And for me, I always try to use the arrow operator when I'm making a new variable And I use the is operator when I'm updating a variable And that is just something that I I learn or I do myself Some people never do it. Some people always use the arrow. Some people always use is they are completely equivalent to each other But for me, I like when I when I see my code I like to have a difference between when I'm creating a new variable or when I'm updating an old one So for me, I when I create a new one, I use the arrow when I update an old one I use the is and I'm not 100 consistent in that because like you can find code online that I wrote Where this is not 100 true But I try to kind of be diligent and every time that I'm defining a new one use the arrow Updating something that is already defined Then I'm going to use the is one But yeah, you can write the arrow that's totally legit and will totally work All right, so the and and and Right, so the single and is vectorized This means that you can use it for logical vectors So it will take two logical vectors and then do the and operator for each one of these in between So it allows you to do statements and allows you to select from vectors The double and is not vectorized It takes the first element from a vector and you have to make sure yourself that you only use it in single values Someone taught me always be consistent. I try to be but of course, there's no such thing as 100 consistency But and so coming back to the and and the oars Um, you have two different types of ands and two different types of oars in r So one of them is vectorized meaning that you can use it with vectors and the other one is not vectorized So my rule of thumb is always if you are writing an if statement you use the double ampersand symbol If you're writing a statement where you are selecting from a vector, then you use the single and statement And this is something that you just kind of have to learn by practicing I can explain this I can give you a couple of examples, but in the end you just have to Run against it a couple of times. So you just have to make the mistake use the wrong and symbol and then r will Will more or less tell you So but hey, the idea is is that if I have v1, right and v1 is a vector Then I can now ask two statements at the same time. So I can say V1 smaller than four and v1 larger than two, right? So this only holds for the number Three, right? So the answer here is false false true false false, right? Because the only number in v1 Which adheres to this statement is the number three So the double one and and is not vectorized So if I have true and and false then this Evaluates to false If I have a vector and I use the ampersand symbol So the double ampersand symbol So I have a vector which is true in the first position false in the second position And I'm going to say and then true it will only look at the first one And it will also issue a warning So if you have an if statement you need a single logical value So when you are writing an if statement always use the double ampersand when you're wanting to ask Is something and something So the output from a single ampersand or the single or statement cannot be used directly So the if statement will just take the first comparison, but it will issue a warning message So when you do an if statement and you use the wrong one So the single horizontal line for or instead of the double horizontal line for the non vectorized Or you will get this warning message saying that the condition has length larger than one and only the first element will be used So if you get this warning message take it very serious It is it is it is not A critical, but it generally it means that there's something wrong in your code And in general it means that you're using the vectorized version of the ampersand Or the vectorized version of the horizontal or statement Good. So I hope that that's clear and we will run against this or you guys will will will Encounter this during the during the assignments. So there is a very There's a specific assignment that's there to kind of trick you guys in using the wrong one And if you use the right one, you don't get the warning message, but be very very Dilligent if our issues a warning message Really check if this is what you want to do because in 95 percent of the cases It is not what you want to do and our issues warnings for a reason. So take warnings very seriously So why do we have this ampersand symbol? It is to do in vector comparisons So to select from a vector some things that I want to have Right. So for example, I have a vector which is um So a logical vector a true false vector can be used as an index to a normal vector So here for example, I'm creating a variable called 10 to 1 which contains the numbers 10 to 1 So you can use the double point operator also in the other direction It's not just 1 to 10, but also 10 to 1 works And now I can ask the question 10 to 1 smaller than 5. So which Which value stored in this variable is smaller than 5? Right, then it will say false false false false false true true true true Right because only the last four are smaller than 5 I can directly use this logical vector as an index to 10 to 1 because often I'm not interested in which ones are smaller than 5 I just want to take out All of the values smaller than 5 right to make a subset So not use the whole vector, but only use the vector for the values which are smaller than 5 So how do I do this? So this is a very common idiom in r where you see a variable with a question And then this the answer to this is directly used as the index to the same vector on which the question was asked Right, so 10 to 1 smaller than 5. This will be false false false false false true true true true true And then you can use this to select from the vector The correct values So let me show you guys how that works in r. So let me first go to notepad Just as a little example. So I for example, I have in x I have the numbers 1 to 100 Right, and now I want to say x x smaller than 30 Right, so if I would do this then now I expect that the return of this So what is printed is the values 1 to 29 Right, so if I would copy and I would go to r Um, then it would look like this Right, and now it indeed selects only the values from 1 to 29 And I can do it more complex right because I can now use the vectorized operation. So I can say And x is larger than 10 And now it will only give me the values to which this adheres to And I can even make it more complex and I can say and x Needs to be for example divisible or needs to be even right so x Euclidean division 2 is is 0 And now I only have the even ones right so I can just Keep on asking questions and in the end I have a single line of code which takes 100 numbers Then takes all of the numbers smaller than 30 larger than 10 And the number which is equal or which is even right so because when you euclidean divide a number by 2 And there is no remainder then it means that the number is even If I want to do the same thing and want to get the odd numbers I can actually do this is 1 right because 3 euclidean division 2 means that there's a remainder of 1 5 euclidean division 2 means that there's a remainder of 1 So these are the odd numbers From a vector with 100 numbers Smaller than 30 larger than 10 Where they are odd So you can build up code statements like this So that's what I'm trying to show you guys with this slide You can even use a little trick Because if you have a logical vector Right like this logical vector has a length of 10 So this Here when I'm using it it will select the numbers which are true But I can actually Use a smaller logical vector. I can say I have two values one which is true one which is false And then if they do not have the same length Then what r will do is from the vector 10 to 1 It will select the first element because it's true for the second element It will be false so that one will not be selected But for the third element it will go back to this little vector and then say well the third element is true The fourth element is false The fifth element is true the sixth element is false, right? So it will loop around So this is also a way of selecting the even numbers just by saying 10 to 1 c true comma false It's just the way it is but it's it's tricky right because here it will not even issue a warning While I think it should issue a warning right because you're trying to select from a vector using a vector which is smaller And then it starts looping around in the small vector like extending this Just copying the small vector every time until it has the length of the long vector So I'm I'm not too happy that this does not give you a warning But this is a relatively well used trick in r Where you can make a subset selection from a vector just using a smaller kind of true false Thing saying I want only the even ones or I want only the the uneven ones, right? So in in r we can write a single one-liner and so instead of doing it like this I could say give me the numbers one to a hundred Right combine these together and then give me all of the values which are even So in this case I can say false comma true Right, so one is false two is true three is false four is true And then it will give me all of the even numbers from one to a hundred Right, so the whole letters assignment that you guys had can also be written down like this Where you take the even letters if you want to take the odd letters You can actually do it like This so just turn them around so say first element is is true second element is false So this is a little bit tricky and a little bit mathematical I just wanted to show you guys that you are aware that this this this is possible And selecting from a vector is something that you will do a lot I'm selecting from a matrix is the same thing right because generally when you're dealing with data You need to make certain subsets so take all of the animals That are below 20 weeks of age which have a body weight larger than 30 grams and which have wings And are not born on a certain day, right? So these statements can and will become very very complex So I already showed you guys this so you can combine these vector statements, right? So the pair wise and or the pair wise or Had to do the same thing as what we did before so we can say 10 to 1 Is 10 to 1 and then larger than 3 smaller than 7 And then we can even assign this to a to a vector or to a new variable And then use this variable to select from 10 to 1 Right and you can do this in vector as well So you can you can you don't have to make this new variable subset You can directly put it in but generally it's cleaner to have a variable Which you use as the selection criteria for your code So vectorize statements, you're going to use them a lot and it will just take you some time in practicing because it It's just complex and you just have to be doing it a couple of times And at a certain point it clicks and then you're like, oh, yeah Now this is how I select from from a vector or from a matrix for that matter So this works for matrices as well. So just a small example If I have matrix one, which is the matrix of the numbers one to nine Three rows three columns I can say for example from matrix one take the first column and then Which numbers in the first column are smaller than three? Right. So those are then that's the number. So I'm taking the first column. So that's one two three So then I'm saying which of these numbers are smaller than three. So that's one that's one and two And then I can use this variable to directly subset my matrix and now I have Matrix one column one lower than three comma, right? So take all of the rows for Which are which are true and drop the rows which are false and then I get a subset of my matrix Of course, I can do the same thing and I can also select from matrix one Take column one which of these values is three And then select this from matrix one as well So in this case, we have a vector because now only one of the rows is selected So when you select two or more rows, this returns a matrix when you select one row This will always return a vector Good. So normally this is where the break is but we We're running a little bit behind schedule because I didn't plan on doing the the assignments But that's okay. We can just take a little bit more time So I told you that there are only two types of control structures And I already told you at that point that I'm lying a little bit because there are more types of control structures in r So there are some special control structures like the warning So you can have for example an if statement checking if x is smaller than or equal to zero And then you can issue a warning you can say, well, um Hey, if I'm writing code and someone else is using my code, they should be informed that By providing a function of x or by providing a value of x Which is smaller or equal to zero stuff might go horribly wrong Right because I might have tested my code and I know that my code works for numbers which are larger than zero It works perfect. There's no bug whatsoever, but I never tested it for numbers below zero So then I issue a warning saying that well, you're on your own now, right? So Be aware that I didn't test this and this might go horribly wrong Of course, you might want to give a a more descriptive warning message saying why it might go wrong but This is when you when you haven't tested it If you know that stuff will go horribly wrong, don't use a warning use an error So an error is kind of a hard bailout If you call the stop function R will stop The computation and directly return to the parent scope So it will it will quit and it will break all of the computation And you will just be dumped back to the terminal for the next command And it of course will print the error So an error is also a control structure, right because it allows you to control the flow of the boxes So the error or the error or the stop function is more or less like the big red button In a factory and if you push it, right the whole factory shuts down and everyone walks outside Well, the warning is just you you press the warning button and a siren goes off, but everyone keeps working So to deal with warnings and errors, you can also have the try catch statement and this is very very advanced This is something that generally people don't discuss in lecture number two But since we're talking about control structures, I want you guys to be aware that there is a try catch So if you have an expression which might cause an error You can actually ignore the error, right? So you can say try catch For example, take the square root of a negative number, right? This will this will cause an error And then you can say error is and then this is a new function that you are creating And then it will say in this case print trying to recover So it will kind of eat up the error, right? So it's like Someone pushes the error thing everyone starts walking outside, but there's a guy that says no, there's nothing wrong Just continue on So the try catch statement is very very complex and I just wanted to show you guys how it is written down in R In case you need it in the future We're probably only going to Write it once during the assignment, but it's something that is very very advanced and only In like lecture number seven or lecture number eight. Does it really start coming up? Right when we start talking about statistics Sometimes you want to do A t-test for every row in the matrix But doing a t-test when there are less than three values will be an error, right? Because a t-test is only valid when you are comparing three versus three But if you're going through a matrix, which is a million entries You don't want to get your error at an entry 23 Just because it cannot do the test at number 23 Doesn't mean that you don't want to do it for all of the other ones, right? So in this case, you can you can wrap your t-test in a try catch statement saying that Just try to do the t-test if you get an error Just continue with the next one, right? So and and that's a way that you can do this right because otherwise You would have to first check if there are enough numbers and all of the other requirements But you can just say no just be dumb right for every row in my matrix Do the t-test if there is an error just continue with the next row good So a short overview variables are boxes. They store things Control structures manage program flow. So you have branching Going to the left going to the right testing if something is true Based on this you want to do either a or you want to do b or c And then we have looping and looping is kind of a control structure which folds back on itself. So For or while something is true. So for every column in the matrix do something While the user is giving me numbers Do something right and the user can then give an infinite number In infinite number of numbers and at a certain point we want to stop when the user gives a certain number But the while loop is used when you don't know how often you should do things Well, the for loop is used when you do know how often you want to do something So the for loop is the most general one because often you are dealing with data that comes from a matrix or from a From another source and then you just want to say for each row in the matrix or for each column in the matrix And then we have warnings and errors warnings and errors are control structures because they do change the flow of the algorithm But they are more severe, right? It's not an if statement. It's not a brand. It's not a looping. No, it's saying that well Something might go wrong or something will go wrong. So please be careful from this point on Good. So let's have a little vote in the chat to see how many people are still awake Do we want to go and do the break now or do we want to continue for 10 more minutes and then do the break in the middle of the next section? So the next section is functions relatively complex as well. Um, but something that is fun Um, but you guys throw into the chat. Um, also if you're on the zoom thing, um, just throw in the chat, say that you're awake and just say break or continue Then I get a little bit of a count of how many people are still here. Um, and we can decide if we want to break now or if we want to break in like 10 minutes So let me be the first one so that people are not shy. I am going to say break. Uh, I don't know 10 what's 10? Davido. 10 is break or continue? Continue, probably, right? Continue for another 10 minutes. Oh guys, it's such a beautiful weather. Like break outside, break outside. Yeah, break, break, break. No comment. Don't mind. I don't know what the weather is at your place, Misha, but here it's so super sunny. So I really just want to run outside for like 10 minutes and, uh, Just enjoy the sun a little bit. So Davido, you say continue. You're the only one. No, Helen Neama is also saying in 10 minutes. All right, 50-50. No, it's not really 50-50 because I voted for break as well. Frogs, frogs are real. Okay, so yeah, this is kind of 50-50. All right, um, I'm going to make an executive decision and I'm going to say we are going to take a break. So, let's just do a break. Oh, the weather is 50-50. Yeah, I'm very sorry about you. Um, the total zoom vote came up with break. So everyone on the zoom voted break. So just so that you guys on YouTube now, um, you were overruled by the, by the zoom vote, which is nice because no one can see it because no one's logged into the zoom vote. Or almost no one. Good. So we will have a short break. Again, like, seven to 10 minutes, depending on which song I start in. And let me see if I remember. So the second break is going to be, okay, so we have a break. So I'm going to run the next one and I'm going to take a break. I'm going to run the next one and I'm going to take a break. But it depends from which song I start in and let me see if I remember. So the second break is going to be ducks. So duck and cover while the ducks are going to entertain you guys for 10 minutes. So we'll be back in seven to 10 minutes. And enjoy the ducks. ducks. I hope you guys enjoy the ducks duck duck duck duck duck. All right, I had to duck a lot. I was like chased by this really annoying wasp outside so it chased me around half of the campus before I got rid of it. Anyway, let's continue. So third part of the lecture and we're going to ramp up the difficulty a little bit because we now know about variables. We know about control structures and I wanted to talk to you guys about functions. First, some advanced looping and this is going to go back in another lecture. So there's more ways of looping in R. Like I told you guys, R provides a lot of different ways of doing the same thing. So we can use the L apply function. So we can linearly apply to X and X here is a vector, a certain function, which are the additional parameters that we can specify. I think I encountered the same wasp this morning. Yeah, I think it's been haunting campus for the last couple of days. I've encountered it a couple of times and it's a really big one. It's like one of these like, so you don't want to get stung by one of those. So we can also apply to a matrix. So if we want to go through a matrix and we want to go through the rows, or we want to go through the columns, then we can use this system as well. So instead of writing a for loop or writing a while loop, we can say, apply to X, X here should be a matrix, then we have to specify the margin. So the margin is if you fill in a one, then you say do for each row, if you fill in a two, then you say do for each column. And then the fun is a function that you want to apply. So I'm going to give you guys a very quick example for this, because it will come back. But I think it's good to show you guys, right? So I can say X one to 10, right? So this is a vector. So I can L apply to my vector X, for example, a certain function, and the function could be the divide function. And then I want to divide by eight, right? So what this will do, it will go through all of the numbers in the vector called X, and then it will divide them by eight. So if I would run this in R, then we would see something like this, where it should output all of the numbers divided by eight. And then we see that we get one divided by eight, two divided by eight, all the way to 10 divided by eight. So we can do the same thing for a matrix. So for example, if we have a matrix, so we say matrix, give me 100 numbers, so one to 110, well, the 20 rows, five columns, and I call this M, right? And then I can say apply to M to one to the rows, for example, the function mean, right? So calculate for each row the mean of the row. So if I would go to R, then it would show you guys something like this, right? So for each row, it will now calculate the mean. If I want to calculate the mean for each column, I just replace the one by a two. So I just go now through the columns. And this will calculate the different means of the of the column of the matrix. So every column in the matrix, so the first column has a mean of 10.5. The first row has a mean of 41. So just to show you guys the matrix, so the matrix looks like this, right? I just filled it with the column. Good. I actually had an example. So if I make a list, because L apply also works for lists, so I can say I have a list in the first element of the list, I have a vector of 125. In the second element of the list, I have a vector from 123 or 123. And then the missing value, then I can, for example, calculate the mean for each element in my list. So I can say L apply my list mean. So the first mean will be three. The second mean will be NA, because you can't calculate a mean when one of the values is missing, right? Then the mean is also missing. So you can fix this by using NA.RM, which is remove NAs. So hey, it just says that, well, ignore NAs when you calculate the mean. So now, when we do this command, we say L apply, so linearly apply to my list, the mean function, and every time that you encounter an NA, remove it. So now it will say that the mean of the first vector is five, the mean of the second vector is two. And because of the double brackets, you can see that the return type of L apply is a list in this case, because it is double brackets, one double brackets, two. So this is the first element, second element of my list, because it doesn't have names, right? So if I would name it, then it would be okay, because then it would specify the name. A little example of apply very similar to what we just did. So from the matrix, fill it with one to 50, 10 rows, five columns, call this my matrix, then we can apply to my matrix to the rows, the mean, and then we can apply to my matrix to the columns, my mean. And then I never use this C12. So because you can go through the rows and the column at the same time, but I've never used this at all. So just forget about the fact that this exists. So if like, if there's a question on the exam, asking if the margin is set to two, what does it mean, then you say, well, because we set the margin to two, it means that we calculate or that we execute the function for each column. And if it's set to one, then we execute the function for each row. So L apply and apply are sometimes more efficient. Well, not sometimes, but they are almost always more efficient because R knows what you're going to do because with a for loop, it can't really know what you're going to do. But with when you use L apply, it knows that you're going to do something for each element in the vector. And with apply, it knows that you're going to do something for each row or column of the matrix. So it can allocate beforehand the number of elements that you are going to return. So it's a little bit more efficient. And the nice thing is, is that there are also parallel versions of L apply and apply. So meaning that you can use multi core CPUs very easily. And like we said, or like I showed you guys, if you want to execute an arithmetic function, like the plus function, then you should quote it. Otherwise, R doesn't understand what you want to do. So if I say L apply to the vector one, two, which is just the vector, which is two numbers, the function plus, and then five, then what it will do it do, it will do one plus five, and it will do two plus five. So it will return six and seven. But they will come back. And they are very, very useful for a lot of things. And they are very, very quick. Sometimes things take, well, hours, when you just write a for loop, and it can finish in like under 10 minutes, when you use the L apply or the apply function, just because they are more efficient computationally, because it knows beforehand how many results there's going to be. So L apply and apply will normally give back a list. And generally, you don't want to deal with lists, because lists are annoying, because they have different, they can contain different things at each of the elements. So for example, if we're doing this L apply to the vector one, two, the plus of five, then we can unlist it to get a vector back, right? So the here we see that we get a list back with the first element containing a vector of length one, on containing the value six, we get the second element of the, of the, of the thing that we get returned, we get a vector of length one with the value of seven. But we can use unlist to make this into a vector again. And unlist is a very, very powerful function, because you can do much more than just making a list into a into a into a vector, right? So going from a list to a vector, you can use unlist, but you can do much more things with unlist. So they are it is a very, very powerful, powerful tool together with L apply to deal with data manipulation. Good. So to complete the analogy, right? Because we now know that variables are boxes, control structures are conveyor belts. And in my mind, functions are factories. So a factory can contain many boxes. But these boxes are not available or visible outside of the function. So this means that you can reuse variable names. Because generally, you have to assign numbers to variables, right, to work with the variables afterwards. And the problem is, of course, is that there are only so many unique words. And in the end, if you would have one big script, and you would just write it from beginning to end, then at a certain point, you would run into issues because you've already used X, or you've already used I, or you've already used your variable name, that is your favorite variable name. So to prevent this, functions can kind of encapsulate pieces of code. Everything inside of the function is not visible from the outside. So you can you can have a function with the same name, or you can have a variable inside of a function with the same name as a variable outside of the function. And changing the variable inside of the function will not change the variable outside of the function. So and this is called the scope of a variable, because a variable lives in a certain scope, it exists throughout the session, if I just define it in R. But if I define a variable inside of a function, then this this variable is not visible from the outside. So we can build factories, they can contain many different boxes. These boxes are not visible from the outside. So we can we can use variable names, which we already used inside of a function. And this is called scope. So how does this now fit together? So factories contain boxes and conveyor belts, right, because a factory also contains conveyor belts. And there are some limitations on functions in R. So we can have multiple variables going into a function. So multiple boxes can go in. And this is called function parameters. Right, we can have all kinds of boxes for intermediate computation within our function. And this is called local variables. So these variables are local to the function that we are defining. And in R, there is a limitation because each factory can only give you back one box. So only one variable or one thing can come out of a function. And this is called the return value. And the return value can be very complex, you can you can return a list with hundreds of items in there. But you can only return one list, you can return two lists, because then these two lists need to be in a single list. So we'll get to that. It's relatively abstract, right. But in the end, we will make it more concrete by giving you a couple of examples on how this works. So we define a function in R by using the function keyword, and we return a box using the special control statement return. So return is another control statement. So we have if or so we have branching, we have looping, we have warnings, we have errors, and we have return values. So let's give you a small example of a function. So this is the most basic function that you can come up. Well, you can come up with much easier functions. But just as an example, right, so I'm creating here a function, which is called box factory. So this is a, this is a variable, right. So to this variable, I assign a function. And this function takes as an input three different things, three different boxes. Then the function starts here, right, and the function ends all the way here. So I'm defining a scope. So I'm defining inside of this inside of this function. And anything that they define, all of the new things that they define inside of this function is not visible to the outside. So here what I'm doing is I'm doing some computation. So I'm taking box one, subtracting box two, and then multiplying by box three. And then I'm storing this into a new variable called fbox. So this fbox is a box which only lives within the factory. So after, if I execute the function, fbox will not be defined in R. It will do the computation, put it in a box, do the rest of the function, and then it will just throw away the box because it totally forgets about it. And then I can return things. So I can say if the first box is smaller than the second box, return box one. Else if box number two is smaller than box number three, return on box number two. And if none of these two statements are true, right, so if box one is not smaller than box two, if box two is not smaller than box three, then return this temporary box, then return the value of this temporary box, right. So this is an example of a function, this function has three function parameters, box one, box two, and box three. And it has one local variable called fbox. And at each point, when I return, I can only return the value of one of the variables. Those are the rules in R that you just have to play by. Good. So a little bit more theory, because in the end, we have an exam and I don't want you guys to just write code on the exam. I also want you guys to know something about programming, right. So there is computer theory here. So computer theory means that there are different ways of how we handle these boxes, right. So if I have input parameters, so this only adheres to input parameters, box one, box two, and box three. So if I have these boxes, then I can give the variable to the function. And now what happens to this variable can be done in two ways. I can pass it by value. So that means that when I give a box to the factory, the factory doesn't take the box inside. But what it does, it just takes everything from the box and puts it in a new box. So the original box remains untouched. And it takes the new box into the factory, does all of the things with there. And then in the end, it throws out a new box. So this is called pass by value. So that means that the function parameters get copied into the function. And changing them inside of the function does not alter the state of the variable that was passed. So a lot of programming languages choose this approach, because it is a very sane approach, because by giving a variable to a function, nothing changes once the function ends, right. So internally, the function can do all kinds of things with this box. But the value that was originally in the box will still be in the box when the function ends. Other programming languages like C or D or C plus plus, they take another approach, they take the pass by reference approach, which means that function parameters are not are the real boxes that were given, right. So they are references to the variable passed into the function. And changing anything to this box alters the state of the variable that was passed. And this is called pass by reference. And this is very useful, but very dangerous, because here we're dealing with something which is kind of called pointers in like languages like C. And these can have very surprising effects, because that means that a function can be written. And this function can have side effects, meaning that by giving a certain variable to the function, after the function ends, the value that you gave to the function can have been updated. R does it slightly different. So R doesn't use pass by value. It doesn't use pass by reference. It uses a structure called pass by promise. So that means that function parameters are references to the variables passed into the function. And only when you change them is there a copy made. And this way, the state of the variable that was passed is never changed into the function. So that's the way that R does it. But I want you guys to know that there's a real difference between pass by value and pass by function, because it really depends. And this is not useful for R because R doesn't use any of these two, but it is useful for when you start learning other programming languages. Because at a certain point, there are things that R is just very bad at things like memory management or comparing strings or doing string manipulation. There are much better languages to do that. So if you switch to a language like Python, which is pass by value, then nothing really changes because it functions more or less the same as that R functions. It's just that it used more memory because you always copy the box. R only copies the box when you change it. But if you switch to a language like C or C++, then there is the option to pass things by reference. And then it becomes really interesting because then all of a sudden you give a variable to a function. And then once the function is done, your variable is completely different. It's completely changed. So there's a different value in it all of a sudden. And this can be very surprising to people that learn R first and then learn other programming languages. People that come from a C background or have C++ experience, they are used to pass by reference. So they are very aware of the fact that there are two different ways of dealing with function parameters. But people that generally program R, they are not aware. So once they start programming another language like C or C++ or D, they run into this big surprise that they expect nothing to change by calling a function. Just a computation takes place and then a value is returned. But in C and C++, pass by reference is more or less the default. And then you can be very surprised by an input parameter suddenly changing the global state of your program. So just some theory. So inside R, parameters get copied when changed inside a function. And this causes some serious issues because this is called pass by promise. But memory and memory usage is one of the big issues that R always struggles with. So updating a variable inside of a function, so updating. So here we have an example function. So it's an example function. So we assign to this variable a function. It takes one variable, P1. So when I do P1, and I multiply P1 by 8, and then I store it into this internal variable P2, nothing changes. P1 is still a reference to my var, which I input into the function. But as soon as I assign something to P1, a copy is made. And this is, this becomes interesting, especially when you're dealing with very big data sets. When you're dealing with very big data sets, you don't want to have them copied. If I have, for example, my data set loaded, then my data set is like four gigs in memory. Then I don't all of a sudden want to start copying four gigs of memory, just because I want to update a single value into a matrix, right? Because then all of a sudden the program that I wrote is using eight gigs of memory. And I might not have eight gigs in my computer. So it will just, at that point, just say error cannot allocate more memory than you have, right? So there is a real reason not to do this. So the easiest would be is to say, well, forget about it, never assign to a function parameter. But that's a little bit wasteful in a way, because sometimes there's no issue, right? If it's a single value, copying a single value is no big deal. But if you copy a big matrix, so if you give a, if you have a function which works on a massive matrix, then you don't, you have to be very careful by reassigning to that matrix. So here we have, for example, my var, right? So my var is five. We give this variable to my example function. So what happens is that P one is actually a reference to my var. So here we say my var times eight, store it in P two. And then we do P two plus five. And now we store it in P one. So at this point, R will make a copy. So it will now instead of using like eight bytes of memory, it will start using 16 bytes of memory. And then if we execute the function, then the result of the function is 45. But my var has not changed. My var still contains the value of five. And in C, this would not be the case. If you would write this function in C, and you would execute it, then my var in the end would be 45 in C. While in Python, it would be five. In R, it would be five as well. Good. So just a little bit of theory, because I think that it's good to know what kind of the idea behind it is, because the people who wrote R have certain ideas of how you need to protect against certain bugs, certain common mistakes that programmers make. So programmers really like to reuse variables a lot, even in C++ or even in Python. So people tend to call variables X and use X all over the place for iterating through things, or I or J. But of course, if you have, if you, if you expect a function to not change your variable, and all of a sudden it does, then of course, you can run into very subtle bugs where the answer in the end is completely wrong. So R protects you by this, because R uses the pass by promise system. Good. So a little bit more about function parameters. In R, we can have function parameters and we can set default parameters. So we can set reasonable defaults for some function parameters. For example, if we have an alpha level in biology, our standard alpha level is 0.05. If we have false discovery rate, then we can say, well, I always want to use a false discovery rate of 10, unless the user tells me that he wants to use a different, different level. If, for example, n perm, which stands for number of permutations, right, if I, if I want to randomize my data set, then I can say my number of permutations is 1000. So here I have my exp variable, which I assigned the value of five. So this is exp and this is my exponent, which is in the parent scope of the, of the function. So it's not inside of the function at this point. Then you define some function and I assign the function to it. And this function has an in parameter. So this is the first input parameter. Then we have the second input parameter, which is called exponent. And I assign the value of two to this by default. Then of course, I do the computation. So I take my first parameter, raise it to the power of exponent, store it in an internal box, and then I return the internal variable, right? So when I call some function five, it will return 25, it will just do five to the power of two, because exponent is standard two. I can also call some function five comma exp. So give it the value of five. And then of course, it will do five to the power of five, which is 3125. But exp is exponent, nothing changes, exp still remains the same, because I don't touch it. It's not copied. It's just a reference. So it's very efficient memory wise to do this. So why do you want to use default parameters? Well, it saves other people from having to fill in all of the parameters. Like the plot function in R has I think almost 73 parameters. And of course, when you make a plot, you don't want to fill in all 73 parameters. So 72 of the 73 parameters of the plot function have default values. So it has a default point size, it has a default font, a default color, a default, and all of these things you can overwrite if you want to. But if you don't want to, you can call the function plot one to 10, and it will make a plot of one to 10. So it allows you to make functions that you give to other people that are as small as possible. Furthermore, we have this dot dot dot parameter to functions. So this means that your function is now a variadic function. So a function can be variadic in R and a variadic function is a function which accepts a variable number of arguments. So if I have no idea how many numbers someone is going to give to my function, I need to use variadic arguments. So we have already seen this because when we looked at the sum function, no, I didn't think we looked at the sum function. So we can, we can use dot dot dot to make our own variadic functions in R. But there are a lot of functions in R which are variadic, which have things like some, right? So if I, if I would give you an example, or if we just look at the sum function, right, if I would look at the sum function, and it's, you can see that in R, this is defined as a function which takes a variable number of input parameters and has a named parameter called na.remove, which stands standardly defaults to false. So standard to sum function is not going to ignore na's. But this now allows me to do some one, three, four, five, six, seven, eight, nine, ten, right? So it will just sum everything up. And of course, I cannot define a very, I cannot define a function without defining how many input parameters there are. But using dot dot dot, I can say there are, I don't know how many input parameters. In this case, there are like eight or nine, but there could be 20, there could be 100, right? So this allows me to define these kinds of functions have when I don't know how many elements the user would like to add up or how many elements the user is going to provide to me. So a little example, if I would implement the sum function myself, calling it my sum, I say my sum, and I assign to this a function, which has a variable number of arguments. So how do I now access these, these variable number of arguments? Well, first off, I define my function starts here ends here. Since we're going to sum up things, initially, I need to have a variable which contains the sum so far, right? Because I have to keep track of how many elements I already added up and what the total is. So I have count, which is the number or the total count so far, right? So I can say for x in list, dot, dot, dot. So give me a list of all of the parameters that the user input it. And what am I going to do? Well, I'm just going to say, well, I take my count, I add up x, and then I store it in count. Right? So it's just summing up everything. And of course, this will be executed on an unknown amount of time. But I know how many because as soon as the user calls the function, I know how many parameters he gave. So if I do my sum, and I don't provide any parameters, then of course, the sum will be zero, because the list will be empty. But if I give it three numbers, then it will add these three numbers together. And I can give it five numbers, I can give it seven numbers, or I can give it 100. So variadic functions are doable in our you can write them yourself. And it's very useful for when you want to provide a function, but you have no idea how many elements your user will provide to you. So another example of variadic functions. Hey, this is the kind of minima minimal variadic function that you can make. So you can say variadic test is a variable which contains a function, which takes an unknown number of input parameters. And I'm just going to return a list of all of the input parameters. So when I now call this function and say param one is 15, then it will say, I give you back a list. The first element of the list has the name param one value 15. I can now call it with first parameter being 15, second parameter being called test. And this is a a a vector one to 10. And then it will return to you a list, which contains the first parameter and the test. So just to show you guys how the dot dot dot parameter work. So we talked a little bit about scope already, but to expand on scope is that stuff or variables which are defined inside of functions are not visible from the outside. So if I define again, a function called some function, and I assign to this function, which is a single input parameter, I then do the input parameter to the power of two, store it in intern, and then I return this internal box. Then when I execute the function with the value five, it will do five to the power of two. But now when I type intern, it says object intern is not found, because it is not visible from the outside, right. And this is very useful because it will clean up memory after itself. So you don't have to deal with like large data sets floating around. No, are when the function finishes, everything which was internally defined is kind of cleaned up by the by the R system. And so it's not visible from the outside. And this is called scope. We can, however, access variables in our parent scope. So in a function, you are allowed to access variables, which are not input parameters are not internally defined, but which are already defined in, for example, the R session. But this is considered very bad practice. And sometimes we really need to do this, especially when we are dealing with big data. Right. So there's three possible reasons to break scope and encapsulation. That's what it's called. So to break outside of your own scope. The first reason is that you're being lazy. And that's not a good reason. The other reason is to save RAM. So to save random access memory. So we're dealing with DNA or RNA sequencing data, which is very, very big data, or we're dealing with like hundreds of micro arrays. Then of course, we don't want to copy hundreds of micro arrays in data because we have already loaded our data in R. And now we're calling a function which, for example, normalizes the data, then we don't want to all of a sudden duplicate the memory usage. And because if we have 16 gigs loaded, then all of a sudden duplicating it will use 32 gigs. And then the computer will just run out of memory. And plot functions also use this system to read environmental settings, read, for example, what the current color of the background is, or read what the current selected plot symbol is. So also plotting functions have a reason to not read inside, to read outside of their own scope. So Misha said, I use GC after a loop with a lot of, I use ones like sensor data. Yeah, so GC just forces a GC collection. So if you have a lot of data, then you start running into the limits of R. And I know that it's a little bit far to talk about this in the second lecture that we're doing, but I do think that you have to be aware that programming is something that you do to analyze data. And most of the people that follow the course are people that are doing biology, right, where I'm from a biology department. So I deal with DNA sequencing data a lot. And if you're in biology, then dealing with DNA sequencing data is going to come up at some point in your career. And sequencing data is literally hard drives full of data. So dealing with these amounts of data requires you to think about what you are doing before you even start doing it. So before you start writing code, you have to think about, okay, so what am I going to do? So how much data do I have? How much memory does my computer have, right? If I have a hard drive with sequencing data, which is 200 terabytes, then like, I'm never ever going to be able to buy that much random access memory. So then either I can't use R, or I have to use R in such a way that I use it smart, and that I don't run into these massive issues. And it's not just when you're dealing with DNA or RNA sequencing data, right, when you have very small things, they can explode to very big things. So if I'm dealing with things like correlation matrices, right, and I have 10,000 things, and I want to calculate 10,000 versus 10,000 correlations, then of course, this is already going to be big data. Because it just explodes. The same thing holds for principle component analysis. Many of these analysis tools that we use on a daily basis, they are very limited by the amount of memory that you have in your computer. So this is an example of just being lazy, right? So here I define a variable called exponent, I assign the value of five to it, then I write a function, and this function takes a single input parameter. But here we see exponent, right? So it's again, just taking the input parameter, doing it to the power of the exponent, storing this in intern and returning intern. And this is something that you have to avoid at all cost. All of the variable names that you use inside of your function should be either input parameters, or they should be temporarily defined variables inside of the function. Because if you do it like this, then you can sneakily change, because if the expectation of other people is, is that you if you call a function with a certain value, it should always return the same thing. So if I call some function five, then this should always return 3125. But if I now change the value of exponent, then all of a sudden calling the same function with the same parameter will give me a different value back, right? If I define exponent to be two, then now calling some function with the value five will all of a sudden give me back 25, right? And this is very surprising to other programmers or other people using your code. And generally, this leads to massive bugs within the end can lead to hundreds of thousands of euros in damages. So never, ever do this. If you're writing a function, make sure that all the variables that you use inside of the function are defined as input parameters, or are defined as local variables. So local internal function variables. So how should the function have looked like? Well, we should have just defined exponent as an input parameter. And here we see the usefulness because I can have a variable called exponent on the outside, and they can have a variable of exponent in the inside. And now they don't conflict. They, they, they are two different variables according to R. One of them is a global variable defined in the global scope. And the other one is a variable which is defined as an input parameter. So inside of the function, it will always take the input parameter, even if it has the same name as the global variable. So this is how the function should have looked like. So hey, you should have not been lazy, you should have typed the extra input parameter into the function. Good. So we talked a lot about functions, you are guys are going to write a couple of functions, you guys are going to do some while loops, you guys are going to do some four loops during the assignment. And this is just to practice. So by now we've seen a lot of different brackets in R. So round brackets are used when you call a function, right? So when we use the please give another example of the exponent thing. Sure, sure. Well, what, what do you want to see? So let's just do some live coding. Right. So a lot of people say, why should you write functions? Right. So because if I have, for example, a function, right, so let's make a function that does at least something useful. Right. So let's call it a name of the function is going to be going to be. All right. So is a function, right? And this function takes as an input parameter a and a parameter B. Right. And it's going to do something. So it's going to multiply a with B. And let's make a good function. Right. So let's, let's make a good function that really does something. So generally you write functions after you've written some code. Right. So I don't want to give away one of the assignments because I want you guys to kind of struggle with it. So let's do something like a box. Right. So let's, let's print out a box. So let's call it do box. Right. So do box is a function. It takes a single parameter which is called and roll. That's a bad value because that's a n dot roll. Right. And standard the box contains five rows. Right. So I'm going to say for x in one, two n dot roll. Right. I'm going to do something. And what am I going to do? Well, I'm going to cut and what am I going to cut? So I'm going to cut axis, just a couple with a new line behind it. Right. So now I can say do box. Right. And default parameter will make sure that I now print five lines of axis. So it looks like kind of a square e thingy and R. Right. So if we go to R, then do box makes a little box. Right. And we can now do some new parameter. And this parameter can be like 50. Right. So now we do a big box or a big row of axes. Right. You have to use a little bit of imagination. So if we go back to notepad, right, then we can do a second parameter, which is called the length. Right. So, so let's call it B dot length. Right. B dot length standard is also five. And now here, instead of doing axes, and then I want to do a set number of B length. So I'm going to say, repeat the value X. And how often do I want to repeat X? I want to do B dot length. Right. And then this is my X row or something like that. Right. And now I'm going to cut. So I'm going to cut X row with a new line. Right. So now I have two parameters. The first one defines how many rows I'm going to do. And the second one defines how many axes there are on a row. So if I now do do box, then it will print a box, which is five by five. Right. And it adds spaces to it. So that's perfectly fine. That's just the cut function at work. Oh, you guys can see. Sorry. So if I do box, then, and now I can define. So I want to do a box. And I want to, for example, do a box, which is six by nine. Right. Then it looks like this. I can do a box, which is nine by six. And then it looks like this. Right. So now, of course, when we're dealing with these variables, I could just say, well, I'm going to define, I'm going to be lazy, and I'm going to define my b-length being five first. Right. And now when I call my function, then I will take the b-length from here. Right. So I can now say, for example, b-length is five, then execute the do box function with six. Right. And now I can redefine my b-length to be 15. And then I'm going to say do box six. Right. So any normal person would expect that when I call do box six and do box six, that it would return the exact same thing. But because of the fact that here I'm using b-length, which is not defined as an input parameter, now these two functions will do completely different things, which is of course very unexpected. Because the only way to know that this do box is actually dependent on b-length is to read the code of the function. But the idea of writing functions is to kind of hide the implementation for the user. Right. So by making do box depend on a variable which is defined, I actually run into issues. It even becomes worse when I clear R. Right. So when I say edit, wait, there's an option. So when I just remove all of the objects in R. Right. So now b-length is not defined. Right. It's not found. So now when I call my function do box, so I have to paste it in first. So when I create my function and I now call do box six, then actually it gives me an error which no one expects because like it only has one input parameter. Right. And I filled in the input parameter. So the function should just work, but it does not work. Why? Because the function is dependent on this b-length variable, which should be an input parameter, but it's not because it reads it from the parent scope. And there are some good reasons to do this. I've used like reading and writing from the parent environment sometimes when you're dealing with big data. It's not very common, but it is common enough that I want to stress this is if you write a function, all of the variables used inside of the function should either be input parameters, or they should be internal temporary variables like x-row. Right. It's fine to use x-row here because x-row is defined within the function. So it's something that like goes wrong a lot of the times. So that's why when I talk about functions, I do want to kind of hammer in that don't do this. Write proper code. Don't make your code depend on some magic variable that the user needs to define because it can be very tricky and calling the same function with the same parameter should always give you the same answer, unless it depends on random numbers, of course. But even then, we have ways of dealing with that. Good. So let's continue with the PowerPoint. Right. So brackets. We see in a lot of brackets, round brackets. You use them when you call functions and you use them when you do control structures. Right. So for example, when I do the if statement, so if round bracket, and then inside of the round bracket is the statement that I want to check. Right. So a round bracket is generally something that is used in functions when you call a function, when you define a function parameter list, or when you do statements in control structures. So the square brackets are when you index vectors, matrices or data frames, the double square brackets are when you index a list. And the curly brackets, they define blocks of codes. And this is used to surround expressions. So we know which expression belongs to the if statement, or which, which expressions belong inside of the function. Right. So it, it, it's six in the Pfizer beginning and an end of a block of code. So at the end of the, of the, so at the closing bracket here, R will know, okay, so now the if statement is done, or the while statement is done, or the function is done. So just as a handy overview for you guys, so that you can kind of look up which bracket should I use, because I know from the assignments, because the current assignments will be relatively hard. And that is, I think good, because you can only learn programming when you kind of run against problems that you either have to Google, or that you have to kind of figure out. But if you, if you just do assignments, right, and the assignments is just following or typing in code that some other person gives you, you never learn how to program. Learning how to program is kind of struggling with errors, figuring out what the errors means, fixing the errors. That's where kind of the whole air folks, I laid this from programming comes from. For if statement, do I need to use idents and curly brackets? Yes, clean code means that you want to make clear where something starts and where something ends, right? So in this do box function, which I'm going to fix, because it's now a bad function, because it depends on something else, right? So when I define the square bracket, or when I do use a curly bracket, I do spaces. And when when I then open up a new section of code, right, because this is the section of code that belongs to the function. And this is the section of code, which is inside of the for loop. I again do indentation. And the indentation is not necessary in our, but it is there to help you figure out what's going on. I could write this whole thing on a single line and our would not care. Because I could just say, well, I don't use the indentation, I'm just going to use like this, right? And I just do just comma to specify where the end is. I don't even need the last one, because I can just say, well, that ends there, right? But it's just clear like this. And then in theory, I think I could even put the whole thing on a single line, right? So then, but no one can understand what's going on here, right? And no one's going to zoom out this far to see your whole function definition. So work clean, right? Write small lines of code. Every line of code that you write should be it should be visible where it starts where it ends where it belongs to. Okay, Leonardo Proust, is it better to use tap or space? It does not matter from a programmer's perspective. But if you want to earn money, you use spaces. People who use spaces generally get paid 10 to 20% more money for the same job as people who use tabs. People who mix tabs and spaces get paid just as much as people who use tabs. So use spaces, spaces earns you more money. I don't know why there was this whole stack overflow, like big research thing that they did. And they just figured out that if you use spaces, you get more money. It's just that's that science that they just did the research, they just asked or they, they spidered a whole bunch of these open source repositories. And then they compared people's income to which one of the two they were used or which one of the three, right? Because you can mix them together as well, like a crazy person. But in the end, the conclusion was, if you use spaces, you get paid 10 to 20% more money for the exact same job. So use spaces, that's all I can kind of conclude for that. But is it better? I don't know. There's no real better in this case. But do a dentation, that's the important part. Make sure that when you look at code, it looks nice and clean. The thing which I like about spaces, and that's just a personal preference, is that they are not as annoying in Notepad++. Because I actually have it set up that every time that I press top, it actually will put two spaces for me. But like you can see in my editor, and I have to probably zoom in, I don't know if it's visible, but there are these little very invisible yellow dots here. And you can make them much brighter, but trust me, they're there. So if you look very closely, you can see that there's like four little dots. And in Notepad++, and to actually get a tab, I have to open up a data file, just because I don't, all my tabs automatically get, so tabs look like this in Notepad++. So if I would write the same function, and I would use tabs, then it would look like this, which is okay, right? Like the indentation is still clear, you can see that this is one level deep, this is two levels deep. But I don't know. For me, spaces are where it's at. But some people like tabs, and as long as you use it consistent, don't be a madman, don't mix spaces and tabs together, because then like it just gets everyone crazy. But in R, there's no, no rule on if you should use spaces, or if you use tabs. But it is one of these things, like, either one is fine, don't mix them. It's like drinking booze. Right? It's fine to drink wine, it's fine to drink beer, but don't be an asshole and start mixing them. That's just crazy. So don't be a crazy person. Good. All right. So we had almost done, right? Because we have functions, we have variables, and all of these things. And actually, after these two lectures, everything's complete, you can write a complete operating system by just being aware of tabs and not of tabs and spaces, but by being aware of the fact that you have variables, you have control structures, and you have functions. And there is nothing more. There is no more magic in programming. Programming, the whole computer just does nothing else than execute if statements, do while and for loops. And besides that, there is a little bit of talking to hardware, which is also using if statements and for loops. And generally, it's just executing a function. So after this lecture, you are fully fledged programmers and you could program anything that you want. But of course, there will be more lectures just to help you guys along. Good. So the last, no, this is not the last topic. The one to last topic that we have for today is escaping the inevitable. So about strings, right? They are enclosed by using double floaty things or single floaty things. We can combine strings using paste, right? We've already seen that in the first assignment when you paste measurements with a vector. You can print them to the screen in R using the function print. Like you saw in the examples, I generally use cut. So cut is exactly the same as print, but cut is nicer because it allows you to not only write to the screen, you can also directly write to a file. And generally, you want to write to files because having the output on the screen is nice for when it's five plus five. But when it's RNA seek data, you want to write it to the to the hard drive. And using cut gives you flexibility, right? Because you can say write it to the screen now. And then once you check that everything is okay for the first couple of lines, then you can say, okay, so now I'm going to write it to the file, and I'm going to do it for all 100,000 elements that are there. So paste is there to paste two things together. Print is there to print things to a screen. And cut means print them anywhere. Print them to a file, to a screen, to a printer, to the mouse, you can also cut to your mouse for some reason, it won't do anything. But the mouse in theory, you can, it's an output device in a way. That's more of an input device, but you can use it as an output device as well. So about strings, forgetting to close a string happens a lot in R. And it is one of these things that people run into during the assignments, and you can struggle with it for a very, very long time. So in R, no command after we'll produce output. Right? So if you are in R, right? So imagine that let me go to R, right? And in R, I'm, I'm typing. So I'm saying I have a variable and I want to put a string in and blah, blah, blah, blah, and something more and something more. And I press enter. And I forgot to close it. Right? And I think, okay, everything's fine. I made a string, right? So I, I now know that, but normally you don't know that you're missing a bracket. And now I'm typing five plus five, right? And I'm like, why the hell's nothing happening or five plus five, right? So why the hell's nothing happening three plus seven, why the hell's nothing happened, right? Because like, I expect there to be output. But I forgot to close my string. So the only way that you can kind of detect this little error, which comes up a lot, is to look in front. So in front, you see here that input lines in R start with a larger than symbol. And when I'm still inside of something, which I did not close, it uses pluses, the same thing holds for if state. So if this happens, right, and you type stuff, and nothing happens, while you expect stuff to be computed, press the little stop button here at the top. So press stop. And now you will see it will give you a larger than symbol again. The same thing happens for if statement. So if I say if five smaller than six, do something, right, so I open up a block, then the same thing holds here. So now you can say got, and I can do something, right? And then nothing will happen until I close the bracket, right? And now it will execute the command because five is smaller than six. But this plus symbol in front, and that is a clear sign, especially when you are defining strings and stuff, that something went wrong. And this is something that is very common for people who start out with programming to fall into. And sometimes this can take you 10, 15 minutes before you kind of figure out, oh, shit, I should have closed my string. And I didn't. And now I spent all of this time just trying to do stuff. And I didn't get any output. So when you don't get any output, while you expect output 70 to 80% of the time is because you forgot to close a string. The other 20% of the time, it is because you opened up an if statements or another a block of code, right, an if statement, a while statement or a function. And you haven't closed the final bracket yet. So if you notice in our these little plus symbols, press the stop button just to force our to stop what it's doing. So stop making a string or stop making something else. And then this will really help you. And I added this slide because first time that I gave the assignments, people got stuck on defining a little string. And this becomes even more troublesome with the escaping the inevitable part, because Helen Neama, this explains a lot, had a full line of plus while doing the assignment. Yeah, yeah, no, I know that it happens. So that's why I made a slide for it, because it's something that's so common that during the assignments, and when we do the assignments in person, I can just walk and say, Oh, wait, wait a second, like there's pluses, so press the stop button and try again. But that's also the reason why I tell people never to program in our program in notepad plus plus, then just copy paste it into our it the same thing can happen. But at least notepad can help you a little bit with figuring out if something is closed or not by having like the highlighting. So it does happen a lot. And especially in this part, where we are going to do string escaping, because escaping the inevitable means that when you have if you print two things together, then you print it to the screen, you can use cut to do the same thing. And then you can say file is out.txt. And it will directly write it to this file. But the problem comes in when we want to print this character itself to the file. It sometimes happens that you want to print the like double air quote thing, especially when you're dealing with text or other things, right? It's just a standard text character, which might occur in a text. So if you want to print this thing, then we need to escape it. You need to escape the character. So there are a few characters in R that need escaping. So the ones that need escaping are quotes. So the double air quote, the single air quote, and you just escape it by putting a slash in front of it. A new line. So the enter on the keyboard is also a text character, right? Because text contains enters. So you need to be able to put enters in text that you make. So the new line is also a escaped character. It is defined by slash n. The top character, so pressing top on the keyboard is also a normal character, which occurs a lot in text files and data files. In tab separated files, there's a lot of tab characters. So to have our output a tab separated file, you need to print or you need to cut tabs to a file. So you can do that by slash t. Since the backslash character is the escape character, to print a backslash character, you have to escape it. So printing a backslash to a file or to R to the window, you need to say slash slash. And then there's a special one, which is the backspace, which is the backspace key on your keyboard. Also that is a typable character in R. So you can do slash B to have a backspace character happening in R. So let me guys show you a couple of examples. So here you see that when I do cut without a new line, then the new input, so the input will continue on the line where I did the printing. This of course doesn't look very nice. So generally you say cut and you say, for example, hello world. And then you say slash n. So now it will print or will print the text to the screen and it will do an enter and then you continue on the next line instead of continue on the same line. If we want to print hello world and then add like this floaty air thingy or floaty air thingy, it's actually called a quote. So a double quote, right? So if we want to print a double quote behind hello world, we can do it like this, right? So now it says hello world double point space quote. If we want to do a slash, it's the same thing. So we do slash slash, right? So now we can print a slash character. We can print a top character as well. So we can do slash T. And of course here, we see that it just goes further apart. And we can also use the backspace character. So if we say, well, slash B, because I want to remove the D from world, then this also works, right? So now we have hello, a tab character, then world, and then the D was printed, but then back spaced out of the string. And this is the way that you can more or less build up more complex strings. So this is always very difficult. And here, especially here with the backspace backspace quote, the backspace double quote, this often runs into problems of having not closed strings properly. Because the string highlighting in notepad plus plus can be relatively difficult, although notepad plus plus understands it, it sometimes goes wrong. So if you do something like this in notepad plus plus, then now it cannot figure out where the string starts or where the string ends, right? Because it doesn't know that this is an escaped character. But you can only remove one letter. No, you can remove as many letters as you want using slash B, right? They could remove another one. It's just, it's just like a string is like typing on a keyboard, right? So here, the string here says, press the ha on the keyboard, press the E on the keyboard, blah, blah, blah, blah. And then here at the world says, press the D, then press the backspace character. And now we can say, press it again, right? So now this would remove the L as well. We can remove the whole string by just adding more slash Bs. Just eat it up. So backspace is like a string is just interpreted from left to right. So R just looks at the first letter, prints it to the screen, second letter and so on and so on. Good. All right. So these ones need to be escaped. There are more. There's a slash zero, which is like the end of a string for C and C++. But these are the ones that you have to be aware of and are very commonly used in normal text. So that you have to, what is the point of slash B? Well, if you send it to a printer, then slash B is a reasonable character to sometimes print, right? Because you want to remove, like it's really useful in when you do a string of things. And it's relatively useful if you want to do kind of fancy stuff. I mean, you could just leave out the character you want to have deleted. Yeah, if I define a string like this, yes. But if I do it in a for loop, or if I do it in a function, which I call, then I might not want to do that. And it's interesting in a way, because it also allows you to do things like progress bars, right? I might want to have a function which shows me a progress bar. And if an error occurs, then I want to have the progress bar run backwards in R. And you can do that using the slash B character. So it's not completely useless. There are some very, very specific needs for having a backspace character. You have a backspace on your keyboard as well, right? Like you could have just typed in the text like proper, but you made a mistake. So you have to press the backspace key. So that there's a reason why it's on your keyboard. And if there's a reason why it's on your keyboard, there's also a reason that you might want to trigger it using a program. Is that a good answer? Or, but because like, I always wonder, I have some code that I use slash B just for the fun of it. But it's not something that you use daily or something. Like it's not that bad. But I use the backspace character on my keyboard a lot. So during programming, you might want to use it as well. And you can do really nifty funny typewriter thingies with it, right? If you would just have like a text from, from a file that is a very big file that you want to print with like delays, like we can, yeah, it's after five. So let's just do a funny, funny thing then, right? So display text like a five year old, right? It takes as the input my text. And then you can do something like four, let's four. So the first thing is we need to string split str split, right? Because we need to get the letters of the my text. And then these are the letters, right? And now we can say for the letters in for L in the letters. And then we can do something like cut L. And then we can say sys dot sleep, run if one. And then if, yeah, let's just run if one, if run if one, smaller than 0.0, 0.02. What do we want to do? Then we want to say cut a slash backslash character and to sleep a little bit again, right? So we want to sleep a random amount of time and then we want to cut L again, something like this, right? So just as a little nonsensical example for the thing. So now we would have like a function which displays text like a kind of five year old typing on a typewriter. So and then we need to get a long text. So let's get a long text from somewhere. Let's just take the moodle standard text. Split is missing. Yes, she's like this. And just say this is the text that I wanted to do and then display text like a five year old. And then, right, so just as a little example, we need to have some text. Let's add in some random enters as well. So we can see kind of what's happening. So we can do the cut and slash new line. Just a little bit more of the time. That's not a lot of why does it not do this? Cut L slash B. Let's do this just more often and do this more often as well. So it's like 30% of the time. Sister sleep, I think it's cold like that. Just don't remember the sleep thingy. I think that's the, so it's with a capital. So now it should be a little bit more fun, fun. I don't know, why does it not, so it sleeps, then it does a run if, then it does another run if. Why does it not add new lines too? Anyway, so oh, you can't see it running. Sorry, I'm not showing you the screen. Yeah, sorry, I'm, I was just coding and not looking at the, but it doesn't put in the new lines actually. So let me actually do, so this is here, so then it should be here to R. I have no idea why it's not, why it's not cutting the new lines, because I'm just wondering why the new lines don't get, and there's a lot of spaces in there as well for some weird reason. So then we do it like this, a little bit of debugging, because this would just print a whole bunch of new lines, but just print one. Ah, I'm sorry, unlist. I am sorry, the unlist function is very, very powerful, so you have to use it sometimes, but now it should be more or less like what I wanted to do, and I would just print a whole bunch of them, that's good, so we can just delete the hashtags for. So now we have a little function that does a little bit more. So it's just like, but still does it on one line. So let's just remove this one. So it's, and then this one should be 2% of the time, and this one should be like 3% of, no, 5% of the time, right? So, so now it seems like just someone typing in a typewriter, and sometimes you have to go back, right? Like it's not something that you use a lot, and I'm just coming out of, so like, it's just, but you can do these kinds of things, right? Like it's creative, in a way. This is of course not the best way of doing it, but it does seem now like someone's typing in a typewriter. It's not that useful. But for progress bar, sometimes you want a progress bar to go back, and then you use slash B. So that's the only reason why I think it's, it's probably in there because of the fact that you can write to printers and other machines as well, right? You can think about sensors or an external or an external device, which responds to the backspace. So that's probably one of the reasons why it's there. All right, let me see. How many slides do we have left? We still have 10 slides left. So we do need to do another break, I think, very quick break. And then I do want to get rid or I do want to get through the last ones. So let's have a vote in chat again. Do we just want to plow through for another like 10 minutes? And I'm just gonna not do anything interesting or not do like these kinds of things. But I'm just going to go through the slides, so that we just, I can say what I want to say about the slides. Or do we want to do a break? So throw in chat, plow through or break. And we can watch this thing like type it. And the nice thing is, is if you display it again, right, it, because of the random numbers, it will always generate a new way. PT plow through. Okay, okay, I'm up for that. I'm up for that. Like it's, it's pizza day. So, all right, Shia Valdez slides, which is also good. Sorry, I'm just butchered your name. Shia Vash Felizade. Is that okay? I'm wondering. So you create a sort of chatbot. PT, okay, plow through, plow through. That's good. Okay, then we're gonna plow through. Yeah, you can, that's the nice thing, right? Like, I don't know why that guy came up with something like this. But you can, you can do things with programming. So, and these things don't have to be useful all of the time. They can be just for fun. And for fun, you need a backspace, because backspaces are cool. Otherwise, they wouldn't be on your keyboard. Good. All right, so let's plow through. So we talked about string escaping. The backslash is one of them. There's probably more, right? There's probably a delete key as well. I didn't put it on here, but there's probably a delete key as well. That's probably slash D or something. But I never used that one. I did use the backslash. So just so that we know. Good. Okay, so when using cut, we print verbatim, right? Meaning we need to make sure to add at the end of the line. Otherwise, R continues on the same line. Not only that, but when we use cut, we can use this separator parameter to specify how we want to separate elements. Right? So we can say cut hello world. Don't forget the new line, separators comma. So we say hello comma world. We can have cut hello world with separator being space, and we can put a separator to being minus. So there's different separators that you can set, and it will automatically use those. Good. So the last thing that I wanted to discuss for today is random numbers and random distributions, right? Because in the end, we're working in science. We want to have reproducible research. And that also is means that when working with random numbers, we also want to have reproducible random numbers. And this is one of these XKCD comics where it says int get random number return four, right? And of course, this is a perfectly valid random number generator, even though it always generate or always returns four, which we now know. But if we would not know the output of the function, right, and we would just say get random number, it would return four, we would say get random number, and it would return four. Again, we have no way of proving that this, this function is not a random number generator, because we don't know how it generates the numbers. But it's an interesting kind of thought experiment, because no one, you can prove that something is really random, right? You can only say that, well, there seems to be a structure with a certain likelihood. But in the end, it could still be random. So uniform distributions, I think everyone knows about uniform distribution. So every value has the same chance of being drawn in R. It's the run if function. And if you generate 10,000 to 100,000 uniform numbers, then it looks kind of like this when you make a histogram. So uniform numbers are drawn, the minimum number is zero, the maximum number is one, inclusively. But if you look at the histogram, it means that every section of the of the plot has the same probability of being drawn. So that's a uniform distribution, very much used when you have a group of students, and you select one of the students to answer your question, right? Because every student has the same chance of being like you answer it, right? That's how I imagine a uniform distribution. There's the Gaussian distribution, also called the normal distribution. Values near the mean have a higher chance of being drawn. In R, you can do this using the R norm function. So the R norm function draws an X amount of numbers from the normal distribution. So the standard normal distribution has a mean of zero and a standard deviation of one. You can specify in the function parameters, the mean and the standard deviation, in the run if function, so the R uniform function, you can also specify the minimum and the maximum value. So it has an N, a min and a max. And here it has an N, a mean and a standard deviation. Then we have the Poisson distribution. So Poisson distributions are special distributions because they are distributions are made up of integers, so whole numbers. And numbers at the lower end of the distribution have a higher chance of occurring. And the way that I always imagine a Poisson distribution is, if you would look at a, at a bush of flowers, right? And the Poisson distribution is the number of bees on a single flower. Some flowers will have zero or a lot of flowers will have no bees on there. Some flowers will have one bee on there. Some flowers will have two, but that's already much, much less. Right? So this is just the number of bees that you can observe on a flower when you look at a random flower, right? And Poisson distributions are always whole numbers, because you cannot observe half a bee. Well, you could, but generally we don't consider half bees to be interesting. Okay, good. So when we talk about randomness, we want to have repeatable randomness, right? So if you, if we need repeatable randomness, so if we have a function, then we want randomness to repeat. So to do this, we can use this set seed function. So the set seed function is like resetting your graphical calculator. When I was doing mathematics in high school, then when we had an exam, everyone was forced to reset their graphical calculator. And the reason for this was is one, it deletes all of the cheating notes that are in your calculator. But it also allows the teacher to have you generate a random number and then do something with it. And then everyone in the class gets the same, right? Because everyone in the class after resetting your graphical calculator, the first random number that you draw will always be seven. The second one will always be nine. The next one will always be 12, right? So it starts again from a fixed point. So every time that you, at least that was in my high school. So in my high school, the first thing that it said on the on the on the exam sheet would be, take your graphical calculator, reset it, draw five random numbers and write them on the piece of paper. And if you had the wrong five numbers, then the whole exam was invalid because you didn't reset your calculator. So they assumed that you were cheating and you had cheat notes in your calculator. But that anyone who knows a little bit about computers figured out that you didn't have to reset your graphical calculator, you could just keep your cheat notes. You just had to make sure that you set the seed of your graphical calculator back to the factory default seed, which was just known because it was written down in the like manual of your graphical calculator. So once people figured it out, everyone started cheating because everyone could use cheat notes during math. So then they quit the whole thing with doing it. Anyway, to have repeatable randomness, you have the set seed function. So when I set my C to one in R, and I do five random numbers between zero and two, and I round them down, then I get 1120. And you can check this in R, right? So you can go to your R when you can do set seed, you can say set seed one, let me show you guys are right. And then I do round. And I do run if, and I do five numbers between, what was it, zero and two. Right, so I get exactly the same, everyone in the world, everyone who uses R, if you set your C to one, you draw five random uniform numbers between zero and two, round them down, it will always be 11120. Right, so that's, if that's how it works, right, because otherwise it wouldn't be repeatable. If I set my set to C to two, and I do the same thing, then I will get 01102. So if I want to have repeatable randomness, the only thing that I have to remember is to set my seed back to a certain fixed number. And you can choose any number as a seed. And then from then on, all of the random numbers that are that you draw will always be the same until infinity. So you can draw an infinite amount of random numbers. If you set your seed from that point on, all of the numbers will be similar. Good. So just as a tip for you guys, if you ever write algorithms, or if you ever need to do something with like simulation, right, where you want to have your simulation come up with the exact same answer as the previous simulation, you can use the set seed function and random numbers to kind of force that. So again, programming is like working in a lab. Make sure you work clean, right? So that means use proper indentation. Don't be a madman. Don't start mixing spaces and tabs. Keep variable names logical. And then make sure that you code in a single directory. So I always have three folders on any project that I do here as a professional programmer. So I have one folder where I put all of the input data. And this folder, when I'm on Linux, I make this folder read only to make sure that the valuable input data that we spend tens of thousands of euros on collecting or doing samples and then sending it for sequencing. When I get that data, I put it somewhere and I make it read only just to make sure that I don't accidentally overwrite my input data. All of the code that I have goes into another directory. And this whole directory is put into version control. Any output data that my code generates is put into another directory, not into the input data folder, just to prevent overriding input files and losing hundreds of thousands of euros of data. And that can happen at the blink of an eye with a computer, especially when you're not paying attention or when you're programming on a Sunday. So don't be that guy that deletes five years of work or 10 years of work of a lab just by not making sure that the input data is set to read only and that it's in its own folder in its own directory on the hard drive, right? It happens. It happens to the best of us. Like NASA lost moonlanders because of dumb programming errors. So make sure that you're not that guy that screws up the whole lab. Good. Clean code also means that you have to think of speaking names for variables and functions, things like my some account or total or those are names which you can do something with. Don't be that guy that leaves behind code which is called variables called AA or ABA or X, X, A or these kinds of things. It does not save you time in the end. It just makes it harder for yourself. When you code more than a couple of years, going back to your old code, being able to read it and understand it without having to figure out what X, I, I, I, J and J, I mean is like a time saver. You will think yourself not now, but in the future. Now it will feel like an extra chore. But in 12 years, when you look back on your code, you will be happy that you came up with good names. Use indentation to denote blocks, use spaces because you earn more money. Align your comments. Comments should align at the end. You are not a madman. We are dealing like with coding as being in a lab. We are professionals. So use indentation, right? So when I open up a block, everything goes two spaces forward. When I open up another block, everything goes another two spaces forward. Make code readable for others. Don't be that guy that writes the next Facebook and then loses everything because no one else can understand your code and no one else can maintain it. Good. So in science, we need reproducible results. So make sure you can run a script without errors. So write your answers in Notepad++ or write it in another editor. At the end of the day, do Ctrl A, Ctrl C, go to R, do Ctrl V. It should run from beginning to end without any errors or warnings. And that's the way that I try to do my coding. So in general, and I'm now going to do something very stupid because you should never do a live demo. So when I go to Notepad and I go and I say View and I want to have my folder as a workshop. And I go, for example, to a random project that I have on my hard drive, like data analysis for Texas. Or I can go to, for example, whatever we want, whatever we want. So we go to call a code that I wrote for an equal, which is very, very old code. Yeah, so this won't work because it's still on my D drive. I don't have a D drive anymore. So but in theory, that should work. But let's take some more, like, random code, right? So let's go to Heymo Lisa's expression, right? This is my code. And my code, I should just be able to do this, right? So Control A, Control C, go to R fingers crossed, right, that I don't like mess up myself. So copy paste, right, should run from beginning to end without having any errors. Right? That's the way that you should write code. Also, you speaking names, mean 11, mean 17, right? Data kidney parental kidney, right, it makes sense, you can read, you can understand what's happening. Right, this is biomarked, so we're doing stuff. So I just re ran part of my analysis, and I'm 100% confident that it worked because it just run from beginning to end. So write code like that, that is really important, because if you are working as a programmer or bioinformatician in a group, your professor will come to you in two and a half years, and will ask you, could you rerun that analysis that you wrote two and a half years ago? And you look like a jackass if you say, no, I can't. Well, not just that, but you might actually lose your job, because in the end, you are there to make sure that everything works. So be a professional. Spend the time now, don't spend the time later. That's it. Any questions, any remarks, any comments, any likes, any dislikes. By the way, you guys can like the video, it really helps for discoverability on YouTube. And I want to thank you guys for being here. It's been a long one, but I think we managed to get through. Thank you guys for the questions. I really like having questions because it just makes it more interactive. And it allows me to write a little function that lets me output text like a five year old who does use the backspace key and doesn't type it in perfectly. And I am going to do another try. It's called si ja fash. That's okay, right? Si ja fash. Yeah, I try, I try. No worries. So thank you guys for being here. If there's no questions, I will just hang around a little bit. So just you guys might come up with questions. It's a Persian name. Yeah, it's a nice name. Okay. Although I butchered it the first time that I tried to say it. I actually wonder why you call yourself CI in emails. That it's like, if you have a good name, strong name, use the whole thing. All right, David, see you around. And do the assignments, of course. The assignments are not on Moodle yet, I will directly upload them so that you guys can start. Let me actually do that live because I did change some little things in the assignments again this year, just to make sure that no one can go away with well, to avoid the butchering part. That's a good reason. Let me go to our course. And then we go to assignments two. And I will then save it as a so the assignments are relatively difficult. So don't be scared of the assignments, I lecture three are I think easier than lecture two. And it is hard, but it is just to force you guys to be professional, ask questions and kind of really learn what it is to be a programmer, right? Because generally, you have to Google a lot, you have to do it all yourself. And the assignments are online now, at least on Moodle. And I will upload them tonight on my homepage. Because that takes a little bit more effort because I have to log in and these kinds of things. So thank you, thank you, thank you guys. Thank you for being there, like without your questions, it's not as fun. And it's just the same thing as last year. And I'm here for you guys. I was a little bit disappointed that no one showed up this morning. I offered you guys that you could come and have me help you with the assignments. No one showed up. That's a little bit like sad. But in the end, like I assume that because no one was there that everyone did the assignments and everyone was able to finish them perfectly. And of course, if you want to have some feedback on your assignments, I'm also more than willing to look at your assignments and give you some feedback, of course. It's not that bad because I say the assignments, you don't have to turn them in. But if you have a solution and you think, is this correct? Or am I doing anything wrong? Then I'm more than willing to help. Question from chat, what are the COVID regulations over there? Well, at least in our building, you still have to do this the whole time. So you have to wear a mask inside of the building. When you can't keep your distance, you have to wear a mask. But in Germany itself, I think the only requirement on masks is in doctor's office, hospitals, public transport. These things still require FFP2s for the rest we gave up. So we just roll over and said, COVID's gone, which I don't agree with, which I definitely don't agree with. So the rules are very minimal. I don't even think that you still need to wear a mask in supermarkets and stuff, which is dumb. It's a disease that spreads via the air. It's a virus. We have no idea about the long-term effects. The first time that people discovered that HIV was a thing, it was also like, yeah, well, it's just a virus. It does give you sniffles and shit, like it's not that bad. And then like five to 10 years ago, or five to 10 years later, people started developing AIDS and then all of a sudden was, oh my God, this is really bad. Fortunately, it's only in the gay community and not really in the whole community. COVID says, see you next fall. I think that, yeah, I'm not going to do any like long-term predictions or stuff, but read the papers that have been published in Nature at the end of last year, especially by the guys from the Wuhan Institute of Vorelogy and look at how China is reacting to the current COVID outbreak and then start realizing that we might actually be wrong and the Chinese might actually be right and then realize what it would mean if it really turns out to be HIV-like and taking over your immune system long-term. I'm wearing my mask zealously from the beginning because my Chinese friends all tell me, wear your mask. And my Chinese friends are good friends, so they want the best for me. So if my friends tell me wear a mask and don't get the virus, then I will wear a mask and don't get the virus because I think that that's the smart thing to do. But that's just my opinion, like other people can think differently from it. But I don't know. I'm undecided about it. But I do go outside sometimes without my mask to smoke a cigarette. And I'm not as careful as I should be probably. We'll see. Good. So I see the freeware number dropping. So thank you guys for being there. Yeah, better safe than sorry. That's always the best approach, I think. So viewer numbers are dropping. So thank you guys for being here again. And end of the lecture. Do the assignments. And if you need any help, contact me by email. Don't wait until next week in the hopes that we can do it in person. Start working on ready on it this weekend. It will take you somewhere between half an hour to two and a half hours to do the assignments. Alright, see you guys next time. And see you next week, at least here. And I hope in person. But if not, then next week here. Alright, so catch you on the flip side.