 one also on YouTube. Let's just continue. Still 70 slides left, so use a good text editor. That is one of the tips that I can give you guys, because a good text editor will save your life. And on Windows, I always advise people to use Notepad++. When you are on Mac OS X, use Text Wrangler or BB Edit, I think it's called nowadays. If you are a Linux user, use whatever you want. If you're smart enough to install Linux, you're smart enough to pick your own editor. So most important for an editor is that it should support code highlighting and should support bracket testing. So let me show you guys Notepad++, because I'm on Windows. So one of the nice things is that if I'm start typing R, right, and I say for X in 1 to 10, do something, then you can see that every time that I'm putting the cursor behind a bracket, it highlights the opening bracket, right? So this is really, really useful. So if I want to know if this one is closed, and this can help you because if you forget it, then of course when I put my cursor behind it, then it doesn't highlight. So I know that I forgot to close something. One of the other things is when I save this file as test.r, then now you see that it starts coloring in this stuff, right? So now the for, which is a keyword, is highlighted in blue. In is also a keyword in R, so it's also highlighted in blue, and numbers just get this fancy, fancier color coding, which is really useful. And using a text editor means that you're writing your code and not directly typing it into R. And especially when you start writing like bigger analysis codes, this of course is very useful. And it will also show you things like comments, right? So here, actually the bracket is the wrong way, so I put my mouse behind it, and now I see, oh, this one isn't closed, and this one isn't closed either, right? So bracket highlighting is, or bracket testing is one of these things, which can help you a lot if you open up a bracket and you forget to close it later on. All right, so use a good text editor, Notepad++ for Windows and for OS X, use Text Wrangler or BBEdit. Those are the two that I like a lot. Of course, clean code is smart code, right? Here you see the same code, but on the left side, it's nice and clean, and that is because it is structured, right? I can see that there's an unordered list with several list items. The third list items contains a diff, while here, it's the same code, but it's just difficult to figure out the structure. And structure is important because when you structure your code, you can see if every bracket that you open is also closed. So it is important to code clean, right? It's like working in a lab. If you're working in a lab and you're putting your tubes everywhere, you have no idea what each tube does and where it is located, so be very diligent. And this holds for every aspect of your life, but especially for programming. If you're not diligent, then if you're creating confusing messes, you can be a very, very good programmer, but in the end, other people will not start using your code just because they don't understand what's going on. All right, so the next topic, control structures. Like I told you guys that in my mind, a variable is like a box, right? You put stuff in. And now we want to do stuff with these boxes. So control structures allow us to do decisions. And decisions, for example, are called branching, right? So I can say if this, then do that. Else do this. We can also have looping, right? So looping is things like for each element or for each row of the matrix or for each column of the matrix do something. So this is called looping. It's going iteratively through a matrix or through a data frame. So control structures in my mind are conveyor belts and those guide the boxes to their correct destination. And this happens based on a fixed algorithm, right? So you are writing the algorithm. So you decide which box goes to the left, which box goes through the right. So imagine that we have a box, right? So this box has, for example, a logical value in there, but it could also be that the box has a numerical value in there or a character value. And since we don't know, right, because, well, we just have a box and we want to see what the box contains. For example, if I'm going through each column of a data frame, when I select column one, I don't know what will be in column one of the data frame. It can be characters. It could be logicals. It could be factors. I have no idea. So I need to check this, right? So in for checking stuff, I can use the if statement. So if I say if the class of the box, right, so I'm asking now what is in the box, if this is equal to character, then call the function right on the box. Else call left, right? So for example, you can think about a matrix. First column of the matrix has numbers in there. So if the class of the first column is numeric, then calculate the mean, right? But for columns in the matrix, which are character, you cannot calculate the mean. So you have to do something else with them, right? So the else branch can also be empty. So you could just skip them. So but this is the power of if statements. So an if statement allows you to check if something is true or if it is not true. So you can use else if. So if you want to extend this a little bit, right? If we have a matrix and this matrix can contain like three or no, not the matrix, but the data frame can contain like different columns. If every column have, for example, if I have a column, which is character, then I want to do something. If it is a logical, I want to do something else. And if it's none of these two, then I want to do a third option. So then I can use the else if statement. So I'm just saying if the class of the box is a character, call the function right else if the class of the box is a logical, send it to the middle and else send the box to the left, right? So numeric values go to the left, factors go to the left, only characters go to the right and logical values go to the middle. So of course, you can also do comparison, right? So if x is smaller than five, and so here I'm assuming that x is a numerical value that it has a single numerical value, then I can say if x is smaller than five, print something to the screen. And the thing that I want to have printed to the screen is x is smaller than five. You can also do comparisons between two variables saying that if x is smaller than y, you can also check all of the numbers in a vector, right? So if I have a vector which has 100 numbers in there, I can just say x smaller than five. And now I can ask the question, are all of them smaller than five? And I can also ask the question if any of them are smaller than five. So hey, you don't have to limit yourself to a vector with a single element, you can also check the whole vector in one go. We'll get back to why this is useful, but remember that in R, generally you work with vectors. So saying x smaller than five is very uncommon. Generally, you want to know if all of your values are below 1000, or if any of the values are below 1000. Alright, so that's the first introduction into if statements and if else branching, right? So here we are doing looping, right? So looping allows us to do something repeated. So have for example, I have a variable which is called box and I put 1000 into the box. And now I can take stuff out of the box, right? So I'm going to say for x in one to 10. So the first time that I'm going through this loop, right? So that's why I also have these two spaces in front of it. So the first time that it's executed, what will happen? Well, it will take the box. So it will say 1000 minus an x in the first iteration is one. And then I store 999 in the box. The second time that it goes through, x will have the value of two. And the box now will have 999 in there, right? So 999 minus two is 997. The third time that I go through it, x will have the value of three. So here I define a new variable called x. I can give it any name that I want. And then I can just say, well, have x go through these numbers one by one. And every time that you increment x, or in this case increment x, you can subtract it from, for example, the box, right? I can do the same thing using a while loop. So a for loop you can use when you know how often you want to do something, right? For x in one to the number of columns of my matrix, right? So first time you will do it for column one, second time you will do it for column two. But the while is useful when you don't know when you have to stop, right? So if you have to loop an unknown number of times, loop until something is true, or loop until something becomes true, then you can use the while. So the while is a little bit more flexible. So the for loop is iterating through a known list of elements while the while list is iterating through a list of elements where you don't know when you have to stop, right? So this is the exact same code as before. You can see that the code is much longer because I now have to take care of the stopping condition. So in this case, I say boxes 1000 take out is my variable which here is called x, right? So it's the thing that I'm going to take out of the box. Initially this is one. And now I'm saying while take out is smaller than or equal to 10, say box is box minus take out. So that's the exact same, same statement as we had before. And now I have to remember that I need to increase the number that I'm going to take out of the box, right? So I have the second statement which increments the takeout variable by one. If I would not do this, this function would loop forever and ever and ever because the takeout will always be one. If I don't increase it, which means that this will always be not true. So it will just continue on looping. This will always be true. So it will just continue on looping repeatedly. So make sure that when you generally always use for loops, the while loops are relatively advanced in a way, because generally you know how many columns your matrix has or how many rows your matrix has, right? So if you're making box plots, then you can say, well, for every row in my matrix, make a box plot. So for one for row in one to the number of columns or number of rows in my matrix, make a box plot. So a little bit of an example. So first, I define a variable which is called even. So it has the numbers from two to 100 stepping by two, right? So I just create a vector, which is containing the numbers two, four, six, eight, all the way up to 100. And I want to add all of those numbers together, right? So the question here is is if I add up all of the even numbers below and or below 100, what would be the sum? Right? So what I'm going to say now is for number in even. So for every number in even, what do I want to do? Well, I want to take the total that I had, add the number to it and then store it back into total. So the first thing. So what happens the first time that I go through it, then my number selected. So the number will be two. So I get the sum total is zero plus two, meaning that two gets stored back into total. The second time I get to write because the total was two plus four, which is the next number in my list. And now I get the number six, which is stored into total. The third time that I go through it now, the total and the total that I had was already six. And now I add six to it. And you'll end up using 12. Right? So if I want to use our to add up all of the even numbers from two to 100, I can just use a for loop, go through each of the numbers in even, and then add them to the total. And don't forget to store the total back, right? Because you have to override the total. So the total here is kind of the accumulator, which accumulates the sum of all of the numbers. All right, so this is if you have an if statement, or if you have a for loop or a while loop, then the thing which you are testing for is called a statement and everything else is called an expression. So everything within the brackets, right, is called the expression. And the statement is called the thing that you check, right? So a statement always evaluates to true or to false. So this holds for for a while loop, and this also holds for an if statement. So if statement and then everything between the brackets is called an expression. And of course, statements can and will become very complex, right? You can have not a so if not the box equals one, then do something. If the box is larger than one and the variable in the box contains a value, which is less than 100. Or you can also use or so you can say if the box is smaller than or equal to zero, or the box is larger than 100, do something, right? So there's there's many different options to specify the statements to kind of get what you want. And they will become very, very complex. Because generally, if you want to do a box plot, then you want to, for example, take all of the individuals which are less than a certain number, or which were born before a certain date, and you want to visualize only those, or you want to take out all of the animals which have a certain treatment. And you can use if statements for that, right? If animal column five is treated, then add it to the sum and then calculate the mean in the end. So for this, we need comparison operators. So these are the comparison operators supported by R. So is is means equal to exclamation mark is is not equal to smaller than larger than smaller or equal, larger or equal. So those are the things that you can use to test. So a little bit about the and about the ampersand and the double ampersand. So again, because in our everything is generally a vector, you can use the single ampersand, which is a vectorized operator, right? So if I have V1, which is a list of numbers, I can ask V1 smaller than four and V1 larger than two. So what it will do, it will compare every element in V1 to see if it's smaller than four. And then this will be and together with the vector, which is created from this. Let me guys show you a quick example in R, because I think that this is generally difficult. So I just create something which has some numbers, right? So one, four, six, seven, four, six, eight, right? So now if I ask V1 smaller than four, it will say true because the first number is smaller than four. The second one is not the third one is not fourth one is so none of the other ones are. So actually add a couple more that are smaller than four, right? So now if I would ask the same thing again, it would now say that, yeah, the last two are also smaller than four. So now I can say, well, what if I have multiple of these things, right? So I could say V1 smaller than four and V1 is larger than one, right? So what it will now do, it will now check every number in the vector to see if it's smaller than four, but also it has to be larger than one. So it's just for each vector. I hope that's clear ish. The double ampersand is not vectorized. So it takes the first element from a vector. So you have to make sure that you use it only for single values, right? So it's generally something that you use for true and then false. So if you have an if statement that you want to check if two things are true at the same time, then you generally use the double one. The nice thing is R will warn you. So if you do it wrong, if you use the single ampersand while you should have used the double ampersand, then R will give you this warning message. So if V1, V1, the condition has length larger than one and only the first element will be used. So and this is because R generally supports vectors and singular numbers are not that common to be used in R. So it's just, but R will give you a warning when you do it wrong. So if I'm doing something like, we'll come up with an example later, but had the if statements are so if you have an if statement, generally you always use the double ampersand. While if you do a selection from, for example, a matrix, you generally use the singular one because you want to know if the first column contains treated and the second column is larger than five and the third column is smaller than 16. So then all of these things you can ask in one statement and you can then make a subset of this matrix. So at these logical vectors that you create using these comparison operators, you can be used as indexes. So imagine that we have something which is called a variable called 1021, which contains the numbers 10, nine, eight, seven all the way up to one. And now I'm wanting to ask, well, 10 to one smaller than five, right? Which numbers are smaller than five? Then it says, of course, well, the first couple are not and the last four are. And I can then use this to index the vector, right? So I can say 10 to one, give me only the stuff from 10 to one where 10 to one is smaller than five. So instead of getting getting back all of the numbers from 10 to one, I now get the numbers four, three, two and one, right? You can do the same thing using logical vectors. If not, then they will loop around. That's not that important. But if I want to, for example, get all of the even numbers in this vector, I can just say 10 to one, see true, false, right? So now for the first element, it will be true for the second element, it will be false for the third element, it will be true for the fourth element, will be false. So it will just start iterating through them. So combining logical vectors, you can do advanced selection, right? So I can do A and B pairwise and A or B pairwise or. So if I again do the same thing, so I have a vector called 10 to one, so numbers 10 to one, I can say 10 to one, larger than three and 10 to one smaller than seven, right? So I can do a subset. I don't select all of this stuff from the vector. So what I can then do is I can see, well, okay, it gives me back true, false. For every element in the vector, it will say if it applies or if it did not apply. And then I can use this to make a subset. So I can say 10 to one, larger than three and 10 to one smaller than seven, store this vector into a new variable called subset. And then I can say 10 to one from this subset. So now it will only show me the values. It will make a subset of the whole vector and only show me the values which are larger than three and smaller than seven. You can do the same thing for a matrix. So imagine that I have a matrix with the numbers one through nine, three rows, three columns. If I now want to know from column one, which elements are lower than three, then I can just say, well, okay, so from matrix one, which is the matrix, take the first column and then ask the question smaller than three. And this will of course apply to row number one and row number two, but not to row number three. So if I now use this vector as this as the row selector for matrix and one, it will give me back a smaller matrix, right? So now it will only show that first two rows of the matrix. And this is very important when you have a big data set and you need to kind of reduce it into subsets, right? So if I'm using our to load in my data and now I only want to see the animals born on a certain day, then I can say, well, if the date in column five is equal to this date, then show me the data, right? So that's where I use these selectors for. So generally, you're not selecting for a number, but you're selecting for a date or a treatment or if something is a male or a female. And this is very useful to very quickly look at only the males in your matrix or look at only the females in your matrix. Of course, I can do the same thing with the is is operator. So compare and say, well, from the first column in M one, only select the element which is three or elements which are three. And then when I do the subset, now I get, of course, a vector back, because only one of the rows applies. And that means that if I'm selecting one row from a matrix, then I'm getting back a vector because a matrix is nothing more than a combination of vectors like we saw using the C bind and the R bind function. All right, some special control structures. And then we've done more or less all of the control structures. So we have the if statement when we want to check if something is true or not, we can use a for statement to do something multiple times. So for each made for each row in the matrix, make a box plot. When you are writing code for other people, you generally want to build in warnings, right? So if people set parameters, which you cannot handle, you might want to give a warning, right? So if I'm doing something with logarithms, and have my function that I made or the code that I make uses user input to do a logarithm, then I can issue a warning saying that well, if the number that you're trying to do the logarithm on is lower than zero, then just give the user a warning. You can also do an error. So the error is called stop. So if x is smaller or equal to zero stop, this will go horribly wrong. So if you don't know if it will work, give a warning, if you are sure that it will never work, give a stop warning. This is the try catch expression for when you want to test if something might go wrong or not. I'm not going to go into that because it's very detailed and you only use it very seldom. All right, so a little bit of an overview. In my mind, variables are boxes. They store things and you can use them without knowing what's in there, right? And you can ask for certain properties like what is the length of this variable or how many rows does this variable have? And control structures, they manage program flow, right? So you can have branching. So you can have an if statement saying that if something is smaller than three, make a box plot. Or if the column that I'm looking at is a numeric column, make a histogram. If it is, for example, a character column, then make a box plot. Now, not a box plot, but make a bar plot, right? So count the number of each of the elements in the, so that's where you can use the if statement for. So you can say if something is true, do it and otherwise you have to do something else. And you have looping like for for loops and while loops and that allow you to do something repeatedly. So for each row and the matrix do something. And of course, there are special control structures, which are called warnings and errors, which we just saw. So hey, if you're writing code for other people, make sure that other people can do silly stuff with your code. For example, using negative numbers, while your code can only handle numbers, which are positive. So there are some advanced looping methods in R and these are used a lot. So you can have the L apply function and you can have the apply function. So these are looping where you don't have to write the for loop. So it is defined like this. And the L apply is used when you have a vector or a list and the apply function is when you use a matrix. I have examples, I think. So if I have a list, so I have a list defined, which in the first element of the list has the numbers one to five. And in the second element of the list, it has the numbers one to three and an A value. So if I want to calculate the mean for each element in my list, right, because I have five numbers here and four numbers in the second element, I can just say L apply my list mean. So what it will now do, it will take the first element of the list and calculate the mean and the mean of one to five is of course three. It will then take the second element of the list and calculate the mean. Of course, you cannot calculate the mean when one of the values is missing. So it will just say Na. This is something that is important in R because Na's propagate. So if you have a missing value, then the mean of this list will also be missing. If you have a missing value in your matrix, then the whole column of the matrix will not be able to calculate a mean. You can add additional parameters. So there is a parameter to the mean function, which is called Na.Remove. And this is used when you do want to calculate a mean ignoring all of the Na's, right? So this Na.Remove is true function will actually ignore the Na's. So when I now say L apply to my list, the mean function, but ignore the Na's in each of the elements, then of course, the first element won't change because the mean of one, two, three, four, five is still three. But now the second one will now all of a sudden have a number because it will just say, well, I calculate the mean of one plus two plus three is six divided by three numbers is two. So you can add more parameters using the L apply function. But the L apply function is is is one of these things where you don't have to write a for loop. And it is really useful when you're dealing with lists. And you want to just for each element of the list want to know how many things are in there or what the mean is or the median or the standard deviation. If you want to apply to a matrix, so if you have a matrix, for example, here, I can have a matrix which is filled with the numbers one through 50. It has 10 rows and five columns. So if I apply to my matrix and I have to specify if I want to apply this function, the mean function to the rows or to the columns. So one means apply this function to the rows. Two means apply this function to the columns, right? So because I have a matrix, which has 10 rows and five columns, if I say apply to my matrix to the rows, the function mean, I will of course get back 10 means, right? Because every row will have a mean which is computable. If I say apply to my matrix to the columns, the function mean, now we'll get back five means because my my matrix had five columns. You can also use both of them at the same time. I never used it in all of the years that have been programming in R. So just remember that if I want to execute a function for each row of the matrix, use one. If I want to execute a function like mean or standard deviation or median or whatever, and I want to apply it for each of the columns, use the two. So the one stands for rows, the two stands for columns. So the nice thing is that L apply and apply are very often more efficient. Speedwise, this depends a little bit on your CPU, but memory wise, it saves you a lot. And you have to remember that if you want to do arithmetic functions, then they should be quoted, right? So here I have a vector, which contains the numbers one and two. So when I say L apply one to two, use the plus function, and then what do I want to add? Well, I want to add five to each element in this list. Then I have to quote. So I have to call it using this because otherwise the plus it won't understand the plus it will just say, Well, this is a syntax error. But hey, you can use plus, you can use multiplication, dividing Euclidean division, whatever you want. Just remember that when you do arithmetic operations or automatic functions, you have to quote them in the L apply and also in the apply function. So and in this case, of course, when I L apply to one and to two, so I have two numbers in my list, five, then of course, the first element will be six, and the second element will be seven because one plus five is six and two plus five is seven. So L apply and apply will normally give you back a list. So you can use the unlist function to get it in a vector, right? Because here I'm applying to I use L apply to apply to this vector. And then you see that it gives me back a list, which is a little bit annoying, because generally, if you use a vector, you want to continue using a vector. So you can use the unlist function to more or less turn the list that you have into a vector again, right? So that's what unlist does. It is a very powerful tool. And we'll get back to it in another lecture. No, we won't. But if you want to know more, just look at the lecture on YouTube, because there, we have a lot more in depth on how to use the unlist function to, for example, select stuff from lists. Alright, so we've seen a lot of brackets by now, right? So we have the round brackets. And those are used when you call a function, right? When I call the C function to combine stuff together, or when I use the mean function to calculate the mean or when I call the L apply function, right? So the round brackets are used when you call a function. And they are of course used in the statements of control structures. So if round bracket open, the thing that you want to test round bracket closed. The square brackets in R are there to specify an index or a name in a vector, a matrix or a data frame, right? So the square bracket allow me to select the first row from a matrix or the 15th element from a vector. The double square brackets are unique for lists. So the thing which is a vector but can have multiple types. So then you have to use the double bracket to specify I want to have the sixth element of a certain list. And then we have these curly brackets and the curly brackets define blocks of code. And it is used to surround expressions in an if statement. And so it tells the R interpreter which expression should be executed when the thing in the if statement is true, right? Because you don't have to put everything on a single line, you can have like 10 lines which are executed when something is true. And also when you are writing functions, you surround the function block. So everything inside of the function will be more or less defined by the starting curly bracket up until the closing curly bracket that belongs to the if statement. Alright, so we've been talking a lot about like strings and like I the trick question that I showed you guys. So a string or a character element in R they are enclosed using these double floating commas or singular floating commas, you can combine two strings together when you use the paste function, you can print them to the screen when you use print. And you can print them to anywhere a screen or file or a printer when you use the cut function. So print is just a fancy thing around the cut function which prints it directly to the R window. So strings are enclosed by these double quotes. Of course, this causes a slight issue, right? What if I want to print a double floating comma, like the two air commas? What if I want to print this, right? Because that is then impossible because the double air quotes start a string, they end the string and everything in between belongs to the string. So how do I kind of get around the fact that I might want to print a quote character. So that's the next slide. So let me let's just continue like this. So they are enclosed by these kinds of things. For getting to close a string happens a lot in R. In R, no command will produce any output, right? So you have to look at the symbol in front of the cursor. Let me show you an example, right? So when I go to my R window, and I say I have a variable called Danny, and I want to put in a string, right? And I start typing, and then I forget to close it, right? So instead of having this thing at the end, I just say, enter. And then I do five plus five. And now R doesn't tell me that five plus five is 10. So what the hell is going on? So I then do 56 divided by seven, still no output, right? And that is because it is still thinking that I am still continuing the string in R. So if I'm still continuing the string in R, right, then then it shows me these plus symbols in front. So when you look at the R window, if you see a larger than symbol, right, then you know that you're currently inputting a command. If you see the plus, it means that you're still inside of a block or inside of a string, right? So the only way to get out of this is by doing two ways. I can close the string, right? If I remember that okay, I'm in the string. But now if I look at Danny, right, then you will see that it also has five plus five 56 divided by seven. And it has these backslash and characters in there, which are the new lines. So those are where I press the enter key. Right? So that is one of the drawbacks in R. But you have to look very closely at this in the input, right? Because now when I do five plus five, it will tell me this is 10. But if I do this, and I forget to close my string, then it doesn't matter what I type, it will not give me any output. And I have to close it in some way. So if this happens, and you're typing, and you see this plus character occurring all of the time, then just press the little stop button. If I press the stop button, it will say, Okay, so now I'm just quitting everything. And I'm getting a larger than symbol, meaning that I can input a new command. So that's what this slide was for, right? So a long string, which I forgot to close, then you see these little pluses. And so there's no output. So just press this little stop button, or close the string, if you are dealing with a string. Alright, so one of the things that we want that we might want to do right is, if we want to print, then we can say, well, print and then paste the words hello and world together. So this will make a string called Hello World. Here we are saving this string to a file. So the cut function allows you to specify which file you want to write to so I can say file is out.txt. This will create a new file called out.txt and then put the result of paste into this file. So what if we want to print this double floating character, right? Because it might be that we want to do have a file with the double quote in there, then we need to escape it, right? So the characters that we need to escape are the quotes themselves because we need to have the backslash, the slash and then the quote, or the slash and the single quote. There are also special characters like the new line character we can have and print a top character which is slash t, the backslash itself. So if I want to print a backslash into a file, I have to use backslash backslash. And if I want to do a backspace character, so the backspace character is slash b and this is just pressing the backspace character on your keyboard. So these are special characters. And if you want to use these as an output, because you want to write them to a file or you want to write them to a screen, then you have to quote them. So let me give you a small example for this, right? Because when you write in English, and you write something like Denny said, right, then normally you would say blah, blah, blah, right? Something like this. And if I want to print this to a file using R, I have to say, well, cut, right? So cut. And then I want to make it a string, right? So I just enclose it by quotes. But now the problem is that R doesn't understand that this quote here, so this one here is not the end of the thing that I want to print. I want to print this character, right? So I want to say backslash. And for the same thing for here, backslash. So I just put a backslash in front and then of course I close the cut function like this, right? So now I actually say I actually get Denny said double point, floating comma, the thing that I'm saying and then the closing. And you see that it actually doesn't move to the next line, right? R just continues on the same line. And that is because I didn't put an enter there, right? Normally when you write text and at the end of a line, there's an enter. So if I want to do that, I can just do something like this. And then I have to of course say, well, I want to have a slash n, a new line. When I do it like this, then now you see that the command continues on the next line. And this is because you might want to print a file which doesn't have an enter at the end or it does, right? And I can also use multiple enter so I can say Denny slay, set slash new line slash new line slash new line. And then you see it does something like this and it puts these enters in the same thing holds for the backslash character, because I can slash slash backslash slash backslash. And then it will say Dan said, right? Because it just the slash B is just pressing the backspace key on the keyboard. So just some some specialties that you can do. And when you want to write text in our or output text in our to to a file or to a text editor. All right, so a little bit about escaping. So when using cut, we print verbatim. That means that we need to make sure that we add the end of line element. Otherwise, R will continue on the same line. We can also use a separator parameter to separate the elements. So for example, I can say cut hello comma world slash new line comma separator is a comma, right? So what it will do now it will take the first element. It will take the second element and then separate them by comma. I can also separate them by a space and I can also separate them by a by a minus. And this is very useful when you're defining things like row names or column names, where you want to have a matrix which has 10 rows and you want to name the rows. So you're just going to say, well, paste a certain word with the numbers one to 10 and then use something as a separator to separate. So separators are there so that you can when you combine two strings, something is in the middle. All right, so R is a language for statistical computing, which means that it has a lot of random numbers and random number generators built in. So here we have like a function which says get random number and it always returns four. And of course, this is a perfectly fine random number generator because in theory, like you can't claim that a random number generator which generates four every time is not random. R knows about different distributions. So it knows, for example, about the uniform distribution. So the uniform distribution is a distribution which generates numbers from zero to one and every number within this interval has the exact same chance of being drawn, right? So it's like a dice. If I throw a dice, then there's a chance of one and six that it will be a three. There's a chance of one and six that it will be a five. And so if I roll the dice enough times and I do that 10,000 times, then of course afterwards I will see that well, one divided by six times 10,000 times I drew a one. So the uniform distribution is more or less that. It's just a uniform. So every number within the distribution has the same chance. It is defined or you can draw stuff from the uniform distribution in R using the run if function. So I can say run if then I then I tell it how many random numbers I want and I can specify the minimum and the maximum. The minimum is by default set to zero. The maximum is by default set to one. So let's just show you guys how that works. That's the wrong button. So let's go to the R window. Right, I can say run if and then give me 10 random numbers. So now it gives me 10 random numbers between zero and one. I can also say give me 10 random numbers from a hundred to a thousand and then it will just draw 10 random numbers. I can draw more and if I want to make the plot that we just saw I can say well do give me 10,000 numbers, right? Minimum zero, maximum one and then of course if I would do this it would just continue on scrolling. And here you also see the indexes in front. So number 501 was actually this number and number 536 was actually that number. But now what I can do is I can now say well make me a histogram of all of these numbers and then it will show you that every number on average or more or less every bin on average was drawn around 500 times. So uniform distribution, every number has the same chance. R also understands what a Gaussian or a normal distribution is. So a Gaussian distribution is when values near the mean have a higher chance of being drawn. So normal distributions are very common in biology and bioinformatics. If I measure for example the height of a human and I would do that for 10,000 humans then the average height would be around like 1 meter 76. Some people are smaller, some people are bigger, but most people fall around the average, right? The R norm function so that if you want to simulate drawing random numbers from a Gaussian distribution you can use the R norm function. You again tell it how many numbers you want so that's the first parameter and then you say what is the mean and what is the standard deviation. So you can give it any mean, any standard deviation. So if you want to simulate human height you will say well I want to simulate 10,000 people, the mean is 170 for 170 centimeters and the standard deviation is for example 10 centimeters, right? So 50% of the population is between 1 meter 63 and 1 meter 83. So again let me quickly show you in R how that looks. So if you want to create the same plot we can just say R norm, right? So and then we say well humans on average are 1 meter 70 and with a standard deviation of 10 and now you will see that this is more or less how it looks. So most people are around 1 meter 70, some people are very small and some people are big, larger than 2 meters. But this is generally the structure that you get when you're drawing random numbers or when you're measuring humans or other distributions. Another distribution which happens a lot is the Poisson distribution. So the Poisson distribution is a distribution which is made based on counted data, right? If I'm looking at a flower and I want to know how many bees are on the flower there can be zero bees, there can be one bee, there can be two bees, there can be three bees. Of course there can be a hundred but it's very uncommon, right? So in this case like the most observations are at the lower end of the spectrum, right? When I look at a flower then there's generally like one, two, three, four, perhaps five bees on there but a lot more there's not going to be like ten thousand bees on a flower, right? So the distribution looks like this. So the numbers at the lower end of the spectrum are more common than the ones at the higher end of the spectrum. So my general Poisson distribution in my mind is just if I look at something and I want to know how many bees there are, right? I look at a flower, how many bees are on that flower. So counted data where counts of zero, one and two are relatively common but the other ones are relatively uncommon or less common. This work versus the R Poish function, so the R Poish function again you can tell it how many numbers you want and then you specify the lambda. So the lambda is more or less how wide the distribution is. Let me show you guys that. So if I would do the same thing and I would draw Poish, I would draw ten thousand Poisson numbers using the standard lambda using one, right? Then what you see is that we see a distribution which kind of looks like this, right? So had many times zero bees are on the flower, sometimes there's one, sometimes there's two, sometimes there's three but more than three is very uncommon. If I increase my lambda then what you see happening is more or less like this. So you see now that it starts looking like a normal distribution but it actually isn't because it is counted data because every number is a whole number and there are, it's not possible to be 1.5, right? If you look at a flower there's not going to be 1.3 bee on a flower. It's always one or two. Like you don't have half elements. So Poisson distributions are Gaussian distributions when you have a high lambda but they are based on counted data. If you want to know more about Poisson distributions then there's a lot of good resources out there. All right so those are the built in randomness functions that R knows of, R knows of a lot more. It also knows what a gamma distribution is, a beta distribution. It knows the t distribution for t statistics and all of these things. But just to show you guys three of them and just to show you guys some random data distributions. And the nice thing about R is since it's a language for statistical computing, it knows if something is a normal distribution or not. Right? So you can test that and it understands the difference between a normal distribution where numbers are continuous and a Poisson distribution where numbers are whole numbers because they are counted. If you want to have repeatable randomness, you can use the set seed function. So for example when I say set seed 1 and I draw five numbers between 0 and 2 and I round them down I get 1, 1, 1 and 2. If I want to have a different seed so a different random starting point then I can set my seed to 2. Now what doing the exact same call will give me different random numbers. But when I set my seed to 1 I now get the exact same thing. So this is based on the fact that at least when I was still in high school we had these graphical calculators and then the teacher would say please reset your graphical calculator because everyone had these cheat notes in there. Right? And to check if you really reset it, the first question was always draw a random or draw five random numbers and write them down. And then because when you reset your machine it resets the seed to the factory default everyone will get the exact same random number. So the teacher had a way to check if you really reset your graphical calculator. If you knew the factory seed actually you could actually circumvent that and just cheat all you want but that's a different story. So hey if you want to have repeatable randomness and you can use the set seed function to fix your random numbers from then on. So setting seed to 1, drawing five random numbers is exactly the same when I set my seed to 1 and draw five random numbers but it is different from when I set my seed to 2 and draw five random numbers. All right like I said I'm almost done with the second hour. Programming is like working in a lab. Make sure you work clean, put your code into a directory, put your input data into a different directory and make sure that the output data also goes into a different directory. Right? Work clean. Make sure that when you write code you act like a professional and not some guy running around burning down a lab and just like mixing all kinds of stuff together. Right? You can experiment much more with programming than in a lab. Right? You can just type in whatever you want and you see the result which in a lab you shouldn't do. You shouldn't just throw chemicals into a vat and then just see what happens. So programming gives you a lot of freedom but in the end if you are thinking about a career in bioinformatics make sure that you act professionally. Right? So have one folder where all of the code is, all of the input files, so the raw input data that you get from a company goes into another directory and you have another directory where all of the output, so the scripts, so the plots that you make and the analysis files that you do go. So think of speaking names for variables and functions. Temperature is a variable name which is perfectly fine but temperature in Celsius is much clearer because then people know I need to convert from Fahrenheit to Celsius before I have to put it in. When you do blocks, right, so if you do like a function or you do a for loop, make sure that you use indentation. So use spaces to align for example your comments at the end of the script. If you have open brackets, make sure that visually it is clear what is inside of the bracket and what is outside of the bracket by putting it on the same level. Right? This makes code reusable and readable by others. So other people can see your code and think like, looks beautiful. And if code looks beautiful, it's generally much easier to understand than when it's just a big mess and everything is on one line and stuff is like idented really weirdly. And R doesn't force this. There are some languages like Python where indentation is coupled to the meaning of the code. And in R that's not the case. Everything between the opening and the closing bracket is executed in the for loop. Right? So every time that it goes through it executes this line. No matter if the line is idented or not. In Python that's not the case. If you would put count here on the same level as the for loop, it will not execute it because Python does not have these opening brackets and closing brackets. So their indentation is everything. Good. So that's what I wanted to talk to you guys about for the second hour. So we discussed a lot already, right? So in R we discussed what are variables, random distributions, writing to a file. So kind of this is more or less all of the basic stuff that you need to start programming, right? If you can write a for loop and you can go through each of the columns in your matrix, then you can do some amazing analysis already, right? Especially when you combine a for loop with an if statement. If for every row in the matrix do something if the numbers are smaller than 10 or make a subset of your matrix using a selector, right? So load in some data columns and then only select the columns where you have males or only select the rows where in a certain column an animal is a male, right? So then you can analyze males and females separately. You can make different box plots and generally in R the functions are called the way that you expect it to be, right? So if we want to make a histogram, it's called his right for histogram. If we wanted to make a box plot out of this, we could just use box plot, right? And it will make a box plot of the data. We can make heat maps and images if we want to, right? So if we have a matrix and this matrix, do we still have a matrix loaded? Let me see. Yeah, we still have our matrix. No, that's not a matrix which we can use. Let's just make a small matrix, right? So matrix put in the numbers 1 to 50, say I have 10 rows, five columns and then I want to do a heat map, right? Then I can just say heat map of this matrix and it will just give me heat map, right? Won't look that pretty because it's just the numbers 1 to 50. If we would fill it using random data, it would start looking a little bit more interesting. So run if, right? So just draw 50 random numbers, put them in, right? Then your heat map starts looking a little bit more interesting. We can give names and and make this plot look better, but in the end R provides all of these plotting functions for you. So if you thinking I want to have a bar plot or I want to have a heat map or I want to have a histogram, just type in the name that you want or just use heat map, right? And then you can search and it will give you all of the functions which have a heat map in there. So good. Let me show you guys how that looks. So when I then look into Firefox, it just opens up this, right? So these are all of the help or all of the functions that there are which make a heat map, right? Heat map of 2D bin counts, pretty heat maps, dry heat map. So it just gives you an overview and then you can just click on it and then you will get the description of the heat map function, what you can do. And then when you go down, there is always an example, right? So if you want to know how this example looks, you just say okay, so just copy paste it in, right? And then see what kind of a heat map it makes and then try and understand what happens. So if we just copy paste this part into R, then we see that it makes a really nice heat map using kind of cars and different like how many cylinders does it have and what kind of gears and the name of the car, right? So every function in R comes with a help file which is super, super useful. All right, that's enough for me. I will take a five to ten minute break. So people on YouTube, I will see you for the next episode.