 also everyone who's watching it on Moodle. I will do the slide in English as well so that people know what it is. So when you use functions you can set default parameters. So default parameters end up being very useful because they are default values which are generally accepted in, for example, biology. So if we are talking about biology and we're talking about statistical testing right, statistical testing normally has a threshold of 5%. So this is generally called the alpha level of a test and if you set it to 0.05, so 5%, it makes sense to have that. So to kind of build that into a function. Things like false discovery rate generally gets set to 10% and things like number of permutations. So if you shuffle your data and do a statistical test, then you shuffle your data, do a test again, had to kind of get a baseline of what is significant. Then the number of permutations that you generally do is like a thousand. So these are default values. So here we have a function which has a default value. The default value here is the exponent value. So the exponent value here is standard 2. So it means that this sum function is actually a function which just does a number. So the in parameter to the power of something that you can specify. If you don't specify anything it will take the default value of 2. So if you call sum function with the value 5, it will just do 5 to the power of 2, the default. But if we specify x, then the default value will be overwritten and the default value will be set to the value of the variable which I set to 5. So some function will now do 5 to the power of 5. Good. So default parameters, you can use them if you write your own functions. You can take some sane defaults which are normally accepted in your field, right? In physics, the alpha level generally is something like 1 x 10 to the minus 3 or 1 x 10 to the minus 5. But in biology, we generally take 0.05. Toc Frou, cat is not a valid symbol for showing on the mood box. Cat is also not really in mood. Although zombie is also not really in mood. Come to think of it. Anyway, so just try again. There's enough options. You're free to throw in what you want, see what sticks, so that's perfectly fine. Alright, so default parameters. Then if we do functions, and we already saw this like three dots, so the dot dot dot structure, this is called a variadic function. So a variadic function means that you make a function, but you leave the number of parameters open. So it could be 1, could be 0, could be a million parameters. So a variadic function is a function which accepts a variable number of arguments. And then we can use dot dot dot to specify a variadic function in R. An example of a variadic function we already saw, and that's the sum function. So let me show you an example in R. So if you have in R you have the sum function, right, and I can sum 1, 4 and 5 together. But I can also sum as much as I want because they are variadic. So I can have 1, I can have 2, I can have 10, I can have 100 parameters which are inputted into the sum function. So that's the whole thing with variadic functions is allowing you to specify multiple parameters. And beforehand you might not know how many parameters or how many elements the user would like to sum up. So as an example you can create your own sum function in R called my sum here. So my sum is a function which takes an unknown amount of function parameters and these are specified here by the dots. So the count is my total just like we did with the counting up the even numbers. Initially it is 0, right, because I haven't added any numbers. And then I can use a for loop to iterate through the parameters. So I say 4x in a list of dot dot dot. What am I going to do? Well I'm just going to take the number that the user gave to my function, add it to the count and then store it back into count. So and this will be executed for each of the parameters that the user inputted. So if I call my sum and don't give it any parameters it will just say 0 but it can also do my sum 1, 2, 3 and then it will just say 6. And I can add as many things to this function that I want. Variadic functions. You can also use named parameters. So if I just make a function called variadic test which is a function which again takes an unknown amount of arguments and I just return the list of these arguments and then if I call the function variadic test with a single parameter, a named parameter called param 1 equals 15 then what happens it had this this parameter is turned into a list. This list gets or the first element of the list gets the name param 1 and it gets the value 15. If I call variadic test with two parameters like the first is a single number, the second one is a vector then it just returns the values that get inputted in that gets input into the function. So it's very useful when you create a function and you have no idea how much things a user wants to sum up together. So if you don't know how many parameters you are going to expect you can use variadic arguments or variadic functions in R. So a little bit more about the scope. I told you that the nice thing about functions is that the internal parameters are not visible. So we have again a function. This time it just has a single parameter. This parameter is done to the power of 2 and this is stored to a variable called intern and then we return the value of the intern parameter. So now when I call my function with the value 5 it will do 5 to the power of 2 and it will return that but when I now type intern into the R command it will tell you that intern is not found because it only exists inside of the function and it never leaves the function and this is kind of the scope of this is called the scope and this is really useful because of course you can think about R and all of the packages that there are in R and of course there's like literally thousands of packages and if you would not be able to limit variables to within a function then of course you would run out of variable names because you can if I define a function and I have a variable x in my function then it would be impossible for some other person to also have a function which has x but because of this encapsulation because of the fact that intern is scoped inside of the function every programmer in R that writes functions can reuse this variable name and because it doesn't get it doesn't escape the function so it only stays there one of the things that you can do in R is access stuff in the parent scope so this is the so intern the scope of intern is within the function but if I am inside of a function I can have access to stuff which is defined outside of the function but this is considered very bad practice however sometimes you need to look outside of the function to see if something is defined so one of the things that that one of the reasons why you would want to do that is being lazy and being a bad programmer so this is not a good reason you want to save random access memory which is one of the concerns in R and we're going to talk and one of the other reasons is that plot functions generally use it to read environmental settings for example which font is used in plotting what is the size of the font and of course you're not going to a plot function doesn't have like a hundred parameters which each of these parameters like one parameter being the phone the size of the phone the type of the dot and so these things generally are defined outside of the plot function itself and the plot function reads these values when it needs them so here an example of just being lazy which you should never do so I define a variable called exponent I give it the value 5 now I write a function and this function here uses the exponent variable but the exponent variable is not passed into the function so if it is not passed into the function R looks it up in the scope outside of the function so the parent scope the scope outside so it works but it is something that you should never do how should the function have looked like well I should have defined an input variable which takes the exponent of the function like we did before so again exponent 5 some function it takes now two parameters one parameter is the value that we want to do to the power of the exponent and of course here when I now call exponent it will take the input variable and it will not take the variable outside so this is just to make sure that if you if you write a function make sure that everything inside of the function is only referring to the input parameters if you don't know how many input parameters there will be you need to use a variadic function but this is this is just the way to kind of encapsulate everything alright so that was everything that I want to say about functions first functions will come back I think in the assignments there's like one or two assignments to write your own function and that will be difficult because it's something that is new for a lot of people that haven't programmed before I think variables are very logical right you just define a name and you put something in the if statements are based on a test if something is smaller or larger than some number and that is very logical to kind of understand the four loops and the while loops already are harder but functions are one of the hardest concepts in programming and they they always have been so I want to introduce them as soon as possible because they will come back during during the lectures that we're going to have so a little bit of overview on brackets because we already saw a lot of different brackets and we've used them and and just to give you one slide where you can say okay so this is how I use brackets so the round brackets these ones are used when you call a function like the combine function like see round bracket open the things that you want to put into the vector and then you close it they are also used in control structure statements like if round bracket open the thing that you want to have tested round bracket close the square brackets specify an index to a vector a matrix or a data frame the double square brackets specify an index into a list and then we have the curly brackets which define blocks of code like in the function right here the function begins and the function ends had the if statement has a beginning and an ending so the statement which are the expressions which have to be executed within the if statement so had they defined blocks of code so what expressions belong to an if statement or what expressions belong inside of these functions so the curly brackets are more or less a beginning and an ending structure so when R sees it everything which in curly brackets is kind of made into a block of code so it belongs together all right so a little bit about escaping so if you define a string so if you define a character or a string value then you enclose the strings by these double double air quotes or single air quotes you can combine strings together using the paste function like we saw paste individual with numerical values you can print stuff to the screen using the print function and you can print them to anywhere for example to the screen but also to a file using the cut function so the cut function is useful when you want to write for example a text file which has some text in there the print function only prints to the R command screen so it's not saved into a file on the hard drive but the thing is is that often you can forget to close this double air quote right and it happens a lot especially when you're not using a text editor but typing directly into R so when that happens nothing will be produced right because R still thinks that you are defining something which has to go into the string so in often what happens is that you define a variable you assign something to it and then you start assigning a long string which I forgot to close and then you forget the closing air quotes so you forget to close them what happens then in R you see these pluses in front of it when I type 5 plus 5 I expect when I press enter that it shows 10 but it doesn't show 10 and then if R doesn't do what you wanted to do then that generally is because you are still inside of a string statement so you forgot to close your string and there's no output for 5 for 5 since I forgot to close the string before when you notice these plus symbols you just press the stop button a couple of times so in the R window or even if you are using R studio there's a stop button at the top which you can just click a couple of times and then it will stop execution and then you get back so when you have this input so this larger than sim or yeah the larger than symbol the larger than symbol in front of the line means that you can input new stuff well the plus symbol means that you are still working on something that you input it holds the same as if you are trying to type an if statement directly in R but a lot of this can be prevented by using a text editor typing the code in the text editor and then copy pasting it into R instead of just typing it directly in R but here remember when you type 5 plus 5 you press enter you don't get any in you don't get any output from R it's generally because either a string is not closed or you're still having an if statement open or something like that and you can notice that by the plus symbols in front and then just press the stop button to kind of get out of out of the hole that you got yourself into back to escaping because if you want to print something right and I want to paste the words hello and world together with a space in between then I can use print and this will print it to the screen I can also paste hello world directly into a file called out.txt by saying cut so cut has an has an has a named parameter file which you have to always specify file is because cut itself is a variadic function so it's a variadic function which has a parameter called file and then an X number of input variables but the problem is is if I want to paste the air quote itself into a file so if I want to write a file which has like the double quotes in the file then of course I need to escape this because if I would just say cut quote and then file is out then of course are still things that you're defining a string because the string starts with the double quote and it ends with the double quote but of course if I want to just print a single like double quote to a file on the hard drive then I need to escape it so you have the slash character for that so the backslash character so if we want to escape a character we need to escape or we need to use the backslash character so if I want to print a quote to a file or to the screen then I need to use backslash double quote or backslash single quote we already saw the new line character so if I want to print an enter to the screen or I want to print an enter into a file then I can do slash n the same thing holds for tabs so the tab character which is a special character but used a lot if you do tab separated files the tab character in R is encoded by slash tab between quotes of course since the backslash is the character which is the escape character when I want to print a backslash to a file I should double escape it and I can also do a backspace because backspace is for for a computer backspace is also a character so if I want to do a backspace then I do slash B so just to give you guys a small example of this let's go to R right so I can do print right and I can print something right and this will show the word something but if I wanted to print the quote then I have to do print something I am printing a quote and then I have to do slash like this and now you can see that when it prints here it says I am printing a quote slash quote the slash is still in here right you can still see it but if we do the cut function so the cut is printing what you see is what you get more or less so here is I am printing a quote quote right and you see that the input continues directly on the same line and that is because I didn't put a new line there so if I do something like this then now it will print a new line behind it and of course if I want to print a backslash as well then I have to do two slashes and then it will print the right character that we want and this is more or less when you use the cut function the cut function is really versatile because you can use it to print to the screen like I'm doing here you can also directly print to a file but you can also directly print to a printer if you wanted to or to a URL so and the cut function is a very versatile function which allows you to generate output to the screen and and these kinds of things but of course if we want to print special characters like the double quotes or the slash new line so the enter and of course you can also do the backspace and so in theory we could do backspace backspace backspace backspace and then have what what it will it actually doesn't backspace it properly why does it not do that because it should actually yeah so that's because of the new line it doesn't backspace in front of a new line have it if I say I am printing a and then backslash backslash backslash it's just like print pressing the back backspace key on the computer so that that's how this works so couple of special characters you just have to learn them remember them and during the exercises we will we will practice with them so that you guys get a little bit familiar on printing like special characters when using cut we print verbatim meaning we need to make sure that we end the end of line element otherwise our continues on the same line like we saw and so and we can also use the separated parameter to separate elements and so if I say cut hello world separator is a comma then it will take the first parameter second parameter and it will separate them by the comma you can use a space and you can use a line if you want but you can set a certain separator for example if you are printing top separated files then of course your separator will be slash T so because we have to escape the top alright so last part some randomness so I'm getting a random number in our this is from xkcd so getting a random number is of course random numbers are interesting because no one can prove that a number is random so a random number generator which always returns for is a perfectly fine random number generator although the number that comes out is not really random officially it's still random because you can prove that it's not random we already saw the run if function so the uniform distribution is a distribution in which each number has an equal opportunity of getting drawn so every value has the same chance of being drawn from the distribution the run if function in R specifies so it has three parameters the first parameter is the number of numbers that you want the second parameter is the from and the other one is to and from is default 0 2 is 1 so and normally if you would say run if 1 it will draw 1 random number from the uniform distribution if you say run if 5 it will draw 5 numbers from the random distribution from the uniform distribution if you say run if 5 comma minus 1 comma 10 then it will draw 5 random numbers between minus 1 and 10 so very similar to the sec function which also has a from and to the Gaussian distribution also called a normal distribution is a distribution where values near the mean have a higher chance of being drawn so if you are thinking about biology then of course the the Gaussian distribution is a distribution which is very often observed if you think about human stature so how big is a human then humans on average are 1 meter 75 centimeters or something like that but of course some people are smaller some people are bigger but on average people tend to be the average height of course very small people or very large people occur fewer so hey if you are measuring if you're measuring the length of a mouse or if you're measuring the amount no not the amount of leaves but if you're measuring the size of a plant in meters or in centimeters then if you measure like a hundred plants or a hundred fish or a hundred mice then the thing that you are measuring automatically becomes a Gaussian distribution so Gaussian distributions are almost everywhere and especially in in biology they come up if you want to draw a number from the from the Gaussian distribution you use the our norm functions or norm stands for normal distribution it's officially called Gaussian but the function in ours called our norm so if I do our norm one it will draw one number from a random this from a Gaussian distribution so it could be anywhere between like minus four and plus four but I can also set the mean and the standard deviation and the this is really useful when you want to do simulation for example I want to simulate like a thousand cows and I want to simulate the milk production and I know that the average milk production of a cow is like 8,000 liters and the standard deviation in a normal population is around like 500 liters and then I can use the our norm function to simulate the milk production of a hundred or of a thousand cows very easily another distribution which occurs a lot in biology is the Poisson distribution so the Poisson distribution is a little bit different than the previous two because the Poisson distribution is a distribution which is based on whole numbers so it it will never give you back 1.5 or 7.3 it will it is the number it isn't it it's an integer number so it's a whole number so if I think about Poisson distributions and this is the example that I always use imagine that I'm a biologist and I'm interested in bees and flowers right then if I look at a flower then there is a probability that there are no bees on there there is a probability that there's one bee on there there's a probability that there's two bees on there but the probability that there's like a hundred bees on a single flower is very very low so that is what the Poisson distribution models so it kind of models the amount of things on or the amount of bees on a flower right so it it's very common to find no bees on a flower very common to find one or two but then when you hit three to four bees it starts dropping off so the Poisson distribution is a distribution which occurs a lot and it occurs a lot when you are observing whole elements in a certain in a certain area or certain radius right the the amount of people on a square kilometer right finding no people there one person two person three person is very common have it if you have a square kilometer finding like a million people on that square kilometer is is relatively rare although probably people in an area don't really follow Poisson distribution but if you want to simulate Poisson distributions then you can use the Poisson function and again it takes the amount of numbers that you want to have and it also has additional parameters to stretch this distribution or to make it smaller because hey you have to kind of limit yourself but Poisson distributions occur in biology also quite quite often so if you look at a histogram of your distribution then Poisson distributions look like this normal distributions look like this and uniform distributions are just uniformly distributed so every number has the same chance of being drawn of course r has a lot more distributions we also have the beta distribution the gamma distribution t distribution and all these kinds of things but they will come up when when we are talking about things like t test and then of course we want to draw a number from the t distribution or we want to look up or if we have a t value we can then look it up into the table to see which p value belongs to a certain t value but we will get back to this but remember these three distributions because they are very very common in normal life since we are doing science we want to have repeatable randomness which is a little bit strange but if I am doing a simulation study then my results that I obtained my I want I might want to have other people be able to redo my work so for that especially if I'm doing random number work I have to have a repeatable randomness so repeatable randomness can be obtained in r by using the set seed function so the set seed function is a function which you can give it a number like one or two or three or four you can you can come up with your own favorite seed number but from then on the numbers will that you draw will be similar so what I'm doing here is that I'm drawing five numbers from the uniform distribution between 0 and 2 I round them down so that I don't have anything behind the comma so when I set my C to one and draw five numbers from the uniform distribution it draws one one one two and zero if I set my C to suit to two and I do the same call I get a different I get different numbers being drawn because we are using a different seed so now I get 0 1 1 0 2 if I then set back my C to one so I do set seed one and then do the same command as that I did before so run a five numbers from the normal this up from the uniform distribution rounding them then you see that I get the exact same numbers being drawn as before and this is for repeatable research so if you're doing a simulation study then generally you set a seed at the beginning you draw your hypothetical cows from your distribution and then of course when someone else read us your analysis they get the exact same answers so just to prevent different people with different seeds getting different numbers normally if you would just do your analysis you would not set a seed but if you are writing a simulation study then of course you need to have repeatable randomness because of the fact that science needs to be repeatable by other people all right last slide last two slides I already told this last week but I'm going to tell you again programming is like working in a lab make sure you work cleanly and so put your code into a directory put your input data into another directory put your output data into another directory and this is just so that you don't accidentally override an input file if I think about some of the input files that I get some of them are valuable some of them literally are tens of thousands to hundreds of thousands of euros that people spend on getting that file hey if I think about hey if I would sequence a hundred cows then that would cost me a hundred thousand euros so I don't accidentally want to overwrite this file that I got from the sequencing company because the sequencing company is not going to store that file for me because they are massive files they're literally probably like 20 to 30 gigabytes for a single sequencing run so make sure that when you are programming that you know what you are doing and that you don't accidentally overwrite an input file so I put my code in a directory I put my input data into a directory I do a set working directory I load my input files and then I do a set working directory to move to my output folder so that I don't accidentally override a file and destroy tens of thousands of euros of data so that's that's that's it's like clean code making code that's understandable is very very important one of these things to create clean code is to think of speaking names for variables and functions if I if I define a variable I want to have a name for that variable that makes sense so don't just call your variable X or box like like I did in this presentation right but think of things like my sum right my implementation of the sum function I have a variable called count which contains the count so far or total which contains the total so far right so think about a name which represent what you are going to put in if I'm thinking about if if I have research which works on fish then fish have a length and they have a weight so it is good to have variables which are called length and weight instead of having a variable called X and a variable called Y right variables you can choose the name so to take good names since we are in this lecture I introduce you to if statements and for for loops and functions use indentation to denote blocks use spaces for this not tabs because tabs are different on everyone's text editor some text editor give a tab five space or four spaces some text editor give a tab like eight spaces a line comments in the back so that code looks like this that I have a function called my sum then everything within the function gets two spaces in front of the line so all the all the all the expressions inside of the function have an indentation of at least two and then inside my function do a for loop then I add another two spaces for everything which is be inside of the for loop so that the code structure looks that you can see the structure from the code by just looking at it and comments a line on the back if I have a comment which it is at the end of the line then if I have a comment on the next line then just align them and just use spaces align them properly and make sure that they are aligned it just makes code very readable for other people because they can directly see oh he's beginning a block here oh the block is ending here and so reusable code and readable code is very very important why use spaces and not tabs you earn around 10 to 15 percent more when you use spaces this is from a stack overflow research that they did where they asked several developers on how much money that they earn and they looked at the code the code that was produced so people who use spaces to align things generally earn 10 to 15 percent more than people who use tabs and this has to do with kind of the programming language but independent of the programming language programmers who use spaces to align stuff generally get paid more than people who use tabs there's probably a reason behind it but it's just one of these findings from from stack overflow research all right so what is the main thing why we want to do this it is because we want to have reproducible results so if I write a script this script should run from beginning to end without producing any errors so if I if I look at for example the assignments right so let's go back to the first assignments that we did so the answers to introduction one what is good code good code is code that you can do control a control C we go to R right and in this case I want to clean the R window so I want to remove all the objects in R and then have what I want is just to paste it in and it should run from beginning to end without producing any errors wonder how programmers that use tabs felt after reading the study by stack overflow well they probably started using spaces and then demanded an increase in pay that's what I would do but no I if you go to notepad plus plus right you can't really see it here but if we go to the engine.js I don't know if it's visible but you see these little yellow dots in front here right these little yellow dots in front means that there's a space and you can do that on the top you have like view and then you go to show symbols and then there's a show white spaces and tabs and there's also show all characters so then it looks like this so then you see the enters as well at the end perhaps I have to zoom in a little bit so that you can see the so there they're little yellow dots here in front so and that's oh don't do that so if you show simple but I always have the symbols open and I change something no here right so the and tabs look like little arrows I can't show you that because I automatically change tabs by spaces but if you see little errors then those are tabs and they they cause a very messy layout if you use tabs some languages actually force you to use tabs if you think about things like Python Python the layout matters so having two spaces or having four spaces or having one tap or having two tabs is a very is different in Python Python doesn't use brackets to denote blocks it uses indentation so that the indentation of the code determines if something is within a function or if it's outside of the function fortunately R doesn't do that but clean and reusable code is very very important so make sure you can run a script without any errors so when you when you when you're done with the assignments close our open up our fresh select all of the text copy paste it in it should run from top to bottom give you all the answers and it should not produce any warnings or errors also warnings fix warnings warnings are a sign of something being wrong so make sure that they don't happen or that no errors or warnings occur alright so that was it for today nice we finished almost just in time so 450 so that's really good so are there any questions for today I know that today is a little bit of a hard lecture because of the function part don't worry too much about the function part it will it will come back in the future and we will go into more detail and it's just that I want to introduce this as quickly as possible so that you know that you have variables you have control structures and these things are put into functions to create reusable components which you can use over and over and over again alright so for the people of Moodle I will stop the recording although no wait there was an I got an email from one of the students and I should mention that before we go yeah it is about the exam and about the assignments so the assignments I am not going to check we are going to do them during the zoom meeting and we are going to I'm going to show you my answers at the beginning of each lecture but you don't have to upload the assignments and I'm not going to give you a grade for if you do the assignments thank you for following thank you thank you so I'm not gonna check your assignments I'm not a high school teacher this is a master course PhD students follow it as well so they are for you if you want more assignments because you want to practice more just ask me I have a whole document like a PDF document with literally like 50 to 100 additional assignments and the only way that you're going to learn how to program is just doing assignments and programming stuff for yourself which is why it's really good to have a to have a data set in your hands right if you're already doing a master project or you just started your PhD and you have a little bit of data and then it's really fun to use our to work on your own data the second point that was asked is the exam so everyone who is part of the Humboldt University is getting credit points when you pass the exam so we will have the exam all the way at the end of the lecture series of course so if you pass the exam you have to for the exam you have to sign up via this Agnes system and you sign up for the exam then you do the exam and then your grade automatically gets sent back to the pre-functs bureau so I will just check your exam and then put your grade on a list and I will send that back to the pre-functs bureau for the people who are not from the BQM module the module will be in your Val part so in your kind of free room in your master where you can choose it for the people who are from external universities it's a little bit different because they can sign up via the Agnes system if they contact our pre-functs bureau beforehand so you have to contact our pre-functs bureau and then you can they will ask you well what's your name which university are you studying what is your matricul number and these kinds of things and then they register you for the exam however you don't have to do that because you can in theory just join the exam and then you get a Leistung shine for me and with this Leistung shine you can go to the pre-functs bureau of your university and they will give you the credit points for PhD students following the course you do not have to take the exam because for PhD students they are not they don't have they don't have to do the exam because they just have an attendance so but I like people to do the exam so PhD students if you do the exam you get a Leistung shine if you don't do the exam then you can still get a letter of attendance certifying that you follow the course so that's that's up to you what you want but if I close R and I'm asked whether to save the workspace if I then press no nothing is saved anywhere right yes always answer no and if you want to check you open up R and you press ls and then execute the function right so if you go to R and well let me clear out R so remove all objects yes so if I do ls so if you open up R and you do ls open close then it should say character zero nothing is defined in the workspace so yeah nothing is saved anywhere if you press no which is how it should be because you don't want to one day wake up open up R have to wait 30 minutes just because you load it in a 5 gigabyte file which now has to reload because you're starting R how many cts is the course once we pass the exam 6 I would say from my mind you can find that on Agnes I think let me let me make sure that I'm not telling you something so you get four semester wachen stunde which is six ects I think yeah six Leistung sponkte so it's four SVS six like six yeah all right thank you yeah thank you guys for for being here for joining and I put the assignments already online I have to do some fiddling with the recording to make sure that I recover the first part that I didn't record but for the people watching on Moodle I will stop the recording here and I will see you guys next