 the recording. So also welcome back people who are watching this on Moodle or on YouTube. I wanted to talk a little bit more about functions. So we already discussed functions a lot, or not so much a lot, but I told you guys that we would come back to functions and we would kind of look at more of the details. So like I told you guys, variables are boxes. We have control statements which move the boxes from point A to point B and then we have functions and functions in my mind are factories. Factories contain all kinds of control structures like conveyor belts to bring boxes from A to B. So functions are factories. They can contain many boxes, so they can contain a bunch of variables, but all of these boxes inside of the factory, inside of the function, are not visible from the outside. This is called the scope of a variable. So a variable lives in a function and when the function ends, the variable ceases to exist. So there is also the parent scope. So if you look at a variable or if you are inside a function, you can look at variables which are available outside of the function, which generally is very bad practice. But there's also something which is called escaping your own scope. So designing or making a new variable inside of a function and this variable exists outside of the function. So you can also do that. So I wanted to talk to you guys a little bit more in detail about functions. So about the scope, right? So I think I already showed you this example previously. So we have a function which we define. So we have some function to which we, so we have the variable called some function to which we assign a function which has a single parameter. Then what we do is we take the parameter that is passed into the function, so the variable in param, and then we do it to the power of two. We then store it in the intern variable, so the variable called intern, and then we return intern to the outside world, right? So when we call some function five, then we get back 25. If we then type intern in R, it will give us an error saying that the object intern is not found, right? Because intern is not visible from the outside. It only exists when the function is more or less executing. And this is called the scope of the variable. So the scope of intern is only some function. So we can access variables in our parent scope, but this is considered bad practice. But sometimes we need to. There are legitimate reasons to kind of, from within a function, access things which are outside of the function and which are not passed into the function as in parameters, so as input parameters, right? So one of the things that we can do, for example, is read from our parent scope. And there are some possible reasons that we want to do that. We are just being lazy, so we don't want to define another input parameter. So instead, we just read variables that should be available in the outside. But of course, being lazy is not an excuse to do this. However, in R, we sometimes want to save random access memory, right? Imagine that I have a variable which contains like a billion measurements, then of course, I don't want to copy these billion measurements into a function. And in R, function parameters are often copied in. So that is, at that point, we're kind of duplicating the memory just by calling a function. And sometimes that is something that we don't want, especially if variables can contain a lot of data. Because then, instead of using two gigabytes of memory, all of a sudden my algorithm starts using five gigabytes of memory, just by copying the stuff in. Of course, we also have plot functions. And plot functions need to read the environmental variables. They need to know which font to use to plot. They need to know what the current point size is. They need to know what the current color is for lines and these kinds of things. Since they can be defined by these parameter functions, right? So you can say par MF row is four four. And this is something that is defined. But then we have the plot function, but the plot function needs to know that we are doing 16 plots in a single window. And so this is an example of just being lazy, right? So for example, we have called, we use the same function again. But now we instead of always doing to the power of two, we use a variable called exponent. But the exponent is not passed into the function. So we now are lazy, because we assume that the person calling our function is smart enough to first define something called exponent. But of course, they don't, they don't have to do that. And if they don't define exponent, then it will generate an error, right? Because you will get an error saying that exponent is not found. So this is just bad function design. So don't do that. However, sometimes we want to do this, right? If I have a big matrix function or a big matrix variable, which stores like a million by a million matrix, right, then this matrix is huge. So have when we then do an efficient function, right, then we have, for example, efficient is function, which takes rows. So a vector of the rows that we want to transform. And then we do a certain transformation, for example, the square root head. So the definition here is that we take big matrix, the rows that we want to do. And then we assign the transformation of these rows back into the rows. And then we return a part of the big matrix, which has been transformed. So this is the efficient way of doing it, because we are now just assigning to this thing called big matrix, which only lives outside of the function. If we have the kind of proper way of doing it, but the inefficient way of doing it, we would define a function which is called inefficient, which is a function which takes M. So the big matrix as an input parameter, right? So now what happens is when we call the function, we have the same code, more or less, right? But now we are not assigning to big matrix. We are now assigning to the input variable M. So as soon as we do this assignment of the transformed rows of M, back to the rows of M, at that point, the big matrix will be copied. And this is very bad, because now, instead of using like five gigs, we all of a sudden use 10 gigs of RAM. And this is something that we want to avoid. So there is a legitimate reason to sometimes not use, there's a legitimate reason to sometimes read or write variables which are not inside of your own scope, which are in the parent scope. So the scope outside of the function. And this is something that I use sometimes. We can also do the opposite. We can also break out of our own scope and create update variables visible outside of the function. Like I say, there are possible reasons to do this to save from do not copy, but manipulate one object, we want to return multiple objects, which have because the variable or the functions in our only allow you to return a single thing. So because I'm only allowed to return one thing, if I'm computing two things, then I might be lazy and say, well, I'm just going to assign a new variable on the outside, which contains the other thing that I calculated. And then the main thing I'm going to give back. But this is not a very good way. This is being lazy. So here, if you want to return two things from a function, then make sure that you just return a list with these two things. You can also use things like setting up environmental variables for, for example, upcoming plots. So the thing that, that, that makes this work is use this double headed arrow. So we use the double headed arrow assignment operator to assign something outside of the function scope. And this is of course different. Well, it's not so much different. Here we use the single headed arrow. Why? Because big matrix already exists. But if big matrix does not exist, we can use the double headed arrow to define a new variable outside of the function. Did I have an example? Yeah. So here, this is the only time when I use this double headed arrow. So for example, if I use an apply function, and we talked about this also on Tuesday, when, when we were doing the question to a with the, with the apply function. So this is when I use this double headed arrow, right? So we have a matrix, we have a count variable. And then when I want to update the count variable, the only way to update the count variable is by using this double headed arrow. So here I say, apply to my matrix to the rows, a function of X, calculate the mean of X. But now imagine that I could not calculate the mean. For example, there was an NA in there, then this will create an error. So I'm, I'm, I'm putting in a extra check saying that if the result was not true, right? So if it was false, for some reason, then I say error detected at count. So now when I use my apply function, now I can see which row actually caused the error. And of course, when I when I don't detect an error, then I can update my count variable by one. So and I just continuously loop for every row. And at a certain point, I encounter an error. And then when I detect an error, I can inform the user where the error where where it went wrong. So at which line, the data was different than I expected it to be. And of course, in the end, I just returned the result. And the result is of course, the mean of the row. Right? So this is the only time that I use the double headed arrow. All right, so a little bit more advanced functions are things like recursion. So recursion is in essence, a function that calls itself. So for recursion to work, you have to have a simple base case or base cases. So this is a base case where I know what the answer is going to be. So had that and all the rules that I have reduce all other cases towards the base case. So this is the sum example from from computational fundamentals. So I define a function called sum up, which is a function which gets a single parameter as input, a single number x, right? And what it's going to do, it will sum up all the numbers. So if I say sum up seven, then it will do seven plus six plus five plus four plus three plus two plus one. So I need to have a situation which I know will be true. So there are two base cases here. One of them is when x is equal or is equal to zero or smaller than zero, right? Because then I cannot sum up, right? If I would do minus seven plus minus eight plus minus nine plus nine, then it would go on forever because there are an infinite amount of negative numbers. However, when I use positive numbers, then this will work, right? So what I'm then going to do is say, well, if x equals one, right? So if the input is one, then the sum up is also one, right? Because I don't have to think about it. If I want to do sum up of one, then it's just one plus nothing, right? So it's just one. So I return one. And then we have the reduction step. So for all of the cases which are not invalid or not the base case, I am going to reduce it. So now I'm going to say return x plus the sum up of x minus one, right? So I'm calling the same function again. And this works and this allows you to write very elegant algorithms. So what happens if I do sum up of seven? So what happens is that the sum up of seven will say, well, I have seven plus the sum up of six. And the sum up of six is six plus the sum of five. And five is the sum of five plus sum up of four, right? So it will it will just do all of these function calls until it reaches the base case. And now I have a sum which can just directly be computed. So every iteration, the function will be called again until we hit the base case. This is just called iteration. So this is called a recursion. So this is a different way compared to iteration, right? We can compute the same thing by iteration saying for x in seven, double point one. So for seven to one, do result is result plus x, right? Then we iterate, then we don't call functions or anything. But we can do this as well with recursion. And recursion is really nice. And this example doesn't really show everything that's possible. But there are some really, really beautiful algorithms, which you can write by using recursions, which when you want to use iterations just become horrible algorithms, because you have to deal with all kinds of side effects or all kinds of other things. Right? Some, some notes when you want to use recursion, you have to handle weird or unexpected input, right? So the sum up of a negative number cannot be done, because we always sum up the number minus the numbers that and then up until we reach one. And of course, if we start from minus seven, then we go to minus eight, minus nine, minus 10. So this will go on forever. So make sure that you don't end up with infinite recursion that you don't do an infinite number of recursion calls. So every input that you give should reduce back to the base case. And the thing that is going to decrease or increase every round is called the recursion invariant. So the recursion invariant is in our case here, x, right? Because I call the function using x, but the function itself calls the same function. But now with x minus one, meaning that every time that I call this function x, every time that this function is called, x will be smaller by one. And this guarantees that no matter what number I input for x, I will always hit the base case, I will always hit x equals one, right? Doesn't matter if I start with 1000, or if I start with 10,000. At the end, I will always because I every time decrease x by one, I will always come closer to our base case. And in a certain point, we will hit x equals one, and then the recursion will stop, right? So besides that, there is also a recursion limit in R. And you can set the limit yourself. So you can say option expressions is 500,000. This will allow us to do 500,000 iterations. So we can do some up of 500,000. And then it will work. But we cannot do some of our 600,000 because then, at a certain point, we will hit our recursion limit. And our will say, I detected infinite recursion. And so because every time we call this function, a new function call is created, and the internal variables of this function need to be stored somewhere in memory. So when we do this 500k limit, right, for the sum up function, then this uses around 320 MBs of memory. So if you set it to like a million, then it will use like twice, so 640 megabytes of memory. So depending on how much random access memory your computer has, you have a certain fixed recursion limit that you can use, right? So the recursion invariant, like I say, in this case is x, right? So if we have no check for correct input, and then of course, when we use x, when we do the sum of x minus one, then for seven, this will work. But for minus seven, this won't work. And the stack are numbers that collect before the base case is hit. Yes, yes, so those are those are stack variables. So the variables, they are pushed on the stack. So they are not allocated somewhere. They are just because the function calls itself, they are kind of remembered within each of the function calls. Yeah, yeah. So these are, this is just pushing numbers on the stack. And every time you push a number on the stack, so x is pushed up on the stack. And then you call sum up again with x minus one, which means that in the next round, x minus one is pushed up on the stack until you reach x equals one. But sum up of seven will finish, sum up of minus seven won't finish when you don't include the error case, right? So always make sure that no matter what the input is, you always end up at the base case, right? And of course, we can we can redefine our sum up function in a way so that we can also support negative numbers. But then of course, we have to increase the number every time. And so and of course, when you do sum up of seven, then adds everything up. And then in the end, it will say error, evaluation is nested too deeply, infinite recursion. And that is correct. In this case, you will get the same error when you hit your 500,000 recursion limit. So that's where stack overflow cuts, got its name from. Yeah, yeah, yeah, stack overflow is this is a stack overflow, because you push numbers on the stack. And at a certain point, your stack is so big that it kind of explodes the memory. So yeah, that's what stack overflow means, or means that's where they get the name from. All right, so then there's more advanced functions, things like indirect recursion. So indirect recursion is recursion using two or more functions. So imagine that we have a function called meltup, and we have a function called sum up, right? So sum up is kind of similar to what we had before. But now, sum up will call meltup, and meltup will call sum up. And again, the same thing happens, we need to take care of our recursion invariant that it decreases every time. So but what will happen now? So when we call sum up of seven, what will happen? It is will, it will do seven plus meltup of six. So then we have seven plus meltup of six, so that's seven plus six times sum up of minus one, which is five times. So it will, the one function calls the other function. And this allows us to write really, really elegant algorithms, right? So the sum up of seven is 157. And the meltup of seven is 797. And you can, you can just write it down on paper. So recursion can, it doesn't have to be direct. It doesn't have to be a function which calls itself. It can also be a function which calls another function which then calls the original function again. And this is called indirect recursion. So why do we use recursion? Well, in many programming languages, recursions is just much more efficient. Since we don't have to allocate memory, we don't have to do like a malloc or a colloc, or, hey, we don't have to use some kind of memory management system. But then like you said, since we push the numbers onto the stack, we just have like a, we don't spend time allocating a list or extending the list. It often provides much cleaner code, because there is less management of state. And in a way, recursion is often how people think about problems, right? If I have to add up the numbers one to 1000, and then of course, I can use a for loop for one and so and then I take a variable called result. But the thing is, when we do it with the sum up function, we don't have to define of when we do it with the sum of function, we don't have to define a function called or a variable called result, which will hold the intermediate results. The intermediate results are automatically remembered for us under on the stack. So there's less management of state, we don't have to have an additional variable which keeps track of the current summed up numbers up until so far, right? But if we want to do this with a for loop, then of course, in the for loop, we would have to have a variable called result or something like that. So and there's less management of state, and there's often more in line with how people think about problems. And so you can look up, for example, the greatest common denominator algorithm and the greatest common denominator algorithm is really, really good. When you program it using recursion, when you try to do it using iteration, it is a horrible algorithm. So the algorithm is just much more beautiful when you do it using recursion. All right, so a little bit more about functions dot dot dot. So the dot dot dot allows you to pass arbitrary function parameters. So it provides a loose couple coupling between two functions. So dot dot dot are these variadic arguments. So if I don't know how many parameters the user will give, like in a function, like in the sum function in R, where you can give like a single number, but you can also give five numbers, but you can also give like a hundred numbers. Of course, using dot dot dot is the only way to make a function which accepts a variable number of arguments. And so, but it also, you can use it also to provide kind of a loose coupling between two functions. Imagine that there is a function a, which has 10 input parameters, and then we have a function b, which calls function a and passes these parameters, right? So if function a adds another parameter, then function b does not have to know about it when b uses the dot dot dot parameter, right? b can just pass like three parameters and the other parameters to b can be passed as dot dot dots. And if a has the names of its input parameters changed, also function b does not need to know about it. Do I have a function about that? Yes. So imagine that I have a complex function, right? So a complex function here has a variable x, which is obligatory and we have variables called a, b, c, which have default parameters, right? So now when I make a new function called using dots, this is a function of x because x is the only parameter which is required to pass to this complex function, right? So now my function is very basic. It just says x is x plus six. So that's one of the computations that we do. And then we call the complex functions, oh, then we call the complex function. But now we say x equals x, right? Because we have to provide x not providing x as an error. But we don't have to provide a, b, or c because they have defaults. But by using dot dot dot, we can now call the function using dots using x comma a equals five. So we can still call the complex function. But my function using dots doesn't have to know what parameters complex function has besides the ones which are more or less required, right? And if complex function now adds a new parameter saying comma D equals 15, then my function will still work because I can you say using dots x equals five comma D equals 20. And it will automatically see oh, complex function now has a parameter called D. So it will just pass them in, right? And this is only this only works because a, b, and c have defaults, right? So the function using dots is now allowed to have four parameters x, a, b, and c, and a, b, and c have default values. So the function signatures just much cleaner. And plus if, if the author of this complex function decides to add more parameters, if these parameters have default values, then my function will still work. And I don't have to change anything to the function because the user can still provide parameter D, E, and F, or whatever parameter complex function decides to add. All right, so then we also have more advanced functions, which are called higher order functions, and higher order functions are functions which allows to be passed to functions, right? So we've already seen this in the apply function, right? So a simple example called due to. So this is a function due to which does something to a certain thing, right? So we have due to it is a function which takes x as a parameter. And it takes a function itself. So this is a higher order function because it takes a function as an input. What does it do? Well, it does nothing more than just return the function called on x, right? So it takes the function that the user specifies, and then applies this function to x. So you can, for example, use this to say due to give it a list of numbers comma mean, and then it will calculate the mean, you can also say due to give the numbers median, then it will calculate the median, you can say due to list of numbers comma SD, right? So and of course, in this case, you could have directly used mean on the on the thing. But sometimes you want to do this. And sometimes you need to or sometimes you want to write a function, which takes another function as an input. And this is called a higher order function. So have for example, we can also check the type and decide to use it. So I always told you that if you look at the class of an object, then the class is logical, numeric character, and these kinds of things, but the class of something of a variable can also be function, right? So here again, higher order, which is a function of x, which takes a function, which is a function which has two input parameters x and fun. And if the class of fun is not equal to function, then of course, we want to generate a stop error. And for example, if x is smaller than 10, then what do we do? We apply the function to x and store it in x and then return x, right? So the class of a function or the class of a variable can be a function as well. So you can the class of something is not just logical numeric and these kinds of things, but we can also have a class which is function. And we have already seen a lot of higher order functions. We have seen, for example, the apply function, or the list apply function. And these are all functions which take a function as one of the input parameters. So one of the input variables. Alright, and this is what I wanted to talk to you guys about. I will directly upload the assignments. So in the assignments, we are you will write a higher order function, which is similar to the apply function in R. So just to show you guys how easy it is using the dot dot dot parameters plus the higher order functions to write an apply function. And I know this is a little bit of a far reach, right? It's something that probably in the next five to 10 years, you're not going to use directly, but it is something that is that sometimes comes up sometimes recursion comes up because there is a very beautiful and elegant solution to your problem. Using recursion while using for loops or using while loops, code would just explode and look really, really ugly because you have to manage so much state. But you can use recursion to kind of get rid of this kind of state and call yourself and you can use higher order functions to implement functions which are general and which which allow you to do more things and use things like optimization. Alright, so that's it for today. I will stop the recording for the people on moodle or YouTube.