 Can the guys at the back hear me? OK? OK. So my name is Hybin. And I would like to give you an introduction to functional programming in Python. Can the guys at the back see the screen? Maybe a little bit small if you want. You can just come to the front. So just a quick remark that the techniques that I'll be showing in this talk comes from a different language called F-Sharp. F-Sharp is a functional first programming language developed by Microsoft. So the language design of this F-Sharp comes from a lot of different languages, like Haskell and also from OCaml. It also gains a lot of influence from Python as well. So Python has a lot of good language design that influence F-Sharp. And I want to bring it back from F-Sharp back to Python and show it to you guys. So my name is Hybin. And I'm currently a data engineer. I have a master's degree in mathematics. I used to be a financial engineer and a business analyst somewhere else. So today, I would like to illustrate three concepts in functional programming. And I want to show you the style of writing code in a functional way, in an F-Sharp way. The three concepts are keep, change, and then. Keep, change, and then. If you use a more functional language, you may see jargon like filter, map, and pipe forward. I keep it simple. Keep, change, and then. There's also another keyword called reduce. I'll briefly mention about it later. So let's start off with the keep function. Let's say that you have a list of 1 to 10. You want to keep the even numbers. And you want to remove odd numbers. What will be your final result? Your result will be 2, 4, 6, 8, 10. So far, so good, right? Nothing fancy here. Still easy. If let's say you have 1 to 10, you want to keep the prime numbers. What will be your final result? Your result will be 2, 3, 5, 7. So given a list and a condition or criteria, you can create a shorter list such that you keep those that are true and you remove those that are false. Granted, in Python, you can already do this in pandas. You can already do this using list comprehension. So you can already do it in Python. So I just want to highlight to you the existence of this thing. And I'll combine it with the other concepts in order to solve problems later. You can also define the remove function that does the exact opposite of the keep function. So far, so good. Now, the change function. From 1 to 10, you want to change x to x times x, change x to its square. What will be the result? The result will be 1, 4, 9, 16, 25, et cetera, et cetera. Straight forward, right? If let's say you have 1 to 10 and you want to change it to x to 1 divided by x, what would be your result? Change x to 1 divided by x. You'll get 1 divided by 1, 1 divided by 2, 1 divided by 3. You get all these fractions. You can also express it in decimals if you want to. So again, given a list and a formula of how you change each individual element, we are able to create a new list that depends on the original list and the formula that you want to transform each element to. Again, you can already do this in pandas. You can already do this using list comprehension in Python. So again, I'm just highlighting these concepts to you. These concepts should be easy. But I want to highlight the next concept later, but let me do a summary. Why do we want to define these functions, the keep function and the change function? It's because it allows you to avoid using for loop, or at least you can use for loop implicitly in your code. You just tell the computer, what do you want to do with the list? Do you want to change it? Do you want to keep some element or remove some element? Usually when you have a list, you either change individual element or you keep or remove some elements. You are able to work at a much higher level, and you let the computer handle the details at the background. Just tell the computer what to do, how to do it, but you don't need to implement the for loop yourself. So these are all good. You can already do this with existing Python code. But I want to highlight this one. This is the key concept for today, which is the then. Let me do an explanation. Can you guys see the picture over here? The picture over here? I start off with an element x on the left-hand side, and I want to pass through three different machines, three different functions, F, G, and H, in order to produce the final result. I have an input x. I want to go through three machines, three functions, to produce the final result. How would you write this in mathematics or in Python? You will write something like this, HGF of x. Is this OK? Yeah. So this is all good for programmers and mathematics students, but it's not so natural to, let's say, a typical English speaker. Like this notation looks a little bit scary for non-programmers or non-mathematics students. It would be great if we can write it in such a form. You start off with x, first you do F, then you do G, then you do H. This looks a little bit easier, right? It takes up more lines, sure, but this is a much easier formula for, let's say, a non-programmer, let's say, a typical English speaker. It's like following, let's say, a cooking recipe, like step one, do this, step two, do this, step three, do this. And by hacking around with some of their Python syntax, I'm able to achieve this. It will look something like this. Something like this, this would be a valid Python code. First step one, you do F, step two, you do G, step three, you do H. The actual definition of the then function looks a little bit complicated. You don't need to remember it, but the quick key concept is this. I can express my calculation using a step by step, like step one, step two, step three, like this. Is everything okay? Now with these three concepts, keep change, and then I'll start attacking some problems from Project Euler. Have you guys heard of this website called Project Euler? Maybe some of you have. Have you guys heard of this website called Hacker Rank, or let's say some of those challenge programming websites, where you can try out some programming challenge problems? So this is Project Euler. It's one website with math and programming challenge problems. And I'll attack problems from this website using the techniques that I mentioned just now. For example, question one from that website. From 1 to 999, find the sum of all numbers that are either multiples of three or multiples of five. If let's say you are faced with this problem, how would you write a code to solve this problem? You can certainly use for loop. You can certainly use list comprehension. If you want to be fancy, you can use pandas. But let me show you my way of writing it. I try to import the techniques from the language F sharp. My code would look something like this. This is my code. I'll leave it from here for a few seconds. The key benefit of this code over here is that you can see a line by line translation with the English language. So on top is the actual Python code. Below is the English translation. I start off with this list. I keep the numbers that I want. And then I sum it up. And then I print out the results. Let me do a step-by-step illustration. This is the first line. I have the range of numbers. And then I keep the numbers that I want, which is divisible by 3 or divisible by 5. And then I sum them up. And then I add one more instruction, which is print out the result to the console, which is the Python print. Notice that I break up this question into a step-by-step instruction, like a cooking recipe. Like step one, do this. Using the result from step one, do the second step. Using the result from the second step, do the third step. Any questions? What's the speed of these methods compared to the Python? OK, I'll explain about this later. There are some pros and cons about functional programming. I'll explain with another example later that functional programming, there are some bad parts in the sense that it may not be the most efficient method, if, let's say, compared to, let's say, using index or like going through it, like for i in this range, you do something, something. I'll show with another example later. Sometime it's not as efficient as the best method, but what you get in return is extremely clear clarity. You are able to see the step-by-step process of what's going on. I'll show you with a not so efficient example to see the bad parts about functional programming later or rather like criticism. So for example, question number two, in this list of Fibonacci numbers, in this list of numbers, what is the sum of the even numbers less than 4 million in this list of numbers? You have a list of numbers. What is the sum of the even numbers less than 4 million? My solution, again, you can do it in pandas. You can do it in list comprehension. My solution would look like this. You need to do some construction of the original list first. And then afterwards, you can just do this step-by-step process. I'll do a step-by-step demonstration. You start off with the original list. This is the original list. You keep some of those that you want. Then maybe I'm not satisfied. Maybe I want to keep even more stuff. I want to remove even more elements away and keep those that I want. I keep those that are less than 4 million. I keep those that are divisible by 2. I sum them up, and then I print. Notice that it's step 1, step 2, step 3, step 4. Any questions? Yes, I can actually run that. Let me run that after a few more examples. So for example, question number 4. A palindromic number is a number that is the same if you read it from left to right and right to left. A palindromic number is, you can sort of think of it as a mirror image of itself. You read from left to right, right to left is the same. The original question asks, which three-digit number, A and B, which three-digit numbers, A and B, will give you the largest product C, and C is a palindrome itself. A and B do not need to be palindrome, but C needs to be a palindrome. So again, the question asks, what is the two numbers A and B that produce the largest C such that C itself is a palindrome? If let's say you are faced with this problem, how would you solve it? How I would solve it would look something like this. I start off with each pair of numbers, and then I calculate the product for each pair of numbers. And then I keep those that are palindrome. And then I find the maximum. And then I print it out. There might be some slight modification you need to make in order to make this function work for tuples. But roughly speaking, this would be how it would look like. There are some modifications needed to make it work for Python syntax. Is it OK? So for example, question number six, they ask you to calculate this, like 1 plus or this sum squared minus that calculation. So one benefit of this notation is the following. So you have 1 to 101. You have 1 to 101. I add them up. And maybe I'm not done with my calculation. I want to take this sum and want to square this whole expression. In let's say list comprehension or let's say in pandas, once you get away from a list or once you get away from a pandas, you might not be able to use pandas, those apply or filter to continue your calculation. Over here, I have a list of numbers. After I add them up, I immediately take that sum and immediately square it. So my calculation do not have to stop. I just keep on adding one instruction, adding one instruction. And similarly, I can calculate whatever's on the right-hand side using the same notation. And then I take left-hand side minus right-hand side. That's my final result. Let me stop for a moment. And let me show you the actual code, OK? So apart from some helper functions, notice I tried to solve the first 10 problem from Project Oiler from this website using only the then notation. You can notice that over in my code I use then, then, then. Let me scroll down, question 1, question 2. Question 2, like you see then, then, then. Question 3, then, then, then. Question 4, then, then, then. Let me run this code. And it will produce all the results for question 1, question 2, question 3, question 4. Which means that whatever my code here, it compiles. It actually calculates the right thing. Question 10 is going to be a little bit tougher because this method may not be the most efficient method. But at least it's still doable if I force myself to use the then notation. I do sacrifice performance quite a bit in some certain situation. But what I do get back is extreme clarity of what I'm doing with my code. So there are some pros and cons. If you are doing high performance, like computing or maybe low latency stuff, this may not be the right approach. But if you are doing something very low latency, most likely you'll end up in C++ anyway. But if, let's say, you want to do general programming and you want your code to look really nice and really understandable, this could be a good choice. And another benefit is the following. So take a look at this code. Take a look at this code. The difference between these four statements is that I add an additional line after it. So x1, x2, x3, x4, they differ only by one line. And notice that all of them compile and all of them do print out something. So what this means in the development process is that I don't need to set up a really big for loop, a really big pandas process in order to see the result. I mean, pandas actually does something similar, which is the step-by-step process. So you don't need to set up a really huge for loop. You can just add one more instruction and immediately see the result. It will compile. It may not be the final answer that you want, but you can see the result first and then continue with one more instruction, add one more instruction. So at any time of the process, you can stop and immediately see what's your result. And if you are not satisfied, add one more instruction. That also helps in the development process. Question number eight, which is an example that I want to get to, which is the criticism. So you have a really long string of digits. You have a really long string of digits. And you take four digits at a time, four digits at a time, four digits at a time. You calculate the product. The four largest digits that produce the largest product is this 9, 9, 8, 9. So in this long list of the string, this long string, the four neighboring digits that produce the largest product is 9, 9, 8, 9. In the original question, they give you a 1,000 digit number. They ask you to calculate which 13 adjacent digits gives you the largest product. In the original question, you have a 1,000 digit number. And you calculate which 13 digits gives you the largest product. So you can try to approach this problem using a more traditional like for loop or pandas or maybe list comprehension. My solution would look like this. This would be my solution to this problem. Let me do a step-by-step analysis of this problem. I have a long string of numbers and I take out each individual character and I convert it to an integer. This is step one. And step two, I break it up into windows of 13. So over here, I have a list of 13 numbers. I have a list of 13 numbers. I have a list of 13 numbers. So I have a list of lists. I break it up into windows of 13 and each window, I calculate the product. And after calculating the product, I find the maximum and I print it out. This would be my solution. There is one criticism to this problem is the following. At this step, we have 1,000 digits. This step, we have 1,000 digits. The next step, I break it up into windows of 13. What this effectively create is approximately 1,000 such windows. Each window has 30 numbers. So effectively, I take a 1,000 digit problem and I convert it into a 13,000 digit problem. Is it the most efficient? Most likely not. Because I take 1,000 digit and I create approximately 13,000 digits before I do any other computation. If you are resource crunch, if you don't have enough resource, then this may not be the right approach. But if you have the resource, this gives you the benefit of extremely clear code. You are able to see each step of the process. What am I trying to produce before I hit the final result? Is this okay? So there's more math examples, find A, B, C that satisfy these conditions like satisfy A square plus B square equal to C square. So my code would look like this. Each pair of A, B, I calculate the value of C and then I keep those that satisfy those conditions that I want before I print out the final result. Some modifications are needed in order to make it work for tuples, but approximately it would look like this. So it's all fun and games like playing with mathematics examples. Let me try out like a non-math example. So take a look at this code. It looks like I try to squeeze a lot on this page. This is like some sample code, like non-math example. Let me try to go through this code step by step. I start off with a SQL query over here. I start off with a SQL query over here. And I format this YY MMDD over here. I format this YY MMDD with today's date. So this is what I have at the bottom. I format the date into that SQL query. And then I run this SQL query to a database to produce some results. Then I get a list of results. Maybe I can do some data cleaning. Maybe I change the country so that it's in capital case instead of lower case. I change, do some data cleaning. And then I do some data filtering. I want to keep the sales that are in Singapore. I do some data filtering. I keep the stuff that I want. And then maybe I sum up my sales. Then I get a number. Maybe I'm not satisfied. Maybe I have a number. I want to go one more step. I want to calculate what's the 20% of my commission. So with a number, I immediately multiply it by 0.2. So this would be a valid calculation using the den notation. Everything okay? Maybe let's try another non-mathematics example. So over here, in step one over here, I have a local folder in my computer. I have a local folder. I list out where the content of that folder. Maybe I do some data cleaning. There are some corrupted file in that folder that I do not want. So I remove them. Maybe they are corrupted files. I remove them. And for all the remaining good files, I count how many lines are there in those files and I add them up. I sum. This would be a valid code using the den notation as well. So I don't have a database. So I'm just like creating some dummy data that immediately runs. So just to show you that the first example, the first non-math example, it actually works. So it doesn't give me a compilation error. It gives me a result, which means that it works. And also similarly, let's hope it doesn't crash. Yep, it works, yay. So I go through my list of the content of my folder and then I do some counting, do some data filtering. This also works using the den notation. So that's roughly it. Let me do a little bit of extra about functional programming. There's also the optional topic about reduce. Take a look at this code over here. You start with zero, from x from one to five, you add to the starting result and you print out your result. From zero, you add one, two, three, four, five and then you print. You get the answer of 15. If you change your starting value at the start, which is, I change it from zero to 1000, I start off with 1000, then I add one, two, three, four, five. My final result will change. That's easy, right? If I change my range of value instead of one to five, I change it to one to 15, then my final sum will also increase as well. If I change my formula, instead of adding numbers, I multiply the numbers, I will also get a different result. So with a starting value and a list of elements and how the formula that you want to update your starting value, based on these three conditions, you can calculate the final value after you accumulate truth your whole list. So this thing that we usually call reduce or fold, this thing we call reduce when combined with the keep function, combined with the change function and combined with the then, we will be able to allow you to tackle even more problems. And the final finale is the infix operator. I want to highlight that this very simple math expression, three plus four, what does this mean? Three plus four, what I want to focus your attention on the plus symbol. The plus symbol connects the left-hand side and connects the right-hand side. It connects the number on the left and number on the right and it combines a final result. What the symbol does, it takes the number at the left and number at the right and it combines a final result. If I change it to multiply, three times four is 12. The multiply symbol also takes the left number and takes the right number and combines and gives you a final result. The key concept here is the then notation takes the left expression and the right expression and gives you a final result. The then notation takes the left expression and the right expression and gives you a final result, which if you write it in multi-lines, if you write it in multi-lines, it takes the top expression and the bottom expression and gives you a final result. That's the glue that glue everything together. And so just a summary, we have gone through key function, change function. You can already do it in Python using this comprehension pandas, but what's important here is the then notation that really glues everything together and allows you to write really clean code. May not be the most efficient, but certainly very clean and understandable. And with that, that's the end of my talk. Yes? Do you think that the way you present it here, Yes? Why do you define this for a need to be taught? This course, the answer beginning before you start to optimize the code, you need to understand how you write your code. So functional thing, do you think it should be a foundation and then optimization first or different ideas? It really depends on the, it really depends on the feel, I guess. If let's say you want to do extreme optimization, let's say high frequency trading, or let's say something really fast or low latency, this is not the right approach because you'll take up a lot of overhead and also you may not be doing the most efficient optimized method. But if you are working on a much more higher level, or if let's say you want to write code that your colleague can understand, your coworker can understand, for more practical reasons, this might be a slightly better way because there are other aspects of functional programming like mutability and also other stuff that are also important to help you write clean and understandable code. If the computer is able to understand what I've written just now, you don't need to go to a much lower level which is let's say a for loop unless absolutely necessary because if you are using a for loop, we need to go through the index, you might potentially get let's say a list overflow, let's say a list has 100 elements and you access the 200 element from that list and then you get like some error. Using for loops and other things could potentially cause to error which like my style of writing does mitigate some of those risks as well. The example you pointed out about the file directory, say for instance you have some background and you want to remove it. It's illustrated by you just now. Yes. So I quite disagree with this thing. Yes. The point is, it seems to me this is totally case insensitive. So I don't know what are the underlying code because literals can be changed to hexadecimal or decimal or whatever that is which represent that and in such a case, it's actually case insensitive especially in the unique system which you can use question mark as a file name which is unseen in Microsoft stuff. Yes. So that eliminates a lot of capabilities and this won't work anyway. For that, because I'm only doing some like basic examples, if you move, there are certainly some higher level functional programming concepts especially if you are using Haskell where they enforce purity a lot. What purity means is that, for example, let's say you want to talk to a web server or let's say you want to access a database or let's say you want to read a file code, you want to read a file from your computer. All of those process could potentially fail. You get something from the website, you get something from a database, you read a text file from your computer, all of those will fail. At this current level of functional programming, only change, keep, and then you are, it's somewhat limited to attack those higher level problem which is handling potential failures which some other functional languages like Haskell handles it really well. But yeah, I understand your consideration. More questions? This is a bit related. This is a fascinating way of writing functional programming but it goes really clean. I'm curious about error handling in general. You're doing this in Python. What has been your experience? Is it putting a try accept amount if the accept is going to be hideous? Effectively, you can certainly create like let's say a data type. Let me write it here. Let X equal to one, two, three, true. So let's say X is a variable that contains two content, a number and a true false value. So if let's say the second value is true, I can safely act on whatever's on the left-hand side. And if it is a false, then I have no guarantee what's on the left-hand side. Maybe it's a decimal, maybe it's a string, maybe it's null. So like such a simple data type which is a tuple, you store whatever information you want on the left and you store the true false value on the right-hand side and that will give you an additional guarantee. Not 100% in Python, but the guarantee is much stronger in other functional languages like F-sharp and Haskell. Okay, sorry, I messed it up a little bit different. So if you've got a list of these tuples and you have to first index one of them in a lemma function and it's not a list, that's just going to throw an index error value error. You hash that and deal with it, right? But the code looks fantastic, I guess. I would say that like it's, you know, so suddenly like because Python is dynamically typed, so there's only limited capabilities of, you know, what we can guarantee to it. So if you are using like let's say F-sharp or Haskell, they also have static checking which checks, you know, whether something is, you know, exists or not, then it will help you detect whether something is not. Maybe if you want, we can talk about it after the talk. Okay, one more question. Thank you for sharing. I think it's really interesting and I've done some programs by myself. Yes. The code becomes so long. So anyways, my question is, how do you measure the complexity of a functional programming code? Measure the complexity of functional programming code. I mean, just to do like an illustration, so okay. So let's say like an example like this, I started with a list and then I do some keeping or removing. I remove the stuff that I do not want and I keep the stuff that I want. Then I do another step. I keep the stuff that I want. The problem here is that every single step, I have no guarantee if anything will be removed or anything will be keep. The worst case scenario is that every single step, I keep everything. I cannot remove everything. So I started with a list. I keep some stuff and then I keep some stuff. There are no guarantees of removing stuff. The worst case scenario is at each individual step, we are recreating the same list again. So when you write code like this, you have to keep this at the back of your head, I guess. If let's say you are doing calculation on let's say a list of billion stuff, then it might not be the best way. But generally speaking, if let's say your list is less than one million or maybe 10 million, this should be okay. Less than one million, then this method usually is okay. Of course, it depends on your data type, I guess. If let's say you have a million data and each individual item in the list is like one megabytes or something, then yeah, it could go wrong. Okay, I think those were some pretty tough questions. Thanks for your talk. Really amazing stuff. It looks amazing. We still have a hour or so, I think there might be one of the slides on the pizza left. You can grab some and watch.