 So let's do some math. Who's ready to do some math? OK, so my name is Zheng Yang. I'm a software engineer at Facebook, Singapore. Today's topic is KL theory in 20 minutes. No math can be scary for a lot of people, but trust me, today's math is very fundamental. You can understand it. So first question we ask, why KL theory? Because it's very important. It can be found in many, this kind of behavior can be found in many natural systems. So stuff like butterfly effects, three-body problems, I believe you have heard of it. Anyone have heard of butterfly effects? Quite a few people, yeah. For those who don't know about it, it's basically a metaphorical example where some butterfly wings shaking can cause a tornado somewhere else. So a single small change in the initial of the system can have very different outcomes, like drastic changes later on. Besides that, there's also a very interesting topic in its own right. It's basically investigating some system that is deterministic, but at the same time unpredictable. Well, if you think about it, determinism implies certainty, and certainty implies predictability. Why is there some system that is deterministic but not predictable? So I'll start with some basic concepts, namely functions and iterative functions. And then we'll look at two interesting problems, our interesting systems, logistic map, and the convex game of life. So functions. Functions, you can view it as a machine that transforms one number to another in mathematical context. People would say it's an expression or it's an equation that takes in a number and generates another number. So for example, two and a function called square, the output will be four. So if I put three inside, I'll get nine out. So pretty simple. And as developers and engineers, we write functions every day, input is the parameters, and output is the return value. It's right. Iterative functions. So we know about functions. It's just mapping between input and output. Iterative functions just repeatedly apply the same function again and again. So taking example, initial value of x. Taking the function x square, initial value of two. And in the first iteration, we're plotting the initial value of two. And we get four out. And then in the second iteration, we get four and we're plotting four. And then we get 16 and so on and so forth. So every iteration, you take the output of the previous iteration as the input. And then you get another output, which will in turn be the input of the next iteration. Simple, right? So we're just looking into the same thing. Here's some notation. So the four here, the superscript is not the exponentiation. It's just the fourth iteration of that function. And similarly, we do this kind of stuff every day. We write loops every day. And we write recursion hopefully every day. So it can be done in the computer programming context. Yeah, now we are equipped with functions and iterative functions to very simple concepts. Now we look at one of the interesting systems called logistic map. Logistic map is a family of functions taking this form. OK, kill. I want to touch it again. I wish there's no microphone here. Oh, no. There is a little bit of setup. OK, so logistic mapping is a function that takes this form r times x times 1 minus x. I think it's pretty easy to understand. Where r is a particular number. You can just plug in any values to replace the r. And you get a bunch of functions. They are called logistic maps. Let's do some iterations on logistic map, let's iterate logistic map. On the right-hand side, you have table where on the left column, we have this iteration numbers. We start with 0.1. And the first example, we take r into 2.5. And then you're plugging 0.1 into the x. And then you've got 2.6 times 0.1 times 1 minus 0.1.1. And you get 0, 2, 3, 4. And then again, you take input 0, 2, 3, 4. And you get another number. So left-hand side is the iteration. And the right-hand side is the value during that iteration, also the iteration. So we plug in this table with horizontal axis being the number of iterations and the vertical axis being the value. It's pretty much the easy to understand it's converging into a single value. So it settles on single value, means it's predictable. If you ask me, OK, what is the value of 1,000 iteration? Obviously you'll say it's just this value because it never changes after that. The reason being, if you plug in that number into this function, you get the same number out. So just start there. OK, not too interesting. Let's increase r to 3.4. And now you have a slightly more interesting pattern. It's oscillating between two values. It doesn't set on single value, but it's oscillating between two values. If you ask me whether this is predictable, I would say yes, it's predictable. Because if you ask me 1,000 iteration, I'll probably say one of those two values. So let's increase further to 3.5. And now the pattern gets more interesting. Now it's oscillating among four different values. It settles on an m shape. So it just repeats like sine curve, it just repeats itself. Still, I would say predictable. OK, anyone can see a pattern here. Now we increase the r to 3.8. Anyone can see a pattern here? It turns out there's no pattern. So it appears to be, if you run this for longer iteration, number of iterations, you'll never discover a pattern. So it appears to be random. But if you think it carefully, is it really random? Is it really random? No, it's not. It's not, right? Because you have the initial value of 0.1, that's not random. And you have function 3.8 times x times 1 minus x. That's also deterministic. And every step, you can just calculate it. There's nothing random, there's no random factor inside. But you ask me whether it's predictable. Do you think it's predictable? We did that right. Pardon? We did that right. It's not predictable, actually. Why? Let's first look at the first reason. When we see the table here, what are we talking about? The table is generated by like 20 iterations, like a loop or a follow, and generate those numbers. But if you see, the numbers cannot be represented precisely, a lot of times, oftentimes, in the computer. We use 32-bit, we use 64-bit to represent the floating point. But oftentimes, it's just approximation. Think about 2 over 3. Precisely, it's an infinite number of 6 digits. But usually, we're just fine with running it to the 7 as the last digit, right? So here, I would say it's not the real value, real output of that function. It's approximation. So all this value you see is approximation. But it should just be fine, right? Because approximations usually is fine. We can work with approximation. It gave us a good way of understanding it. But it's not the case this way. This is what we call the behavior of chaotic systems. It is the sensitivity to the initial conditions. We take two very close values, like 0.1 and 0.1001. Very tiny difference. And we run the same experiment. We use the same function, by the way. We run multiple information. And what do you observe? Initially, it's not just our intuition. We can just use a similar number. And the approximation should just work. So they overlap with each other. And after maybe 15, they start to slightly disagree with each other after 15th information. And after 20th information, you see they go completely different ways. So everything stays the same. The only difference is that 0.0001, tiny difference. So if you run this experiment with 32-bit folding points, you get 0.579. And if you repeat the same experiment with 64-bit, you get 0.74. So this time, that tiny running error that you think is fine, but it is actually not fine. So it gives you different values. Which one do you think is correct? None of them is correct. Well, you may say, let's just get the process number, process value. It turns out, if you compute those school way, you just use what we do on a pen of paper. And you just multiply digits by digits. It actually grows exponentially. The number of digits grows exponentially with the number of iterations. So it's computational prohibitive to do that. So by the time you compute 1,000 iteration, you have more than number of atoms in the universe to represent that number. We know that it's between 0 and 1, but we don't know what the value is. So this is a typical example of chaotic system where everything is deterministic. You know the initial condition. You know the rules to derive from one step to the next. But I would say it's unpredictable because we don't know the value. It's just hated. We don't know the value. In the long term, it's opaque. We don't know the future. Oh. OK. So if you are curious about what other R values, remember that R value, it can actually take all the R values at the hard-on-access axis. And the number the pattern settles on in the vertical axis. The first example, R is 2.6. It settles on a single value. So here, it settles on. So 3.4, remember the oscillation between two values? That is why, because you have two values to pattern. And then remember the n shape is 3.5. That's four values. And 3.8 last example, the chaotic one, you don't see a pattern actually. It's seemingly randomly jumping in that range. Jumping in that range. So this is called bifurcation diagram. OK, let's look at another example. Let's play some game. This is called Game of Life. Another example. So this whole common-based game of life is played on a 2D grid, where each cell can be either dead or alive. Here, blue color means alive, and white color means dead. It's a zero-player game. Zero-player means you just set up some initial conditions. You just bring some cells alive. And then the game will play itself by some rules, according to some rules. So the rules are like this. If a living cell, if a blue cell, has two or three living neighbors, a blue cell has two or three living or blue cells neighbors. So by the way, the neighbor is surrounding each cell. It will stay alive during the first diagram. That cell at the center will stay alive. Otherwise, it will die. The second one that will die because it will have one neighbor. If a dead cell, if an empty cell, a white cell, has three living, exactly three living neighbors, it will be alive. It will become alive. And otherwise, it will remain dead. So in the last example, because it only has one neighbor, it will remain dead. So pretty simple, right? That's the latest game. Initially, it was set more like this, like here, square. And then, according to the rules, it will become like this. Again, again, again. So if you look at this, this is very similar to the previous example, right? So what is function in this case? Function is just rules of game. And instead of using a number to represent a state, this is the board. It's done tire-to-degree. So you just iterate on the same rules. And then you play it again, again, again. Now something interesting happens. You see that goes back to the previous state. So if you take that bar, you take that grid, and apply the rules, it goes back to the previous state. And it's a cycle stuck there. So nothing interesting afterwards. It's just oxidizing between the two states. All right. There are some interesting equilibrium we're set of some. Some still lives. So it just remained there. Lives have stayed there forever. Death cells stayed there forever. Remained death forever. And oscillators, similar to the one we just saw. So it's just alternating between two patterns. And space shapes, like moving across the board. Coming back to the chaotic behavior of this system. If I just bring one cell down, so I just kill one cell. So it's similar to the previous one. I just bring one down. It actually gives you a very different future. So just by removing one cell, you have a completely different path. This is not. It's very different from what we saw just now. So this is a butterfly effect. Well, if you think about why we can predict weather, it's because we have some sensors around the globe. We know that the variation of temperature of each point, we know the variation pressure of the atmosphere. We know the heating effects of the sun. And we think with the laws of physics, we should be able to predict the next day. But the thing is, reality is we can't measure every single point in the 3D space. It's not possible. And with a single mis-measurement, or you didn't measure this part, and that could be this cell. And the weather you predict could be entirely different. That makes long-range prediction of weather very difficult. Remember that 0.01 example. We first a few days, it's fine. But if you project much longer, then the actually very different, looks very different. So the butterfly effect was coined by this scientist called Edwin Lawrence. Lawrence was a pioneer in this field. So he once said, when the present determines the future, the approximate present does not approximately determine the future. With that end, my talk. Thank you.