 All right, okay, that's a very good question, right? So we start with actually a question from home, which is in the lecture, Jan said that in a very high dimensional space, there are less local minimum or less likely to have local minimum. Can you elaborate the intuition behind this, right? That's them where we start in today. Last time I show you, okay, I will eventually answer this question within today's session, but I need to start with a detour because otherwise not everyone is able to follow since I haven't yet explained exactly what I showed you last time. Actually, we finished class last time. Okay, hold on, many questions. Let me finish the first one. We finished class last time by, I showed you an animation which I didn't explain, so there is no, how do I say, requirement for you to understand what it was, but I ask a question at the end, no? There were five arrows. Five arrows in a 2D space. Does anyone have any guess about what those five arrows in a two-dimensional space were? You can type on the chat. I can see whether there is any clue. Basis vector of what? So where can you find those five vectors in this possibly neural network that I've been used, right? We figure this in a moment, right? Okay, so PCA, okay, actually, yeah, you're getting kind of close. So I got to do a PCA, like automatically, I just asked the network to reduce the dimensionality using a linear transformation. So what are these linear transformations, right? So the only thing, there are two things you asked yesterday, Jan talked about, right? And so let me start by sharing the weight matrix of the output, yeah, the weights of the output matrix, that is actually a solution of the question. You won, I'm Bravo, okay, I give an adjective. Yeah, that's correct. So the answer of my question was those were the weights of the last matrix, right? But again, you were not supposed to know the answer, I was just teasing you. So let's start by sharing my screen, share screen, and I will share this screen and possibly you can see something, yeah. Okay, so this was one slide from Jan, right? I don't like his slides, but okay, we are using this anyway. So this is one of the two slides that are the most important thing. So Jan covers everything horizontally. I just pick one thing and I go down vertically. So the old thing it says here is that neural nets are a bunch of items which are this first item over here which is called linear block. And then there is another item over here which is this non-linear block, right? So in what I call them, so usually, so this is supposedly I think X is a scalar, so W0 is also a scalar. Usually we end up using a vector in the input and therefore W0 is actually a capital W, a matrix, right? And so whenever you have a W matrix multiplied by X is gonna be a linear block and then you have this S1. And then S1 goes through this H which is I guess this non-linear, right? So he calls linear non-linear since I am weird, I call them differently. And so first of all, what is a linear transformation, right? I don't want the technical definition, I want the intuition, right? So let's say we are in a 2D space, so X has two dimensions. And then I apply a matrix which is a two by two matrix. So if I apply a two by two matrix to a vector of two dimensions, what kind of operations do you perform to that point? Right, so a 2D vector can be a vector like a narrow in a 2D space that is pinpointed in the origin or you can also consider a point in this 2D space, right? So this can be considered a vector, a point in a 2D space is gonna be two components represent this point or this point or this point or the other one, right? So if I apply depends on the matrix, that's a correct answer. So what are possible outcomes? Let's say you try different matrices, what do you expect these points? So you have a cloud of points, let's say you have a house, right? You have one, two, three, you have like a square with a triangle on top. If I apply a matrix, you can get a shear for sure. So you get like a stretch in different directions. You have a rotation for sure. So you can have a rotation around which axis, like around where the duck? I think it's the name. I don't know if I pronounced correctly. Rotation, how? If this is the origin and this is my square with the triangle on top, where is this stuff rotating? Yeah, but with respect to what? Like it's rotating where? So this is the origin and this is my item here in the first quadrant. So if I do a rotation, is this one rotating here or the thing is gonna be rotating with respect to the origin, right? So if it's here, it's gonna go over there, right? So we have rotation, we have shear, we have more stuff. What do we have? Let's say the matrix is diagonal. You have scaling, yeah? One more. Actually, yeah, one more. How about the determinant is negative? What happens? Reflection, yeah. So if the determinant is negative, you get the reflection. And then since we actually tune in a transformation, we use a fine transformations. So what does a fine transformation have in addition with respect to this linear transformation into this space? So one more guy here, shifting, right? Okay, so with all these five things, that's what usually I think about when I apply a matrix to this point. Why am I talking about point? Because an image or a piece of audio or anything can be just considered a very, very, very long vector with many items. And this vector is just a point in this huge, high-dimensional space. And so let's say I have many images of a cat, many images of a dog, right? I put them all in big vectors, then I plot them. So we had images of cats, cats, cats. Then I had this image of dogs. Where should I place it? Here, here, or further away? One, two, three. One, two, three. Tell me. So this was one, this is two, this is three. Yeah, I know. I mean, just Jeffrey, just guess, it's okay. All right, so unfortunately everything is gonna be just stuck here. Everything is all together mashed in one specific place. Because what matters eventually is that the statistics of this type of data, it's very similar. And therefore the location in this huge dimensional space will eventually have everything clashed in the basically same space. So that's why it was very hard to make like say classification because everything is just crammed in a very little space. And so what you want to do usually is you take this one, you want to zoom in, but the origin is over here. So how can you zoom this first? If you want to zoom this location, and this is the origin, someone finish the sentence, you have to shift to the origin, right? So how can I shift to the origin? What operation did I do? What is, where is this location? What is this location, right? So I have many points. How can I compute the, I had to compute the mean, right? So I can figure out where the, what's called the center of mass is of these things and I subtract and then you get zero mean. So everything stays in the origin. So everything is here. So I can now zoom, right? And so actually at the beginning to make things easy. So we said everything was crammed in a very tiny little space. So how does the network know how close things are? Oh, let's say I'm using different data. This data comes from a different camera. It comes from a different acquisition system. And so points are still very far from the origin but maybe more scattered, less scattered. You don't exactly know what is the size of this cloud of points, right? So the second point would be such that the network doesn't, shouldn't have to care about the size of this cloud. It's gonna be normalization, which means speaking like speaking in a scale. What kind of scale? Divide by the standard deviation, yes. So the standard deviation tells you how it is the spread of this point. You divide by the spread of the, like you divide by this kind of spread such that things are within this kind of bubble thingy. Sweet. Now we had this bubble, which is now centering the origin and has kind of radius of, it's not radius of one because that's the standard deviation. So radius is actually, okay, keep that in mind, Robert, please. I asked you in a second. You answer this question for me in a minute. So I get this cloud here, right? All right, cool. And then all the things I do is gonna be apply a linear transformation, which we said are doing scaling, rotation, reflection, and shearing. And then we also have translation since we apply the affine transformation, okay? Okay, okay. And so usually since those are too many things to say every time, five things, I just pick rotation, okay? Whenever I apply a linear transformation, every time I apply a matrix, I just from now on until the end of class, I will talk about rotation, which is mostly what happens in high dimensional space. So whenever you apply a matrix to a vector in blah space, you basically rotate stuff, right? All the other things are less dominant, especially because also you can extract like the scaling factor with a scalar value. And then if it's negative, you can also extract that. And so you can split the effects of the linear transformation into its own components. And usually the matrix, what it does, I like to think about rotating. Maybe it doesn't do just that, but that's my view. Okay, so there is this rotation, I call it. And then there is the other thing. So I rotate or I then will be happening many other things like zooming and things. And then I put inside this non-linear function, which I call squashing, right? And so usually at the end of my course, people, they only think, well, they keep repeating to me, you know? Because I repeat this every time. What a neural network does are basically two things. First one is rotating data and the second one is squashing. Rotating, squashing, rotating and squashing, right? So that's my perspective view of how this stuff works. All right, enough talking about me and like me, my ideas and I show you a few of these things, right? And then we put together these two things and see how they play now. But now I'm gonna be showing you these two parts, the linear part, the non-linear. So the rotation and the squashing separately, right? Oh, and someone is vacuuming. It's so nice, that's the teaching from home, you know? Okay, squashing equal area zero, right? No, no, squashing, sorry, no. Squashing for me means changing things in a non-linear way, okay? So if you have like a spread of point, maybe I just curve the, I apply a non-linear transformation. So squashing in my jargon means apply non-linear transformation. I mean, some kind of weird pushing, twisting, twisting. Yes, yes, squashing, twisting, yeah, twisting. That's the better word. But I've been using squashing, so yeah, twisting would be better. Can you elaborate rotation and squashing a little bit? If I say rotation, you just have to think a fine transformation. If I say squashing, you have to think non-linear function. That's it. And the other two words are what I think when I see a matrix or when I see a non-linear function, okay? That's the only point I'm making here. And so this slide, which is a bit more formal, which is say linear blocks, non-linear blocks, in my jargon would be rotating and squashing. Okay, so, yep, the next thing is gonna be showing you individual parts, right? And so I'm gonna be doing some live, trying not to screw up. And can you hear noise in the background? Because I, oh, is it okay? You're, okay, I'm good. Maybe Zoom is killing the ground. Okay, all right. I'm pretending I'm not hearing anything here. All right, so I am on the terminal, right? So you should be able to use the terminal somehow, right? So I'm gonna be going on, should we follow along? Not quite. So I would say you would just follow what I say and I do in class during these sessions. Then you should spend like, usually two hours trying to stuff yourself if you're a beginner. So if you watch first, then you can think and then you start trying, right? If you try it while I show you may, you are gonna lose some of the things I say, right? So it's better that you try to take notes now and then you try by your own with your own time. You know, people that have different speeds. Like I'm very slow usually. I'm not a computer scientist. I'm a technical engineer. So beg my pardon, but I'm begging pardon, right? But other like a student of mine is like so fast. I have no idea how it does. One of the TAs I'm talking about. Okay, so I'm going first to, I see should we run, oh no, I'm running on my laptop. This is my Mac from four years ago, five years ago. So I'm going into some folders, notebooks, video lessons. Maybe I got the right one, let's see. So we have to do a condom, activate. Then it's gonna be PDL. PDL stays for PyTorch Deep Learning, okay? If you have installed the environment for instructions on the GitHub page, you should be able to type that command. Then I do, okay, I just type this one, which means alias jn, which means Jupyter Notebook. I don't want to type Jupyter Notebook. So I just type jn. These things pop up. There are errors because I've been using other things that break things, but never mind, it works. All right, so you can't see anything because it's blue on black. I should be fixing that, but at least we are not projecting on the screen, so you should be able to see. And I'm actually, okay, this is a edited version of what is on the GitHub page. It has better colors here. I had to update that one. So what am I showing here? I'm showing you a few things. I'll be slowing down for the things that actually matter. So here I import Torch. Torch is the PyTorch library. It's called Torch because it used to be called Torch, but it was in Lua and now it's in Python. So it's called PyTorch, but still the library is called Torch. The dark theme, I made it. You can find it on my GitHub. Then I import this nn. Nn is the sub-library from Torch that can allow you to use this kind of parametrized models. Then I import some things from my library I made and then I import plotting stuff. I import NumPy. This is to set up inline printing device. So what is this device? So device, if this machine has a GPU, which is like an accelerating unit, then use that, right? Otherwise, just use the CPU. My machine is a Mac from four years ago. Oh, actually, this machine has a GPU, but I didn't install the driver, so let's... No, I didn't install from source, so it's not gonna be running. Let it be. So here I create a cloud of points. In Python, you can put an underscore. Put an underscore. Whenever you write long numbers, put underscores to separate the thousands, you know, millions, billions, or like giga, terra, whatever. Otherwise, you had to count the zeros. Okay, so here I'm defining some stuff. You don't care. Okay, I'm just plotting this thing, right? Okay, cool. So maybe this is to zoom in and you can see stuff. Maybe, can I hide the bar? Let me go full screen. There we go, okay, now it's better. Okay, so what is this cloud? Can, can, okay. What is this cloud of points? So I sample from a Gaussian. I'm asking you here. What is the standard deviation of this Gaussian? Actually, okay, first question. How, what is the radius of these cloud of points? Can someone tell me? You have missing information here. You have partial information, I understand. So don't get annoyed if you can't answer. I'm just teasing you, okay? Three, that's correct. Why is it three? What is three? Can you explain how you got three? No, I asked the radius, right? So three is the answer for the radius. How did you get the number three? Because the basis vector, yes, fantastic. So what are the, why question for anyone there? Why is this guy red and this one green? You know? Yeah, you know, why is X and why is Y? Why is the X vacuuming? Okay, I'm like, sorry. This is irritating. Why is the X red and why is the green Y? Am I asking the color? Why did I choose these colors? There is a specific reason. Does anyone know what red and green means? RGB, yeah. And so X, Y, and Z, right? So if you do computer graphics, you're gonna always have X, red, Y, green, and blue is gonna be the Z axis, okay? It's called Z, nose, Z. Can, just kidding. All right, just kidding, just kidding. All right, so you can tell that there are three of these X axis here, right? And you can tell that this cloud has a radius of three. Cool. So a cloud of point sample from a Gaussian distribution with a radius around three has a standard deviation of one. That's correct, yes. Usually people don't get this answer, but okay, because maybe you can just check the code, right? I don't know, maybe it's cheated. Maybe not, maybe you knew. But anyway, it should be, where did I sample this stuff? Random here, right? So in this thing here, I have capital X, it's gonna be the collection. It has end points, so a thousand rows and it has two columns, right? So I have all the samples as horizontal vectors in this big matrix, and this matrix has two columns, and then a thousand rows, okay? And so here you just, I display all of them. So the one on the GitHub doesn't have all these nice colors. I had to fix this one. I know there is an issue open on GitHub since two years ago. I just, you know, force me, please. So the colors here are showing you the different quadrant. Quadrant number one is blue. Quadrant number two is green. Quadrant number three is yellow. Quadrant number four, it's red, okay? One question. Three standard deviations account for 99% of the data, but the remaining 1% of the data wouldn't occur. Well, 1% of data wouldn't that occur at a point further from the origin? Meaning, yeah, so if you want to be technical, maybe these are not the best classes. Like in my labs, I'm trying to give you like the physicist interpretation, okay? Think about I'm a physicist. So I can be also engineer, but in this case, in this class is I'm a physicist and I give you like the intuition, right? So if you see a cloud of point radius of three, this means in a 2D space, right? Not in other spaces. In 2D space, that means it's gonna be so in practical settings, we can expect the radius, okay, this is just for in the 2D case, yes. Again, I'm just giving you the, how do you say, curiosity or like, if you notice a cloud of point in 2D and you can tell that if the radius is roughly three, then it means it has a standard deviation of roughly one, okay? This is just visual inspection of your data. It's kind of important skill to get a sense of what you're dealing with by sight, okay? So, you know, sight is your primary visual, the primary sense you have, right? So always look at your data, always check the plots and curves and values and things such that you can have like your own, sort of like feeling of what you're dealing with, okay? Could you explain how you got the standard deviation was one? Because I plot it, right? So I created here, this is standard deviation, oh, how do I know this one? I can just do shift tab and then you get, oh, this one, right? And then it says somewhere, this is sample from, I press plus here, you can see variance of one, right? So I plotted this one and then I saw that by using a variance of one, standard deviation of one, then I got this plot, which is roughly three, okay? Again, roughly speaking. Okay, moving forward, otherwise we don't even reach the end. We have a cloud of points, blah, and oh, we have the colors, right? Blue, green, yellow, and red, such that you know which quadrant this thing belongs, right? And then guess what? We're gonna apply a linear transformation, yay. Okay, in this case, I used this special matrix, right? I have got this w, which is generic matrix, I split it in three matrices. A u, a rotation u, a diagonal matrix, and another rotation matrix. So what is this stuff? Have you heard about SVD? Yeah, if you haven't heard about SVD, I had to move this bar away one second. View, show, all the way show toolbar. Okay, so this one is something you want to check out. We go again on Twitter, because that's where I put all the links for every year. If you go, if I move my face away from the search bar would be nice, okay. I can't move my face. Okay, so if I, again, you go here, you go parentheses from me, and you go SVD, you find this post from Gilbert again. Yes, I really like his things. The first one talks about diagonalization, and then there is singular, okay. First one talks about diagonal matrices, and then the second one talks about SVD, okay. Just check this out. If you're not super familiar with these things, even if you're familiar, just check these out are very convenient to refresh whatever those mean, right. Again, this is not math class. I'm just pointing you that math is necessary. Someone say necessary evil, I don't know. For me, it's necessary, I was saying pretty thing, but anyway, just go over that stuff to be sure you actually understand what we are talking about. Anyway, I just use these metrics that is, you know, I'm gonna just use this central part, okay. I'm just running some things here. So I have that, I split this one, and then I think I just use the S if I'm not mistaken. Yeah. Oh no, hold on, I use everything here. Okay, I use everything, and I use this singular value, the S, to show you what's happening below, right. So this is my original cloud of point, right. The one with the radius three, and then blue, green, yellow, red, right. And so I apply this one, which has these singular values, and you can see that everything got squashed. Also in this case, we said it is blue, green, yellow, red, and here you have blue, red, yellow, green. So what happened here? This stuff got rotated, it got squashed, so not squashed, it got compressed, right. Squashing is the other one. And then I also got flipped, can you tell? It's got, yeah, it got reflected. So the determinant in this case is negative, right. And then you can also tell that it was radius three, now the radius is two in this direction, and the other direction is like zero, two, zero, one, zero, two, right. So, you know, how are these connected? You had to, if you don't know, check the thing out, okay. More examples, right. So this one didn't move much, blue, green, yellow, red, similar. So it got just slanted and squashed, but it resized on one axis only. Okay, okay, okay. So how do you do that? So first you squashed, you, sorry, you resized in one direction, then you can rotate, right. And then I can add more things. Why did you apply this transformation though? To show you what this thing here, to show you what this thing does, okay. So what does a linear block boring things is the answer. I cannot go back. Oh my God, where is it? Okay, there we go. So this is boring, I think, maybe, right? I mean, uh-oh, here. Okay, I click on the wrong place. So all of these are basically converting my circular cloud of points into potatoes, slanted potatoes, okay. So I see, so a fine transformations are, okay, actually I don't have the linear part here, right? So this is a linear transformations are the linear blocks and the squashing is the non-linear. Yeah, that's good point. It's the one point of today class. And so here basically nothing happened, okay, almost. Oh no, what happened here? Who can tell me what happened here? So it's almost the same, right? Maybe a little bit shorter, but what happened as well? W, here is almost diagonal, yeah. And correct, there was a reflection, yeah. So there is a, I don't see rotation in this case. I see only reflection in some little, there is no transposition, there is reflection. And there is a little bit of resizing in one direction, right, it's no longer a circle, it's a potato, like a vertical potato. Again, let's use squashing for the other one, right? I'm confusing myself already. Okay, I scrolling down more examples. Oh, what's happened here? Oh, this one really got, you know, pressed down on one axis, axis, man. Okay, more examples, again, boring, right? I mean, you saw a few, you understand how this stuff works, right? Okay, cool. You can do the same now with PyTorch, okay? With PyTorch, how do I perform a linear transformation, right, not a fine linear? So in this case, I create a sequential here in this case. A sequential is gonna be a container, which is not necessary, but it's just convenient, okay? So you don't have to use this, but it just makes your coding experience easier, okay? More easier. So in this case, this sequential has this linear module inside, okay? And this has a two by two. Means I go inside here, I press Shift tab, and I can see, oh, goal from two dimensions, which is my input data, the dimension of my input stuff to the output. Okay, so it's a two by two matrix. Bias usually is true, but I say no for us. So, you know, the output is gonna be simply this X, A transpose, because again, we have the rows, like the vectors are put in rows in the big matrix, right? You remember, the big X was 1000 rows with the two columns, right? So all of them are horizontally put. Thus, we don't increase and decrease the dimensions in the linear transformation. Does this mean we would change dimensions? No, in this case, I just plot in 2D because I have a 2D screen. If you have a 5D screen, you're welcome to try to plot in 5D. Sarcasm, okay? I'm sarcastic. I'm not changing the size here because I don't know how to plot more than 2D. I can try to plot a projection, but then I start changing things. But that's why I use PCA somehow in the other visualization. And that's a lot to unpack there. Does this also mean we are trying to transform the feature? No, right now, I'm just rotating stuff, right? There is no transform. I'm just rotating and moving things around. I just play this one. I just press Shift Enter and I show you, oh, what happened here? What happened here? How is this matrix? Someone answers. Hello? Yeah, basically singular, right? Okay, so if you are familiar with this, perfect, if you're not fine as well, just check the video out, play with this notebook, try out things, change values, get yourself, get your hands dirty. You had to build an intuition. And you can only build intuition if you mess around with these things. Ta-da, non-linear transforms, blah, like blah, blah, blah, squashing, okay? In this case, I just go with, in this case, I go with the, no, it can be diagonal. One of the two terms can be zero, right? In this case, I'm gonna be using a hyperbolic tangent here and for the linear part, I'm just using a diagonal matrix which has a S inside. So why is, why there is an S inside my diagonal matrix? Why is it called S, the value? Both of them are S, right? Okay, that's correct. It's a scaling factor. So how can I write this matrix as well? S, I, okay, and what, okay, another word with S, very similar to scale. Why is a scalar called scalar? Do we know this? You know, we have scalars and vectors and, okay, singular values, that was also a good guess. Why is a scalar called a scalar? Because it's a multiplier, right? It changes just the size. So scalars and vectors, scalars are made for scaling, changing size, right? If it's negative, then it also flips things. But anyhow, so this is S for singular value or a scaling factor. Okay, so here I just, what do I do? I create some points and then, oh, okay, here I just show you what is the hyperbolic tangent, right? So this is my hyperbolic tangent. And of course, I have to unzoom because you can't see things. So hyperbolic tangent, if you didn't know, basically it has two kinks. One happens at minus 2.5. The other one happens at 2.5, okay? It, before minus 2.5, basically it stays at minus one. After 2.5, it stays at plus five. If you have the sigmoid, logistic sigmoid function, instead it's the same function, but this bottom left part, it goes to zero. And since you made half as short, the kinks are gonna be half as large. And so the first kink is gonna happen at minus five over here. And the other kink is gonna happen at plus five, okay? So I don't know, there is like, not because. I mean, the other one, the sigmoid has two kinks, minus five, plus five. This one here instead has the two kinks at minus 2.5 plus 2.5 and goes from minus one to plus one. Again, why do I have this? Why do I point out these numbers? Because whenever you have weird values in your training and whatever, you always should be, you know, telling, oh, what happened there? I know what happened, okay? Things are more negative than minus 2.5 or things are larger than 2.5 for this guy. Or for the logistic sigmoid, things are larger than five. Things are lower than minus five. Or for the real or rectifying linear unit, things are positive, right? So all these numbers are not important per se. Like you can have any number, but the fact that you can pinpoint that, oh, that value, you know, that's very convenient for debugging stuff, right? All right, back to the thing. Let me zoom back. And so what I'm doing here is like, I applied these two by two diagonal metrics to this data. And I changed the scaling factor from one to six, okay? So I keep expanding that cloud of points. And then I applied this sigmoid, right? So what does the sigmoid do? The knob sigmoid, hyperbolic tangent, my bad. Who can summarize in one word or in a few words what does this hyperbolic tangent do? Okay, I talked too much, right? Squashing, that's correct, okay? Squash every, fantastic. Squash the input between minus one and plus one. You need to say something else, right? I just say so many words. You have to be a little bit more specific, right? So the output will be minus one to plus one. But where will this happen, right? Like where is the turning point, right? From minus 2.5 to 2.5, okay? That's the point. So in input, there's gonna be minus 2.5 to plus 2.5, right? What is the cloud size? Three, okay? So minus 2.5, 2.5 where it happens. Just, no, no, it's no five. The cloud has a radius of three, right? And so applying this hyperbolic tangent is gonna be a little bit inside that three, right? This is three and this is 2.5, right? And so it does something. Then what do I do here, right? So every time you read code, you have to execute the code in your mind first. Because this is so important. We are not programmers. I mean, some of you are. I'm not a programmer, definitely. And these are not programs. This is mathematics, down with computers. Which is kind of dangerous. Why is it dangerous? Because you'll have no idea if things are correct or wrong, right? And maybe in computer science it's fine, I don't know. I'm an engineer. If I get something off, this did happen in the past, the building will fall down. Oh, I missed a comma, sorry, a period, right? I got an extra zero. Oh, I didn't notice that the matrix was bad condition. And then when I inverted the matrix, I got a wrong result. It's like, oh, the thing just came down, right? You can't make these kind of mistakes, right? Every time you invert matrices, check the conditioning, right? You can't, again, engineers can't fuck up, right? You're gonna be killing people. Okay, so here we have to understand the code first before running it, okay? So we have this scaling factor going from one to six. So I'm getting my cloud. And then what do I do? I scale it, right? So I have S multiplied. So the W is gonna be just a scaling factor. I multiply this data by this here, right? The model is this sequential and then I send inside this data, which is multiplied by this matrix weight, which has this scaling factor and inside this I, right? Okay, let me turn off the notifications because of course, I don't know how to turn off the notifications, nevermind. Oh, here, right? Not disturbed, there you go, okay. So I'm scaling things, right? From one to six. So my cloud of points, which was originally size of three, right? I scale it to six, nine, black, 12. I can't count in English. But the hyperbolic tangent is gonna still be there, right? At minus 2.5 plus 2.5. So first of all, the first one is just getting the edges, right? You have the ball and then I apply this function, which is gonna get just a little bit. Then I expand this one to be six and now you're gonna be chopping like inside the meat before you were just chopping the edges of the potato. Now you're just chopping basically halfway through, right? Kind of. So points are pushed to the border. Maybe you just ran the notebook already. So I'm running this one and that's what you get. So this is our cloud of points. It's some unzooming a little so you can see better. There we go. So this was our three things and then I apply my hyperbolic tangent, okay? Let me unzoom a little bit. Okay, there we go. So what happened here, right? You get a square and points are kind of uniformly distributed, it looks like. This square has radius of one, meaning that the side is two, right? And things are all nicely packed around, right? And this was my squashing, okay? So why is called squashing? Because I squashed just the edges. So what I think, what we actually have done here more or less is gonna get, you get the circle and then you bam, bam, bam, bam, right? You just squashed the edges. It didn't change much in the inside. Why did I say that? Because the inside part here is basically linear, right? So nothing happened in the center part. All these things over here get squashed, right? And so this cloud that was kind of more rare, like more spread around here, it actually got compressed and then you get this kind of almost uniformly looking thing. Second part, I zoom in, I kept zooming, right? I kept iterating through the range and that's meaning that I am getting all everything to be zoom in the center, right? If I want to zoom some other location I just get the bias term, right? And then everything starts going into the corners. Oh, this is also a common thing. And then, oh, more spread things, right? And then keep going to six. And here you have five because it's Python. Okay, I can't count. And here basically you have very rare points but I told you before that things were very close together. Maybe you can bring them down and then you stretch so you can actually tell them apart, okay? And so this is the non-linear part. So we are almost done, right? As in, we had the linear, non-linear. Linear, non-linear. I just show you the linear. I just show you the non-linear. I show you the same, right? They are just two operations. What does a neural network do? Two things, right? Tell me. I'm reading the chat. Yeah, yes, sure. Okay, rotate and squash in, thank you. Can you explain the intuition of why the minus 2.5 matter? Now, minus 2.5 and plus 2.5 don't matter. The point is that you have to understand what's going on here, okay? So those numbers are important to tell why this cloud becomes this thing, okay? So the numbers don't matter per se. The thing that matters is your critical thinking to, okay, maybe I didn't finish my sentence before, to debug math. How do you debug a program? You just press execute and then if the compiler complains, they're like, oh, someone, I am dumb, right? I didn't manage to follow the syntax. And that's fine. You can catch those things. In math, you don't know if you got things wrong. Like you can make mistakes like someone told me yesterday as like, oh, I forgot to convert something, okay? I forgot to divide by the, I forgot to unnormalize things, right? How do you realize you forgot normalizing? You don't know, nothing works. So you just have to try. Well, you don't have to try. You have to be a maniac, okay? You have to be very, very, you, everything you do has to be like, check a hundred times in your brain, right? And so again, the same thing I'm telling you here, those numbers I'm showing, the graph I'm showing, everything I do, I plot. Why do I plot such that I can see things with my eye and then I can catch all these things, right? If this square didn't look like a square, but it looked like a triangle, it looked something else, oh, something fishy happened, okay? So I always run my code in my brain before pressing enter. And if my expectation doesn't agree with what the computer shows me, I trust my intuition first and not my computer. If you're lazy, I'm not saying you are, but if I would be, if I am lazy, there you go. If I'm lazy and I just blindly check my computer, then you eventually don't end up publishing anything because the computers are stupid machines, right? The fact that we are using a computer is just to get the correct results. It to do a bunch, a lot of operations in a very systematically way, but you are the one in charge, the one that makes sure the things are actually doing what they should be doing, okay? And so, yeah, you always have answering, Robert, you have to run this program in your mind first, then you execute, then you compare the mismatch between the expectation and the actual results, okay? And this is how you debug math, right? I don't know other ways of debugging math on a computer. This is why, how are you debug math? Just plotting everything. Okay, so we saw rotation, we saw squashing. How does rotation plus squashing work? Actually, boring, right? Let's do more fancy things. Let's do rotation, squashing, rotation. This is a deep neural network. You have two linear layers, okay? First one goes from two to hidden. Hidden is like five neurons, okay? So the neurons are basically the size of these vectors. The neurons are the intermediate representations, right? So you have input neuron. You have the hidden neuron because they're in between the network here, and then you're gonna have the output neurons. So there are three layers for me, okay? People will call this whatever layer neural network. This is for me, again, your advice. This is not two layer neural net because there are two layers. There are three layers. There is an input layer, this is my X. Then there is the output of this nonlinear transformation, which is the hidden layer. And then there's the output of the last item here, which I don't call it layer, which is going to be the output layer, right? So this is a three layer neural net for me. Rotation, squashing, and rotation. Yeah, rotation, squashing, rotation. I execute this guy. And this is my cloud points, or I remove the arrows because it's annoying. And then, ha-ha, okay. How do I see you? You have reactions, right, on your Zoom. You can press, oh. I usually force students to say, oh, in class. Okay, but exactly, there you go. Just three people, play along with me, please. Okay, thank you. All right, oh, what happened here? Okay, this is weird, right? What's happening here? Okay, we come back to this in a second. Oh, okay, you're also typing on the chat. That's so sweet of you. Oh, that's ops. Okay, what's going on here? Oh my God. We haven't seen this stuff yet, right? Oh, this is so cool, right, I think. Okay, what's going on? What am I using? Why is it, oh, okay. Something, I didn't tell you yet. What is this function, do you know? This is called rectifying linear unit in deep learning. In mathematics, it's called positive part, okay? So when I write this one in math, and also my students, they had to do this, otherwise I'm gonna be biting them. So this is gonna be written like that in latex, okay? So this is just the positive part of something. No, no, don't write max. I mean, Daniel, don't write max sounds like the name of a person. Don't write max zero and x, that's computer scientist. Please don't, just do this, it's a function. It has a name, like positive part. Okay, yes, but that's how you implemented it in Ampy. Okay, so what happens here, right? Question to you, what happens, what does it do? You have my thing here, what does happen, what happens if I apply the reload to this cloud of points? Yeah, so it simply kills every things that happen in quadrant two, three and four, right? All these points will go where up to the x-axis, correct? All these points go where? On the left, the quadrant three, the origin, what happens to quadrant number two? Y-axis, correct. So actually, since I'm here and I still have a few minutes, let me do a shout out to my student, high schooler. So if you go here again, you just type high schooler. Oh, no, where is, where is he? Okay, high school, maybe one word, high schooler, no. High school, so he's awesome. Okay, oh, there you go. Okay, so Vivek, check out this video from Vivek, okay? Here's a YouTube channel, he's been playing with me and he's showing you exactly what I just show you right now with the real squash and things, okay? Shout out. He's like 16, I forgot, 15 is like half my age, I don't know. And now you know how old I am, maybe, maybe I cheated. Okay, so what happened here? I told you, all the things, you see fire needle line, right? So, should there is some rotation? That's okay, where does the rotation come from? Which one? There are two, there are two rotations in one squashing. You cannot tell me rotation because I understand how you rotate things. Which rotation? The second one, yes, that's correct. So the second rotation got me things here. Where's the origin? I can't see, the origin is where I put my mouse. What happened? Why, what happened to, like, are there biases? I forgot, yes, there is bias, right? So in this case, I didn't turn off the bias. So it's not just a rotation. It's actually full, a fine transformation. But again, I have these two fine transformations. And then I have these non-linearities, right? Which was doing the, so this one did something. This one got you, oh, five lines, right? How do you know it's five, right? So every axis, basically you see the five axis in this case, right? And then you got the last one putting things here. But why is this smooth? What happened here? Actually, you see one axis, right? One axis is this one, right? So why do I see only one axis here? You still go to five dimensions, right? And also got flipped, right? I see red points over blue points. Aha, check the video, right? I show you there, I show you in the video, in actually, he shows you in the video what's happening here. What happened here is that the network did this, it folded, right? You can see red on top of the blue. We are running out of time. Let me finish the notebook. I did cover almost everything I wanted to cover. And here I just have a very deep neural network which has one, two, three, four, five fine rotations, right? And then all the, okay, let me show you one more thing before. So here we can actually turn on the sigmoid instead. And what's gonna be the difference between the reload and the sigmoid? The things look much pretty, I think. I don't know, they are boring right now. Let me rerun. Usually, they are prettier. Oh, okay, someone tell me later what happened here. Okay, these are pretty annoying. Let's go with five layers. Sorry, one, two, three, four, five transformations, still five hidden values, right? And here you can see much interesting shapes. Oh, you're not doing oh anymore. Okay, what happened here? Okay, this is important. So where is it? Okay, this is one important result, right? You have to tell me what happened here. Keep in mind, right? This is a second question for you for next time. What happened here? You actually need to get it right. It's different from, it's different from this one. It's different from this stuff, okay? So although they look the same, they are different. So yeah, for next time, right, the answer. Oh, oh, okay, different, right? So look, so this one is also different from, somehow different from this one over here, right? Okay, very interesting things, right? And last one, then we are done because I'm taking you too much time. I'm gonna use the hyperbolic tangent. Oh, it's a parabola, interesting, right? So summary of today lesson, right? I just talked about one thing. This was the thing I talked about today. Neural networks are made of rotations and squashing. Rotation are just rotating and squashing are basically, you know, kind of moving. So like if you rotate a potato, you still have a potato. You may get like a longer potato or squashed like a more condensed potato. You're gonna reflect the potato. Still the potato, like the circles looks like a potato anyway. It can get down to a line if the thing is singular. If the matrix is zero, you're gonna get a point. It doesn't happen often. On the other side, we show you, I show you also the squashing part, which is this kind of getting things not to be behaving the same way in every direction. And we have these very interesting transformations. Last time, last lesson, I showed you that video. No, I'm gonna show it. No, you already saw the video. It's on YouTube as well, you can find out. So that video was showing you a, you know, no longer arbitrary transformation. This one I show you right now are arbitrary transformation, right? No one is training anything here. Here are just, you know, random networks. These are randomly initialized. If you want to know the initialization, check out the website. Anyway, I just apply a network to a point, a cloud of points in two dimensions. The network performs some sort of bending of this, you know, space, so whether you get these points and then it starts moving them in different locations. So you have a way of parameterizing transformations of the 2D space in this case, going through a higher dimensional space. I didn't answer the question that we started. I answered the question next lesson. Why are we going in a high dimensional space? I have a video for that showing too. So that's next time I show you, okay? Don't, don't, don't let me forget. And so we saw that a network performs this arbitrary transformation of the points in a 2D cloud. Next time we are going to be learning, well, you already learned in the slide, but okay, with the, how we are going to be enforcing a specific transformation to reach a specific objective, which is let's say, classifying points classifying images or doing regression, anything you want, right? So how do we enforce a specific transformation? You already kind of got it, right? So you define a loss, which is telling you how well you're doing a specific task. Namely the cross entropy loss, it's basically an approximation of the accuracy loss. And then you try to optimize for that value. You just try to minimize the misclassified items, right? To increase the accuracy. And that process, which is called again gradient descent, which is, you know, you minimize that error by using the gradient computed, the gradient of the, you know, basically the partial derivative of the loss with respect to all these parameters, you start changing the parameters such that that loss goes down. Doing that will allow you to end up, you know, after training with a set of parameters, which are performing a transformation that you want. And it is instrumental to actually perform the task that you were seeking, okay? So today we cover rotations and squashing. We figure that diagrams are very pretty. Go over the notebook, try, play, read the, watch the videos from SVD, the diagonalization of matrices, check the video out from YouTube. Next time we're gonna be seeing how I can enforce and how can I come up with parameters such that they actually do something, okay? Today was like more exploratory and giving you this kind of mathematical debugging intuitions. Next time we're gonna be seeing how to, you know, get the network to obey our will. All right, peace. That's it, finish. This notebook without the colors, it's on GitHub, yes. Yeah, I've been changed, so say bye. All right, feel free to ask questions on CampusWire, myself or the, I get colors. I have to push them, just complain. I will push the commit, okay? It's on my computer. It just had to commit and push on the thing. If you have any question, ask on the CampusWire, me, Jan, Vlad or Jachen, we are gonna be here for you. Anything, if you want to complain, complain, I like complaints, I try to improve every time. We are here for you. Have a nice day, stay warm. Bye.