 All right, so thank you for being here on time. We had just 15 minutes today, so we are going to be trying to squeeze as much as I can without running too much. So there's going to be a small agreement between me and you every time we start class. So the agreement is the following. I'm here for trying to communicate to you, right? So I'm here because I'd like to send a message across. Every time you don't have an idea what's going on, you think I'm confusing. You cannot understand my English. I have a very strong Italian accent. Mamma mia. No, OK. Yeah, I make jokes. So don't take every word I say as truthful if I'm talking about being aggressive or whatever. I'm joking, most likely. If I'm talking about deep networks and stuff, usually it's reliable. Yeah. OK, so the point is going to be like, every time you have no idea what's going on, just stop me. Call me, Alf. I have no whatever idea was going on here. Stop, repeat, explain to me again. I will explain to you as many times it's required. If you do not understand something, it's almost 99.9% my fault because I haven't taken an account to your background. And I'm skipping some steps perhaps because I'm not used to maybe this class or your background or whatsoever. So again, I'm here in order to educate, in order to communicate. And I can't do that if you don't help me, right? So if I ask a question, you understand what I'm saying, you should nod your head. Because if you're not reacting to my jokes or questions, then I have some issues understanding whether you are making any sense of what I'm saying. If you get distracted, I will try to, usually I throw chokes. We don't have chokes here, right? I think I can try with markers, but maybe not, right? Just for an undergraduate. You're grown up, so you should be able to keep yourself awake for the next 45 minutes more or less. Do not turn off because if you turn off, I can't come there and take you and shake you, right? Again, you're not undergraduates anymore. You're grown ups. Some of you are undergraduates, I guess. So OK, I figure out where you are. So I may come and shake you up. But for the others, you should be able to stay awake for the next 50 minutes. I know it's late night, so try not to eat before class so you don't have the after dinner thingy, whatever. So yeah, so don't take me too seriously. I'm Italian. Right. So inspirational, like why are we here to take a A? Maybe. So it's a competitive class, right? At the end, we're going to have a competition. Don't get too scared. We do not only consider how you're doing well in the final ranking. We also consider how well is your work, right? So sometimes, the things don't quite work, but you still apply yourself. So that's actually what matters. Nevertheless, if you actually win the challenge, you may have some gifts and presents and something from us. Yep, that should be it. Also, if you take a very nice grade, you may come and join the lab, perhaps, with Jan, and so on next semester. So just saying, you know, take a nice grade so you can serve later as well. OK, all right, kind of joking. So neural networks, right? Did we see last time yesterday what neural networks are? Kind of, no? So I think we mentioned a few applications. One of those were classifications. So let me just give you a small recap about what classification is. So what classification? Let's say I take a picture of this, of something, right? Let's say I take a picture. It's a one megapixel camera. So how many dimension will be my image? So if I take an image, I can consider my image, which is going to have RGB plane. And it's going to be having like 1,000 pixels in the vertical and 1,000 pixels horizontally. So you're going to have overall 1 million pixels, so one megapixel. You have RGB. So how many pixels, how many values will you have? 3 million, right? OK, cool. So you can think about this image, which can be thought as this kind of stack of three layers, as one point, living in this 3 million dimensional space. And this 3 million dimensional space is really, really, really, really large, right? You can move around and nothing happens, right? So let's say I take a picture of a dog. And I have here a picture of a dog. Let's say I take a picture of a cat. Where is it going to be a picture of a cat? Here, here, or here? Sorry. OK, hands up for who thinks is, who thinks, if this is my cat or dog, who thinks the cat, the other one, is here? OK, maybe you were actually paying attention yesterday. Who were thinking, who thinks the other point is going to be here? OK, who thinks the other point is going to be here? Fantastic, OK, so you're actually on the right thing. So everything is just in the same damn little spot in the space, OK? And everything that actually makes sense is here. Everything else is just trash. And so we have to take the space, go here, take this point, and move it here. How do you move points? Have you taken linear algebra? Yes? No? OK, about linear algebra. So as you may know, or if you don't now, you do. I do have Twitter. And I'm very annoying person on Twitter, because I talk a lot. So let's see Twitter. OK, open Twitter. All right, so here, I cannot zoom. So Twitter.com, right? So then you go, oh, there's no internet. See? Ha. Oh, let's connect, Wi-Fi. Yes, yes, I'm connecting. All right, almost there. OK, so you want to do a search, and you do linear algebra. Cannot see anything here? OK, damn. OK, let me zoom here. OK, so you go on top right, where is the search? You do linear algebra. You do parentheses from me, R, C, and Z, OK? So here's how you find out what are the questions for the midterm. Just check on Twitter. I usually leak stuff there. All right, so here you have this, actually, it's very nice, this book, Introduction to Linear Algebra by Gilbert Strang. So you may want to look up this stuff. But the other one is this one here. So you'd like to check the third tweet here, where I show you the essence of linear algebra by Grant. Maybe you know this guy, Grant Sanders. If you don't know, well, you should know now. Just check this link, right? And this link is going to show you how you move stuff around, OK? Anyhow, that was the first link for today. So how do you move things around in a space, for the one actually that know how linear algebra works? You just talk, you don't raise hands. OK, matrix multiplication, what does it do? Rotation, I can't hear. OK, first, matrix multiplication is what kind of operation? Linear transformation, OK? What are the linear transformation? One, rotation, two. You sure? Stretching, OK, fantastic. Then third, stretching, scaling. I guess, let's say scaling, stretching is number two. Then, if the determinant is negative, what do you have? What is it? Reflection, third one, then one more. Is it? OK, I guess we have three for the moment. There should be one more. Hold on. So scaling, right? So you have scaling this way. You have scaling that way. Then you have rotation. Then you have reflection. When you have reflection, no, no, no. Identity does nothing, right? So, OK, rotation means the matrix is orthonormal, right? Cool. Second one, you have scaling. When do you have scaling? OK. Well, OK, we're sharing, fantastic. So we have the four, right? So we have rotation. We have zooming this way. You have sharing, right? And then you have the reflection whenever you have the negative determinant. Cool. Still, I want you to move things around. Is motion translation, is it a linear operation? No, yes. You can answer. You don't have to raise your hand. Adding scalars. Is adding scalars a linear operation? Not sure. Because a zero, where is the zero mapped to, right? Only if the zero is mapped to zero, then you have a linear transformation. Anyhow, so you have a translation, which is the fourth one, the fifth operation, if you use a fine transformations. So all those nice things, you're going to be fine them there. So check those things are very useful. Anyhow, so finally, we figure out that if I have everything up here, we have cats, hippopotamus, dogs, and whatever. I want to take them down. I want to translate, right? Yes. Talk to me. No, you don't talk yet. OK. All right, so we want to use translation. And then you translate this stuff in the zero. So you have all the things here. Then how do you separate those things apart? So all of them are all together. How can I? We mentioned what are the four different transformations that a linear transformation does. Scaling, right? How do you scale things with a diagonal matrix? Fantastic, cool. All right, so fantastic. So how do we diagonalize matrices? Ha, that is the second question. You can figure out if you check Twitter. So we open a new page. We go Twitter again. We go on top right. Should take a note about how to do this stuff. You do SVD parentheses from R, CNZ. OK, cool. Now if you go here, it's like, oh, SVD. So this is from also Gilbert, right? So you also would like to check this one. If you click here, it's going to be more information about midterm, right? Last year, oh, OK, you know too much now. OK, anyhow, so you want to check this stuff out. Anyhow, we have figured out that by using matrices and scalars, you can move things around. And then you can zoom, right? And so again, I have things all collapsed here. I can't do anything, so I take it down with a translation, right? Yes, no, do like that with your head. No. You following there? Yes, no, OK. It's boring? No, OK. All right, so you take a reason, you put it down with translation, and then you zoom it, right, with a matrix, perhaps, the diagonal with some singular values, right? Cool, cool, cool. And then what? So I have this thing zoomed up. So classification, what is it? How do you perform classification now? You would like to assign labels, but how do you do that? So whatever you would like to do is going to be basically moving these points in different regions, and then you slice them, OK? And so this is going to be part of the next class. Right now I'm just showing you how a neural network does this. I hope you're going to be enjoying this video if I find my cursor. OK, so, all right, so we start from here. Full screen, perhaps. All right, so can you see anything? No, how do you turn off the lights? Can someone figure out how to... Oh, maybe here. Yes, no? Good night. No, hold on. All right, good night. So, at least this one, then I turn it back. So here we start with these five branches of a spiral. And you can see that each point is going to be basically represented as what? How can you represent each of those points without the color? OK, where do these points live? On a 2D plane, right? So this is a 2D plane, this screen here. So each of those points is represented as a tuple, right? That is representing the x and the y coordinate. Then the color there represents a third dimension where it's going to be basically representing the class to which each of those points belong. And here we have five different branches of a spiral. So this is the input to my network. It's going to be a bunch of points without colors. And then I ask my network to separate the points by color. OK, is it clear what is the task? I ask my network. You cannot raise your hands because I cannot see shit, right? So you have to shout back because I don't see you. You understand, right? What's going on here right now? So we start with this guy here, which is just five branches of a spiral. The network doesn't see the colors right now. And the network will try to separate the colors apart. So this is what my network does. It takes the space, the space fabric, and it performs like a stretching of the space fabric, right? How cool does that sound? And it should do like, oh. Oh, see? OK, we had to do multiple iterations, I think, here. This is your first class. It's OK to be shy. All right, so this is my network, which is basically stretching the space fabric in order to get all those points that are belonging to the same color, to the same class, to be in the same subspace of this final manifold. Such that once we reach convergence, so whenever we reach the end of this animation, you have all your points that belongs to different spirals linearly separable. And so now you can just use logistic regression, right? Or like one versus all, whatever regression is called. Yeah, I don't know. So in this case, the last matrix is represented by that five arrows there. So those five arrows represent a matrix, which is how many rows, how many columns. So what is my output dimension here of this network? What are we trying to infer in this case? The classes, how many classes do we have? Five. So the number of rows of this matrix will be five, right? Because the height of the matrix is the same dimension where you're shooting to. And the width of the matrix represents the space where you're shooting from, right? And so what's going to be my width of this matrix? Two, right? So this matrix is going to be a five times two. Because we are shooting towards five dimensions, the five colors over here. And then we have two columns, right? So it's going to be basically the two coordinates of the tip of the arrow. And those arrows are just centered in zero. We're going to talk more about that intersection, I guess, in later lessons, if I actually managed to make a video about that. But can you see that kind of cute intersection at the center? Have you ever seen that kind of intersection before? Have you ever taken a bath? I mean, yeah, bubbles, right? So whenever you have multiple bubbles touching together, it looks like similar to that, perhaps. Maybe not, but I don't know. It looks to me. All right, so that was the first network, OK? And this is how this network works. These networks basically take the space fabric and then apply some kind of transformation, which is still parametrized by matrices. So I have many, many matrices. And then I have no linearities. Why do I need many matrices? Or why do I need no linearities? Can I turn on the light, or you still like to watch? OK, yeah, otherwise, start sleeping. All right, so here is basically this guy. Here is going to be a network with two matrices. My first matrix maps my input, which is living in which space. Two dimensions to an intermediate layer of 100 dimensions. So my network architecture is the following. So I have my two neurons here. I map this one, two, one, two, three, so on here. I have 100. Then I have some non-linearity, which is going to be just the positive part. And then from this guy here, I do something cute or funny or weird. I don't know, it depends. So from here, I map down to OK, what should I do after? Let's say this is my network. What's going to be the last layer of this network? Five, right? But how many dimensions has this screen? So how can I plot there? I don't say PCA. So I do this one. Such that I can show you there the linear interpolation between this point here, which I call embedding in this case. But just the way I call this stuff right now. And my input. All right, that's pretty much it. So this is my neural network. It has one input layer, one hidden layer, one kind of embedding layer, which does nothing, and an output layer. So it has one hidden. And overall, I count this as three-layer neural network. Because it is one, two. And then this stuff is linear, right? So why do we need non-linearity? It's one layer in the network. And what can one layer? OK, it said without non-linearity, it would look like a single-layer neural network. And a single-layer neural network, what can it do? Scaling. Translation. Translation. Rotation. Reflection. And I guess shearing, right? OK, so guess what's going to be the next part of this class? Let's check how one linear network with one layer works, OK? Questions so far? Also there in the, yeah. For just displaying stuff on the screen. More questions. Again, don't raise hands. You can just talk to me. Can you hear me over there? Is everything OK? Awesome. All right, thank you for approval. So these two points here represent the coordinates of my input points, right? So this is going to be my x-coordinate, my y-coordinate of the input space. This one is going to be my x-coordinate and my y-coordinate of this embedding space, which is linearly separable. Given that I have obtained 100% accuracy with the training. We haven't talked about training yet. But does it make any sense? The answer what I give you? OK, yeah. So this is just a way to visualize five dimensions in two dimensions. Instead of doing a PCA, I just do a PCA. Hold on. He hasn't finished. Because I'd like to see what the network output looks like in two dimensions, OK? But will this not shrink the information that goes through the last layer? Why is that true or why is that false? You'll see that next time. It doesn't shrink because, again, those two, you can see multiply them together, right? It looks like one single matrix. And actually, this works better. More about this later on in the class. Actually, Jan is going to be touching about this. OK, we still have some time. All right, so how did I make these animations? First option, I'm a magician. Why are you laughing? It's actually true, but anyhow. I know how to use matplotlib. I've taken this class three times, and I never pass it. OK, all of these are free. All the three are true. I never taken the class, but OK. I mean, I've been on the other side. Yeah, so this is not magic, right? So this is just visualization using matplotlib, free open source code, and PyTorch, which is going to be our library, which are going to be using for writing these models. So it's very, very convenient. It was made by a student of mine back, I guess, in 2016. So be that cool. No, I'm just kidding. You can be even more cool. All right, more questions. No, OK, so now we actually can go to the more concrete part, right? So this was kind of abstract. I gave you some interesting, delicious, I guess, appetizer. Let's get on the main course, and let's see how you can actually get started, OK? So everything you're going to be seeing in my classes are in my GitHub repository, which you are going to be forced to start. No, I'm just kidding. But if you don't start it, I will take notice. All right. GitHub.com slash et-called. So et-called stays for something weird. So my Italian name is Alfredo. Fredo can mean cold, and al can mean et. So that et-called is going to be the kind of transliteration of my name. So et-called, you go there. Yeah, that's me. How cute. The first one with 1,300 stars, and I guess you have 200. So it should be 1,500 by tonight. It's going to be the class repository for the class, right? And so we're going to be, if you don't have, so you should have like a Unix, actually no, PyTorch works on Windows. I don't use Windows, so I have no idea. But if you have a Mac or a Linux machine, everything works just fine. If you have a Windows, it should work as well. I haven't tried. So how do you access this stuff? You go there. Here you're going to have the page website as soon as we actually make it. We are going to be actually looking at the repo, sorry, the notebook number two right now. If we have time left, we can also check a bit of a more basic introduction to the tensor tutorial. How many of you have no idea what NumPy or NumPy, or whatever you want to call it, is? Hands up. Don't be shy. So I changed my question now. Does everyone here in class know about NumPy and has it used before? Are you going to be shaking your hand? Just shake your head, right? So who has no whatsoever idea what NumPy is and how it works and never used Python before? It happened. Don't laugh. Because if you don't know about that, there is also the 00 class where you implement things from scratch. But I guess there's too much back. So I guess the number one, perhaps it's not too worthy going over. We see if we have some time left. It basically gives you some introduction about how to use basic PyTorch routines for creating tensors, which are basically multidimensional arrays in NumPy. And so it creates tensors, it multiplies tensors. It just initializes, creates random stuff. So it's not that different from NumPy. Again, we can check these out if we have some time afterwards. Otherwise, it's going to be left as an assignment to go over this. It should be a very easy, gentle introduction to how to get started with the API. Complains about this? No? Wow. Or you're shy or you're too nice. You're not laughing. All right, so I'm joking, but you're not getting the jokes. It's fine. One more advertisement is Taipora. This Taipora thing, I love it so much, is a markdown editor, where you can write down your notes in LaTeX and markdown. So you can write, for example, whatever soft arc max, which is something you're going to be learning about, which I call it this way. So you can use LaTeX in a markdown file. It looks very good. For example, here I annotate articles I read. So this is called Taipora, T-Y-P-O-R-A. That's just promotional message. I don't get paid, so it's free, too. All right, so how do we get started there? We open the terminal with Zoom, such that you can see something. So we can go, for example, inside, oh, this is going to be different. Let's say we go inside that repository. I go somewhere else, which is going to be my notebooks, and then my video lessons. So you're going to be basically running Conda, activate, deep learning mini course, and everything just works. If you have installed and you have followed the instructions in the aridmi. So right now, if you just type Ipython with Python 3.7.6, we'll open. You can do import torch, for example, just to see whether everything works. And then you can type, for example, torch.rand and 5. And this one is going to be typing creating a tensor with 5 number, which are randomly picked from a 0 to 1 uniform distribution. So this is just to check that everything just works fine. If this works, then we can actually start and play with the notebooks. You're not kind of, I mean, you can try to follow along in class with the computer, but I rather, I think it's better for you to actually follow and what I do here. I'm a clown. I try to engage with you. If you isolate yourself with the computer, maybe sometimes you may lose some of the nuggets I give you. Anyhow, so it looks like everything works here. So I'm going to be just opening the Jupyter notebook. And things are black because I like them this way. Yeah, black is good. All right, so we're going to see this random projection to be starting. So let me go full screen. You should be complaining if you don't see things, right? I kind of know what you see, what you don't know. What you see, what you don't see. But if you provide feedback live, like I can't read anything or can you turn off the lights that would be also useful, OK? So do interrupt me, talk to me. Can you see? Is it OK? Yes, no. Thank you. All right, let me turn off that one too. All right, OK. So what I do here, I'm going to just import some utilities I defined for plotting stuff. Then here, I just import Torch. And then from Torch, I import this nn library, which is simply I'm going to be importing the matrix multiplication and the addition of a vector, OK? So matrix multiplication is going to be a linear operation plus the bias plus the translation. We have the affine transformation, which is the one we were talking about with the five different transformations. And then we are going to be importing, I guess, some nonlinear function somewhere. Moreover, I import from Aplotlib those things for plotting things. I also import numpy as np. Numpy, numpy, I don't know. Is it numpy, number, or numpy with numeric? People complain about that. I don't know. Anyhow, so I use some default things. And OK, this is the first line Torch specific line. So this is going to be device equal Torch dot device, CUDA 0 if CUDA is available. Otherwise, CPU. What is this? If I am running these examples on a machine which has a GPU, automatically Torch will run on the GPU memory. So your machine has a CPU which makes only sequential operations, one after each other. It might be very, very, very fast, but it's still sequential. It's like your brain. You can only do one thing at a time. If you try to drive and smoke and take a phone call, yeah, you might not be here tomorrow. So you can only do one thing at a time. And it's like the CPU of a computer. And then you have some local memory which is the main memory. It's called RAM, random access memory. On the other side, if you would like to speed up things, you can use something that is much slower, which is called GPU. So why would you use a GPU which is slower than a CPU to speed up your computations? Because although it's very, very much slower, it can perform many, many, many more computation at the same time. So if the CPU is going to be running very quickly from one side to the other, it's going to be beating your GPU so many times. But the GPU does. So the GPU will have a memory which is called device memory which lives on the GPU. I think right now they are planning to get some access to the main memory of the CPU from the GPU. I think they are working on that. I don't think it's yet out. Anyhow, if you work on the GPU, you will have to create your tensors in the GPU memory. But you don't have to worry about that. As long as you specify that thing over there at the beginning of your code, my torch will take care and will put your tensors in the right location for you. If you're going to be using a TPU, a Tensor Processing Unit from Google, similarly, you're going to have PyTorch putting your tensors in the TPU memory, which is close to the actual processing unit there, which is going to make your operation computed in a faster way. So this is a stupid single line, but it's important. Again, about the PyTorch stuff, you won't be tested in the midterm. Midterm is going to be mostly about math. This stuff is going to be essential for actually managing to succeed in the final project. So if you are trying to beat others while using CPUs and they are using GPUs, good luck. And yeah, I'm ironic, but people tried and then complain and then I don't want complaints. All right, so here we create 1,000 points. And actually, I do this one. So 1 underscore 0, 0, 0 is the same as 1,000, but it's better to read. And so here I have 1,000 points. And my capital X is going to be my design metrics. It's going to be having 1,000 rows and two columns. And it's a sample from a random and distribution. So if you go here, you press Shift-Tab. You're going to be having this thing opening up. And you can click here. You're going to see what is this stuff. So random, randomN is going to be a return a tensor field with random numbers from a normal distribution with mean 0 and variance 1, also called standard normal distribution. So how did I open the help? Do you know this stuff? Have you ever used notebooks before? Is the first time you saw this help? Some of you may have seen this the first time. So how are these capital X going to be looking like? How are these set of points? What is your expectation for this set of points? You're supposed to interact. Small values near to zero. OK. So you're going to have a cloud of points. What is going to be the radius of your cloud? Say again? You mean this aggregation? No. I'm just speaking. Since I don't like distributions, I'm going to have a bunch of points, which are going to be basically somehow circularly distributed. I'd like to know what is the average radius of this blob? A half who offers more or less? Zero? Average. So you have the expectation of the square of the X square. Square root of the expectation of the sum of the squares. Someone said one half. Who, guys, wants more? Who beats more? Say again? That or one? That or one. So you have the standard deviation is going to be one. So are you going to get points outside the standard deviation or not? You know the bell curve, right? Where is the one standard deviation? Close to the center, far away. How much larger than the standard deviation is the bell? 67% is one. OK, roughly. Try to guess. Just shoot numbers. It's OK. I don't judge you yet. No, you're too shy. Where's my student? Answer. OK, fine. Yeah, fine, not fine. OK, there we go. What is this stuff? It's cute, I know. Can you see anything? Let me, shall I lower it? OK, I just lowered the light, whatever. People in the recording are going to be complaining. OK, still, I don't know how to turn off the first thing. So you go get dark. So what are those red and green arrows there? OK, you can guess, right? I don't buy, too. Axis, fantastic. Which axis? What is the length of those things? Come on. Unit, right. So those are the unit vectors. Y one is red, Y the other is green. OK, fantastic. Which one is X? Which one is Y? X is the red one. Y is that because you have RGB, right? Because you have taken some graphic class before this. So red is going to be always the X axis. Green is going to be always the Y axis. Those are unit vectors. And therefore, this cloud of points spans roughly. What is the average radius here? Can you guess? More or less? Three, fantastic. That was the answer. OK, so you have a cloud of points, roughly large, three. Uniformly like a circularly distributed, sort of. All right, cool. So we're going to run now. You can try to put like this arrow here. One more arrow here. One more arrow here. You have three arrows there. More or less. I'm a physicist, right? Yeah, later I will be mathematician. OK, now I'm a mathematician. So these are the linear transformations. Here I compute the SVD, singular value of the composition. Again, there is the video I pointed out before. So I'm going to be multiplying this cloud of points by a matrix. So what does a matrix do to this circular blob? Come on, you know this stuff. You repeated three times so far. What do you expect to see? You have a circle here. What do you expect to see afterwards? A potato. Yes, correct. So first potato, this is a very squished one. So what's happened here? What's happened in the y direction? All right, so how is this matrix? It's a cute matrix. It's a rotation matrix. It's an annoying matrix. What are the singular values? Look here. One singular value is almost close to zero, right? So this is an almost singular matrix. So this is what's happening. It's going to kill one dimension. This one didn't kill it yet. But basically, you have that the dimensions in the diagonal, in the singular value of the composition, represent the amount of zooming you have in the different direction. And then the first two matrices, the first and the last one, represent the rotation that these metrics apply. This other one has a factor of 2 in one direction and the 0, 6 in the other one. So you get this stuff here. Oh, something else. Hold on. So here you have blue, green, yellow, red, right? So RG, hold on. There is no order here. We have blue, green, yellow, red. And then here you have blue, red, yellow, green. So what's happened here? How is the determinant? Negative. Why is that? Because you had a reflection. That's correct. All right, keep going. You can see here different things, right? So you have more potatoes here. And this actually is zooming a lot. You have 1.3. This also is kind of zooming a little. OK, this one does nothing. OK, boring. Right? Oh, OK. What's happened here? How is this matrix? It's going to be singular, right, almost. You can see one singular value is going to be 0.03. It's very tiny. You can see these are the unit vectors. They are very big arrows here. Means it has squashed down, right? The first one is 0.06. The other one is 0.03. So basically you almost kill a dimension here. And so on, OK? So no much stuff. Anyhow, you can do the same things with PyTorch now. And so this is going to be the first instruction you're going to be seeing about PyTorch. So here I do a model. It's going to be a sequential, which is basically a container where I can put a few modules, one after each other. And my first module is going to be an NN linear, which means what does NN linear means? Doesn't mean a linear transformation because these people were, I guess, programmers. So they don't know, I guess. This is an affine transformation. But actually it's a linear transformation because I said bias equals false, OK? So this is actually just a matrix multiplication. So it is a linear transformation, OK? Maps 0 to 0. Then I do model 2 device. I ship the model to the memory of the GPU. And then I remove the gradients. You're going to figure out this next week what it is. I'm going to get here my y is going to be the output of the model to which I input the x. Then I generate my figure and I plot this stuff. So this is to show you that you can also get a singular matrix with the PyTorch. How cool is this? It's boring, right? I mean, you can see you can multiply data by matrices by using the PyTorch package. And so we had to create a model, a sequential container. We put inside the linear module. We removed the bias, the translation. And you get this stuff. Why did I remove the translation? Because otherwise I had to fetch it and it goes outside the screen. So I keep it in the center, OK? So let's have this one. So right now I'm going to just use the following. I just use a matrix that is, I think, identical. No, it's going to be identity matrix, which I scale both items in the diagonal by the same value. And then I apply a nonlinear function. So my nonlinear function is going to be the following. It's a hyperbolic tangent, which goes to minus 1 roughly when you cross the minus 2.5 here on the bottom. And it goes to 1 roughly whenever you cross the plus 2.5 up here. So this stuff maps the whole real line to wire. So plus infinity goes to 1, and minus infinity goes to until you reach the linear region, which is roughly minus 2.5 to 2.5. Why do I mention these numbers? Because sometimes you're going to put your nose inside your model that is not training, and you still want to figure out the order of magnitude and the rough, the number which are associated to the kink of these nonlinearities. Anyhow, so I just create this one, which is very, very similar to the thing I just showed you before, which has, where is it? Here. So here I have my sequential, which is this sequence of operations. I use a matrix first, so this linear 2, 2, and then I apply the tanh. And then I increase the, I think, the multiplier for that diagonal matrix. So that's what you get. So this is my cloud. What do you expect to see of my cloud? So what is the range of this cloud? We said it's roughly? It goes from minus 3 to plus 3, right? Where is the kink happening? So everything that is outside these 2.5, minus 2.5, 2 plus 2.5 box is going to be mapped to one. Things that are inside stay roughly the same. That's correct. So what do you expect to see after this image over here? You expect to have in box your data, but what is going to be the size of this box? Are you sure? So the suggestion was from minus 2.5 to 2.5. The box is going to be to minus 1 to plus 1. So minus 1, minus 1, all the box from minus 1 to plus 1. But then the points inside are going to be mapped according to the mapping we were mentioning before. So things from minus 2.5 to 2.5 are getting linearly squeezed. And then things outside the 2.5 number are going to be squashed down to the basically edge of this box. Are you ready to see how to box a normal distribution? Are you excited? OK, just two people just said that. I mean, are you excited? Come on, give me some satisfaction. OK, you're not playing with me. I see. I'm trying to be nice to you. Oh, thank you. At least that worked. So that was first one. So in this case here, I just show you just my identity metrics. I didn't zoom anything. And here, things are kind of, what kind of distribution looks this? I mean, what does it look like? Come on. Oh, it's a uniform kind of between minus 1 to plus 1, fantastic, in 2D. What does it happen now if I start zooming a little bit? If I start multiplying instead of having an identity metrics, now I have the 2-2 metrics, or 3-3, or 4-4-5-5. Say again. The points will go further to the edges. So you're going to be seeing this, and this, and this, and finally this one. Oh, right. Cool, no? So what happens here? We mapped our cloud that was like circular into this kind of boxy thing. How cool is that? Now those little points are very, very spread apart. And so you can actually tell them apart if you are, for example, classifying stuff. Finally, in the last minute, then I leave you, let you go. It's going to be using random matrices, but in this way. So here, I'm going to have my sequential. I put like a linear layer, which also has the bias term. So it's a fine transformation. I go from 2 to this hidden layer dimension, which is going to be 5. So I go to 2 to 5. I apply the reload, which is simply the positive part operator. And then I apply another fine transformation that goes to 5 to 2, such that I can display them on the screen. Is it clear what I'm doing? So I create a sequential, which is a sequence of modules. I have three modules. The first module is a matrix mapping matrix of height 5 with 2. So it shoots towards five dimensions from a two-dimensional input space. There is a bias, which is going to be five dimensional. Then you remove, you set to 0 everything that is negative. And then you map this five-dimensional space, whatever, down to two dimensions. So you're going to have a matrix that has two rows and five columns. And then there is a bias of height of size 2. Did you follow? If you didn't follow, you can find in the recording. Because I'm going to press play. Are you ready? Yes. OK. Some engagement. OK. It didn't work. Sorry. OK. OK. Fantastic. No, it did. So this is the beginning, right? That's the starting point. And then you get, yes, you're allowed to do, oh. It's fine to go, oh, this is all, right? This is very spiky. So we're going to figure out in the next class what happened here. This is a very different thing, right? These are smooth transformations, somehow smooth. This is very, very angular, right? And this one. And how cute is this, right? So finally, final remarks, and then I let you go. I know it's dinner time. Sorry for keeping you here two minutes late. So final remarks are the following. Then, yeah, check the piata. So this is basically a untrained neural network. Those I just showed you in the module with the three items, the linear transformation, positive part, linear transformation, or a fine transformation, is your first linear deep neural network. It's not that deep, but still makes pretty damn awesome drawings, I think, in my opinion. So unlike a just initialized network, an untrained network, we start with this kind of transformation. But we have seen from the first video I show you tonight that we are aiming for a transformation which is instrumental to perform a specific task, which is, for example, the classification of those points that are belonging to the different five spiral. So in the next class, we're going to figure out, how do I take this initial arbitrary transformation, and I can pull it apart and manipulate, such that I achieve the final goal, which is, for example, make my points linearly separable in the final layer. So this is going to be for the next class. Otherwise, stay warm, and I see you next time. Bye-bye.