 You can start today lesson and I'm going to be in the one teaching today, so yeah. You're white while Fredo's now you're going to figure out very soon white while Fredo's. So because we start with the other Alfredo. So today, summer is going to be the following right this is the table of content we start with a little little bit of review about matrix multiplication. Just my interpretation like my point of view, then I'm going to be telling you about input data which are going to be in this case talking about input signals. So we're going to be extending the matrix multiplication for signals and we're going to be we're going to be ending up with convolutions, which exploits some property of my natural signals. Finally, we're going to be seeing some interactive Python interpreter with pytorch. So let's get started. So we're going to have here linear algebra. Wake up. Alright, so whenever we have a neural net, usually we can write that our hidden layer. And here I'm going to be using a bar underneath a letter to represent that is a vector. Okay, so this is my age vector is going to be a nonlinear function F applied to my Z. And what is that Z is my linear rotation of my input. Okay, so I can write here Z equal some metrics a times my X. So let's say X is dimension and Z is going to be dimension and therefore what is the dimension of a anyone from home. Can you guess how many rows, how many columns and by this because we're going to be having as many rows as the dimension we are shooting towards and as many columns as the size of the dimension we are shooting from. Okay, so let me just expand this one as the classical, you know, just an expanding those symbols. Okay, so I'm going to have a 11 a 12 until that attack a one and then I'm going to have a 21 a 22 until the last one a two and and then I go down until the last row, which is going to have a M one a M two until a M N. And then I'm going to be multiplying this one by my vector X. So I have X one X two X and all right. So whenever we multiply a matrix times column vector, usually I would suggest everyone thinks about doing this operation, you take this first row and then you multiply by this column right and this is going to be like a inner product, which is this one times this one plus this one times this one plus this one times this one. And so we have usually mine this kind of representation row times column. So I'm just writing down to be, you know, just writing down the obvious. So we have now that my metrics can be written as being my first row a one, then I have my second row a two until I get the last row a M. Okay, so these a's are already rows. And then I multiply this one by my vector X. So if I multiply this one, I'm going to get something which is going to be like here, which is going to be a one X a two X until I get the last one a M X right. So first another question, what is the size of this element over here? What is it? One right so this is a scalar. And so I have one two until the last one which is M. So I have M scalars. Sweet. And so this one is going to be my vector Z one Z two until I get down to the Z. Okay. All right, so let's now think a little bit what each of these elements are. Okay. So let's have here the generic term a row vector times my X. And let's assume now that my N is equal to. Okay. So what is the, what is the output of this operation here? How can I compute the value here? I just said before, right? This is going to be this item times this one plus this item times this one, right? So I just write it down. We're going to have a one times X one plus a two times X two. Right now, since we are in two dimensions, I can also draw this here, right? So I have here a couple of axes. And I'm going to have my, let's say my X. So this is going to be my X. And this angle here is going to be XI. This point here is going to be X one. And this point over here is going to be X two. And then similarly, I'm going to have my A here. And I'm going to have my A one here. This is going to be alpha. And then it's going to be my A two over here. So I can just write this with some trigonometry. How much is a one? It's straightforward, right? To tell me how much is a one, right? Can someone tell me how much is a one? Yes. So no, it's actually wrong. What is a? Okay. Cosine cosine alpha, I guess, but you also had you have also to multiply by the by the length of the vector, right? So a one is going to be the length of the vector times the cosine of alpha. And then I have the other one is going to be the length of the other vector times the cosine of XI class. I'm going to have the magnitude of a times the sine alpha times the magnitude of X times the sine of XI, right? So I can take those factors out. So I'm going to have the magnitude of a the magnitude of X times cosine alpha cosine XI plus sine alpha sine XI, right? How much is this stuff over here from high school? You were just a few years ago in high school. Can you tell me how much is this expression? You're very quiet. It was more than 10 years ago. Okay, now it's not equal to one. Yeah, that's correct. Cos alpha minus XI, right? So this one, it's simply equal to the two. Okay, I can just write it here. So this is going to be the cosine of alpha minus XI, right? Or which is the same as saying cosine of XI minus alpha, right? Because the cosine is an even function. So first take away, okay? Whenever you multiply a vector with a matrix on the left, right? You basically have this multiplication, this scalar product between this row and this one, each row times this one here. Each item over here will represent up to some constant, which is expressing the length of these vectors, the degree of alignment between these two vectors. So if these two vectors are both pointing in the same direction, the cosine of zero will be one, and so you're going to have the highest value. If these two vectors are at 90 degrees, you're going to get the cosine of pi half is going to be zero, right? And so this term will be turning to zero. Otherwise, if you have one vector in this direction, the other vector in the other direction, you're going to have the cosine of pi, which becomes one, minus one, and therefore you're going to have the minus one over here, right? Multiplying, of course, these two factors. So whenever you think about a matrix and vector, you can think about matching this value here, which now we're going to be calling kernel. So my A, which was in pink, so this guy over here is called kernel. And therefore, this multiplication will be called, so the kernel is like a template, right? So this is called template, template matching. So you simply check how much is the scalar projection, the scalar product, the geometric projection of the given signal or given symbol, like given vector, towards each of these kernels, right? All right. One more interpretation, which I really like, is the following. And usually people don't see this when I talk, when I explain this during the office hours, people were like, oh, and so I had to show it to you as well, otherwise you don't do, oh, okay? All right, so we said that this, to get these items, I do this item. Well, we do this one times this one, this one times this one, this one times this one, right? So this first item gets multiplied by this item here. This item over here also gets multiplied by this one. This item, the same, and this one as well, right? So the entire first column gets multiplied by the first item over here. Then we sum it to the product of this item times the second one, maybe I had to make a square. So we can do this one here. Then I do the second item times the same square. Then I have this one times the same item and the final this one here until, and then I add a plus here. And then I get this item here over the last one, this item over the last one here, and then so on, right? And so you can express this matrix vector multiplication in this different way, right? So I can write this one as being my A1 vector column. I have A2 vector column until I get the last one, which is going to be the nth because we have n column, right? And then here I multiply this by my vector x. And this is equal to my vector A1 times the first component of my x plus the vector A2 times the second component plus so on until the A3 times the x3, right? And right now I'm going to be asking you a quick, like I'm going to make an exercise, right? So let's say Ax is one-hot with the x vector, right? xj is equal to one, right? So if you multiply a one-hot vector, like a matrix with a one-hot vector, what do you get? All of these x's are going to be zero, right? Only the jth item is going to be set to one. And so you can immediately see that if you multiply a matrix times one-hot vector, you basically select the jth column, right? Does it make sense? Yes? No? You already knew everything? Yes, okay. You already knew everything? No. You already seen this before? No, okay. Yes. All right, I'm confused. I'm asking too many questions. Yes, I know. Another one, which is a different exercise, is going to be the following. So let's say my x now is actually a n-class probability, right? So what is the outcome of multiplying these matrix times an x that is containing probability? This is P1, P2 until the Pn. You can answer? No one answers? What do you have? So if these are probabilities, like if they sum up to one, right, like similar to this one, probability of an outcome. So if you have, these are the probability, like the summation, the one norm is equal to one, right? And all items are positive. You basically get an expectation, right? Because you sum all the items multiplied by a probability, right? So automatically you can compute an expectation by doing a simple vector, matrix vector multiplication. All right, so this was the first part. Then we're going to be switching to the second part where we are talking about natural signals. So let's see what are these natural signals. We start here with usually the user definition. Whenever we do machine learning or whatever, we have our x curly x. It's going to be the collection of these x. So here I don't use the bar because I can use the bold font, right? So these are vectors because I use the bold. So here I have one sample. Given that this bold x, this vector x, i, is a data sample. And i goes from 1 to n. And these are my input samples. Right now we're going to be switching to some more generic and powerful representation. We're going to talk about this input x as being the set of these x i's, which are functions mapping my domain, omega, towards r c, where c is the number of channels we have. So at a given location, that omega, or a given time, or whatever, a given item in the domain, I'm going to be associating a specific value x i. So let's have a look about this domain and channels. What are they? So for example, if I just have a one-dimensional signal, omega is going to be just the sequence of natural numbers, 1, 2, 3, and so on, until I reach the total length of my signal, the total time, and I divide it by the sampling interval. So in this case, this fraction is equal to n, if I think about having n different samples. So this is a subset of the natural numbers. And then what is this r c? So r c could be like 1 would be like a monophonic signal. So I'm speaking in this microphone, which is super cool. And right now I'm just recording one single stream of information at different time intervals. And so that would be a monophonic signal. Or I can switch the knob and I can record whenever I want to record music. I can have a stereophonic. And so I have two values. So I no longer have a scalar. Now I have a vector at each timestamp. So I have a stereo signal. Or who can guess what is 5 plus 1? Talking about audio, what is 5 plus 1? Yeah, surround system, Dolby surround. That's correct. So I haven't here my Dolby 5.1. OK, OK, so you're actually following. But definitely we can go for a higher dimensional signal, right? So a signal that isn't defined on a higher number, like a larger domain. In this case, I'm going to have my domain. This is going to be a Cartesian product between these two sets. The first one goes from 1 to height. And the second one is going to be from 1 to width. And these are a subset of the n squared, the natural numbers, right? Squared. And let's see a few examples of Cs, right? So 1 is going to be a grayscale image. 3 is going to be the classical color image. And so let me actually go in the detail here and tell me a bit more precisely how this mapping works, right? So my x at the given location omega 1, omega 2 will have three values, which are r, g, and b. Each of these are, again, scalar fields, right? Each r is going to be a scalar, g is going to be a scalar, b is going to be a scalar. But then I pack three of them together. So I have a vector of size 3 at each location in the grid, right? We can still think about an image as being like one data point and still get back to this first original representation over here. But it's much more convenient to use this other representation. Actually, we go usually back and forth. But if I consider this representation, I can exploit properties of this signal that are inherently determined by these locations, OK? And so we are going to be doing that in a second. One more example, but this is going to be a bit tricky perhaps. If, oh, yeah, the last one here I didn't tell you, sorry. So 20 is going to be a very classical example of hyperspectral images where you have like 20 bands. So there are 20 different wavelengths. You're going to be recording these kind of images. These are not images. They are hyper-images, right? OK, OK. So now it's going to be a bit tricky perhaps. I'm going to tell you what you think is going to be the following. My omega, my domain, it's going to be r4 times r4. Who can guess from home? What is this stuff? What is r4? Do we have any physicists in the room? Well, in the Zoom room. Time. Quantum physics. This is space time, OK? So r4 is space time. We have space plus time dimension. And the other one is going to be the four momentum, the derivative. And if you have space time and four momentum, then you can compute, for example, a scalar value, which is going to be corresponding of your system, OK? So these are all kind of crazy things you can do with these kind of signals. All right, so an example, a very simple example of 1D signal is going to be simply my x square bracket k, which indicates that this is a discrete sampling process. And I have my x1 item, x2 item, and so on, OK? All right, so let's extend the notion of matrix multiplication to these signals. Extension to signals. Boom, all right. So how do we do this? So let's start now with an initial matrix, which is going to be three items in these three columns, right? So I'm going to be calling these columns here as lowercase k in this case. And then here I'm going to have my signal, right? Which has to be three items, right? Since you have three columns, right? We just figured that out. But then we're going to have that our signal definitely will extend over time, right? And so this stuff will be going down a lot. So we have to extend this matrix, right? So we had to make it larger and larger, right? So my question is going to be what do I put here, right? And so we take out the first property of signals. We're going to talk about this first property. So we had that natural signals have locality. So natural signals are local. What does it mean? It means that things may happen within this region like the things that happen here are quite uncorrelated with things that are happening down, down, down here, okay? We only care about small regions. And so in this region here, which would be like, how do I take care of the items down here? Well, I don't want to take care of them, right? Because I know that my signal will only have something that is meaningful within a small region. And so I will put a big zero over here, okay? And so locality gives me sparsity. So here I'm going to put a big zero, okay? Cool. So let's have here my, this first item here. We're going to call it, let me throw it in blue. So I have here my A1, okay? So I have my A1. Let's just call it A for the moment. And then we said we have zero here. And here I have my signal, which is long. So whenever I multiply these two, when I get something that is here, it's going to be my A times my X restricted to the interval one, two, three, okay? But then we're going to be using a second property, okay? So now we're going to say that natural signals are stationary. What does it mean? It's stationary. Stationary means that what happens here may happen over here and may happen again over here. So I am interested to check whether this specific template over here appears over here, appears over here, all across the signal. Since these signals are stationary, I expect to show the same type of pattern happening over and over again, okay? And so this one leads to parameter sharing, which means that I'm going to be simply reusing the same A here. Or maybe I should have done it in blue. Let me do it in blue. And so I'm going to be using the same A one step down here. A is those A1, right? And so I just keep repeating this one down, down, down, down until the last one. So if I just compute... Oh, it's missing something here. Okay. So this was the first element here. It was A times X1 to 3. Then I had a second item here. This is going to be A times X, which goes in this case 2 times 4, until the last one, which is going to be A times what? So I reach the end of the signal, right? And let's say this thing is long N. So I reach the N here, and this is going to be N minus... Well, N minus 2 in this case, right? And what is 2? In this case, 2 is going to be K minus 1, right? So question for you. How many items do I have here? You've been following? N minus 2, yeah. Exactly, because I have 1, 2, until the last one, N minus 2. To be more generic, we're going to be writing this N minus 2. The whole N minus 2 is N minus K minus 1, okay? So this is the total length of my signal after the convolution with the kernel size of K. So now someone should complain and say, oh, Alf, you forgot something. What did I forget? Well, I forgot there were other kernels over here, right? There is this one over here, and then there is this one over here, right? So what happened to them, right? I just computed this first guy, right? And so let's finish up with the other one. So we're going to simply repeat, right? So we're going to have these metrics here. I had the first one, the second one, until the last kernel, okay? And so I'm going to have all the colors, the blue, the green, and then you have the orange. And so each of them will have a corresponding metrics. Oh, we didn't say how these are called, right? So how is this metrics here called? Oops, too long. So this guy here is called top-let metrics, okay? And a top-let metrics is, of course, sparse because it has all zeros here and all zeros over here and has this diagonal that goes down, down like that. And so if we consider all the stack of these kernels, right? Each of these are a kernel. So this one will lead toward to as many top-let metrics, right? So this is going to be my one, two, one, till the end one. And so if I just draw them quickly, we're going to be ending up with something that is like... let's say like this. And then I have... and then let's see if I can cheat. Uh-oh, messed up. Copy, paste. A second one, color. And the green one. And then you have the third one, paste. And then we have color. And then, see, I become an expert of this stuff. All right. And so this is going to be giving you a stack of top-let metrics. And eventually you end up with not just one output, but you're going to have the first output. You're going to have the second output, right? And then you have the third one, right? And so right now we started from a signal here, which was... how much was the depth here? Well, the depth here was one. In this case, my channel was equal to one, right? Let me write it down. Which color should we choose? Okay. So here my... Uh-oh, too thick. Here my channel was equal to one, right? So in this case, we started with a channel equal one, and we end up with a channel equal three, okay? So how do we deal with this, right? So how do we perform convolution after this, right? So actually, let me reshape this one in this way. So this would be the last one. This would be the second one. And so we can think about this one as being like, this is my face, front face, and then it goes up this way, right? This is the second one. Okay, I cannot draw. Sorry. Because I don't have space. Okay, this was the last one. Talk. Something like this, okay? And we go up like this. All right. I made a mess. So what is the difference now? How do we keep going? How do we apply a convolution to this one? Well, it's not a big deal. So this one works in a very similar way if we have more channels, right? So in this case, we only have one channel. But we can use a stereophonic signal, okay? For example, I'm going to have my signal over here. Or maybe I can do it on the bottom side here. So let's say I have my signal, which is going to be like that. And then I have two channels, okay? And so how do we compute this convolution? Well, the kernels now, instead of being like here, they had like just one channel in thickness. Now we're going to have a kernel which has more thickness, right? So we're going to have the kernel stack, which was three items. Now we're going to be also having some depth. This was the first one. Then we go to the second one until I get to the last one, okay? And this one keeps going down, down, down, right? And so how do you compute this scalar product? Let me move a little bit here, okay? So this is going to be like a scalar product. And so the first one was the season. So we have the first kernel right here. We apply down this direction, right? We had three items here. One, two, three. Two in the thickness like this. And then we go down, bam. So each multiplication, so this one times this stuff over here is going to give me just one value, right? So this is going to be my first item over here. And then as you move down, you get all these items. So then you're going to get the second kernel, right? And similarly you go down. And you get all the other ones until you get to the last one, right? Which was this one over here. And boom, you go down. And so you get those output, right? And so this is how it works, right? So here we started with these metrics, which we saw that it's like both a template matching. Or it can be this one, which we didn't even write what it is. This is simply a scalar, a linear combination of the column, right? Linear combination of the columns of A, right? And then we saw that we can extend the data that is down here by, and then we can modify our metrics to be having a big zero over here. And then we start sliding the same kernel over again because we use parameter sharing, since the signal is stationary. And we have all these big zeros because we only look for local interaction. We don't care about things that are further away. And for the final part, if there are no questions, you're very, very quiet. Are you enjoying the lesson or everyone is asleep? I hope you are enjoying the lesson. I am, at least. This is the third time I'm teaching it and it's getting prettier and prettier every time I go over. Enjoying. Okay, that's great. Fantastic. Thank you for the feedback. And so, finally, we're going to be looking at how do we deal with the 2D kernels. So, 2D kernels are, oh, okay. Yes, I'm getting there now. So, how do we deal with 2D kernels? First of all, let's make sure we understand these numbers. And so, we're going to be playing a little bit with PyTorch, okay? So, something I won't already show you, perhaps a few times, is that you have a repository from last semester where you have all the instructions for how to get Konda running on your machine. I usually like to do this such that I can quickly prototype things and, you know, I don't quite use... I don't want to use the internet, right? When I work, I want to be, like, quiet with my machine myself. So, how do I play with this? I can do now Konda, activate PyTorch Deep Learning. And then, from here, I can do Ipython. And then, I can do import, Torch. And then, from Torch, import and end. If I press Ctrl-L, I will clean up the screen, such that I can see what I'm doing. So, let's start by defining my first signal, it's going to be X. It's going to be a Torch.Rend end of just one item, like a batch of one item. And then, I'm going to use a stereophonic signal, so I have two channels. And then, let's say I have 64 samples. So, this is going to be my first example of using like a signal. It's not big deal. Then, let's say I have my convolution, which is going to be an end.conf I press Tab, I have auto-completion, 1D. Oh, I don't know how to use this. I don't know the API. Someone say, oh, check online. No, I don't like to check online. So, I just press question mark and I press enter. If I do that, I'm going to see now the list of all the options that I can use. So, we have input channel, output channels, channel size, stride and padding. So, let's say I understand this. So, I'm going to say here, my convolution, I have to say start from two channels. So, I start for a stereophonic signal. I like to get 16 kernels. And let's say the size of this kernel should be 3, like we have done so far. So, first question for my audience. What is the size of the weights of this convolution? Questions. I mean answer. I have to answer on the zoom because I cannot see things on the 16 times 2. So, we have 16 kernels for sure. It has to have two channels in order to match the input. But how about this number 3 here? So, the length of each kernel is going to be 3. And the number of channels of these kernels must be 2, such that it matches my input. And then I have 16 of them. So, this one gives me 16 two channels of size, like two channels and domain size 3. Cool. How many biases do I have? Anyone? If we have 16 kernels we're going to have 16 biases. Yeah. Finally, what is the size of my convolution applied to my X? And you need to answer because we don't go ahead otherwise. So, we remember we have 16 filters, yes. So, eventually you're going to have 16 channels. So, that's going to be sure. The first dimension is going to be 1, which means I have one batch. 16 are the channels. And then what is the extension of the signal? We started with 64 samples. But then we saw that when we applied convolution we go down until n-k-1, right? So, k is 3, 2. So, 64-2 gives us 62. And so, here we go. You have one batch of 16 channels of 62 in length. What do we have now if I choose a batch like a kernel size of 5? What is the only difference we get now here? Anyone? 16, yeah. All right, so let's try to see how the 2D convolution works. So, let's start with my signal, which is going to be again this random random. Let's have a batch of one item. I'm going to use a hyperspectral image. And then for the size, I'm going to be using something that is in height 64 and in width 128. So, this is my natural signal for a hyperspectral image. This is my conf, which is going to be nn.conf 2D. Again, if I don't know how to use it, I put a question mark and I press enter. And again, they say input channel, output channel, kernel size, stride and padding. So, in this case, I'm going to do again input channel, we set 20 because we start with a hyperspectral image. We'd like to use our 16 kernels. In my filter size, let's say the vertical direction and 540 horizontal direction. So, first question, which is going to be quite easy. What is the size of my weights? Anyone? Maybe someone that hasn't answered already before or literally anyone because it's getting late. It's very similar to what we saw before. We have 16 kernels. What are the sizes of these kernels? You need to match the 20. Yeah, and that would be the output size. I ask what is the size of the weights, not the size of the output. But that would be the correct output size, which we're going to see in a second. So, the size of the weights is going to be 3 and 5, right? So, we have these kind of patches, which are height of 3, width of 5, 20 channels in depth, and then I have 16 of these items. And then, as the same as before, if we check the biases, we still get the same 16 biases. And finally, as Eric was pointing out, if we do, if we apply the convolution to my input X and we check the size, we're going to get definitely one batch of 16 in depth, right? Or 62 times 124. My last question would be how do I manage to remove this change in a domain, right? So, we move from a domain that is 64 and 128 to a domain that is 62 and 124. If I'd like to keep the same domain, I had to change a few parameters here in my convolution. So, first of all, I had to say I'm going to still use a stride of 1. But then I'm going to use a padding of 1 for left and right, sorry, top and bottom, and then 2 for left and right. Yes. And then I need one more parameters. So, in this case, the only thing I change is going to be adding an extra 0 to the top and to the bottom of my input signal and 2 zeros on the left and 2 zeros to the right. Such that whenever I compute the convolution of X, it still preserves my 64 and 128 dimension. This is going to be quite important or it's necessary if you want to use some received by connections. You cannot use the same received by connection if you get domain changes, okay? So, this one was the whole course, sorry, the whole lesson today. Are there questions or was it just fine? I mean, after the third time I'm teaching this stuff today, I think I was, it was very smooth. But also, I don't know because you should provide some feedback as well. Are there questions? What did you learn, actually? I learned that coding is easy. I hope you, sorry, the first page, like the first part I was talking about the matrix multiplication as a template matching and then the linear combination of the column. That's very, very important to keep in mind, right? You have this kind of at least image, mental image where you're computing the projection, right? Where the each item in the output of the multiplication and then the other interpretation which is again the linear combination of the columns. We are going to be using that in the lesson for graph neural networks. So if you don't, if you're not practical, if you're not familiar with that kind of representation, it becomes a little bit hard perhaps. And then finally, yeah, play with torsion and try to get yourself familiar. There are no other questions. We are done. No questions? Bye-bye.