 Last time we had seen that a matrix can be written basically let me draw here the matrix so we had several Roles right and then we multiply usually decent by one one column all right And so whenever we multiply these guys you can see these and as two types two different equivalent types of Representation, okay, can you see right? You don't is it legible? Can you okay? So you can see basically as the output of this product has been a sequence of Like the first row times these column vector and then again I'm just okay shrinking them. They should be at the same size right because otherwise you can't multiply them So you have this one and so on right until the last one and This is gonna be my final vector and we have seen that each of these Buddies here, what are these talk to me please? There's a scalar products right, but what do they represent? What is it? How can we call it? What's another name for calling a scalar product? I? Show you last time a demonstration with some trigonometry trigonometry right what is it? So this is all the projection if you talk about geometry or you can think about this as an unnormalized cosine Value right and so this one is gonna be my projection basically of one Kernel or my input signal onto the kernel right so these are projections Projection All right, and so then there was also a another interpretation of these like there is another way of seeing this Which was what basically we had the first column of the matrix a multiply it by the first element of the X Of these of this vector right so back the element number one then you had a second column times the second Element of the X vector until you get to the last column right Times the last n element right suppose that this is long n and this is m times n Right so the height again is gonna be the dimension towards we shoot to and the width of a matrix is dimension where we coming from Second part was the following so we said instead of using these metrics here instead of since we are doing Convolutions because we'd like to exploit sparsity a stationarity and Compositionality of the data we still use the same Metrics here perhaps Right we use the same guy here, but then those kernels We are gonna be using them over and over again the same current across the whole signal right so in this case The width of this matrix is no longer be it's no longer and as it was here There's gonna be k which is gonna be the kernel size right so here I'm gonna be drawing my thinner matrix and This one is gonna be k lower case k and the height maybe we can still call it and okay All right, so let's say here. I have several kernels for example, I may have my Sion kernel then I may have my I don't know green. Let's let me change. Let's put pink So you have this one and then you may have a green one, right? and so on So how do we use these kernels right now? So we basically can use these kernels by stacking them and shifting them a little bit, right? So we get the first kernel out of here and then you're gonna get basically You get the first guy here then you shift it you shift it you shift it and so on right until you get the whole Matrix and we were putting a zero here and a zero here, right? This is just recap and then you have this one for the blue color Then you do magic here and just do copy Copy and I do paste And then you can also do color see fantastic magic and we have pink one And then you have the last one, right? Can I do the same copy? Yes, I can do fantastic Sorry, you cannot do copy and paste on the paper All right color and The last one light green Okay, all right, so we just duplicate how many matrices do we have now? How many layers? No, don't count the number like the letters on the on the screen and K or M. What is it? What is K? Decide You're just guessing you shouldn't be guessing you should tell me the correct answer, right? Think about this as a job interview I'm training you So how many maps we have? And right so this one here are as many as my and which is the number of rows of this initial Thing over here, right? All right, so what is instead the? Width of this little kernel here. Okay, right? Okay What is the height of this matrix? What is the height of the matrix? Are you sure? Try again. I can't hear and minus K That's one okay, and the final What is the output of this thing right so the output is gonna be one vector which is gonna be of height the same right and minus K plus one and then It should be correct Yeah, but then how many what is the thickness of this final vector and right so this stuff here goes as thick as M right So this is where we left last time, right? But then someone asked me now then I realized so we have here as many as the different colors, right? So for example in this case if I just draw To make sure we understand what's going on. You had the first Thing here, and you had a second one here, and I had the third one right in this case All right, so last time they asked me if someone asked me at the end of class So how do we do convolution when we end up in this? Situation over here because here we assume that my Curners are just you know, whatever K long let's say three long, but then they are just one little Vector right and so someone told me no then What do you do from here? Like how do we keep going because now we have a thickness before we started with a Something here this vector which had just n elements, right? Are you following so far? I'm going faster because we already seen these things. I'm just reviewing But are you are you with me until now? Yes, no Yes, okay fantastic. So let's see how we actually keep going So the thing I just show you right now is actually assuming that we start with that long vector which was of height what was the height and Right, but in this case also this one means that we have something that looks like this and So you have basically here This is one. This is also one. So we only have a Monophonic signal for example, and this was and the height, right? All right, so let's assume now we are using a stereophonic system. So what is going to be my? Domain here. So, you know my x can be thought as a function that goes from the domain to the our Number of channels. So what is this guy here? Yeah, x is one dimension and somewhere. So what is this omega? We have seen this slide the last slide of Tuesday lesson, right? Second Omega is not set of real numbers. No someone else tries We are using computers It's timeline. Yes, and how many samples you you can you have one sample number sample number two sample number three So you have basically a subset of the natural Space, right? So this one is going to be something like zero one two so on set Which is going to be subset of and right? So it's not our are is going to be if you have time continuous domain What do you see in this case in the case I just show you So far, what do you see in this case? Now number of Input channels because this is gonna be my x right. This is my input. So in this case we show So far in this case here We were just using one. So it means we have a monophonic Audio, let's have now the assumption make the assumption that this guy instead is gonna be two such that you're gonna be talking about stereo phonic Signal right, okay, so let's see how this stuff changes. So in this case my Let me think Yeah, so how do I draw I'm gonna just draw right complain if you don't follow are you following so far? Yes, because I watch my tablet. I don't see you right? So you should be complaining if something doesn't make sense, right? Otherwise becomes boring if I'm waiting and watching you all the time, right? Yes. No Yes, okay. I'm boring. Thank you. All right. So We have here this signal, right? And then now we have some thickness in this case. What is the thickness of this guy see right? So in this case this one is going to be see and in the case of the stereo phonic signal You're gonna just have two channels left and right and this one keeps going down, right? All right So our kernels if I'd like to perform a convolution over this signal, right? So you have different Same policy, right? And so on right if I'd like to perform a convolution 1d convolution I'm not talking about 2d convolution, right? Because they are still using a domain which is Here number one, right? So this is actually important. So if I ask you what type of signal This is you're gonna be basically you had to look at this number over here, right? So we are talking about one dimensional signal Which is one dimensional domain, right? 1d domain Okay, so we are still using a 1d signal, but in this case it has, you know, you have two values per point So what kind of kernels are we gonna be using? So I'm gonna just draw it In this case, we're gonna be using something similar like this. So I'm gonna be drawing this guy Let's say I have K here, which is gonna be my width of the kernel But in this case, I'm gonna be also have some thickness in this case here, right? So basically you apply this thing here, okay, and Then you can go second line and third line and so on, right? So you may still have like here M kernels, but in this case you also have some Thickness which has to match The other thickness, right? So this thickness here has to match the thickness of the input size So let me show you how to apply the convolution So you're gonna get one of these slices here and then you're gonna be applying this over Here and then you simply go down this way All right, so whenever you apply this You perform this guy here inner product With this over here what you get? It's actually a one by one is a scalar So whenever I use this orange thingy here on the left hand side and I do a dot product Scalar product with this one. I just get a scalar. So this is actually my convolution in 1d The convolution in 1d means that it goes down this way and only in one way. That's why it's called 1d But I will multiply each element of this mask Times this guy here, and then you have second row and This guy here, okay? You say you multiply all of them you sum all of them and then you get your first Output here, okay, so whenever I make this multiplication. I get my first output here Then I keep sliding this kernel down and then you're gonna get the second output third out fourth and so on until you Go down at the end Then what happens then happens that I'm gonna be picking up Different kernel. I'm gonna beg it. Let's say I get the third one. Okay. Let's get the second one. I get the second one And I perform the same operation. You're gonna get here This one actually let's actually make it like a matrix you go down. Okay, until you go with the last one Which is gonna be the mth right the mth kernel which is gonna be going down this way you get the last one here, okay? Yes, no confusing clearing. So this was the question I got at the end of the class Yeah Tuesday. Yeah Because it's a dot product of all those values between so basically you do the projection of this part of the signal Onto this kernel So you'd like to see what is the Contribution like what is the alignment of this part of the signal onto this specific subspace? Okay, this is how convolution works when you have multiple channels so far I show you just with single channel now we have multiple channels Okay, so, oh, yeah, yeah in one second One and one one at the top one at the bottom So you actually lose the first row here and you lose the last row here So at the end in this case the output is gonna be n minus three plus one So you lose two one on top. Okay in this case you lose two at the bottom If you actually do a center you center the convolution usually you lose one at the beginning one at the end Every time you perform a convolution you lose the number of the dimension of the kernel minus one You can try if you put your hand like this you have a kernel of three you get the first one here that is matching Then you switch one and then you switch to right so okay with five that let's have a kernel of two Right, so you have your signal of five you have your kernel with two you have one two three and four So you started with five You end up with four because you use a kernel size of two if you use a kernel size of three you get one two and Three so you lost two if you use a kernel size of three. Okay, so you can always try to do this All right, so I'm gonna show you now the dimensions of these kernels and the outputs with pie torch Okay, yes, no all right cool Okay disaster Can you see anything? Yes, right? Let me zoom a little bit more Okay, so right now we can go we do conda activate Activate and then we have Activate yeah PDL right by torch deep learning. So here we can just run ipython If I press control l I clear the screen and we can do import torch Then I can do from torch import and then So now we can see for example comv. Let's have my convolutional convolutional layer is gonna be equal to and then comv and Then I can keep going until I get this one Let's say let's say I have no idea how to use this function I just put a question mark and press enter and I'm gonna see here now the documentation, okay So in this case you're gonna have the first item is going to be the input channel Then I have the output channels, then I have the kernel size All right, so for example, we're gonna be putting here Input channels we have a stereo signal. So we put two channels The number of kernels we said that was m and let's say we have 16 kernels. So this is the number of Kernels I'm gonna be using and then let's have a kernel size of what the same I use here So let's have K or the the kernel size equal three. Okay, and so here I'm gonna define my first convolutional object So if I print this one comv, you're gonna see we have a convolution a 2d convol sorry one one deconvolution made that Okay, so we have a one deconvolution which is going from two channels so a stereophonic to a 16 channels means I use 16 kernels This kernel size is three and then the stride is also one Okay, so in this case, I'm gonna be checking what is gonna be my convolutional weights What is the size of the weights? How many weights do we have how many how many planes do we have for the weights? 16 right so we have 16 weights what is the Length of the of the of the of the kernel Okay, oh What is this two? Channels right so I have 16 of these kernels which have thickness 2 and then length of 3 okay, makes sense right because you're gonna be applying each of these 16 across the whole signal so Let's have my signal now. You're gonna be is gonna be equal torch dot rend Rand and all size. I don't know. Let's say 64 I also have to say I have a batch of size one so I have a batch of size one So I just have one signal and then this is gonna be 64. How many channels we said this has To right so I have one signal one example which has two channels and has 64 samples So this is my X Hold on what is the convolutional bias size It's 16 right because you have one bias per plane per per weight. Okay, so what's gonna be now my convolution of X The output hello, so I'm gonna still have one sample right how many channels 16 what is gonna be the length of the signal? Okay, that's six Exito okay fantastic. All right, so what if I'm gonna be using a convolution with Size of the kernel 5 What do I get now? Yeah to shout I can't hear you 60 okay, you're following so fantastic. Okay, so let's try now instead to Use a hyperspectral image with a 2d convolution, okay So I'm gonna be calling now my convolution here. It's going to be my in this case is correct. It's going to be a com Com 2d again, I don't know how to use it So I put a question mark and then I have here input channel output channel kernel size Stride and padding Okay, so I'm gonna be putting input stride input channel So it's a hyperspectral image with 20 planes. So what's gonna be the input in this case? 20 right because you have you start from 20 spectral bands Then we're gonna be inputting the output number of channels. Let's say we're gonna be using again 16 in this case I'm gonna be inputting the kernel size Since I'm planning to use okay. Let's actually define Let's actually define my signal first so my x is gonna be a torch Dot rend and let's say one sample with 20 channels of height For example, I guess 60 128 well, hold on 64 and then with 128, okay So this is gonna be my My input my input data. Okay, so my convolution now it can be something like this So I have 20 channels for input 16 are my kernels I'm gonna be using then again, I'm gonna be specifying the Kernel size in this case. Let's use something that is like Three times five. Okay, so what is gonna be the output? What are the kernel size? anyone Yes, no, what? No 20 channels is the channels of the input data, right? So you have how many kernels here? 16 right there you go You have 16 kernels which have 20 channels such that they can lay over the input three by five, right? Thinny like a short like yeah short but large Okay, so what is gonna be my convolution on x sized? 116 6224 Let's say I'd like to actually add Back the I'd like to have the same dimensionality. I can add some padding, right? So here there's gonna be the stride I'm gonna have a stride of one again if you don't remember the The syntax you can just put the question mark and figure out and then how much stride should I add now? How much stride in the y direction? Sorry, yes, how much padding should I add in the y direction? One because it's gonna be one on top one on the bottom, but then then on the x direction Okay, you're following fantastic. And so now if I just run this one you want to get the initial size Okay, so now you have both 1d and 2d. The point is that what is the dimension of a? convolutional kernel Ensemble for 2d dimensional signal Again, I repeat what is the dimensionality of the collection of kernels used for two-dimensional data? again For right so for is gonna be the number of dimensions that are Required to store the collection of kernels when you perform 2d convolutions The one is gonna be destroyed. So if you Don't know how this works. You just put a question mark. You're gonna tell you here So striding is gonna be telling you you stride off you move every time the kernel by one if you add the The first one means you only is the batch size So torch expects you to always use batches meaning how many signals you're using just one, right? So that's our expectation if you send an input vector, which is gonna be input tensor Which has dimension 3 is gonna be breaking and complaining Okay, so We have still some time to go in the second part all right second part is gonna be So you've been computing some derivatives right for the first homework, right? So the following homework maybe you have to do You have to compute this one Okay You're supposed to be laughing. It's a joke Okay, there you go fantastic So this is what you again wrote back in the 90s for the computation of the gradients of the Of the LSTM, which are gonna be covered I guess in next next lesson. So how somehow So they had to still do these things right? It's kind of crazy Nevertheless, we can use pi torch to have automatic Computation of these gradients so we can go and check out how these automated gradient Works, okay. All right, so All right, so we are gonna be going now to the notebook number three, which is the Yeah, invisible. Let me see if I can highlight it now. It's even worse. Okay number three auto grad tutorial, okay? Let me go full screen Okay, so Auto grad tutorial was gonna be here here. I just create my tensor Which has as well this require gradients equal true in this case? I mean asking torch, please track all the gradient computations Did the computation over the tensor such that we can perform? Computation of partial derivatives, okay? In this case, I'm gonna have my y is going to be so x is simply gonna be one two three four The y is gonna be x subtracted number two, okay? All right, so now we can notice that there is this grad fn Grad fn and fn function here. So let's see what this stuff is We go sit there and see. Oh, this is a sub backward What is it? Meaning that the y has been generated by a module which performs the subtraction between x and 2 right so you have x minus 2 therefore if you check who generated y well, there is a sub a Subtraction module, okay? So what's gonna be now the grad function of x and you're supposed to answer? Oh Okay, why is none? Because they should have written their Alfredo generated that right? Okay, all right none is fine as well Okay, so let's actually put our nose inside we were here we can actually access the first element You have the accumulation. Why is the accumulation? I don't know. I forgot but then if you go inside there, you're gonna see the Initial vector the initial tensor. We are using is the one two three four Okay, so inside this computational graph. You can also find the original tensor. Okay? All right, so let's now get the Z and Z is gonna be my y square times three And then I compute my average a it's gonna be the mean of Z All right So if I compute the square of this thing here and I multiply by three and I take the average So this is the square part times three and then this is the average, okay? So you can try it if you don't believe me All right, so let's see how this thing looks like so I'm gonna be plotting here all the sequence of Computations so we started by from a two by two metrics. What was this guy here two by two? Who is this? X okay, you're following cool. Then we subtracted Two right and then we multiply by y twice, right? That's why you have two arrows. You get the same subtraction that is the y at the x minus two multiplied by itself Then you have another multiplication. What is this? Okay, multiply by three and then you have the final The mean backward because it's why it's green because it's mean No Yeah, thank you for laughing Okay, so I compute back prop right what does back prop do? What does this line do? I want to hear everyone. You know already We compute what? Gradients right so back propagation is how you compute gradients. How do we train neural networks with? Very innocent right or whatever Aaron said yesterday back propagation instead is used for Computing the gradient completely different things. Okay, please keep them separated. Don't merge them Everyone after a bit that don't they don't see me those two things keep colliding into one Mashi thought Don't it's painful Okay, see here I compute the gradients right so guess what we are computing some gradients now Okay, so we go on your page a is going to be what What was a a was the average right? So this is a one-fourth, right? This animation of all those Z eyes Okay What so I goes from one to four? Okay, so what is that I said I is going to be equal to three? Why I square right? Yeah, no questions. No, okay All right, and then this one is was equal to three times X minus two Square right So a what does it belong to? Where does a belong to what is the are right? So it's a scalar Okay All right, so now we can compute the a Over DX So how much is this stuff you're gonna have one-fourth comes out forum here And then you have you know, let's have this one with respect to the X ith element, okay So we're gonna have this one. I plug inside the Z. I have the three why I square and it's gonna be three X I minus two square right so these three comes out here The two comes down as well, and then you multiply by X I minus two right so far should be correct Okay, fantastic. All right, so my X Was this element here Actually, let me compute as well this one. So this one goes away. This one becomes true. This is one point five times X I Minus three right minus two minus three. Okay, mathematics. Okay. Okay. Thank you. All right, so What's gonna be my DA Over DX I'm actually writing the transpose directly here. So for the first element you have one you have One times one point five so one point five minus three you get one minus one point five, right? Second one is gonna be three minus three you get zero, right? This is three minus three Maybe I should write everything right so you actually following so you have one point five Minus three now you have three minus three below you have four point five minus three and then the last one is gonna be six minus three which is gonna be equal to Minus one point five zero one point five and then three right You agree, okay Let me just write this one here Okay, just remember So we have already computed the back propagation here. I'm gonna just print the gradients and then Right it's the same stuff we got Here right such that I don't have to transpose it here Whenever you perform the partial derivative in a pytorch you get the same The same shape as the input dimension So if you have a weight whatever dimension then when you compute the partial you still have the same Dimension they don't swap. They don't turn. Okay. They just use this for practicality the correct version I mean that the gradient should be the transpose of that thing Sorry the Jacobian which is the transpose of the gradient right if it's a vector but this is a Tensor so whatever we just used the same same shape Thing no, so this one should be a flipping. I believe Maybe I'm wrong, but I don't think all right. So Cool. This is like basic This is basic pytorch now you can do crazy stuff because we like crazy, right? I mean I do I think if you like me you like crazy, too Right, okay So here I just create my vector x which is going to be a three-dimensional Well, a one-dimensional tensor of three items I'm gonna be Multiplying x by two then I call I call this one y then I start my Counter to zero and then until the norm of the y is low thousand below thousand. I keep doubling y, okay? And so you can get like a dynamic graph right the graph is base is conditional to the actual Random initialization, which you can't even tell because I didn't even use a seed So everyone that is running the stuff is gonna get different numbers So these are the final values of the y. Can you tell me how many iterations we run? So the mean of this stuff is actually lower than a thousand. Yeah, but then I'm asking whether you know how many times this loop went through No, good Why it's random, right? You know, it's bad question Okay about bad questions next time I have something for you. Okay, so I'm gonna be printing this one now I'm telling you the grad are 2048 right just check the central one for the moment, right? This is the actual gradient So can you tell me now how many times the loop went on? So someone said 11 how many hands up for 11? Okay, four people just rose the hands. What about the others? 21, okay Any other guys? 11 10, okay, we have actually someone that has the right solution And this loop went on for 10 times. Why is that? Because you had the first multiplication by two here and then the loop goes on over and over and multiplies by two, right? so the final number is gonna be the This number of iterations in the loop plus the additional like yeah additional multiplication outside, right? Yes, no, you're sleeping Maybe okay. I told you not to eat before class. Otherwise you get groggy Okay, so inference. This is cool so Here I'm gonna be just having both my x and y we are gonna just do linear regression, right linear whatever thing the eighth operator is just the Scalar product, okay, so both the x and w has have the requires gradient equal to true being this means we are gonna be keeping track of the The gradients and the computational graph. So if I execute this one you're gonna get the partial derivatives of the inner product with respect to the With respect to the input is gonna be the weights, right? so in the Range is the input, right and the ones are the weights So partial derivative with respect to the input is gonna be the weights partial with respect to the weights is gonna be the input, right? Yes, no Yes, okay now I just you know usually it's this one is the case I just have to require gradients for my parameters because I'm gonna be using the gradients for updating later on the The parameters of the mother is so in this case you get none Let's say in this case instead What I usually do when I do inference so when I do inference I tell torch a torch Stop tracking any kind of operation. So I say torch. No, but please so this one regardless of whether your Input or weights have the required grads true or false or whatever when I say torch. No grads. You do not have any Computational graph taking care of right Therefore if I try to run back propagation on a tensor Which was generated from like doesn't have actually, you know a graph because this one It doesn't have a graph you're gonna get an error Okay, so if I run this one you get an error and you have a very angry face here Because it's an error and then it takes you Element zero of tensor does not require grads and does not have a grad function, right? So e which was the Yeah, whatever the Z here actually Z you couldn't run back problem Z because there is no graph attached to Z. Okay Questions This is so powerful. You cannot do this stuff with tensor flow. Okay tensor flow is like Whatever yeah more stuff here actually more stuff coming right now So we go back here we have inside the extra folder you have some nice cute things I wanted to cover both of them just that we go just for the second. I think So the second one is gonna be the following So in this case we are gonna be generating our own specific Modules, so I like let's say I'd like to define my own function, which is super special amazing function I can decide if I want to use it for you know Training nets I need to get the forward pass and also have to know what is the partial derivative of the input Respect to the output such that I can use this module in any kind of you know point in my You know code such that you know by using backprop chain rule You just plug the thing no young went on several times as long as you know partial derivative of the output with respect to the input You can plug these things anywhere in your chain of operations. So in this case We define my addition which is performing the addition of the two inputs in this case But then when you perform the back propagation if you have an addition, what is the back propagation? So if you have an addition of the two things And you get an output when you send down the gradients, what does it happen with the with the gradient? It gets you know copied over both sides, right? And that's why you get both of them are copies of the same thing and they are sent through one side of the other You can execute this stuff. You're gonna see here You get the same gradient both ways in this case. I have a split So I come from the same thing and then I split and I have those two things doing something else If I go down with the gradients, what do I do? You add them, right? And that's why we have here the add And so you can execute this one You're gonna see here that we had these two initial gradients here and then when you went up So when you went down the two things the two gradients sum together and they are here, okay? So again, if you use pre-made things in PyTorch, they are correct this one You can mess around and you can put any kind of different you know For a function and backward function I think we are out of time other questions before we actually leave No All right, so I see on Monday and stay warm. Bye. Bye