 In this video, I'm going to use Python to talk a little bit about the Jacobian and the Jacobian is all about partial derivatives Of course, so we're going to run through just a couple of examples Just to show what we do with partial derivatives depending on how many variables are in our function and how many functions We have and then we're going to just look at as far as data science at least is concerned Just look at some Example how we use how we would use the Jacobian in a neural network We are here in a notebook a co-lab notebook in our google drive And we'll go to full screen mode there so we can see a little bit more So the packages that we're going to use in python i'm going to use symbolic python And as always we just use the namespace abbreviation sym And then i'm also going to just use Some latic printing to the screen as far as our sympi codes concerned So we're going to just use in it underscore printing there and just call that function as well So before we do all of that, of course, let's just connect to A google co-lab And we are connected. So let's run import sympi sym and then also in it underscore printing So I always just like to check what version is installed at the moment because of course these new versions all the time And you just make want to make sure that some of the functionality that you use Is available to you. So let's have a look at the time of recording of this video. We see We are using some pipe. Well, you just use sym dot and then the danda double underscore version double underscore And that should tell us that we currently have version 1.7.1 So let's talk about partial derivatives in the jacobian Now the type of functions that we can talk about you can see the table of contents here on the left hand side There's single variable Single function. So that's not usually how we would refer to this, but I just want to make the classification very clear So it's a single variable and just a single function And then we're going to have a multi variable, but only a single function and then of course Real vector valued functions where we have a multi variable multiple functions So first of all just that single variable single function This is a mapping f as you can see in one here. We're just mapping if if just maps the reels to the reels So giving me any real valued input and the output is also going to be a real number And of course one of the easiest ones is the parabola y equals x squared And you can see here in two then if we just take the derivative of f with respect to x we all know That rule that's just going to be 2x But we can use some pi of course symbolic python turning our python instance here into a computer algebra system So we're going to use the symbols function as we can see a symbols function And we're going to set Mathematical symbol x to the computer variable x. So if we do that assignment, we can now use x as a mathematical variable So much show that I can create a computer variable f here to hold this expression x squared. Remember the double star symbols for exponent or the exponent at least The power I should say of a variable so x to the power 2 And then we print f to the screen and because we set in it underscore printing We've called that function. We can now see x Square printed in la take there no problem Now we can just use the diff di ff method on our object f there So f dot diff and then we pass the variable with which we want to take the derivative with respect to So this is f Or take the derivative of f with respect to x And if we do that Very neatly we should see that the answer is just 2x there as we would expect So very easy just to use the diff method on on our expression there. So let's go to a multivariable single function So what we're doing here in three as you can see we're just mapping r squared to r So you give me a cartesian product any two elements in r And it'll map to a single value Here we have an example also a function called f it takes two variables x 1 and x 2 both real numbers And it's good to output a single real number. Hence the mapping that we can see there So we're going to use x underscore 1 x underscore 2 as our two variables So 2 times x sub 1 times x sub 2 squared plus 3 times x sub 1 plus 2 times x sub 2 And if we want to take the derivative of that remember We have to take partial derivatives because we can take the derivative of this function with respect to either of our two variables And that's what we see on four So we have the partial derivative of f f with respect to x 1 comma the partial derivative of f with respect to x 2 And what we end up with this is this little row vector here in as much as we now have two values And taking the derivative a partial derivative, of course You just treat all the other variables as constants And that's why from 2x sub 1 x sub 2 squared if we take that first expression And we take that with respect to x sub 1 x sub 1 is a x sub 2 is treated as a constant So that's just going to stay so it's 2x sub 2 squared And then from 3x plus 1 we're just going to get the plus 3 and because 2 times x sub 2 is treated as a constant Of course that the derivative of a constant is 0 Now we can check our work first. What we'll do is we'll just set these two Symbolic variables x 1 and x 2 and we're going to use a bit of LaTeX there just to indicate that It's x sub 1 and x sub 2 So if we pass that as arguments to the symbols function now we can Use x sub 1 and x sub 2 so much so that I've Assigned my expression the expression that we have in 3 To a computer variable called f and if we print f to the screen you'll see beautiful LaTeX printing of our single function with multiple variables here And now we can take the two partial derivatives I'm going to say f dot diff and then with respect to x sub 1 and then comma another one f dot diff With respect to x sub 2 and that's going to give us our two partial derivatives And those two partial derivatives exactly the same answer as we had the in 4 So now let's go to a multi variable multiple functions. So we're talking Vector valued functions here. So we express these two functions in as a column vector So now we have two functions and that you can see in 5 here f sub 1 and f sub 2 They both have two variables x sub 1 and x sub 2 and you see them there listed as a column vector So those two multi variable functions And now if I have two functions, they each have two Variables, so if I take partial derivatives, of course, I'm going to have four Four expressions here, and that's what we call the Jacobian now We can refer to I suppose we can refer to this in 4 appears a Jacobian as well But usually when it gets to 2 by 2 and larger Then we when then we refer to the Jacobian, but we won't be strict on that So what we have on the first row of our 2 by 2 matrix here in 6 Is the first function So you see f1 and f1 and then it's taken with respect to x sub 1 and x sub 2 and then the second row We have function 2. So just as they appear in your vector value function up here Function 1 is on the first row function 2 is on the second row and you just keep it the same And then you're just going to do all the partial derivatives And then you end up with a 2 by 2 matrix and that is the Jacobian So it is just this matrix of our partial derivatives And it's first all the partial derivatives And we would write 6 as j sub f x sub 1 x sub 2 in the various notations. You can write that as that So if we just want to use uh some pie to do this for us I'm just going to set up a little some pie sy m dot call the matrix function And then I'm just going to pass this as a list my two functions with a separated bio comma And if we print that to the screen, we're going to see Just as we had our original problem. We see the two Multi-variable functions there So now I've just got to tell some pie the variables that I am going to use as far as my derivatives are concerned So I'm setting that up as a computer variable called vars And then a matrix Function again and then pass as a list x 1 and x 2 and now I can call the Jacobian once we've run that So we've got vars now we can just call the Jacobian method So f dot Jacobian and then we pass the vars A variable here and that holds this matrix of x sub 1 and x sub 2 So it knows what to take the partial derivatives of and then we see we see our Jacobian there 2 by 2 matrix just as we would expect So now we're going to go on and we're going to look at how to use the Jacobian when we deal with neural networks So here we're going to have a simple neural network. We see they're two deep layers So we have our input x we have two hidden layers And each of them have got two nodes and then we have over there. That's our output node So if we look at the first node, we can just represent that as a column vector a sub 1 and a sub 2 Which is exactly what we do there in 7 And what we will do You know, how do we calculate the values that are in a sub 1 and a sub 2? Or we're just going to set up this nice little matrix Times a scalar plus a vector notation So if we can get a matrix w and as you can see here, we're going to just call it the w transpose So we'll set up w and then I'll take this transpose multiplied by x Which in our case remembers a scalar here is a very easy input And then we're just going to add an appropriate Appropriately dimension column vector to that and that's what we see here in 8 just to keep the dimensionality correct So w it's it's transpose is 2 by 1 2 rows in one column So technically it's a column vector But let's keep keep on just calling it a matrix So that's going to hold the weights the values that we want updated Doing back propagation in our neural network. So we multiply that by scalar and I'm going to see the scalars are one by one Matrix And then we add to that a column vector c which is also two rows one column and the result is going to be a Two times one it is a two row one column Vector column vector as we can see a then seven So you can spend a little bit of time just looking at eight just to make sure that you understand What we're doing here And as I mentioned here nine we can see that x we're just going to see it as a one by one You can see it as a one by one matrix, but it is just a scalar and if we look at w now But i'm going to use w again So what we usually do uses the superscript notation inside of parentheses Where this just denotes that this w refers to our first hidden layer here So it only pertains to multiplying x with something Adding a bias node to get the values that are a one and a two So if w superscript one there it has only two elements w sub one and w sub two again with a superscript to denote That's the hidden layer we're working with so it has dimension one times two one row two columns But we take its transpose. So it's two rows one column And then c must also be a column vector two rows one column and remember this is a bias node So c one equals c sub one equals c sub two. Yeah, so it's both the same values So this is what a would look like so in 12 here. We have a So I remember it's a column vector. So it's going to have a sub one and a sub two in it. Both of them are going to be Functions of x And what we do is we take our weight matrix here. It's transpose multiply by x and add Our bias vector there and that's our two row one column Result there w one x plus c one and w sub two x plus c sub two And this is the what we get that is a sub one That's the value for a sub one and a sub two row one and row two And that is our vector a and that's remember how we get that And this is what the neural network is all about at least a densely connected neural network We need to find the best values for w and for c And for the weights and the bias values Now if we take this a and we take the derivative with respect to x So all we're going to do here and again, we will call this a jacobian and we'll do jacobian superscript one Now we don't have to take partial derivatives here We can just take the derivative of both of our functions And if we take them with respect to x, I mean all we're going to have is w sub one and w sub two Because we just have x there's a scalar. It's not raised to the power anything and c is just the constant So all we left with is that and let's call that our first jacobian or our jacobian of our first hidden layer If we go to the second hidden layer b The vector b is also a column vector with two rows, but it's a bit more complex The values b one and b two because they are now functions of a one and a two which In the own right are functions of x so it's getting much more complicated now And to do that if we just go back up to the picture That's always always nice with these densely connected neural networks We can see four lines connecting the first and the second hidden layer there So our weight matrix will have to have four values. We have to multiply each of these connections That's all a weight. So this is going to be a two by two matrix and we see it then in 14 w in a superscript two because this is the weight matrix for our second Neural our second deep layer So there we have w underscore row one column one row one column two row two column one and row two column two And if I if we take the transpose of that, of course Those subscripts just change around c two Bias vector for the second hidden layer That's also going to be c sub one and c sub two But the superscripts two and remember that the two values are also going to be the same So our vector b now contains b sub one and b sub two But both of them are functions of a sub one and a sub two And now we can get to what we had as far as the Jacobian that we spoke about is Because if we look now Remember b being a two by one column vector If I take a weight matrix two by two multiplied by my column vector, which is a one and a two And I add my bias node. I now get b, which is a two row one column column vector. So it's w Superscript one one times a one plus w two one a two plus c one So that is really just a multiplication of a two by two matrix and a column vector And you can sit and do that on a piece of paper. You'll see we come out exactly as where where b is concerned But now we can really take partial derivatives. I have two functions Remember f sub one on top f sub two at the bottom and they both have Two variables and in this instance my variables are a sub one and a sub two So i'm really going to have what we termed the Jacobian before the two by two So it's just a partial derivatives b one on the first row b two on the second row And we take those and if we do that We see that we just end up where we started with because a sub one and a sub two. They're not raised to any power c is just a constant And we end up with this that we're going to call j two It's not our original matrix weight matrix w It's it's transposed as you can see there and then for our output, which we've called o It's just a scalar in the end. We're going to call it a one by one matrix And what we need here is a weight matrix That's now we're going to call it superscript three because technically that is a third layer then for us And there are the two weight values our constant here our bias term is just a single scalar so we can call it a one by one matrix So o becomes very easy. It's w three times b plus c three now w can have all ones it can just be one and c can c can be zero I suppose it We can have that But irrespective of what our weight weight values and and our bias term is And 21 there in the second line. We'll see what the output is then and then If we do this output with respect to b Our vector b That's just going to be taking o with respect to b to sub one and with respect to b to sub two So those two partial derivatives and what I'm left with is this Again, this is our weight matrix, which is a row vector and I'm going to call that my third Jacobian and now What we want in the end is to see How the output changes depending on the input. So what we're actually asking for is dx do And if we look right at the bottom, that is where we eventually end up the the odx I should say how the output changes with respect to the input And all I'm going to do is just multiply my three Jacobians and because What we are asking in the end is just a scalar if you look at the dimensions One by two times a two by two times a two by one Well, what we're going to end up with is a one by one And we can certainly multiply those three matrices with each other And that's how we use the Jacobian when it comes to neural networks