 So the time has eventually come for us to have a look at a proper neural network now It's going to be a very basic neural network, but we are eventually there I'm still going to use a document here that I've uploaded to our pubs. I'll leave the link down below You can also download this our studio file right from GitHub Remember if this is the first time you come across these videos, please start at the beginning otherwise things are not going to make much sense and And also remember that I'm really after people who are interested in deep learning who are not necessarily Computer scientists or mathematicians, but really want to get involved with deep learning in my specific case Really getting healthcare professionals involved in deep deep learning We won't go into the code that will come with time. So let's just get going I'm going to build on everything that has gone before and And we're really going to construct this network and it's going to look Very familiar if you remember what we discussed when we looked at linear regression and logistic regression The whole idea about a behind-the-deep neural network is just this very loosely based on this idea of a brain cell A neuron and we can see one depicted here We can see that this image comes from wiki commons and Wikimedia commons and you can just click on the link there and it'll take you to these wonderful images The whole idea is that there are many connections. So this brain cell With its nucleus here and cell body has all these connections that Bring impulses from many other brain cells in and it gets trans that impulse goes all the way along and it gets Transmitted here along the axon and then to all these connections to other Neurons in a network. So many connections to many other connections and that is what it is all about Here we go, and I'm just going to make this screen size here a bit smaller so everything can fit on Let's go there we go now everything fits on now. You'll still see the input layer on this side You'll still recognize this hidden layer here You might still notice this node here But things look a bit different and the most noticeable difference from what it's come before are these many connections No longer is this one input connected to one other node in a single layer as we did with a logistic regression look at this Number one, there's three feature variables here, but there are four nodes here Now that's completely arbitrary if you design a neural network you decide how many nodes go here and that number is Something we refer to as a hyper parameter There are many hyper parameters in the design of a neural network And if you design that neural network all those hyper parameter values are up to you There are four nodes here in the hidden layer in other words That's a hyper parameter. That's your decision and different values will work differently under different circumstances But look at this first one. It receives input from all three of the input nodes not just the single one and It also gets input from this node up here, which is called a bias node So I can create a bias value and that can also be added in To these nodes now here we have three feature variables So for the first patient for instance or whatever the case might be in the subject the first row Or one of the rows in your data set the value for the first variable the variable for the second variable the variable The value for the third variable Each of these are going to be multiplied by some weight And remember now we're going to move away from calling those beta sub 1 beta sub 2 the parameters We call them weights now, but each line here represents a value a weight So this x sub 1 value is going to be multiplied by this weight and then input it to that node This node number 2 this feature variable number 2 is going to be multiplied That value is going to be multiplied by some weight value this line here and As an input and the same here and each one of them will be connected to each of these nodes So if there are three on this side each with four connections That means there was already there already be 12 values here 12 weights That we need to optimize before when we only had the single beta sub zero and beta sub one for a single variable You know that was all that we needed to learn just from this tiny little connection. There's already 12 Parameters that need to be learned 12 weight values that we have to optimize for in our loss function and in our cost function And then we can even add this bias node to all of these values as well So these multiplications and they all get added you can also add this value in there What you are going to get is what a value after all these multiplications and additions is this value called z1 and then Not only that that's not where we stop We're also going to apply some function to it in logistic regression. We looked at the logistic sigmoid function But there are many other functions and they are called activation functions And hence this idea of a neuron you can see now all the dendrites all the connections coming in all the connections going out and There's some decision as to you know what flows through yet. Does an impulse go or it doesn't impulse not go That's our activation function Now let's represent things slightly more mathematically There is a short video just on any algebra. Remember, I do have a course of almost a hundred videos on the new algebra I linked to that as well But In case, you know, you've missed all of that This might not make saying much sense to you watch that single video It's not that difficult though If we have a look at it There is my three input values from my three feature variables and I'm going to call this a column vector with three rows and a single column so it's three by one and I put this little underscore here this line under that represents a vector Rank one tensor and the solution I want is this z1 z2 z3 z4 z1 z sub 2 z sub 3 z sub 4 depending on where you live and so my Output here is a four by one column vector one two three four rows times one column My weight is now my weights is now not just beta sub zero beta sub one and maybe beta sub two It's a whole matrix rank two tensor of values and if you look at this w sub one one w sub one two if we look at node one it has one two three Four values coming out of it and there we have one two Just look at the first row there one two three four values coming out of it now if we do this tensor Multiplication we have a matrix on the side and a vector on the side So we have to actually transpose it at the moment. It is one two three It's a three by four. I cannot have an inner product a dot product of a three times four Matrix with a three times one vector. I've got to transpose that so the rows become columns and columns become rows So that change from a change of it from a three by four to a four by three Now these inner two values are both three I can do this in a product and the result will be these two values on the outside a four by one matrix Exactly or column vector at least exactly what I wanted four by one four by one there we see it and I can if I wanted to even add this bias node and it'll also be also have to be if I add that It also has to be a four by one column vector And if you look at up here, of course one two three four It's got to have these four connections So four by one and that leaves me with a four by one column vector Now I need to apply To each of these values e1 to z through z4. I've got to apply an activation function here. We call it g Now there are many you've seen the logistic sigmoid function One of the most common activation functions though is is the radio function rectified linear unit You see certain R and then lowercase e and then uppercase Lu And this is what it looks like No matter what value I input if that value is zero or less. It just outputs a zero So whatever this value this value that I calculate here all these ease that I calculate through this whole Equation five if that was a negative one The activation function spits out zero if z2 was negative one million It's never going to be that but just imagine it's going to output a zero so it'll already be zero zero If it's more than zero it just takes on that exact value. This line goes up at 45 degrees So you can see an input of 0.62 is an output of 0.62 input of 1.26 is an output of 1.26 This is called a rectified linear unit and What we do is we each of these values e1 z2 z3 We pass through this very easy the output then is this and then right at the end You know We'll combine this in some way so that there's an output and that output can just be as is in a great regression problem It can go through an activation function Itself if we have a binary classification, which is all we can do if there's a single node But you'll see later. They can even be more than one nodes here as an output And then we'll use a different kind of activation for these last values something like a soft max Function and we'll get into that a little later And that's it There is a single hidden layer neural network and you can see the differences from this to From this to a logistic regression Network that we built before many more connections And now you can see that there's richness built into this because imagine I had more nodes more feature variables more nodes And then more of these layers We'll get so many of these parameters and they all are going to in the end Remember we're going to get a value here a why had value which is quite different from Might be quite different from the actual why the ground truth why and we're going to see you know Sum up in some way or average in some way all these errors that gives us a big cost function Which is now a function here already and let's see just here there's 12 and 4 there's already 16 connections here and You know, there's even more here depending on what happens with deeper layers You can see how many parameters are known So they are in this massive equation and we now really talking about Multidimensional space and we you have to use back propagation gradient descent and then we'll optimize all of these values and We'll go through again. Hopefully our cost function will show that the error is now less We do gradient descent through differentiation partial differentiation of all of these how many ever they are weights They are We now get better values and we go forward again and backwards and forwards until our error is as Small as possible and these values all take on an optimum value as I said There's really a richness build into this you can this This algorithm can learn a lot more than a simple logistic regression model It can really try and mimic at least a simple connection Inside of your brain in some way. I look forward to speaking to you again