 My name is Jean-Clape and I'm really on a mission to teach deep learning deep neural networks. I'm a surgeon I'm involved in research using deep neural networks because I think the time has come for doctors nurses healthcare professionals Everyone involved in healthcare to start learning about deep learning It's usually the space we will find mathematicians computer scientists, but to solve problems We must understand that the domain knowledge lies on our side on the medical specialist side And we can contribute to this we can work together with our colleagues and mathematics and computer science just to to improve the overall Ames and what we can achieve with deep learning So really I think the time has come for everyone interested in healthcare to get involved in deep learning Now these videos are not only for those who are from the healthcare professional side Anyone interested in deep learning should also find these videos helpful Now these videos are based on Documents that we create were created and are and published here in our pubs So they are available the actual RMD our studio files will be available on github You can view these on our pubs as we're going to do here You can download the actual files on github. I'll show you the links there There'll be videos on YouTube. I'll talk about it on Twitter and also disseminated on LinkedIn so connect with me on LinkedIn follow me on Twitter Sign up and subscribe on YouTube and also view these files in Our pubs and also download them from github. So I'm really trying to put these everywhere Just to to to so this net as far and wide as possible because I really want to To get everyone involved in deep learning. So this is the document. I'm not going to read from the document With these are pubs, but I'm going to use it as a baseline just to discuss a few topics here on YouTube Really if you want a bit of extra information or you want to hear a bit of a bit of explanation instead of just reading the document So let's carry on. We'll we'll move down and what I'm going to do really here is just use linear regression as a first step to understanding deep learning there's a lot of concepts we can take from simple linear regression and It'll teach us those concepts and we're going to use them later when we develop deep neural networks So imagine we are in a scenario where we want to use deep learning. What would such a scenario look like? Well, one of the first ones is actually an example that will refer to something we that we call That we will call supervised learning and supervised learning. We have this Single variable which we want to predict now. I've created a very simple example here You see I've created a computer variable or an object called sales and I've put these 10 values into This numerical vector here Using see this I've got these 10 values now they can represent anything just imagine they represent the Sales in units of tons. So some medical company selling some something that goes into the production of Some medication and they sell so many tons of each product There were three tons sold in four tons then two tons doesn't matter what these are but they represent a variable called sales and They are all these values and and I might have a lot of other variables associated with each of these that would predict that the sale was going to be Three tons or four tons or two tons. So that makes this Variable here something we call a target variable. I'm gonna use the other words the other terms for it I'm gonna I'm going to use target So this becomes a target variable and these values these 10 values. I'm going to try and predict them Now without looking at all the other variables that might be there that might I might use to predict these Let's just consider a very simple model a Very simple model to predict that and that is the mean of these t these 10 values And we know how to calculate a mean you somewhere over all the values the x sub i There's 10 so x will be one then the second one then the third one except one would be the three except two would be the four Except three would be the two there except four would be the four I hope you didn't get those mixed up. Anyway, one two three four until ten That's what the summation sign means and I'm going to go from one to in and in his tenure Their sample size is ten and then I'm going to divide by in how many they are and that's just the calculation for the arithmetic mean So when we get this mean sales, I can just say mean here mean say mean Let's do that mean and sales as an argument Save that in the mean dot sales computer variable or object and Indeed the mean is 4.9. I can suggest that as a very baseline model that no matter what my input variables are I will always predict 4.9. So I would predict that all these values are 4.9 4.9 4.9 4.9 4.9 4.9 4.9 I haven't really learned much. I'm just predicting that every Output given any set of input. I will always sell 4.9 now, of course I'm making an error here because if I predicted the sale of 4.9 actual sale was three There's a difference between four nine and three and yeah, I sold four and again There's a difference between 4.9 and four. There's a difference between what my model predicts. It will always predict 4.9 This is a very simple model But there's an error every time and I can check what the differences are by just Subtracting this one from that one in each instance So I can say three minus four point nine and four minus four point nine and two minus four point nine That difference is the error that I'm making And I can sum over all these errors, but just have a look at one thing I mean two is less than four point nine, but five and six and nine and 12 they more than four point nine So I'm gonna get some negative numbers and some positive numbers and lo and behold If I I'm just using rounding. Yeah, otherwise, we're gonna get something to the power negative 15 Which is basically zero. I'm just subtracting all of those and I'm adding all those subtractions I'm gonna end up with zero because that is just what the mean is doing some values are below and some values are above and this One sits in the middle So if I sum up all the differences, I am going to end up with zero. So zero is definitely not the error I'm not making an error of zero. I'm definitely making an error So a better way to do this is what we call the sum of squared errors So I'm gonna take every value again here and I'm going to subtract that from the mean as we did before But I'm gonna square every value. Remember if you square anything the result is a positive number So minus three times minus three is positive nine So I'm always gonna end up with a positive and then I'm going to add all these squared values So that's the sum of squared errors and now that gives me if I do that the sum of sales minus four point nine square It's going to do a bit of broadcasting in other words element by element It's gonna subtract four point nine from each one of these square that and it'll have ten of those Square differences and then sum all of them and I get a hundred point nine So that's closer to the fact that I am making an error here There's one problem though is that that error is not improper units So say for instance those was how many pounds I sold now I have pounds squared what is a pound squared what kind of thing is that so that doesn't work Number two is if I have more I only have tenure, but you can imagine if I added more I'm just adding all my errors. So the errors are going to be related to how many variables I have So again, I can't compare two errors with two, you know in two different scenarios because this is going to depend How many I have I might be woefully wrong with something that just have five errors and be very close with something That has a thousand of these values thousand errors But you know the thousand might be much more accurate one in the end Even though it has a bigger error. So we've got to do something and we solve that problem as we see down here by dividing by the Number of some of the sample size now, we don't do the sample size We actually do n minus one and that has to do with the concept of degrees of freedom Which I don't want to get into here So if I divide by how many there are My sum of squared errors, which is still in the numerator there divide by how many there are now I'm stabilizing this thing So it doesn't matter how many Errors I have how many elements I have in in my In my target vector here my target variable here I'm dividing by how many there are so it doesn't matter now I can compare two different scenarios with each other and because I had pound squared remember I'm just going to take the square root of that and Recall that the standard deviation and of course without taking the square root we call that a variance now That is a different way of looking at variance and standard deviation We used to look at it in statistics just as a measure of dispersion But we're not looking at it as a measure of dispersion here. What we are looking at it here is a Indicator of how wrong my model is if my model always predicts the mean Then I can use the standard deviation or the variance the variance is this without the square root of How wrong my model is so the variance in The error of my model How bad my model is that is what variance and standard deviation really is and if I do that here I'm dividing by nine. There's ten minus one is nine and the sum of squares. So I have a I have a variance here of eleven that describes the error in my model and This brings home a very important concept that the actual target now There's ten of these so why one why two why three why four are these numbers up here? My target if I can represent them as a why I each of them is going to be the mean plus some error Evie the difference between those two is some error And so I can run through all ten of those and each will have its individual error And if I add that error to the mean I'm going to get the actual why and now this is really important concept in supervised machine learning I'm gonna have some target variable and I'm going to have some model and if I add to that some error I'm going to get to the target and That is that's machine learning right there in equation five here Now let's learn a few extra terms by just looking at improving our model now what I'm going to do is to set the random seed and So just if you run this you get the same results I'm going to create an input variable and an output variable my input variable is going to be a hundred values with a mean of 10 a standard deviation of two and One decimal place and I'm just going to add some random noise to each of these to get my output variable I'm just it's been a contrived Creation here of an input variable and a target variable Sometimes it's also called an output a target might also be called an output. I'm using plotlier just to show you all these Values so I've got my input variable on the x-axis and I'm predicting what the output variable is going to be so in the end This would have been my sales my 10 sales values up here and now we are putting an input So this one for an input of seven I will have a result of six point one for this one an input of six point four will give me an output a target value of Seven point four and I'm asking the question Can I use this six point four or that seven or this six or that seven point six to? predict what the second number the output variable the target variable is going to be and you've seen this in linear regression I Can perhaps draw a straight line through here and that straight line is going to represent my model So that anyway on that straight line given whatever the value here is on the x-axis I can read from the x from the from the y-axis what the output the predicted output is going to be and that is Going to be different from the real output of the line goes you know just imagine I didn't draw it in here, but imagine the line goes down here and right about here It's predicting just over say eight point one eight point two given a seven point five Look, I'm in the line with a seven point five here. So down there may be here It would have been eight point two, but in reality it was eight point eight So my model would be out by by oh point six if it was eight point two minus eight point eight Negative oh point six would have been my error and so if I draw a line in here There'll be an error between each of these and remember I can square all of those So that's the sum of squared errors and I can divide that by n minus one which would be 99 in this case and get the variance As to describe the error in my model But what I'm going to try and predict here my aim here is to predict this line and Remember a straight line from school algebra a straight line will have a slope You know this will be a positive slope going up if the line went down will be a negative slope And remember slope is this rise over run So if I draw a line from here to there It will be the difference in y divided by the difference in x that gives me the slope and When x is zero here, it's going to cross through the y line somewhere and that's the y intercept So if I have those two called parameters The y intercept and the slope I'll be able to draw this line So those are the two things I have to learn and there's the learn in deep learning We're trying to learn what those values would be so that when I draw this line here and The aim here is to minimize the error the error between what the predicted value is going to be on this line My model and the true value there. That is the aim to learn the very two best Values for that slope and that intercept to give me the best line to minimize my error and again. I just show you Another term here. It's called deviation. That's observed What is the real value minus what my model is going to suggest squared? And I sum over all of these same story now if I were to run this over these 100 sample values that I did the sum of squared differences I get 681 that is that is really the variance in my model now Let us just through blind Interpretation and and again just to reiterate I used the mean of these sales here the output variable I used the mean of that as My model so I'm subtracting each one of these from the mean squaring that summing that so the sum of squared errors and I haven't yet divided it by 99 so this is not the variance. It's a 681. Let's improve this by introducing Just blindly a slope in an intercept the slope of 0.8 and an intercept of 0.1. So if I plug it into that Why I and then we put a hat on because this is now the predicted one Remember is beta zero, which is the two year and beta one, which is the 0.95 I'm just plugging those in I should just change what I've written here because I've changed this to 0.95 and two So I'll just change that to 0.95 and two when I republish this So there we go 0.95 and two I'm plugging those in just as blind as I can and I do the sum of squared Eras and I get 383 So that's an improvement on the 681 and I can actually just divide those two to each other just to get the r squared Which will give me an indication of of the improvement in my model And we call this this ratio actually the systematic variance is Relative to how much variance it was to begin with that's the baseline model the unsystemic variance Statistical terms we need to care too much about that Now I can abuse the system a little bit by cheating and using the linear model function in R And I create this output variable is dependent on just a single input variable if they were more I would just plus the next one plus the next one and that gives me these two values right here the best ones an Intercept of 1.94 and a slope here of 0.99. That's what these two values under the estimate was So in actual fact the prediction is going to be 0.99 82 times my input variable plus 1.942 That's going to give me the predicted y and those are the best values To for beta 0 and beta 1 to take to give me the least number of errors to get to the true value You'll see there's no hat on this one I have to add an error that's the difference between each prediction and the actual value And we can just expand this if I had more than one if I had more than one Input variable also called a feature variable that would just expand to beta 1 x 1 and beta 2 x 2 and beta all the way To end how many input variables I had plus some error is going to give me the true y output The concept here really is to learn the best parameters beta 0 and beta 1 So I can draw the best line here to have the least number of errors Now by the way, if you hear the banging outside I've mentioned a couple of times they're building a new neuroscience center right outside my office And doesn't matter that I come in early in the morning before work or leave after work The banging continues and it's driving me nuts So I do these recordings out of ours, but doesn't matter You're still gonna hear this banging as they built the new center and please just forgive that I cannot run away from that here my office at all So a few basic concepts that we take away here that and there's some something we can learn Something we can learn and one way to learn was to create this sum of squared errors And in some way which is gonna come in the next section as we continue this look into linear regression to teach us Fundamentals of deep learning. We're going to learn. We're gonna show you how to learn what these parameters are by Minimizing something we're gonna minimize the error and I'm gonna show you the concept behind minimizing the error So I can get the best line two for three and that line is just a model It predicts this target variable and that's what we're going to do in deep learning We can just shove in a lot of data and we want to learn We're going to predict some outcome and we want that outcome to be as close as possible to the truth I might give a lot of images some with a nodule in the lung It's cancerous and some are not imagine some CD scans and I want the model to predict if I give it an x-ray in the end Or CD scan in the end is this? Tumor or is it not a tumor and I want that to be as accurate as possible same concept is going to apply I'm going to create some error function And I want to minimize as this as this deep learning network that we're going to develop as it learns From the data because every time I'm going to give a data to learn from first Which has the target variable known and it's going to learn these parameters akin to the beta zero beta one here to predict the best possible accuracy And it's going to change them and change them and change them until it gets the best values of Something that is akin to this beta zero beta one To give the best prediction if for that CD scan if this really is a tumor or not So it makes the least amount of mistakes That is what we are after and you can see that these concepts these basic concepts here That we that we're showing here in linear regression Is is really the basis of what we need to do to develop our understanding of deep learning? So that's it for this video tutorial. I'm going to move on to the next one I just want to just want to to warn you there are there are going to be some additional videos on YouTube two of the concepts that we're going to come across are linear algebra and Calculus some derivatives multivariable derivatives Now strictly speaking you do not I have to understand linear algebra or derivatives to be able to write code or to create a deep learning network Don't run away from it because you don't understand linear algebra or derivatives I'm going to make two separate extra little videos just to show the very basic concepts of the derivatives in them in a Multivariable system and to the basics of linear algebra Just to remind you from stuff that you might have seen at school or you might have seen Early on in your university career, and you just forgotten them or you've never seen them before it doesn't matter I just showing the very basic concepts just to ease the understanding But again, you don't need to know you're gonna write a line of code and the computer is going to do this all for you But I think it is nice if you have that understanding if you want deeper understandings I've got two massive playlists on YouTube one on multivariable calculus and one on on linear algebra and if you really want to understand it I think it is over a hundred lectures and each of those two playlists that I've put out there if you really want to understand any algebra You really want to understand multivariable calculus. You can watch those but again, I reiterate it's not necessary We're gonna write a line of code and the computer is going to do all this for us But perhaps we just watch those two very basic videos. It really is going to help you to understand Where the how this deep learning Deep neural network that we are going to construct, you know What the basics of it really is just to give you that intuitive understanding of what is happening here Actually without you knowing you already have it just from from this video I look forward to speaking to you again as I say subscribe on YouTube. Follow me on Twitter Connect with me on LinkedIn these these At least the format that you see is available on our pubs and it's going to be and it's available on github I'll put all the links in the video. It'll all be written on LinkedIn and I refer to it on Twitter, etc As I said, I want to spread this word as far and wide as possible I want as many people To to get to grips with deep learning and I want us to start using it in our research and answer some fundamental questions and solve some problems In healthcare, we all come together and we make the effort to learn about a deep learning