 Hi everybody and welcome to lesson three of practical deep learning for coders We did a quick survey this week to see how people feel that the course is tracking and Over half if you think it's about right pace and of the rest who aren't Some of you think it's a bit slow and some of you think it's a bit Sorry, I'm I think it's a bit slow and some of you think it's a bit fast So hopefully we're that's about the best we can do Generally speaking the first two lessons are a little more easy pacing for anybody who's already familiar with the kind of basic Technology pieces and then the later lessons get you know more into kind of some of the foundations and today we're going to be talking about You know things like the motrix multiplications and gradients and calculus and stuff like that So for those of you who are more mathy and less computer-y you might find this one more comfortable and vice versa So remember that there is a official course updates thread where you can see all of the up-to-date info about Everything you need to know and of course the course website As well So by the time, you know you watch the video of the lesson It's pretty likely that if you come across a question or an issue somebody else will have and so definitely search the forum and check the facts First and then of course feel free to ask a question yourself on the forum if you can't find your answer One thing I did want to point out which you'll see in the lesson thread and the course website is There is also a lesson zero lesson zero is based heavily on Radix book met a learning which internally is based heavily on All the things that I've said over the years about how to learn fast AI it's we try to make the course full of tit-bits About the science of learning itself and put them into the course It's a different course to probably any other you've taken and it's I strongly said recommend watching lesson zero as Well, the last bit of lesson zero is about how to set up a Linux box from scratch Which you can happily skip over unless that's of interest, but the rest of it is Full of juicy information that I think you'll find useful so the basic idea of What to do to do a faster AI lesson is Watch the lecture And I generally you know when the video recommend watching it all the way through without stopping once and then go back and Watch it with lots of pauses running the notebook as you go because otherwise you're kind of like running the notebook Without really knowing where it's heading if that makes sense And the idea of running the notebook is is you could you know? There's a few notebooks you could go through so obviously there's the book So going through chapter one of the book through chapter two of the book as notebooks running every code cell and Experimenting with inputs and outputs to try and understand what's going on and then trying to reproduce those results and Then trying to repeat the whole thing with a different data set and if you can do that last step, you know, that's Quite a stretch goal to clear the start of the course because there's so many new concepts But that really shows that you you've got it sorted now first third bit reproduce results I recommend using you'll find in the fast book repo so the repository for the book There is a special folder called Clean and clean contains all of the same chapters of the book But with all of the text removed except the headings and all the outputs removed And this is a great way for you to test your understanding of the chapter is before you run each cell Try to say to yourself. Okay. What's this for? And what's it going to output if anything and if you kind of work through that slowly That's a great way at any time. You're not sure you can jump back to the The version of the notebook with the text to remind yourself and then head back over to the clean version So there's an idea for something which a lot of people find really useful for self-study I Say self-study, but of course as we've mentioned before The best kind of study is Study done to some extent with others for most people You know the research shows that that you're more likely to stick with things if you're doing it That's kind of a bit of a social activity There the forums are a great place to find and create study groups And you'll also find on the forums a link to our discord server So yeah, it's a discord server where there are some study groups there as well so, you know in-person study groups virtual study groups are a great way to You know really make good progress and find other people at a similar level to you if there's not a Study group coming at your level in your area in your time zone Create one so just post something saying hey, let's create a study group So this week there's been a lot of fantastic activity. I can't show all of it So what I did was I used the Summary functionality in the forums to grab all of the things with the highest votes And so I just quickly show a few of those we have a Marvel detector created this week Identify your favorite Marvel character. I Love this a rock-paper-scissors game where you actually use pictures of the rock-paper-scissors Symbols and apparently the computer always loses. That's my favorite kind of game There is a lot of Elon around so very handy to have an elon detector to you know Either find more of him if that's what you need or maybe less of him I thought this one is very interesting. I love these kind of really interesting ideas. There's like gee. I wonder if this would work Can you predict the average? temperature of an area based on a aerial photograph and The eye and apparently the answer is yeah, actually you can predict it pretty well here in Brisbane It was predicted. I believe to within one and a half Celsius I think this student is actually a genuine meteorologist if I remember correctly he built a cloud detector So then building on top of the what's your favorite Marvel character. There's now a also an Isida Marvel character My daughter loves this one. What dinosaur is this and I'm not as good about dinosaurs as I should be if you're like this Ten times more dinosaurs than there was when I was a kid. So I never know their names. So this is very handy This is cool choose your own adventure where you choose your path using facial expressions And I think this music genre classification is also really cool Brian Smith created a Microsoft power app Application that actually runs on a mobile phone. That's pretty cool I wouldn't be surprised to hear that Brian actually works at Microsoft So also an opportunity to promote his own stuff there I thought this art movement classifier was interesting in that like there's a really interesting discussion on the forum about What it actually shows about similarities between different art movements And I thought this redaction Detector project was really was really cool As well, and there's a whole tweet thread and blog post and everything about this one particularly great piece of work Okay, so I'm gonna Quickly show you a couple of little tips before we kind of jump into the mechanics of what's behind a neural network Which is I was playing a little bit with how do you make your neural network more accurate during the week and so I created this pet detector and this pet detector is not just predicting Predicting dogs or cats, but what breed is it? That's obviously a much more difficult exercise now because I put this out on Hiking Face Spaces you can Download and look at my code because if you just click files and versions on the space Which you can find a link on the forum and the course website You can see them all here and you can download it to your own computer So I'll show you What I've got here now One thing I mentioned is today I'm using a different platform So in the past I've shown you co-lab and I've shown you Kaggle And we've also looked at doing stuff on your own computer Not so much training models on your computer, but using the models you've trained to create applications Paper space is a another website a bit like Kaggle and Google But in particular they have a product called gradient notebooks Which is at least as I speak and things change all the time so check the course website But as I speak in my opinion is is by far the best platform for Running this course and for you know doing experimentation I'll explain why as we go so why haven't I been using the past two weeks? Because I've been waiting for them to build some stuff for us to make it particularly good And they just they just finished so I've been using it all week. That is totally amazing This is what it looks like So you've got a machine running in the cloud, but the thing that Was very special about it is it's a it's a real it's a real computer you're using It's not like that kind of weird virtual version of things that Kaggle or Colab has So if you whack on this button down here, you'll get a full version of Jupiter lab Or you can switch over to a full version of plastic Jupiter notebooks And I'm actually going to do stuff in Jupiter lab today because it's a pretty good environment for beginners Who are not familiar with the terminal which I know a lot of people in the course in that situation You can do really everything Kind of graphically there's a file browser. So here you can see I've got my pets repo It's got a git repository thing. You can pull and push to git and then you can also Open up a terminal create new notebooks And so forth. So what I tend to do with this is I tend to go into a full screen. This is kind of like its own whole IDE And So you can see I've got here my my terminal Here's my notebook They have free GPUs and Most importantly, there's two good features one is that you can pay I think it's eight or nine dollars a month to get better GPUs and basically as many as you get As many hours as you want And they have persistent storage so with co-lab if you've played with it You might have noticed it's annoying. You have to muck around with saving things to Google Drive and stuff on Kaggle There isn't really a way of Kind of having a persistent environment Where else some paper space you have, you know, whatever you save in your storage It's going to be there the next time you come come back so I'm going to be adding Walkthroughs of all of this functionality. So look at so if you're interested in really taking advantage of this check those out Okay, so I Think the main thing That I wanted you to take away from lesson two isn't necessarily all the details of how do you use a particular platform to train models and Deploy them into applications through JavaScript or online platforms But the key thing I wanted you to understand was the concept There's really two pieces There's the training piece and at the end of the training piece you end up with this bottle pickle file, right? And once you've got that That's now a thing where you feed it inputs and it spits out outputs Based on that model that you trained and then so you don't need You know because that happens pretty fast. You generally don't need a GPU once you've got that trained And so then there's a separate step Which is deploying so I'll show you how I trained my pet classifier So you can see I've got two IPython notebooks One is app, which is the one that's going to be doing the inference and production one is the one where I train the model So this first bit I'm going to skip over because you've seen it before I create my image data loaders Check that my data looks okay with show batch train a ResNet 34 and I get 7% accuracy So that's pretty good But check this out there's a link here To a notebook I created actually most of the work was done by Ross Whiteman Where we can try to improve this by finding a better architecture There are I think at the moment in the PyTorch image models libraries over 500 Architectures and we'll be learning over the course You know what what they are how they differ, but you know broadly speaking they're all mathematical functions, you know, which are basically matrix modifications and and these these non linearities such as Reluers that we're talking about today So most of the time those details don't matter what we care about is three things how fast are they? How much memory do they use and how accurate are they and So what I've done here with Ross is we've grabbed all of the models from PyTorch image Image models and you can see all the code we've got is very very little code To create this this plot Now my screen resolution is a bit Okay, let's do that And so on this plot on The x-axis we've got Seconds per sample so how fast is it so to the left is better is faster and on the right is how accurate is it? So how how accurate was it on image net in particular and so generally speaking you want things that are up towards the top and left Now we've been mainly working with ResNet and you can see down here Here's ResNet 18 now ResNet 18 is a particularly small and fast versions of prototyping. We often use ResNet 34 Which is this one here And you can see this kind of like classic model that's very widely used actually nowadays isn't the state of the art anymore So we can start to look up at these ones up here and find out some of these better models the ones that seem to be the most accurate and fast for these Levit models So I tried them out on my pets and I found that they didn't work particularly well So I thought okay. Let's try something else out. So next up. I tried these comf next models and This one in here was particularly interesting. It's kind of like super high accuracy. It's the you know if you want 0.001 seconds inference time. It's the most accurate. So I tried that. So how do we try that? all we do is I can say So the PyTorch image models is in the Tim module. So at the very start I imported that And we can say list models and pass in a Glob a match And so this is going to show all the comf next models and Here I can find the ones that I just saw and all I need to do is when I create the vision learner I just put the name of the model in as a string Okay, so you'll see earlier This one is not a string. That's because it's a model that fast AI provides the library Fast AI only provides a pretty small number So if you install Tim, so you need to pip install Tim or condor install Tim You'll get hundreds more and you put that in a string So if I now train that the time for these epochs goes from 20 seconds to 27 seconds So it is a little bit slower But the accuracy goes from 7.2 percent Down to 5.5 percent. So, you know, that's a pretty big relative difference 7.2 divided by 5.5. Yeah, it's about a 30 percent improvement So that's pretty fantastic. And you know, it's It's been a few years honestly since we've seen anything Really beat ResNet that's that's widely available and usable on regular GPUs So this is this is a big step. And so this is a you know, there's a few architectures nowadays that really are Probably better choices a lot of the time and these cons. So if you are not sure what to use Try these comf next architectures You might wonder what the names are about. Obviously Tiny's more large, etc. Is how big is the model? So that'll be how much memory is it going to take up and how fast is it? and Then these ones here that say in 22 FT 1k These ones have been trained on more data. So image net. There's two different image net data sets There's one that's got a thousand categories of pictures and there's another one. It's about 22,000 categories of pictures So this is trained on the one with 22,000 categories pictures So these are generally going to be more accurate on kind of standard photos of natural objects Okay, so from there I exported my model and that's the end. Okay, so now I've trained my model and I'm all done You know other things you could do obviously is add more epochs for example That image augmentation there's various things you can do but you know, I found this this is actually pretty Pretty hard to beat this by much If any of you find you can do better, I'd love to hear about it So then I'd turn that into an application. I just did the same thing that we saw last week, which was to load the learner As is something I did want to show you The learner once we load it and call predict spits out a list of 37 numbers That's because there are 37 breeds of dog and cat. So these are the probability of each of those breeds. What order they are they in? That's an important question The answer is that fast AI always stores this information about categories This is a category in this case of dog or cat breed in something called the vocab object and it's inside the data loaders So we can grab those categories and that's just a list of strings just tells us the order So if we now zip together the categories and the probabilities, we'll get back a dictionary that tells you Well like so So here's that list of categories and Here's the probability of each one and this was a basset hound so there you can see yep Almost certainly a basset hound So from there just like last week we can go and create our interface and then and then launch it And there we go. Okay, so What have we just do really? What is this magic? model pickle file So we can take a look at the model pickle file it's an object I call a learner and A learner has two main things in it The first is the list of pre-processing steps that you did to turn your images into things of the model And that's basically This information here So it's your data blocks or your image data loaders or whatever and then the second thing most importantly is the trained model and So you can actually grab the trained model by just grabbing the dot model Attribute so I'm just going to call that M and then if I type M I can look at the model and so here it is Lots of stuff. So what is this stuff? Well, we'll learn about it all over time, but basically what you'll find is It contains lots of layers because this is a deep learning model and you can see it's kind of like a tree That's because lots of the layers themselves consist of layers So there's a whole layer called the Tim body, which is most of it and then right at the end there's a second layer called sequential and Then the Tim body contains Something called model and it can then it contains something called stem and something called stages and then stages contain zero one two, etc So what is all this stuff? Well, let's take a look at one of them So to take a look at one of them There's a really convenient method in PyTorch called get some module where we can pass in a kind of a dotted String navigating through this hierarchy. So zero model stem one goes zero model stem one So this is going to return this layer norm 2d thing So what is this layer norm 2d thing? Well, the key thing is It's got some code with the mathematical function that we talked about and then the other thing that we learned about is it has Parameters so we can list this parameters and look at this. It's just lots and lots and lots of numbers Let's grab another example. We could have a look at zero dot model stages dot zero two blocks dot one dot MLP Fc one and parameters Another big bunch of numbers So what's going on here? What are these numbers and? Where on earth did they come from and how come these numbers can figure out whether something is a basset hound or not? okay, so To answer that question. We're going to have a look at a Kaggle notebook How does a neural network really work? I've got a local version of it here which I'm going to take you through and the basic idea is Machine learning models are things that fit Functions to data so we start out with a very very flexible in fact an infinitely flexible as we've discussed function a neural network and We get it to do a particular thing Which is to recognize the patterns in the data examples. We give it so let's do a much simpler example Than a neural network. Let's do a quadratic So let's create a function F which is 3x squared plus 2x Plus one. Okay, so it's a quadratic with coefficients 3 2 and 1 So we can plot that function F and give it a title if you haven't seen this before things between dollar signs Is what's called latex. It's basically how we can create kind of types that mathematical equations Okay, so let's run that and so here you can see the function here. You can see the title. I passed it and here is our quadratic Okay, so what we're going to do is we're going to Imagine that we don't know that's the true Mathematical function we're trying to find because it's obviously much simpler than the function that figures out whether an image is a Basset hound or not that we're just going to start super simple So this is the real function and we're going to try to to recreate it from some data Now it's going to be very helpful if we have an easier way of creating different quadratics So I have to find a kind of a general form of a quadratic here If the with coefficients a b and c and at some particular point x it's going to be a x squared plus b x plus c And so let's test that Okay, so that's a For x equals one point five. That's three x squared plus two x plus one, which is the quadratic we were did before Now we're going to want to create lots of different quadratics to test them out and find out which one's best So this is a somewhat advanced but very very helpful feature of Python that's worth wanting if you're not familiar with it And it's used in a lot of programming languages. It's called a partial application of a function. Basically. I want this exact function But I want to fix the values of a b and c to pick a particular quadratic and The way you fix the values of the function is you call this thing in Python called partial and you pass in the function and Then you pass in the values that you want to fix So for example If I now say make a quadratic three to one that's going to create a quadratic equation with coefficients three two and one and You can see if I then pass in so that's now f if I pass in one point five I get the exact same value I did before Okay, so we've now got an ability to create any quadratic equation we want by passing in the parameters of the coefficients of the quadratic And that gives us a function that we can then just call as just like any normal function So that only needs one thing now, which is a value of x because the other three a b and c and I'll fix So if we plot that function We'll get exactly the same shape because it's the same coefficients okay so now I'm going to show an example of of some data some data that Matches the shape of this function, but in real life data is never exactly going to match the shape of a function It's going to have some noise. So here's a couple of Functions to add some noise So you can see I've still got the basic functional form here, but this data is a bit dotted around it The level to which you look at how I implemented these is entirely up to you It's not like super necessary, but it's all stuff, which you know the kind of things we use quite a lot So this is to create normally distributed random numbers This is how we set the seed so that each time I run this I'm going to get the same random numbers This one is actually particularly helpful. This creates a Tensor so in this case a vector that goes from negative to to In equal steps, and there's 20 of them. That's why there's 20 steps along here So then my y values is just f of x With this amount of noise added Okay, so as I say the details of that don't matter too much the main thing to know is we've got some Random data now. And so this is the idea is now. We're going to try to reconstruct the original quadratic equation find the one which matches this data So how would we do that? well, what we can do is We can create a function called plot quadratic That first of all plots our data as a scatter plot and then it plots a function Which is a quadratic a quadratic we pass in Now there's a very helpful thing for experimenting in Jupiter notebooks, which is the at interact Function if you add it on top of a function That it gives you these nice little sliders so here's an example of Quadratic with coefficients 1.5 1.5 1.5 and it doesn't fit particularly well So how would we try to make this fit better? Well, I think what I'd do is I take the first slider and I would try moving it to the left and see if it looks better or worse That looks worse to me. I think it is to be more curvy. So let's try the other way Yeah, that doesn't look bad. Let's do the same thing for the next slider. How about this way? No, I think that's worse. Let's try the other way Okay final slider Try this way No, it's worse this way So you can see what we can do. We can basically pick each of the coefficients One at a time try increasing a little bit see if that improves it try decreasing it a little bit So if that improves it find the direction that improves it and then slide it in that direction a little bit And then when we're done we can go back to the first one and see if We can make it any better Now we've done that And actually you can see that's not bad because I know the answers meant to be three two one So they're pretty close And I wasn't shooting I promise That's basically What we're going to do that's basically how those parameters created, but we obviously don't have time because the You know big fancy models have Often hundreds of millions of parameters. We don't have time to try a hundred hundred million sliders So we did something better Well, the first step is we need a better idea of like when I move it is it getting better or is it getting worse? So if you remember back to Arthur Samuel's Description of machine learning that we learned about in chapter one of the book and in lesson one We need some Something we can measure Which is a number that tells us how good is our model and if we had that then as we move these sliders We could check to see whether it's getting better or worse So this is called a loss function So there's lots of different loss functions you can pick but perhaps the most simple and common is Mean squared error which is going to be so it's going to get in our predictions And it's got the actuals and we're going to go predictions minus actuals squared and take the mean So that's mean squared error so If I now rerun the exact same thing I had before but this time I'm going to calculate the loss the MSc between the values that we predict f of x Remember where f is the quadratic we created and the actuals y and this time I'm going to add a title to our function Which is the loss? so now Let's do this more rigorously We're starting at a mean squared error of 11.46. So let's try moving this to the left and see if it gets better No, what so move it to the right? All right somewhere around there. Okay. Now. Let's try this one Okay Best when I go to the right Okay, what about C? 3.91 getting worse So keep going Sorry about that and so now we can repeat that process right so we've we've had each of a B and C move a little bit Let's go back to a and I get any better than 3.28. Let's try moving left Yeah, that was a bit better and for B. Let's try moving left worse Right with better and have it finally see me to the right. Oh definitely better There we go okay, so That's a more rigorous approach. It's still manual But at least we can like we don't have to rely on us to kind of recognize. Does it look better or worse? So finally We're going to automate this So the key thing we need to know is for each parameter When we move it up Does the loss get better or when we move it down? Does the loss get better? One approach would be to try it right? We could manually increase the parameter a bit and see if the loss improves and vice versa but there's a much faster way and The much faster way is to calculate its derivative So if you've forgotten what a derivative is no problem There's lots of tutorials out there. You could go to Khan Academy or something like that But in short the derivative is what I just said the derivative is a function that tells you if you Increase the input does the output increase or decrease and by how much that's called the slope or the gradient Now the good news is PyTorch can automatically calculate that for you. So if you went through Horrifying months of learning derivative rules in year 11 and are worried you're going to have to remember them all again Don't worry. You don't You don't have to calculate any of this yourself. It's all done for you. Watch this So the first thing to do is we need a function that takes the coefficients of the quadratic a b and c as inputs I put them all on the list. You'll see why in a moment. They're going to call them parameters We create a quadratic Passing in those parameters a b and c This star on the front is a very very common thing in Python Basically it takes these parameters and spreads them out to turn them into a b and c and pass each of them to the function So we've now got a quadratic with those coefficients And then we return the mean squared error of our predictions against our actions So this is a function. It's going to take the coefficients of a quadratic and return the loss So let's try it Okay, so if we start with a b and c of 1.5 We get a mean squared error of 11.46 It looks a bit weird it says it's a tensor So don't worry about that too much in short in PyTorch Everything is a tensor. A tensor just means that you don't it doesn't just work with numbers It also works with lists or vectors of numbers. That's got a 1d tensor Rectangles of numbers so tables of numbers. It's got a 2d tensor Layers of tables of numbers. That's called a 3d tensor and so forth. So in this case, this is a single number But it's still a tensor. That means it's just wrapped up in the PyTorch Machinery that allows it to do things like calculate derivatives But it's still just the number 11.46 All right, so what I'm going to do is I'm going to create my parameters a b and c and I'm going to put them all in a Single 1d tensor. Now 1d tensor is also known as a rank 1 tensor So this is a rank 1 tensor and it contains the list of numbers 1.5, 1.5, 1.5 And then I'm going to tell PyTorch That I want you to calculate the gradient For these numbers whenever we use them in a calculation and the way we do that is we just say requires credit So here is our tensor it contains 1.5 three times and it also tells us it's we flagged it to say please calculate gradients for this particular tensor when we use it in calculations So let's now use it in the calculation We're going to pass it to that quad MSE. That's the function. We just created that gets the MSE the means grid error for a set of coefficients and Not surprisingly, it's the same number we saw before 11.46. Okay Not very exciting, but there is one thing that's very exciting It is added an extra thing to the end called grad function and this is the thing that tells us that if we wanted to PyTorch knows how to create calculate the gradients for our inputs and to tell PyTorch just please go ahead and do that calculation you call backward on The result of your loss function now when I run it nothing happens It doesn't look like nothing happens, but what does happen is it's just added an attribute called grad Which is the gradient to our inputs ABC. So if I run this cell This tells me that if I increase a the loss will go down If I increase B the loss will go down a bit less And if I increase C the loss will go down Now we want the loss to go down Right, so that means we should increase a B and C Well, how much by well given that a is Says if you increase a even a little bit the loss improves a lot That suggests we're a long way away from the right answer. So we should probably increase this one a lot This one the second most and this one the third most Okay, so this is saying when I increase This parameter the loss decreases So in other words, we want to adjust our parameters a B and C By the negative of these we want to increase increase increase So we can do that By saying, okay, let's take our ABC Minus equals so that means equals ABC minus the gradient But we're just going to like decrease it a bit. We don't want to jump too far. Okay, so just we're just going to go a small distance So we're going to we're just going to somewhat arbitrarily pick point oh one So that is now going to create a new set of parameters But you're going to be a little bit bigger than before because we subtracted negative numbers And we can now calculate the loss again So remember before It was eleven point four six So hopefully it's going to get better Yes, it did ten point one one There's one extra line of code which we didn't mention which is with torch dot no grad Remember earlier on we said that the parameter ABC requires grad and that means pie torch will automatically calculate It's derivative when it's used in a in a function Here it's being used in a function, but we don't want the derivative of this. This is not our loss Right, this is us updating the gradients. So this is basically The standard in a part of the pie torch loop and every neural net deep learning Pretty much every machine learning model at least of this style that you're billed basically looks like this If you look deep inside faster your source code, you'll see something that basically looks like this So we could automate that right so let's just take those steps which is we're going to Calculate let's go back to here. We're going to calculate the mean squared error for our quadratic Call backward and then subtract the gradient times a small number from the gradient Let's do it five times So so far we're up to a loss of ten point one So we're going to calculate our loss call dot backward to calculate the gradients and Then with no grad subtract the gradients times a small number and print how we're going and There we go the loss keeps improving So we now have Some coefficients and there they are 3.2 1.9 2.0. So they're definitely heading in the right direction So that's basically how we do it's called optimization Okay, so you'll hear a lot in deep learning about optimizers. This is the most basic kind of Optimizer, but they're all built on this principle Of course, it's called gradient descent and you can see why it's called gradient descent we calculate the gradients and Then do a descent which is in we're trying to decrease the loss so believe it or not that's that's The entire Foundations of how we create those parameters. So we need one more piece Which is what is the mathematical function that we're finding parameters for? We can't just use quadratics right because it's pretty unlikely that the relationship between parameters and Whether a pixel is part of a basset town is a quadratic. It's going to be something much more complicated. No problem it turns out that We can create an infinitely flexible function from this one tiny thing This is called a rectified linear, you know The first piece I'm sure you will recognize It's a linear function. We've got an output y our input x and coefficients m and b This is even simpler than our quadratic and This is a line And torch dot flip is a function that takes that output y and if it's greater than that number It turns it into that number. So in other words, this is going to take anything that's negative and make it zero So this function is going to do two things Calculate the output of a line and if it is bigger smaller than zero, it'll make it zero So that's rectified linear So let's use partial To take that function and set the m and b to one and one. So this is now going to be this function here we'll be Y equals x plus one followed by this torch clip and here's the shape. Okay, as we'd expect It's a line Until it gets under zero When it becomes well, it's still a line. It's a becomes a horizontal line So we can now do the same thing we can take this plot function and make it interactive using interact and We can see what happens when we change its two parameters m and b So we've now plotting Directified linear and fixing m and b so m is the slope okay, and b is the is the Intercept for the shift up and down Okay So that's How those work now? Why is this interesting? Well, it's not interesting of itself But what we could do is we could take this rectified linear function and Create a double revenue Which adds up to rectified linear functions together? So there's some slope and one b1 some second slope and two b2. We calculate it at some point x And so let's take a look At what that function looks like if we plot it And you can see what happens is we get this downward slope and then a hook and then an upward slope So if I change m1, it's going to change the slope of that first bit and b1 is going to change its position Okay, and I'm sure you won't be surprised to hear that m2 changes the slope of the second bit and b2 Changes that location Now this is interesting why Because we don't just have to do a double value We could add as many values together as we want And if we add as many values together as we want then we can have an arbitrarily squiggly function and with enough values We can match it as close as we want Right, so you could imagine incredibly squiggly like I don't know like an audio wave form of me speaking and If I gave you a hundred million values to add together You could almost exactly match that Now we want functions that are not just That we've put in 2d we want things that can have more than one input But you can add these together across as many dimensions as you like and so exactly the same thing will give you a value over surfaces or a value over 3d 4d 5d and so forth and it's the same idea with this Incredibly simple foundation You can construct an arbitrarily accurate precise Model Problem is You need some numbers for them in a parameters. Oh No problem. We know how to get parameters. We use gradient descent So believe it or not We have just derived deep learning everything from now on is Tweaks to make it faster and make it need less data You know this is this is it Now I remember a few years ago when I said something like this in a class Somebody on the forum was like this reminds me of that thing about how to draw an owl Jeremy's basically saying okay step one draw two circles Step to draw the rest of the owl The thing I find I have a lot of trouble explaining to students is when it comes to deep learning There's nothing between these two steps when you have values getting added together and gradient descent to optimize the parameters and Samples of inputs and outputs that you want The computer draws the owl right that's that's that's it So we're going to learn about all these other tweaks and they're all very important But when you come down to like trying to understand something in deep learning just try to keep coming back to remind yourself of what it's doing Which it's using gradient ascent to set some parameters to make a wiggly function Which is basically the addition of lots of rectified linear units or something very similar to that Match your data Okay, so we've got some questions on the forum Okay, so question from a Zakiya with six up votes so for those of you Watching the video what we do in the lesson is we want to make sure that the Questions that you hear answered are the ones that people really care about So we pick the ones which get the most up votes this question is Is there perhaps a way to try out all the different models and automatically find the best performing one? Yes, absolutely you can do that so If we go back to our trading script remember there's this thing called list models and That's a list of strings so you can easily add a for loop around this that basically goes you know for Architecture in Tim list models and you could do the whole lot which would be like that and Then you could Do that and away you go It's going to take a long time for 500 and something models So generally speaking like I've I've never done anything like that myself I would rather look at a picture like this and say like okay, where am I and The vast majority of the time. This is something. This would be the biggest. I reckon number one mistake of Beginners I see is that they jump to these models From the start of a new project at the start of a new project. I pretty much only use resin at 18 Because I want to spend all of my time Trying things out and I try different data orientation. I'm going to try different ways of cleaning the data I'm going to try you know Different external data I can bring in and so I want to be trying lots of things and I want to be able to try it as fast as possible, right so Trying better architectures is the very last thing that I do and What I do is once I've spent all this time and I've got to the point where I've got okay I've got my resident 18. Well, maybe you know resin at 34 because it's nearly as fast And I'm like, okay, well how accurate is it? How fast is it? Do I need it more accurate for what I'm doing? Do I need it faster for what I'm doing? Could I accept some trade-off to make it a bit slower to make more accurate and so then I'll have a look And I'll say okay. Well, I kind of need to be somewhere around 0.001 seconds. And so I try a few of these So that would be how I would think About that Okay, next question from the forum Is around how do I know if I have enough data? What are some signs that indicate my problem needs more data? I think it's pretty similar to the architecture question. So you've got some amount of data Presumably you've you know, you've started using all the data that you have access to you've built your model You've done your best Is it good enough? Do you have the accuracy that you need for whatever it is you're doing? You can't know until you've trained the model but as you've seen it only takes a few minutes to train a quick model so My very strong opinion is that the vast majority of Projects I see in industry Wait far too long before they train their first model You know in my opinion you want to train your first model on day one with whatever CSV files or whatever that you can hack together And you might be surprised That none of the fancy stuff you're thinking of doing is necessary because you already have a good enough accuracy for what you need Or you might find quite the opposite You might find that oh my god with we're basically getting no accuracy at all. Maybe it's impossible These are things you want to know at the start not at the end We'll learn lots of techniques both in this part of the course and in part two About ways to really get the most out of your data In particular, there's a Reasonably recent technique called semi supervised learning which actually lets you get dramatically more out of your data And we've also started talking already about data augmentation, which is a classic technique you can use So generally speaking it depends how expensive is it going to be to get more data? But also what do you mean when you say get more data? Do you mean more labeled data? Often it's easy to get lots of inputs and hard to get lots of outputs For example in medical imaging where I spent a lot of time It's generally super easy to jump into the radiology archive and grab more CT scans But it's might be very difficult and expensive to You know draw segmentation masks and pixel roundries and so forth on them So often you can get more You know in this case images Or text or whatever and maybe it's harder to get labels And again, there's a lot of stuff you can do using stuff things like we'll discuss semi supervised learning to actually take advantage of Unlabeled data as well Okay Final question here in the quadratic example where we calculated the initial derivatives for a b and c We got values of minus 10.8 minus 2.4, etc What unit of these expressed in why don't we adjust our parameters by these values themselves? So I guess the question here is why are we multiplying it by a small number? Which in this case is point oh one Okay, let's take those two parts of the question What's the unit here? the unit is For each increase in x of one How much does what sorry in for each increase in in a of one? So if I increase a from in this case We have one point five. So if we increase from one point five to two point five What would happen to the loss? And the answer is it would go down by ten point nine eight eight seven now That's not exactly right because it's kind of like It's it's kind of like in an infinitely small space right because actually it's going to be curved Right, but it's if it if it stays it stayed at that slope. That's what would happen So if we increased B by one The loss would decrease if it stayed constant You know, so if the slope stayed the same the loss would decrease by minus two point one two two Okay, so why would we not just Change it directly by these numbers Well, the reason is The reason is that if we Have some Function that we're fitting and there's some kind of interesting theory That says that once you get close enough to the the optimal value all functions look like quadratics anyway All right, so we can kind of safely draw it in this kind of shape Because this is what they end up looking like if you get close enough and we're like let's say we're way out over here, okay, so we were measuring I Used my daughter's favorite pens and I sparkly ones, so we're measuring the slope here There's a very steep slope All right, so that seems to suggest we should jump a really long way So we jump a really long way And what happened well, we jumped way too far and the reason is that that slope decreased As we moved along and so that's generally what's going to happen, right? Particularly as you approach the optimal is generally the slope is going to decrease So that's why we multiply the gradient by a small number And that's more number. It's a very very very important number. It has a special name It's called the learning rate And this is an example of a hyper parameter It's not a parameter. It's not one of the actual coefficients of your function But it's a parameter you use to calculate the parameters Pretty better, right? It's a hyper parameter. And so it's something you have to pick now. We haven't picked any yet In any of the stuff we've done that I remember and that's because fast AI generally picks reasonable defaults For most things but later in the course we will learn about how to try and find really good Learning rates and you will find sometimes you need to actually spend some time finding a good learning rate You could probably understand the intuition here if you pick a learning rate. That's too big you'll jump too far and so you'll end up way over here and then you will try to Then jump back again, and you'll jump too far the other way and you'll actually Diverge and so if you ever see when your model's training that it's getting worse and worse Probably means your learning rates too big What would happen on the other hand if you pick a learning rate that's too small? Then you're going to Take tiny steps And of course the flatter it gets the smaller the steps are going to get and so you're going to get very very bored So finding the right learning rate is a compromise Between the speed at which you find the answer and the possibility that you're actually going to shoot past it and get worse and worse Okay, so one of the bits of feedback I got quite a lot in the survey is that people want a break halfway through which I think is a good idea So I think now is a good time to have a break. So let's come back in 10 minutes at 25 past 7 Okay, hope you had a good rest have a good break. I should say So I want to now show you a really really important Mathematical computational trick Which is we want to do a whole bunch of values All right, so we're going to be wanting to do a whole lot of MX plus B's and we want don't just want to do MX plus B. We're going to want to have like lots of Variables so for example every single pixel of an image would be a separate variable So we're going to multiply every single one of those times some coefficient and then add them all together and then do the Crop the the value and then we're going to do it a second time with a second bunch of parameters And then a third time and a fourth time and fifth time It's going to be pretty inconvenient to write out a hundred million values But that happens there's a mathematical Single mathematical operation that does all of those things for us except for the final replace negatives with zeros and it's called matrix modification I expect everybody at some point did matrix multiplication at high school. I suspect also a lot of you have forgotten how it works When people talk about linear algebra in deep learning They give the impression you need years of graduate school study to learn all this linear algebra You don't actually all you need almost all the time is matrix multiplication And it couldn't be simpler. I'm going to show you a couple of different ways The first is there's a really cool site called matrix multiplication dot XYZ. You can put in any matrix you want So I'm going to put in This one So this matrix is saying I've got three rows of data with three Variables so maybe they're tiny to the tiny images with three pixels and the value of the first one is one to one The second is oh one one and the third is two three way So those are our three rows of data These are our three sets of coefficients. So we've got a B and C in our data So I guess you'd call it x1 x2 and x3 and then here's our first set of coefficients a b and c two six and one And then our second set is five seven and eight So here's what happens when we do matrix multiplication that's second this matrix here of coefficients gets flipped around and we do This is the Modifications and additions that I mentioned right so multiply add multiply add multiply add So that's going to give you the first number because that is the left-hand column of the second matrix times the first row so that gives you the top left result So the next one's going to give us two results, right? So we've got now the right-hand one with the top row and the left-hand one with the second row Keep going down Going down and that's it. That's what matrix multiplication is. It's modifying things together and adding them up So there'd be one more step to do to make this a layer of a neural network, which is if this had any negatives We replace them with zeros that's my matrix multiplication is the critical foundation or mathematical operation in basically all of deep learning so the GPUs that we use The thing that they are good at is this matrix multiplication They have special cores called tensor cores Which we can basically only do one thing which is to multiply together two four by four matrices And then they do that lots of times so bigger matrices So I'm going to show you an Example of this we're actually going to build a complete machine learning model on real data in the spreadsheet so Fast AI has become kind of famous for a number of things and one of them is using spreadsheets To create deep learning models. We haven't done it for a couple of years. So I'm pretty pumped to show this to you what I've done is I went over to cable where There's a competition. I actually helped create many years ago called Titanic and That's like an ongoing competition. So 14,000 people have entered it. So two teams have entered it so far It's just a competition for a bit of fun There's no end date and the data for it is the data about Who Who survived and who didn't from the real Titanic disaster and so I clicked here on the download button To grab it on my computer that gave you a CSP Which I opened up in Excel The first thing I did then was I just removed a few columns that Clearly were not going to be important things like the name of the passengers the passenger ID Just to try to make it a bit simpler and so I've ended up with Each row of this is one passenger The first column is the dependent variable. The dependent variable is the thing we're trying to predict Did they survive and The remaining are some information such as what class of the boat first second or third class the sex the age How many siblings in the family? I think his parents or something So you should always look for a data dictionary to find out what's what number of parents and children, okay? What was their fare and which of the three cities did they embark on to their queens down south? Okay, so there's that data Now when I first grabbed it I noticed that There were some people with no age now There's all kinds of things we could do for that, but for this purpose. I just decided to remove them and I found the same thing for embarked. I removed the blanks as well But that left me with nearly all of the data, okay, so then I've put that over here Here's our data with those rows removed and Okay, that's the so this these are the columns that came directly from Kaggle So basically what we now want to do is we want to multiply each of these by a coefficient How do you multiply the word male? by a coefficient and How do you multiply S? Coefficient you can't so I converted all of these to numbers male and female are very easy. I created a column called is male and As you can see there's just an if statement that says if sex is male that's one otherwise is zero And we can do something very similar for embarked. We can have one column called did they embark in Southampton? Same deal and another column for did they also called Chersburg Chersburg did they embark in Chersburg? And their P class is one two or three which is a number, but it's not really It's not really a continuous measurement of something. There isn't one or two or three things that different Levels, so I decided to turn those into similar things into these binary. They're quite these are called binary categorical variables So are they first class? And are they second-class? Okay, so that's all that The other thing that I was thinking well, you know that I kind of tried it and checked out what happened and what happened was the people with So I created some random numbers. So to create the random numbers. I just went equals RAND Right, and I copied those to the right and Then I just went copy and I went paste values So that gave me some random numbers and that's my like so just because like I was like before I said Oh a B and C. Let's just start them at 1.5 1.5 1.5 What we do in real life is we start our parameters at random numbers That are a bit more or a bit less than zero So these are random numbers Actually, sorry, I slightly lied. I didn't use RAND. I used RAND minus 0.5 And that way I got small numbers that were on either side of zero So then when I took each of these and I bought applied them by Our Fairs and ages and so forth. What happened was that these numbers here are way bigger than You know these numbers here and so in the end all that mattered was what was their fate? That because they were just bigger than everything else So I wanted everything to basically go from zero to one these numbers were too big So what I did up here is I just grabbed the maximum Of this column the maximum of all the fares is 512 and so then Actually, I'll do age first I did maximum of age Because similar thing right this 80 year olds and there's two year olds and so then I over here. I just did Okay, well, what's their age? divided by the maximum and so that way all of these are between zero and one Just like all of these are between zero and one So that's how I fix this is called normalizing the data Now we haven't done any of these things when we've done stuff with fast AI That's because fast AI does all of these things for you And we'll learn about how right? But it's all these things are being done behind the scenes For fair, I did something a bit more which is I noticed there's some lots of very small fares And there's also some a few very big fares. They're like $70 and then $7 $7 Generally speaking when you have lots of really big numbers and a few small ones So generally speaking when you've got a few really big numbers and lots of really small numbers This is really common with with with money You know because money kind of follows this relationship where a few people have lots of it And there's been huge amounts of it and most people don't have heaps If you take the log of something that's like that has that kind of extreme distribution You end up with something that's much more evenly distributed. So I've added this here called log fair as you can see and These are all around one which isn't bad. I could have normalized that as well, but I was too lazy I didn't bother because it seemed okay So at this point you can now see that if we start from here All of these are all around the same kind of level, right? So none of these columns are going to Saturate the others So now I've got my coefficients Which are just as I said it's random Okay, and so now I need to basically calculate a x1 plus b x2 plus C x3 plus blablabla blah blah blah blah blah blah blah blah blah blah blah okay and so to do that You can use some product in XL. I could have piped it out by hand It could be very boring, but some product is just going to multiply each of these This one will be multiplied by Where is it subset by this one? This one will be multiplied by this one so forth, and then they get all added together Now, one thing, if you're eagle-eyed, you might be wondering, is in a linear equation we have y equals mx plus b. At the end there's this constant term. And I do not have any constant term. I've got something here called const, but I don't have any plus at the end. How's that working? Well, there's a nice trick that we pretty much always use in machine learning, which is to add a column of data just containing the number one every time. If you have a column of data containing the number one every time, then that parameter becomes your constant term. So you don't have to have a special constant term, and so it makes our code a little bit simpler when you do it that way. It's just a trick, but everybody does it. Okay, so this is now the result of our linear model. So this is not, I'm not even going to do value. I'm just going to do the plane regression. Now, if you've done regression before, you might have learned about it as something you can solve with various matrix things, but in fact you can solve a regression using gradient descent. So I've just gone ahead and created a loss for each row, and so the loss is going to be equal to our prediction minus whether they survived squared. So this is going to be our squared error, and there they all are, our squared errors, and so here I've just summed them up. I could have taken the mean. I guess that would have been a bit easier to think about, but some is going to be given the same result. So here's our loss, and so now we need to optimize that using gradient descent. So Microsoft Excel has a gradient descent optimizer in it called Solver. So I'll click Solver, and it'll say okay, what are you trying to optimize? It's this one here, and I'm going to do it by changing these cells here, and I'm trying to minimize it, and so we're starting a loss of 55.78. Actually let's change it to mean as well. It would mean or average? Probably average. All right, so start at 1.03, so optimize that, and there we go. So it's gone from 1.03 to 0.1, and so we can check the predictions. So the first one, get predicted exactly correctly. It was, they didn't survive, and we predict wouldn't survive. Ditto for this one. It's very close. And you can start to see, so this one, you can start to see a few issues here, which is like sometimes it's predicting less than one, so it's less than zero, and sometimes it's predicting more than one. Wouldn't it be cool if we had some way of constraining it to between zero and one, and that's an example of some of the things we're going to learn about that make this stuff work a little bit better, right? But you can see it's doing an okay job. So this is not deep learning, this is not a neural net yet, this is just a regression. So to make it into a neural net, we need to do it multiple times. So I'm just going to do it twice. So now, rather than one set of coefficients, I've got two cents, and again I just put in random numbers. Other than that, all the data is the same. And so now I'm going to have my sum of product again. So the first sum product is with my first set of coefficients, and my second sum product is with my second set of coefficients. So I'm just calling them linear one and linear two. Now there's no point adding those up together, because if you add up two linear functions together, you get another linear function. We want to get all those weagels, right? So that's why we have to do our value. So in Microsoft Excel, value looks like this. If the number is less than zero, do zero, otherwise use the number. So that's how we're going to replace the negatives with zeros. And then finally, if you remember from our spreadsheet, we have to add them together. So we add the values together. So that's going to be our prediction. And then our loss is the same as the other sheet, it's just survived minus prediction squared. And let's change that to main, not main average. Okay, so let's try solving that. Optimize, AX1. And this time we're changing all of those. Solve. So this is using gradient descent. Excel solvers not the fastest thing to do, but it gets the job done. Okay, let's see how we do it. 0.08 for our deep learning model versus 0.1 for our regression. So it's a bit better. So there you go. So we've now created our first deep learning neural network from scratch. And we did it in Microsoft Excel. Everybody's favorite artificial intelligence tool. So that was a bit slow and painful. Be a bit faster and easier if we use matrix multiplication. So let's finally do that. So this next one is going to be exactly the same as the last one, but with matrix multiplication. So all that data looks the same. You'll notice the key difference now is our parameters have been transposed. So before I had the parameters matching the data in terms of being in columns. For matrix multiplication, the expectation is the way matrix multiplication works is that you have to transpose this. So it goes the x and y is kind of the opposite way round. The rows and columns are the opposite way round. Other than that, it's the same. I've got the same. I just copied and pasted the random numbers. So we have exactly the same starting point. And so now our entire, this entire thing here is a single function, which is matrix multiply all of this by all of this. And so when I run that, it fills in exactly the same numbers. Make this average. And so now we can optimize that. Make that a minimum by changing these. So it should get the same number. Oh wait, wasn't it? Yep, and we do. Okay, so that's just another way of doing the same thing. So you can see that matrix multiplication, it takes like a surprisingly long time at least for me to get an intuitive feel for matrix multiplication is like a single mathematical operation. So I still find it helpful to kind of remind myself it's just doing these sum products and additions. Okay, so that is a deep learning neural network in Microsoft Excel. And the Titanic Kaggle Competition, by the way, is a pretty fun learning competition. If you haven't done much machine learning before, then it's certainly worth trying out just to kind of get the feel for these, how these all get put together. So this is, so the chapter of the book that this lesson goes with is chapter four. And chapter four of the book is the chapter where we lose the most people because it's, to be honest, it's hard. But part of the reason it's hard is I couldn't put this into a book. So we're teaching it a very different way in the course to what's in the book. And you can use the two together, but if you've tried to read the book and been a bit disheartened, yeah, try following through the spreadsheet instead. Maybe try trading, like if you use numbers or Google Sheets or something, you could try to create your own kind of version of it on whatever spreadsheet platform you prefer. Or you could try to do it yourself from scratch in Python, you know, if you want to really test yourself. So there's some suggestions. Okay. Okay, question from Victor Guerrero. In the Excel exercise, when Jeremy is doing some feature engineering, he comes up with two new columns, P class one and P class two. That is true. P class one and P class two. Why is there no P class three column? Is it because P class one, if P class one is zero and P class two is zero, then P class three must be one. So in a way, two columns are enough to encode the input with the original column. Yes, that's exactly the reason. So there's no need to tell the computer about things that can kind of figure out for itself. So when you create, these are called dummy variables. So when you create dummy variables for a categorical variable with three levels, like this one, you need two dummy variables. So in general, a categorical variable with n levels needs n minus one columns. Thanks for the good question. So what we're going to be doing in our next lesson is looking at natural language processing. So so far we've looked at some computer vision and just now we've looked at some what we call tabular data, so kind of spreadsheet type data. Next up, we're going to be looking at natural language processing. So I'll give you a taste of it. So you might want to open up the Getting Started with NLP for Absolute Beginners notebook. So here's the Getting Started with NLP for Absolute Beginners notebook. I will say as a notebook author, I may sound a bit lame, but I always see when people have upvoted it, it always makes me really happy. And it also helps other people find it. So remember to upvote these notebooks or any other notebooks you like. I also always read all the comments. So if you want to ask any questions or make any comments, I enjoy those as well. So natural language processing is about rather than taking, for example, image data and making predictions, we take text data. That text data most of the time is in the form of prose. So plain English text. So English is the most common language used for NLP, but there's NLP models in dozens of different languages nowadays. And if you're a non English speaker, you'll find that for many languages, there's less resources in non English languages. And there's a great opportunity to provide NLP resources in your language. This has actually been one of the things that the fast AI community has been fantastic at in the global community is building NLP resources. For example, the first Farsi NLP resource was created by a student from the very first fast AI course. The Indic languages, some of the best resources have come out of fast AI alumni and so forth. So that's a particularly valuable thing you could look at. So if your language is not well represented, that's an opportunity, not a problem. So some examples of things you could use NLP for. Well, perhaps the most common and practically useful in my opinion is classification. Classification means you take a document. Now when I say a document, that could be one or two words. It could be a book. It could be a Wikipedia page. So it could be any length. We use the word document. It sounds like that's a specific kind of length. It can be a very short thing, a very long thing. We take a document and we try to figure out a category for it. Now that can cover many, many different kinds of applications. So one common one that we'll look at a bit is sentiment analysis. So for example, is this movie review positive or negative sentiment analysis is very helpful in things like marketing and product development, you know, in big companies, there's lots and lots of, you know, information coming in about your product. It's very nice to be able to quickly sort it out and kind of track metrics from week to week. Something like figuring out what author wrote the document would be an example of a classification exercise because you're trying to put in a category in this case is which author. I think there's a lot of opportunity in legal discovery. There's already some products in this area where in this case the category is this legal document in scope or out of scope in the court case. I'm just organizing documents, triaging inbound emails. So like which part of the organization should it be sent to? Is it urgent or not? Stuff like that. So these are examples of categories of classification. What you'll find is when we look at classification tasks in NLP is it's going to look very, very similar to images. But what we're going to do is we're going to use a different library. The library we're going to use is called Huckingface Transformers rather than Fast.ai. And there's two reasons for that. The main reason why is because I think it's really helpful to see how things are done in more than one library. And Huckingface Transformers, so Fast.ai has a very layered architecture. So you can do things at a very high level with very little code, or you can dig deeper and deeper and deeper, getting more and more fine-grained. Huckingface Transformers doesn't have the same high-level API at all that Fast.ai has. So you have to do more stuff manually. And so at this point of the course, you know, we're going to actually intentionally use a library which is a little bit less user-friendly in order to see kind of what extra steps you have to go through to use other libraries. Having said that, the reason I picked this particular library is it is particularly good. It has really good models in it. It has a lot of really good techniques in it. Not at all surprising because they have hired lots and lots of Fast.ai alumni. So they have very high-quality people working on it. So before the next lesson, yeah, if you've got time, take a look at this notebook and take a look at the data. The data we're going to be working with is quite interesting. It's from a Kaggle competition which is trying to figure out in patterns whether two concepts are referring to the same thing or not, whether those concepts are represented as English text. And when you think about it, that is a classification task because the document is, you know, basically text one, blah, text two, blah, and then the category is similar or not similar. And in fact, in this case, they actually have scores. It's either going to be basically 0, 0.25, 0.5, 0.75, or 1 of how similar is it. But it's basically a classification task when you think of it that way. So, yeah, you can have a look at the data and next week we're going to go through, step by step through this notebook. And we're going to take advantage of that as an opportunity also to talk about the really important topics of validation sets and metrics, which are two of the most important topics in not just deep learning, but machine learning more generally. All right, thanks to everybody. I'll see you next week. Bye.