 So hello and welcome to lesson three of practical deep learning for coders We We're looking at Getting our model into production last week and so we're going to finish off that today And then we're going to start to look behind the scenes at what actually goes on when we train a neural network We're going to look at Kind of the math of what's going on And we're going to learn about SGD and some important stuff like that The the order is slightly different to the book in the book There's a part in the book which says like hey, you can either go to Lesson four or lesson three now And then go back to the other one afterwards. So we're doing lesson four and then lesson three Chapter four and then chapter three. I should say You can choose it whichever way you're interested in Chapter four is the more Technical chapter about the foundations of how deep learning really works because chapter three is all about ethics So and so with the lessons we'll do that next week So we're looking at 0 to production notebook and We're going to look at the fast book version the one with in fact everything. I'm looking at today will be in the fast book version and Remember last week we had a look at Our our bears and we created this data loaders object by using The data block API, which I hope everybody's had a chance to experiment with this week if you haven't Now's a good time to do it We kind of skipped over one of the lines a little bit Which is his item transforms So what this is doing here when we said resize The the images we downloaded from the internet are lots of different sizes and lots of different aspect ratios some are tall and some are wide I'm a square and some are big some are small When you say resize for an item transform, it means each item to an item in this case is one image It's going to be resized to 128 by 128 by squishing it or stretching it and so we had a look at you can always say show batch see a few examples and This is what they looked like Squishing and stretching isn't the only way that we can resize remember we have everything we have to make everything into a square Before we kind of get it into our model by the time it gets to our model Everything has to be the same size in each mini badge, but that's why and they're making it a square It's not the only way to do that, but it's the easiest way and it's the by far the most common way So another way to do this Is we can create a another Data block object and we can make a data block object That's an identical copy of an existing data block object where we can then change just some pieces And we can do that by calling the new method Which is super handy and so let's create another data block Object in this time with different item transform where we resize using the Squish method So we have a question. What are the advantages of having square images versus rectangular ones? That's a great question so Really it's simplicity If you know all of your images are rectangular of a particular aspect ratio to start with you may as well Just keep them that way, but if you've got some which are tall and some which are wide Making them all square is kind of the easiest Otherwise you would have to kind of organize them such as all of the tall ones kind of ended up in a mini batch Nor the wide ones ended up in a mini batch And then you'd have to kind of then figure out what the best aspect ratio for each mini batch is and we actually have some research That does that in fast AI too But it's still a bit clunky I should mention, okay I just lied to you the default is not actually to squish or stretch the default I should have said sorry the default when we say resize is actually just to grab Grab the center actually all we're doing is we're grabbing the center of each image So if we want to squish or stretch you can add the resize method dot squish argument to resize And you can now see that this black bear is now looking much thinner But we have got the kind of leaves that are around on each side for instance Another question when you use the dls.new method, what can cannot be changed? Is it just the transforms? So it's not dls.new. It's bears.new right? So we're not creating a new data loader's object We're creating a new data block object I don't remember off the top of my head So check the documentation and I'm sure somebody can pop the answer into the into the forum So you can see when we use dot squish that this grizzly bear has got pretty kind of Wide and weird looking and this black bear has got pretty weird and thin looking And it's easiest kind of to see what's going on if we use Resize method dot pad and what dot pad does as you can see is it just adds some Black bars around each side so you can see the grizzly bear was tall So then when we we stretched squishing and stretching opposites of each other So when we stretched it it ended up wide and the black bear was Originally a wide rectangle so it ended up looking kind of thin Um You don't have to use zeros zeros means pad it with black. You can also say like reflect to kind of have The the pixels will kind of look a bit better that way if you use reflect All of these different methods have their own problems The the the pad method is kind of the cleanest you end up with the the correct size You end up with all of the pixels that you also end up with wasted pixels So you kind of end up with wasted computation The squish method is the most efficient because you get all of the information You know and and nothing's kind of wasted But on the downside your neural net's going to have to learn to kind of like Recognize when something's been squished or stretched and in some cases it might it wouldn't even know So if there's two objects, you're trying to recognize one of which tends to be thin And one of which tends to be thick and otherwise they're the same they could actually be impossible to distinguish Um, and then the default cropping approach Actually removes some information So in this case, um, you know this uh, this grizzly bear here We actually lost a lot of its legs So if figuring it out what kind of bear it was Required looking at its feet. Well, we don't have its feet anymore So they all have downsides So there's something else that you can do a different approach Which is instead of to say resize you can say random resized crop And actually this is the most common approach and what random resized crop does is each time It actually grabs A different part of the image and kind of zooms into it Right, so these this is all the same image and we're just grabbing a batch of of four different versions of it and you can see some are kind of You know, they're all squished in different ways and we've kind of selected different subsets and so forth now this Kind of seems worse than any of the previous approaches because I'm I'm losing information like this one here. I've actually lost A whole lot of its of its back right But the cool thing about this is that remember we want to avoid overfitting and When you see a different part of the animal each time It's much less likely to overfit because you're not seeing the same image On each epoch that you go around that makes sense. So um So this random random resized crop approach is actually super popular and so min scale 0.3 means We're going to pick at least 30 percent of the pixels of kind of the original size each time Um, and then we're kind of like zoom into that that square um so This idea of doing something so that each time the Model sees the image. It looks a bit different to last time. It's called data augmentation And this is one type of data augmentation. It's probably the most common But there are others and One of the best ways to do data augmentation is to use This aug transforms function and what aug transforms does is it actually returns a list of different augmentations And so there are augmentations which change contrast which change brightness Which warps the perspective so you can see in this one here It looks like this bit's much closer to you and this is much away from you because it's kind of been perspective warped It rotates them. See this one's actually been rotated. This one's been made really dark right These are batch transforms not item transforms. The difference is that item transforms happen one image at a time And so the thing that resizes them all to the same size that has to be an item transform Pop it all into a mini batch put it on the gpu And then a batch transform happens to a whole mini batch at a time And by putting these as batch transforms that the augmentation happens super fast because it happens on the gpu And uh, I don't know if there's any other libraries as as we speak which allow you to write your own gpu accelerated transformations That run on the gpu in this way Um, so this is a super handy thing in fast ai too um So you can check out the documentation for Org transforms and when you do you'll find the documentation for all of the underlying transforms that it basically wraps right? um So you can see if I shift tab I don't remember if I've shown you this trick before if you go inside the parentheses of a function and hit shift tab a few times It'll pop open a list of all of the arguments. And so you can basically see you can say like, oh Can I sometimes flip it left right? Can I sometimes flip it up down? What's the maximum and I can rotate zoom? Change the lighting warp the perspective and so forth How can we add different augmentations for train and validation sets? um, so the cool thing is that um Automatically fast ai will avoid doing data augmentation on the validation set um, so all of these org transforms will only be applied to the training set With the exception of random resize crop random resize crop has a different behavior for each The behavior for the training set is what we just saw which is to randomly pick a subset and kind of zoom into it And the behavior for the validation set is just to grab the center the largest center square that it can um You can write your own transformations. Uh, they're they're just python. They're a standard pytorch code The the way if you and by default it will only be applied to the training set If you want to do something fancy like random resize crop where you actually have different things being applied to each You should come back to the next course to find out how to do that or read the documentation It's not rocket science, but um, it's Not something most people need to do um, okay so Last time we we here bit did bears dot new with a random resize crop min scale of point five. We added some transforms I mean we're ahead and trained actually since last week. I've rerun this notebook I've got it's on a different computer and I've got different images. So it's not all exactly the same but I still got a good confusion matrix so of the um Black bears 37 were classified correctly two were grizzlies of one with a teddy now And we talked about plot plot plot top losses and it's interesting you can see in this case There's some clearly kind of odd things going on. This is not a bear at all This looks like it's a drawing of a bear which it's decided is is Um predicted as a teddy, but this thing's meant to be a drawing of a black bear I can certainly see the confusion You can see um, how some parts of it have been cut off. We'll talk about how to deal with that later Now one of the interesting things is that we didn't really do Much data cleaning at all before we built this model The only data cleaning we did was just to validate that each image Can be opened. There was that verify images call Um, and the reason for that is it's actually much easier normally to clean your data after you create a model and I'll show you how We've got this thing called image classifier cleaner Where you can pick a category right And training set or validation set Um, and then uh, what it will do is it will then List all of the images in that set And it will pick the ones which are Which is the least confident about which is the most likely to be wrong Where the where the loss is the worst to be more precise um, and so this um This is a great way to Look through your data and find problems. So in this case the the first one Is not a teddy or a brown bear or a black bear. It's a puppy dog Right. So this is a great cleaner because what I can do is I can now click delete here This one here looks a bit like an ewok rather than a teddy. I'm not sure. What do you think rachel is an ewok? I'm going to call it an ewok Right, and so you can kind of go through. Um, okay, that's definitely not a teddy And so you can either say like, oh, that's wrong. That's actually a grizzly bear Or it's wrong. It's a black bear or I should delete it or by default just keep it right And you can kind of keep going through until you think like, okay, they're all seem to be fine Maybe that one's not And kind of once you get to the point where they all seem to be fine you can kind of say, okay Probably all the rest are fine too because they all have lower losses So they all fit the kind of the mold of a teddy And so then I can run this code here Where I just go through cleaner dot delete So that's all the things which I've selected to delete for and unlink them So unlink Is just another way of saying delete file. That's the python name And then go through all the ones that we said change and we can actually move them to the correct directory Um If you haven't seen this before you might be surprised that we've kind of created our own little gooey inside Jupiter notebook Yeah, you can do this and we built this with less than a screen of code. You can check out the source code in the Fastai notebooks. So this is a great time to remind you that This is a great time to remind you that Fastai Is built with notebooks and so if you go to the fastai repo and clone it and then go to nbs you'll find All of the code of fastai Written as notebooks and they've got a lot of pros and examples and tests and so forth So the best place to learn about how this is implemented is to look at the notebooks Rather than looking at the module Okay By the way, sometimes you'll see like weird little comments Like this these weird little comments are part of a development environment for jupiter notebook We use called nbdev which we built So silver and I built this thing to make it much easier for us to kind of create books And websites and libraries in jupiter notebooks. So this particular one here hide means When this is turned into a book Or into documentation don't show this cell And the reason for that is because you can see I've actually got it in the text Right, but I thought when you're actually running it It'd be nice to have it sitting here waiting for you to run directly So that's why it's shown in the notebook, but not in the in the book. It's shown differently Um, and you'll also see these things like s colon with a quote In the book that would end up saying sylva says and then what he says So there's kind of little bits and pieces in the In the notebooks that just look a little bit odd and that's because it's designed that way in order to show In order to create stuff in the book Right, so then last week we saw how you can export that To a pickle file that contains all the information for the model And then on the server where you're going to actually do your inference You can then load that save file and you'll get back a learner that you can call predict on so predict Um, perhaps the most interesting part of predict is the third thing that it returns Which is a tensor in this case containing three numbers Uh, the three numbers there's three of them because we have three classes teddy bear grizzly bear and black bear, right? and so um, this doesn't make any sense until you know what the order of the classes is kind of in in um, in your data loaders And you can ask the data loaders What the order is by asking for its vocab so a vocab In fast ai is a really common concept. It's basically any time that you've got like a mapping from numbers to strings or discrete levels Um, the mapping is always stored in the vocab So here this shows us that um, the the activation for black bear is 10a neg 6 The activation for grizzly is one and the activation for teddy is 10a neg 6 so Very very confident that this particular one. It was a grizzly not surprisingly. This was something called grizzly dot jpeg um So you need to kind of know this This mapping in order to display the correct thing But of course the data loader's object already knows that mapping and it's all the vocab and it's stored in with the loader So that's how it knows to say grizzly automatically. So the first thing it gives you is the the human readable string that you'd want to display So this is kind of nice that With fast ai2 You you save this object which has everything you need for inference. It's got all the um, you know information about Normalization about any kind of transformation steps about what the vocab is so it can display everything correctly right, so Now we want to Deploy this as an app Now if you've done some web programming before then all you need to know is that this Line of code and this line of code So this is the line of code you would call once when your application starts up And then this is the line of code you would call every time you want to do an inference And there's also a batch version of it which you can look up if you're interested. This is just a one at a time um So there's nothing special if you're already a web programmer or have access to a web programmer These are you know, you just have to stick these two lines of code somewhere and the three things you get back at the The the human readable string if you're doing categorization The index of that which in this case is one is grizzly and the probability of each class One of the things we really wanted to do in this course though is not assume that everybody Is a web developer Most data scientists aren't but gee wouldn't it be great if all data scientists could at least like prototype an application to show off the thing they're working on and so we've Tried to kind of curate an approach which none of its stuff we've built. It's really is curated Which shows how you can create a GUI and create a complete application in Jupyter notebook So the Key pieces of technology we use to do this are ipython widgets, which is always called ipy widgets and voila ipy widgets Which we import by default as widgets and that's also what they use in their own documentation as GUI widgets for example a file upload button So if I create this file upload button and then display it I see and we saw this in the last lesson as well or maybe it was lesson one an actual clickable button So I can go ahead and Click it and it says now. Okay, you've selected one thing So how do I use that? Well these um Well these widgets have all kinds of methods and properties and the upload button has a data property Um, which is an array containing all of the images you uploaded So you can pass that to pil image dot create And so dot create is kind of the standard Factory method we use in fast ai to to create items And pil image dot create is smart enough to be able to create an item from all kinds of different things And one of the things it can create it from is a binary blob Which is what a file upload contains So then we can display it and There's our teddy, right? So you can see how You know cells of jupyter notebook can refer to other cells that were created that were kind of have GUI created data in them So let's hide that teddy away for a moment. Um, and The next thing to know about is that there's a kind of widget called output And an output widget is um, it's basically something that You can fill in later, right? So if I delete actually This part here. So I've now got an output widget Yeah, actually let's do it this way around And you can't see the output widget even though I said please display it because nothing is output So then in the next cell I can say with that output placeholder Display a thumbnail of the image and you'll see that the the display will not appear here It appears back here Right because that's how that's where the placeholder was So let's run that again to to clear out that placeholder um So we can create another kind of placeholder, which is a label that labels kind of a Something where you can put text in it. They can give it a value like, um, I don't know Please choose an image Okay, so we've now got a label containing please choose an image Now let's create another button to do a classification Now this is not a file upload button. It's just a general button. So this button doesn't do anything All right, it doesn't do anything until we Attach an event handler to it an event handler is a callback. We'll be learning all about callbacks in this course Um, if you've ever done any GUI programming before or even web programming, you'll be familiar with the idea that you Write a function Which is the thing you want to be called when the button is clicked on and then somehow you tell your framework That this is the on click event So here I go. Here's my button run If I say the on click event the button run is Recall this code And this code is going to do all the stuff we just saw going to create an image from the upload It's going to clear the output display the image call predict and then replace the label with a prediction So there it all is Now so that hasn't done anything but I can now go back to this classify button Which now has an event handler attached to it. So what's this? click Bump and look that's been filled in That's been filled in right in case you missed it. Let's run these again to clear everything out Okay, everything's gone This is please choose an image. There's nothing here. I click classify Oh Right, so it's kind of amazing how Our notebook has suddenly turned into this interactive prototyping playground building applications. And so once all this works We can dump it all together and so The easiest way to dump things together is to create a vbox a vbox is a vertical box And it's just a it's just something that you put widgets in and so in this case We're going to put the following widgets We're going to have a label that says select your bear Then an upload button a run button an output placeholder and a label for predictions But let's run these again just to clear everything out Whether we're not cheating And let's create our Vbox So as you can see, it's just got all the All the pieces Right Now we've got Oh, I accidentally ran the thing that displayed the bear. Let's get rid of that Okay So there it is. So now I can click upload I can choose my bear Okay, and then I can click classify Right and notice I this is exactly that this is this is like the same buttons As as these buttons. They're like two places where we're viewing the same button Which is kind of a wild idea. So if I click classify It's going to change this label and This label because they're actually both references to the same label. Look There we are Okay, so This is our app Right, and so this is actually how I built that That image cleaner gooey Is is just using these exact Things and I built that image cleaner gooey Cell by cell in a notebook just like this And so you get this kind of interactive Experimental framework For building a gooey. So if you're a data scientist who's never done gooey stuff before This is a great time to get started because now you can you can make actual programs Now of course an actual program Running inside a notebook is kind of cool But what we really want is this program to run In a place anybody can run it That's where voila comes in So voila Needs to be installed So you can just Run these lines Or install it Um It's listed in the pros And what voila does is it takes a notebook And just doesn't display anything except for the markdown The ipython widgets And the outputs Right so all the code cells disappear and it doesn't give the person looking at that page the ability to run their own code They can only interact with the widgets Right so what I did Was a copied and pasted that code From the notebook into a separate notebook which only has Those lines of code Right so So these are just the same lines of code that we saw before Um And so this is a notebook. It's just a normal notebook Um, and then I installed voila And then when you do that if you um navigate to this notebook But you replace notebooks up here with voila It actually displays not the notebook but Just as I said the markdown And the widgets so here I've got My bare classifier and I can click upload Let's do a grizzly bear this time And this is a slightly different version. I actually made this so there's no classifier button I thought it would be a bit more fancy to make it so when you click upload it just runs everything But as you can see there it all is Right, it's all working so This is the world's simplest Prototype, but it's it's a proof of concept, right? So you can add widgets with drop downs and sliders and charts and You know everything that you can have in a You know an angular app or a react app or whatever and in fact there's um, there's even Um stuff which lets you use for example the whole vue.js framework if you know that it's a very popular JavaScript framework the whole vue.js framework you can actually use it in widgets and voila So now we want to get it so that this This app can be run by Someone out there in the world So the voila documentation shows a few ways to do that. Um, but perhaps the easiest one Uh is to use a system called um binder And so binder is at mind my binder.org And all you do is you paste in your github repository name here, right? And this is all in the book, right? So you Paste in your github repo name You change where it says File you change that to url You can see and then you put in the path which we were just experimenting with Right So pop that here And then you say launch and what that does is it then gives you a url So then this url You can pass on to people And this is actually your Interactive running application. So binders free and so this is an you know, anybody can now use this To take their voila app and make it a publicly available web application So try it As it mentions here the first time you do this binder takes about five minutes To build your site Because it actually uses something called docker to deploy the whole vast ai framework and python and blah blah blah But once you've done that That virtual machine will keep running for you know as long as people are using it. It'll keep running for a while That virtual machine will keep running for a while as long as people are using it and you know, it's it's reasonably fast So a few things to note here Being a free service you won't be surprised to hear this is not using a gpu. It's using a cpu And so that might be surprising that we're deploying to something which runs on a cpu When you think about it though, this makes much more sense to deploy to a cpu than a gpu the Just a moment The thing that's happening here is that I am Passing along let's go back to my app in my app. I'm passing along a single image at a time So when I pass along that single image, I don't have a huge amount of parallel work for a gpu to do So this is actually something that a cpu is going to be doing more efficiently So we found that for folks coming through this course The vast majority of the time they wanted to deploy Inference on a cpu not a gpu because they're normally just doing one item at a time Um, it's way cheaper and easier to deploy to a cpu Um, and the reason for that is that you can just use any hosting service You like because just remember this is just a this is just a program at this point, right? Um, and you can use all the usual horizontal scaling vertical scaling, you know, you can use heroku. You can use aws You can use inexpensive instances Super cheap and super easy Having said that there are times you might need to deploy to a gpu For example, maybe you're processing videos And so like a single video on on a cpu to process it might take all day um Or you might be so successful that you have a thousand requests per second In which case you could like take 128 at a time batch them together and put the whole batch on the gpu And get the results back and pass them back around I mean you've got to be careful of that right because As if your requests aren't coming fast enough your user has to wait for a whole batch of people to be ready to to be processed But you know conceptually As long as your site is popular enough that could work Um, the other thing to talk about is you might want to deploy to a mobile phone um and Deploying to a mobile phone Our recommendation is wherever possible do that by actually deploying to a server and then have a mobile phone talk to the server over a network because if you do that Again, you can just use a normal py torch program on a normal server and normal network calls. It makes life super easy um When you try to run a py torch app on a phone You're suddenly now not an environment. We're not in an environment where like py torch will run natively And so you'll have to like convert your program Into some other form And there are other forms And the the the main form that you convert it to with something called on and x which is specifically designed for kind of super high-speed high performance you know Approach that can run on both servers or on mobile phones and it does not require the whole python and py torch kind of runtime in place but it's it's much more complex than not using it's harder to debug it's harder to to set it up and it's harder to maintain it so If possible keep things simple And if you're lucky enough that you're so successful that you need to scale it up to gpu's or and stuff like that Then great, you know, hopefully you've got the the finances at that point to justify, you know, spending money on a on an x expert or serving expert or whatever and there are various Systems you can use to like on an x runtime and aws h maker where you can kind of say here's my on and x bundle and it'll serve it for you or whatever PyTorch also has a mobile framework same idea So, um All right, so you've got I mean, it's kind of funny. We're talking about two different kinds of deployment here one is deploying like a A hobby application, you know that you're prototyping showing off to your friends to explain to your colleagues how something might work You know a little interactive analysis. Um, that's one thing. Well, but maybe you're actually prototyping something that you're Want to turn into a real product? Or an actual real part of your company's operations When you're deploying You know something in in real life, there's all kinds of things you got to be careful of One example of something to be careful of is let's say you did exactly what we just did Which actually this is your homework is to create your own Application, right? I want you to create your own image search application. You can use My exact set of widgets and whatever if you want to but better still go to the ipai widgets website and see what other widgets They have and try and come up with something called Um, try and come you know try and show off as best as you can then show us on the forum Now, um, let's say you decided That you want to create an app that would help The users of your app decide uh, if they have healthy skin or unhealthy skin So if you did the exact thing we just did rather than searching for grizzly bear and teddy bear and so forth on Bing, uh, you would search for healthy skin and unhealthy skin, right? So here's what happens, right? If I and remember in in our version, we never actually looked at Bing We just used the Bing API the image search API, but behind the scenes. It's just using the the website right So if I click healthy if I type healthy skin and say search I actually discover that the definition of healthy skin is um young white women touching their face levelingly So that's what your Your healthy skin classifier Would learn to detect right and so, um This is uh, so this is a great example from um, Deb Raji, uh, and you should check out her paper actionable auditing um For lots of cool, uh insights about model bias, but I mean here's here's like a A fascinating example of how if you weren't Looking at your data carefully um, you you end up With something that just doesn't at all actually solve the problem you want to solve um this is um This is tricky right because The data that you train your algorithm on if you're building like a new product that didn't exist before By definition, you don't have examples Of the kind of data that's going to be used in real life Right, so you kind of try to find some from somewhere and if they and if you do that through through like a google search It's pretty likely you're not going to end up with Um a set of data that actually reflects the kind of mix you would see in real life um So, um You know the main thing here is to say be be careful right and in particular for your test set, you know that final set that you check on Really try hard to gather data that that reflects The real world so like just you know for so for example for the healthy skin example You might go and actually talk to a dermatologist and try and find like 10 examples of healthy and unhealthy skin or something Um, and that would be your kind of gold standard test Um There's all kinds of issues you have to think about in deployment. I can't cover all of them Um, I can tell you that um, this O'Reilly book called our building machine learning powered applications um Is is a great resource And this is one of the reasons we don't go into a detail about AP to ab testing and when should we refresh our data and how do we monitor monitor things and so forth is because That book's already been written so we don't want to rewrite it um I I do want to mention a particular area that uh, I care a lot about though Which is Let's take this example Let's say you're rolling out this bear detection system and it's going to be attached to video cameras around a campsite It's going to warn campers of incoming bears. So if we used a model That was trained with that data that we just looked at you know, those are all Very nicely taken pictures of pretty perfect bears, right? There's really no relationship to the kinds of pictures You're actually going to have to be dealing with in your in your campsite bear detector Which has it's going to have video and not images. It's going to be night time. It's going to be probably low resolution security cameras You need to make sure that the performance of the system is fast enough to tell you about it before the bear kills you You know, there will be bears that are partially obscured by bushes or in lots of shadow or whatever None of which are the kinds of things you would see normally in like internet pictures So what we call this we call this out of domain data out of domain data refers to a situation where The data that you are trying to do inference on Is in some way different to the kind of data that you trained with And this is actually um There's no perfect way To answer this question and when we look at ethics, we'll talk about some really helpful ways to to Minimize how much this happens for example, it turns out that having a diverse team is a great way to Kind of avoid being surprised by the kinds of data that people end up coming up with But really it's just something you've got to be super thoughtful about Very similar to that is something called Domain shift and domain shift is where maybe you start out with all of your data is in domain data But over time the kinds of data that you're seeing Changes and so over time maybe raccoons start invading your campsite and you Weren't training on raccoons before it was just a bear detector And so that's called domain shift and that's another thing that you have to be very careful of great choice or question No, I was just going to add to that in saying that All data is biased. So there's not kind of a You know a form of a de bias data Perfectly representative in all cases data and that a lot of the proposals around addressing this have kind of been converging to this idea And that you see in papers like timnit gebru's data sheets for data sets of just writing down a lot of the Details about your data set and how it was gathered and in which situations it's appropriate to use and how it was maintained And so there that's not that you've totally eliminated bias But that you're just very aware of the attributes of your data set so that you won't be blindsided by them later And and there have been kind of several proposals in that school of thought which I which I really like around this idea of just kind of Understanding how your data was gathered and what its limitations are Thanks, rachel so A key problem here is that you can't Know the entire behavior of your neural network With normal programming you typed in the if statements and the loops and whatever So in theory, you know what the hell it does although it's still sometimes surprising In this case, you you didn't tell it anything. You just gave it examples to learn from and hoped that it learned something useful There are hundreds of millions of parameters in a lot of these neural networks And so there's no way you can understand how they all Combine with each other to create complex behavior. So really like there's a natural compromise here is that we're trying to Get sophisticated behavior so like like recognizing pictures Sophisticated enough behavior. We can't describe it um And so the natural downside is you can't expect the process that the thing is using to do that to be Describe a ball for you for you to be able to understand it. So our recommendation for kind of dealing with these issues is a very careful deployment strategy which I've summarized in this little graph this little chart here um The idea would be first of all Whatever it is that you're going to use the model for Start out by doing it manually. So have a have a park ranger watching for bears Have the model running next to them and each time the park ranger sees a bear They can check the model and see like did it seem to have picked it up So the model is not doing anything. There's just a person who's like Running it and seeing would have made sensible choices And once you're confident that it makes sense that what it's doing seems reasonable um In you know in as close to the real life situation as possible So Then deploy it In a time and geography limited way. So pick like one campsite not the entirety of california and do it for you know one day And have somebody watching it super carefully Right. So now the basic bear detection is being done by the bear detector But there's still somebody watching it pretty closely and it's only happening in one campsite for one day And so then as you say like, okay We haven't Destroyed our company yet So let's do two campsites for a week and then let's do, you know, the entirety of marine for a month And so forth. So this is actually what we did when I uh used to Be at this company called optimal decisions Um optimal decisions was a company that I founded to do insurance pricing and if you if you change insurance prices by You know a percent or two in the wrong direction the wrong way You can basically destroy the whole company. Um, this has happened many times, you know insurers are companies That set prices. That's basically the the product that they provide So when we deployed new prices for optimal decisions We always did it by like saying like, okay, we're going to do it for like five minutes For everybody whose name ends with a d, you know, so we'd kind of try to find some Group which hopefully would be fairly, you know They would all be different But not too many of them and we'd gradually scale it up and you've got to make sure that when you're doing this that you have a lot of Really good reporting systems in place that you can recognize Are your customers yelling at you? Are your computers burning up? You know, are your Um, are your computers burning up? Are your costs spiraling out of control? And so forth. So it really requires great reporting systems This fast a I have methods built in that provide for incremental learning i.e. improving the model slowly over time with a single data point each time Yeah, that's a great question. So, um, this is a little bit different, which is this is really about um Dealing with domain shift and similar issues by continuing to train your model as you do inference And so the good news is um, you don't need anything special for that It's basically just a transfer learning problem so You can do this in many different ways probably the easiest is just to say like okay each night Probably the easiest is just to say okay each night You know at midnight, uh, we're going to set off a task which grabs all of the previous day's transactions as mini batches and trains another epoch And so yeah, that that actually works fine. You can basically think of this as a fine-tuning approach where your pre-trained model is yesterday's model and your fine-tuning data is today's data So, um as you roll out your model um One thing to be thinking about super carefully is that it might change the behavior of the system that it's a part of And this can create something called a feedback loop and feedback loops are one of the most challenging things for For real world model deployment, particularly of machine learning models Because they can take a a very minor issue and explode it into a really uh big issue So for example think about a predictive policing algorithm It's an algorithm that was trained to recognize Uh, you know basically trained on data that says whereabouts are arrests being made? um And then as you train that algorithm based on where arrests are being made um Then you put in place a system that uh sends police officers to places that the model says are likely to Have crime which in this case where were were there where were arrests So then more police go to that place um Find more crime Because the more police that are there the more they'll see they arrest more people Causing you know, and then if you do this incremental learning like we're just talking about then it's going to say Oh, there's actually even more crime here. And so tomorrow it sends even more police um And so in that situation you end up like The predictive policing algorithm ends up kind of sending all of your police The one street block because at that point all of the arrests are happening there because that's the only place you have policemen Right, I should say police officers So there's actually a paper about Uh, this issue called to protect and serve And to protect and serve um the authors write this really Nice phrase predictive policing is aptly named. It is predicting policing not predicting crime so If the initial model was Perfect, whatever the hell that even means but like it somehow sent police to exactly The best places to find crime based on the probability of crimes actually being in place I guess there's no problem, right? Um, but as soon as there's any amount of Bias, right? So for example in the u.s. Um, there's a lot more arrests Um Of black people than of white people even for crimes where black people and white people are known to do them the same amount So in the presence of this bias Um or any kind of bias You're kind of like setting off this this domino chain of feedback loops where that bias will be exploded over time so You know one thing I like to think about is to think like well, what would happen if this um If this model was just really really really good So like who would be impacted, you know, what would this extreme result look like? How would you know what was really happening this incredibly predictive algorithm that was like Changing the behavior of your if your police officers or whatever, you know, what would that look like? What would actually happen? Um, and then like think about like, okay What could go wrong and then what kind of rollout plan? What kind of monitoring systems? What kind of oversight? could provide the the circuit breaker Because that's what we really need here, right is we need like nothing's going to be perfect You can't Be sure that there's no feedback loops But what you can do is try to be sure that you see when the behavior of your system is behaving in a way that's Not what you want Did you have anything to add to that Rachel? all I would add to that is that you're at risk of potentially having a feedback loop any time that your model is kind of controlling what your next round of data looks like And I think that's true for pretty much all products and that can be um I think a hard jump from people people coming from kind of a science background where you may be thinking of data as I have just observed some sort of experiment Whereas kind of whenever you're you know building something that interacts with the real world You are now also controlling what your future data looks like based on kind of behavior of your your algorithm for the current current round of data right so um So given that um, you probably can't avoid feedback loops Um, you know the the thing you need to then really invest in is the human in the loop And so a lot of people like to focus on automating things Which I find weird, you know if you can decrease the amount of human involvement by like 90 percent You've got almost all of the economic upside of automating it completely But you still have the room to put human circuit breakers in place You need these appeals processes. You need the monitoring. You need, you know humans involved to kind of go Hey, that's that's weird. I don't think that's what we want Okay Yes, Rachel And just one more note about that. Um, those humans though do need to be integrated well with um, A kind of product and engineering and so one issue that comes up is that in many companies I think that um ends up kind of being underneath trust and safety handles a lot of sort of issues with how things can go wrong Or how your platform can be abused Um, and often trust and safety is pretty siloed away from product and edge Which actually kind of has the the control over, you know, these decisions that really end up influencing them And so having the the the engineers probably considered them to be pretty pretty annoying a lot of the time How they're getting away and getting the way of them getting software out the door Yeah, but like the kind of the more integration you can have between those I think it's helpful for the kind of the people building the product to to see What is going wrong and what can go wrong if the engineers are actually on top of that They're actually seeing these these things happening that it's not some kind of abstract problem anymore So, you know at this point now that we've got to the end of chapter two You actually know A lot more than most people about about Deep learning and actually about some pretty important foundations of machine learning more generally and of data products more generally So now's a great time to think about writing So, um Sometimes we have Formatted text that doesn't quite format correctly and trip it a notebook By the way, it only formats correctly in in the book book So that's what it means when you see this kind of pre-formatted text So The the idea here is to think about Um starting writing At at this point before you go too much further Rachel There's a question. Oh, okay. Let's hit the question Question is I am I assume there are fast ai type ways of keeping a nightly updated transfer learning setup Will could there be one of the fast ai version for notebooks have an example of the nightly transfer learning training? Um, like the previous person asked I would be interested in knowing how to do that most effectively with fast ai Sure. So, um, I guess my view is there's nothing fast ai specific about that at all So I actually suggest you read Emmanuel's book that book. I showed you to understand the kind of the ideas And if people are interested in this I can also point you at some academic research about this as well And there's not as much as that there should be But there is some there is some good work in this area um, okay, so, um The reason we mentioned writing at this point in our journey is because You know things are going to start to get More and more heavy more and more complicated. Um, and a really good way to make sure that you're on top of it Is to try to write down what you've learned So, sorry, I wasn't sharing the right part of the screen before but this is what I was describing in terms of the Reformated text which doesn't look correct um so When so, um Rachel actually has this great article that you should check out which is why you should blog and I will say it sort of her saying because I have it in front of me and she doesn't Weird as it is. So, Rachel says That the top advice she would give her younger self is to start blogging sooner So Rachel has a math phd and this kind of idea of like blogging was not exactly something I think they had a lot of in the phd program But actually it's like it's a really great way of Finding jobs in fact most of my students who have got the best jobs are students that have Good blog posts The thing I really love is that it helps you learn By by writing down it kind of synthesizes your ideas And um Yeah, you know, there's lots of reasons to blog so there's actually Something really cool. I want to show you Yeah I was also just going to note. I have a second post called advice for better blog post That's a little bit more advanced, which I'll post a link to as well And that talks about some common pitfalls that I've seen in many in many blog posts and kind of the importance of putting Putting the time in to do it well and and some things to think about. So I'll share that post as well Thanks, Rachel. Um, so one reason that sometimes people Don't blog is because it's kind of annoying to figure out how to Particularly because I think the thing that a lot of you will want to blog about is Cool stuff that you're building in Jupiter notebooks so We've actually teamed up with a guy called hammer sane And and with github to create this free product As usual with fast ai no ads. No anything called fast pages where you can actually blog with jupyter notebooks and so You can go to fast pages and see if yourself how to do it But the basic idea is that like you literally click one button It sets up a blog for you and then you dump your notebooks into A folder called underscore notebooks and they get turned into blog posts It's it's basically like magic and hamels done this amazing job of this and so This means that you can create blog posts where you've got Charts and tables and images, you know Where they're all actually the output of of jupyter notebook along with all the the markdown formatted text headings and so forth And pipe links and the whole thing So this is a great way to start writing about What you're learning about here So something that Rachel and I both feel strongly about when it comes to blogging is this which is Don't try to think about the absolute most advanced thing you know and try to write a blog post that would impress Jeff Hinton Right because most people are not Jeff Hinton. So like a you probably won't do a good job because you're trying to like Blog for somebody who's more got more expertise than you and b You've got a small audience now, right? Actually, there's far more people that Are not very familiar with deep learning than people who are So try to think you know and and you really understand what it's like What it was like six months ago to be you because you were there six months ago So try and write something which the six months ago version of you Would have been like super interesting full of little tidbits you would have loved You know that you would have that would have delighted you That six months ago version of you Okay, so once again Don't move on until you've had a go at the questionnaire To make sure that you You know understand the key things we think that you need to understand And yeah, have a think about these further research questions as well because they might Help you to engage more closely with the material So let's have a break and we'll come back in Five minutes time So welcome back everybody. Um This is a Interesting moment in the course because we're kind of jumping from Part of the course which is you know very heavily around kind of The kind of the the the the the structure Of like what are we trying to do with machine learning and what are the kind of the pieces and and what do we need to know To make everything kind of work together There was a bit of code but not masses. There was basically no math And we kind of wanted to put that at the start for for everybody who's not You know who's kind of wanting to an understanding of of these issues Without necessarily Wanting to kind of dive deep into the code in the math themselves And now we're getting into the the diving deep part Um If if you're not Interested in that diving deep yourself you might want to skip to the next lesson about ethics Where we you know is kind of that rounds out the kind of You know slightly less technical material um So what we're going to look at here is we're going to look at um What we think of is kind of a toy problem but Just a few years ago is considered a pretty challenging problem And the problem is recognizing handwritten digits And we're going to try and do it From scratch Right, and we're going to try and look at a number of different ways to do it So, um, we're going to have a look at a data set Uh called mNIST and so if you've done any um machine learning before you may well have come across mNIST It contains handwritten digits Um, and uh, it was collated into a machine learning data set by a guy called John LeCun Um, and some colleagues and they use that to demonstrate I'm one of the you know, probably the first computer system to provide really practically useful scalable recognition of handwritten digits Lynette 5 with the system was actually used to Automatically process like 10 of the checks in uh in the us Um So one of the things that really helps I think when building a new model is to kind of start with something simple And gradually scale it up. So, um We've created an even simpler version of mNIST which we call mNIST sample, which only has 3s and 7s Okay, so this is a good starting point to make sure that we can kind of do something easy I picked 3s and 7s to mNIST sample because they're very different So I feel like if we can't do this we're going to have trouble recognizing every digit So step one is to call um untar data untar data is the fast ai Um function which takes a url Um Checks whether you've already downloaded it if you haven't it downloads it checks whether you've already uncompressed it if you haven't It uncompresses it and then it finally returns the path of where that ended up So you can see here Urls dot mNIST sample So you can just hit tab um to get autocomplete um is just some Some location right doesn't really matter where it is and so then when we Call that I've already downloaded it and already uncompressed it because I've already round this once before so it happens straight away and so path goes me um Where it is now in this case path is dot and the reason path is dot Is because I've used this special base path attribute to path to tell it kind of like where's my Where's my starting point, you know and and that's used to print So when I go here ls which prints a list of files, these are all relative to Where I actually untart this too. So it just makes it a lot easier not to have to see the whole set of parent path folders um Ls is actually um so so path um is a um Let's see what kind of type it is So, uh, it's a path lib path object um Pathlib is part of the python standard library. It's a really very very very nice library. Um, but it doesn't actually have ls Where there are libraries that we find super helpful, but they don't have exactly the things we want We liberally add the things we want to them. So we add ls, right So if you want to find out what ls is, um, you know, there's as we've mentioned There's a few ways you can do it you can pop a question mark there And that will show you where it comes from. So there's actually a library called fast core Which is a lot of the foundational stuff in fast ai that is not dependent on PyTorch or pandas or any of these big heavy libraries um So this is part of fast core and if you want to see exactly what it does you first remember you can put in a second question mark uh to get The source code and as you can see there's not much source code to it And you know, maybe most importantly Please don't forget about doc Because uh, really importantly that gives you this uh show in docs link Which you can click on to get to the documentation to see um examples Textures if relevant tutorials, um tests and so forth um, so What's so when you're looking at a new data set you kind of just just I always start with just ls see what's in it Um, and I can see here. There's a train folder And there's a valid folder. That's pretty normal. So let's look at ls on the train folder And it's got a folder called seven and a folder called three And so this is looking quite a lot like our bare classifier data set We downloaded each set of images into a folder based on what its label was This is doing it at another level though Well, the first level of the folder hierarchy is is it training or valid and the second level is um, what's the label? Uh, and this is the most common way For image datasets to be distributed Um, so let's have a look Um, let's just create something called threes that contains all of the Contents of the three directory training And let's just sort them so that this is consistent Uh, do the same for sevens and let's just look at the threes And you can see there's just They're just numbered All right, so let's um grab one of those um Open it And take a look Okay, so, um, there's the picture of a three and so what is that really? Um, well not three I am three um So p i l is the python imaging library. It's the most popular library by far for working with images On python and it's a png Um, not surprisingly So, uh, duper notebook Knows how to display many different types and you can actually tell if you create a new type You can tell it how to display your type and so p i l comes with something that will automatically Display the image like so What I want to do here though is to look at like how are we going to treat this? as numbers Right, and so one easy way to treat things as numbers is to turn it into an array So array is part of numpy, which is the most popular array programming library For python and so if we pass our p i l image object to array It just converts the image into a bunch of numbers and the truth is it was a bunch of numbers the whole time It was actually stored as a bunch of numbers on disk It's just that there's this magic thing in jupiter that knows how to display those numbers on the screen So when we say array Turning it back into a numpy array. We're kind of removing This ability for jupiter notebook to know how to display it like a picture So once I do this we can then index into that array And create everything from the grab everything all the rows from four Up to but not including 10 and all the columns from four up to and not including 10 And here are some numbers And they are Eight bit unsigned integers. So they are between zero and 255 So an image just like everything on a computer is just a bunch of numbers and therefore we can compute with it We could do the same thing but instead of saying array we could say tensor now a tensor is basically The pi torch version of a numpy array And so you can see it looks it's exactly the same code as above But I've just replaced array with tensor and the output looks almost exactly the same except it replaces array with tensor And so you'll see this that basically a pi torch tensor and a numpy array behave Nearly identically Much if not most of the time But the key thing is that a pi torch tensor Can also be computed on a gpu not just a cpu So in in our work and in the book and in the notebooks and in our code We tend to use tensors Pi torch tensors much more often than numpy arrays Because they kind of have nearly all the benefits of numpy arrays Plus all the benefits of gpu computation And they've got a whole lot of extra functionality as well um a lot of people who have used Python for a long time Always jump into numpy because that's what they're used to if that's you you might want to start considering jumping into Tensor like wherever you used to write array start writing tensor And just see what happens because you might be surprised at how many things you can speed up or do more easily So let's grab That that three image Turn it into a tensor and so that's going to be three image tensor. That's why I've got im3t Okay, and let's grab a bit of it Okay, and turn it into a pandas data frame And the only reason i'm turning it into a pandas data frame is that pandas has a very convenient thing called background gradient That turns uh a background into a gradient as you can see So here is the top bit of three you can see that the zeros of the whites and the numbers near 255 Are the blacks okay, and there's some what's it bits in the middle which are which are gray So here we have we can see what's going on when our Images which are numbers actually get displayed on the screen. It's just it's just doing this okay And so i'm just showing a subset here the actual full number in m nest is a 28 by 28 pixels square So that's 768 pixels So that's super tiny right My mobile phone. I don't know how many mega pixels it is, but it's millions of pixels So it's nice to start with something simple and small Okay So here's our goal create a model But by model it has been some kind of computer program learned from data Um that can recognize threes versus sevens So you could think of it as a three detector. Is it a three because if it's not a three it's a seven um, so have it stop here pause the video and have a think How would you do it? How would you like you don't need to know anything about neural networks or anything else? How might you Just with common sense build a three detector Okay, so I hope you grabbed a piece of paper a pen jutted some notes down I'll tell you the first idea that came into my head Was what if we grab every single three in the data set? and take the average Of the pixels so what's the average of? This pixel the average of this pixel the average of this pixel the average of this pixel right and so there'll be a 28 by 28 picture Which is the average of all of the threes and that would be like the ideal three and then we'll do the same for sevens And then so when we then grab something from the validation set to classify we'll say like oh um Is this image closer to the ideal threes the ideal three the mean of the threes or the ideal seven This is my idea and so i'm going to call this the pixel similarity approach um, i'm describing this as a baseline a baseline is like A super simple model that should be pretty easy to program from scratch with very little magic you know, maybe it's just a bunch of kind of simple averages simple arithmetic which You're super confident. It's going to be better than better than a random model right and um, one of the biggest mistakes i see In even experience practitioners is that they fail to create a baseline and so then they build some fancy Bayesian model or or some fancy Uh, they create some fancy Bayesian model or some fancy neural network and they go wow jeremy look at my amazingly great model And i'll say like how do you know it's amazingly great and i'll say oh look the accuracy is 80 percent And then i'll say okay, um, let's see what happens if we create a model where we always predict the mean Oh, look that's 85 percent um And people get pretty disheartened when they discover this right and so make sure you start with a reasonable baseline and then gradually build on top of it So we need to get um The average of the pixels um, so We're going to learn some nice python programming tricks to do this um, so the first thing we need to do is we need a list Of all of the sevens. So remember We've got the sevens which is just a list of file names, right and um So for each of those file names in the sevens, let's image dot open that file Just like we did before to get a pio object and let's convert that into a tensor so this thing here Is called a list comprehension. So if you haven't seen this before this is one of the most powerful and useful tools in python If you've done something with c sharp, it's a little bit like link. It's not as powerful as link, but it's a similar idea Um, if you've done some functional programming in in javascript It's a bit like some of the things you can do with that too. Um, but basically we're just going to go through This collection, uh, each item will become Called o and then it will be passed to this function Which opens it up and turns it into a tensor And then it will be collated all back into a list and so this will be all of the sevens as tensors So silver and i use list and dictionary comprehensions every day And so you should definitely spend some time checking it out if you haven't already So now that we've got a list of all of the threes as tensors Let's just grab one of them And display it So remember this is a tensor not a pio image object So jupyter doesn't know how to display it Um, so we have to use um, uh, a command to display it and show image as a fast ai command that displays A tensor and so here is our three So we need to get um the average of all of those threes So to get the average The first thing we need to do is to turn change this so it's not a list But it's a tensor itself so currently three tensors one as a shape Which is 28 by 28 So this is this is the rows by columns the size of this thing right but three tensors itself Is just a list So I can't really easily do mathematical computations on that So what we could do is we could stack all of these 28 by 28 images on top of each other to create a Like a 3d cube Of images and that's still quite a tensor So a tensor can have as many of these axes or dimensions as you like And to stack them up you use funnily enough Stack that so this is going to turn the list Into a tensor and as you can see the shape of it is now 6131 by 28 by 28 So it's kind of like a cube of height 6131 by 28 by 28 The other thing we want to do is if we're going to take the mean We want to turn them into floating point values Because we we don't want to kind of have integers rounding off The other thing to know is that It's just as kind of a standard in computer vision that when you're working with floats That you you expect them to be between zero and one So we just divide by 255 because they were between zero and 255 before So this is a pretty standard way to kind of Represent a bunch of images in PyTorch So these three things here are called the axes first axis second axis third axis And overall we would say that this is a rank three tensor because it has three axes so the This one here was a rank two tensor Just have two axes So you can get the rank from a tensor by just taking the length of its shape one two three three, okay You can also get that from So the word i've been using the word axis You can also use the word dimension I think numpy tends to call it axis pyTorch tends to call it dimension So the rank is also The number of dimensions end in So you need to make sure that you remember This word rank is the number of axes or dimensions in a tensor and the shape Is a list containing the size of each axis In a tensor so We can now say stack threes Dot mean now if we just say stack threes dot mean That returns a single number. That's the average pixel across that whole cube that whole rank three tensor But if we say mean zero That is take the mean over this axis. So that's the mean across the images right and so That's now 28 by 28 again because we kind of like Reduced over this 6136131 Axis we we took the mean across that axis And so we can show that image and here is our ideal three So here's the ideal seven using the same approach All right, so now let's just grab A three there's just any old three here it is And what I'm going to do is I'm going to say well, is this three more similar to the perfect three Or is it more similar to the perfect seven? And whichever one it's more similar to I'm going to assume that's that's the answer um So we can't just say look at each pixel and say Um, what's the difference between this pixel? You know zero zero here and zero zero here and then zero one here and then zero one here and take the average The reason we can't just take the average is that there's positives and negatives and they're going to average out to nothing Right. Um, so I actually need them all to be positive numbers Um, so there's two ways to make them all positive numbers I could take the absolute value which simply means remove the minus signs Okay, and then I could take the average of those um, that's called the mean absolute difference or l one norm or I could take the square of each difference And then take the mean of that and then at the end I could take the square root Kind of undoes the squaring And that's called the root mean squared error or l two So let's have a look let's take a three And subtract from it the mean of the threes and take the absolute value and take the mean Okay, and call that the distance using absolute value of the three to a three Uh, and that there is the number point one, right? So this is the mean absolute difference or l one norm So when you see a word like l one norm if you haven't seen it before it may sound pretty fancy But all these math terms that we see, you know, you can Turn them into a tiny bit of code, right? It's it's you know, don't let the mathy bits for you that they're often like in code It's just very obvious what they mean. Uh, whereas with math you just you just have to learn it or Learn how to google it So here's the same version for squaring take the difference wear it take the mean and then take the square root Okay So there we'll do the same thing for our three and this time we'll compare it to the mean of the sevens All right, so the distance from a three to the mean of the threes in terms of absolute was point one Uh, and the distance from a three to the mean of the sevens was point one five So it's closer to the mean of the threes than it is to the mean of the sevens So we guess therefore that this is a three Based on the mean absolute difference Same thing with rmsc root mean squared error would be to compare this value This value and again root mean squared error. It's closer to the mean three than to the mean seven so this is like A machine learning model kind of it's a data driven model which attempts to recognize threes versus sevens Um, and so this is a good baseline I mean, it's it's a reasonable baseline. It's going to be better than random We don't actually have to write out minus abs mean We can just actually use l1 loss l1 loss does exactly that We don't have to write minus squared We can just write msc loss That doesn't do the square root by default. So we have to pop that in okay, and as you can see they're exactly The same numbers um It's very important before we kind of go too much further to make sure we're very comfortable Working with arrays and tensors and you know, they're they're so similar so We could start with a list of lists Right, which is kind of a matrix We can convert it into an array Or into a tensor We can display it And they look almost the same You can index into a single row You can index into a single column And so it's important to know this is very important colon means Um every row because I put it in the first spot, right? So if I put it in the second spot It would mean every column and so therefore comma colon is exactly the same as removing it So it just turns out you can always remove Colons that are at the end because they're kind of they're just implied right? You never have to and I often kind of put them in anyway Because just kind of makes it a bit more obvious how these things kind of match up or how they differ Um, you can combine them together. So give me the first row And everything from the first up to but not including the third column Right, so there's that five six Um, you can add stuff to them. You can check their type. Notice that this is different to the python The python Type so type is a function Just tells you it's a tensor If you want to know what kind of tensor you have to use type as a method So it's a long tensor Um, you can Modelplied them by a float turns it into a float, you know to have a fiddle around if you haven't done much stuff with numpy or I've watched before this is a good opportunity to just Go crazy try things out try try things that you think might not work and see if you actually get an error message, you know so We now want to find out How good is our model Um, our model that involves just comparing something to to the uh, domain So We should not compare You should not check how good our model is on the training set as we've discussed We should check it on a validation set and we already um have A validation set. It's everything inside the valid directory. So let's go ahead and like combine all those steps before Let's go through everything in the validation set three ls Open them turn them into a tensor stack them all up Turn them into floats Divide by 255 Okay, um Let's do the same for sevens So we're just putting all the steps we did before into a couple of lines Um, I always try to print out shapes like all the time Um, because if a shape is not what you expected then you can you know get weird things going on um So the idea is we want some function is three that will return true if we think something is a three Um, so to do that we have to decide whether our Digit that we're testing on is closer to the ideal three or the ideal seven Um, so let's create a little function that returns Um, take the difference between two things takes the absolute value and then takes the mean Um, so we're going to create this function amnest distance that takes the difference between two answers Takes their absolute value And then takes the mean and it takes the mean and look at this. We got minus this time It takes the mean over the last Um over the um Second last and third last Sorry the last and second last dimensions. So this is going to take um The mean across the kind of x and y axes And so here you can see it's returning A single number Which is the distance of a three from the mean three Um, so that's the same as the value that we got earlier point one one one four Um, so we need to do this for every image in the validation set because we're trying to find the overall metric Remember the metric is the thing we look at to say how good is our model So here's something crazy. We can call amnest distance Not just on r3 But our on the entire validation set Against the mean three So that's wild like there's no normal programming that we would do where we could somehow pass in Either a matrix Or a rank three tensor and somehow it works both times And what actually happened here is that instead of returning a single number It returned 1010 numbers And it did this because it used something called broadcasting and broadcasting is like The super special magic trick that lets you make python into a very very high performance language And in fact, if you do this broadcasting On gpu tensors and pytorch, it actually does this operation on the gpu even though you wrote it in python Here's what happens Look here this a minus b So we're doing a minus b On two things. We've got first of all valid three tens or valid three tensor Is um a thousand or so images right and remember that mean three Is just our single ideal three So what is Something of this shape minus something of this shape Well broadcasting means that if this shape doesn't match this shape Like if they did match it would just subtract every corresponding item But because they don't match It's it's actually acts as if there's a thousand and ten versions Of this So it's actually going to subtract this from every single one of these right So broadcasting let's look at some examples So broadcasting requires us to first of all understand the idea of element wise operations This is an element wise operation. Here is a rank one tensor Of size three and another rank one tensor of size three So we would say these sizes match. They're the same and so when I add one two three to one one one I get back two three four It just takes the corresponding items and adds them together. So that's called element wise operations so When I have different Shapes as we described before What it ends up doing is it basically copies This this number a thousand and ten times And it acts as if we had said valid three tens minus A thousand and ten copies Of mean three As it says here, it doesn't actually copy mean three a thousand and ten times. It just pretends That it did right it just acts as if it did so basically kind of loops back around to the start again and again And it does the whole thing and see or in CUDA on the gpu So Then we see absolute value. All right, so let's go back up here After we do the minus we go absolute value So what happens when we call absolute value on Something of size 10 10 by 28 by 28 just calls absolute value on each underlying thing right And then finally we call mean Minus one is the last element always in python minus two is the second last So this is taking the mean over the last two axes And so then it's going to return just the first axis. So we're going to end up with a thousand and ten Means a thousand and ten Distances which is exactly what we want. We want to know how far away is our each of our validation items away from the The ideal three so then We can create our is three function, which is hey is the distance Between the number and question and the perfect three Less than the distance between the number and question and the perfect seven if it is It's a three right, so Our three that was that actual three we had is it a three? Yes Okay, and then we can turn that into a float and yes becomes one Thanks to broadcasting we can do it for that entire That's it right. So this is so cool. We basically get rid of loops In in in this kind of programming you should have very few very very few loops loops make things much harder to read And and hundreds of thousands of times slower on the gpu potentially tens of millions of times slower So we can just say is three on our whole Valid three tens and then turn that into float and then take the mean So that's going to be the accuracy of the threes on average And here's the accuracy of the sevens. It's just one minus that So the accuracy across threes is about 91 and a bit percent the accuracy on sevens is about 98 percent And the average of those two is about 95 percent. So here we have A model that's 95 percent accurate at recognizing threes from sevens um, it might surprise you that we can do that using nothing but arithmetic, right Um, so that's what I mean by getting a good baseline now the thing is It's not obvious how we kind of improve this right, I mean The thing is it doesn't match Arthur Samuel's description Of machine learning, right? This is not something where there's a function Which has some parameters Which we're testing against some kind of measure of fitness and then using that to like improve the parameters iteratively we kind of we just did one step And that's that right um So we want to try and do it in this way where we arrange for some automatic means of testing the effectiveness of He called it a weight assignment We'd call it a parameter assignment in terms of performance and a mechanism for alterating altering the weight assignment to maximize the performance That we want to do it that way Right because we know from from chapter one from lesson one, but if we do it that way we have this like magic box Right chord machine learning that can do, you know, particularly combined with neural nets should be able to solve Any problem in theory If you can at least find the right set of weights So we need something that we can get better and better um to learn So let's think about um A function which has parameters So instead of finding An ideal image and seeing how far away something is from the ideal image um So instead of like having something where we test how far away we are from an ideal image What we could instead do Is come up with a set of weights For each pixel So we're trying to find out if something is the number three And so we know that like in the places that you would expect to find three pixels You could give those like high weights so you can say hey if there's a dot in those places We give it like a high score And if there's dots in other places We'll give it like a low score But we can actually come up with a function where the probability of something being an What in this case let's say an eight um is equal to The pixels in the image Modified by some set of weights And then we sum them up Right, so then anywhere where um our The image we're looking at, you know As pixels where there are high weights It's going to end up with a high probability. So here x is the image that we're interested in And we're just going to represent it as a vector. So let's just have all the rows stacked up End to end into a single long line so We're going to use an approach where we're going to start with A vector w so a vector is a rank one tensor Okay, we're going to start with a vector w That's going to contain random weights random parameters Depending on whether you use the ather Samuel version of the terminology or not and so We'll then predict Whether a number appears to be a three or a seven By using this Tiny little function Um, and then we will figure out how good the model is So we will calculate like how accurate it is or something like that Um, yeah, this is the loss And then the key step is we're then going to calculate the gradient Now the gradient is something that measures for each weight if I made it a little bit bigger Would the loss get better or worse? If I made it a little bit smaller Would the loss get better or worse? And so if we do that for every weight We can decide for every weight whether we should make that weight a bit bigger or a bit smaller So that's called the gradient, right? So once we have the gradient we then step is the word we just step we change all the weights Up a little bit for the ones where the gradient we should said we should make them a bit higher And down a little bit for all the ones where the gradient said they should be a bit lower So now it should be a tiny bit better. And then we go back to step two And calculate a new set of predictions using this formula Calculate the gradient again Step the weights Keep doing that. So this is basically the flow chart and then at some point when we're sick of waiting or when the loss gets good enough We'll stop So these seven steps One two three four five six seven These seven steps are the key to training all deep learning models. This technique is called stochastic gradient descent Well, it's called gradient descent. We'll see this stochastic bit very soon And for each of these seven steps There's lots of choices around exactly how to do it, right? We've just kind of hand waved a lot like What kind of random initialization and how do you calculate the gradient and exactly what step do you take based on the gradient? And how do you decide from to stop blah blah blah, right? So in this in this course We're going to be like learning about, you know, these steps Um, you know, that's kind of part one, you know, the the other big part is like, well, what's the actual function? Neural network. So how do we train the thing and what is the thing that we train? So we initialize parameters with random values We need some function that's going to be the loss function that will return a number that's small If the performance of the model is good It's some way to figure out whether the weight should be increased a bit or decreased a bit And then we need to decide like when to stop which would just say let's just do a certain number of epochs So let's like Go even simpler, right? We're not even going to do MNIST. We're going to start with this function x squared Okay, and in faster. Yeah, we've created a tiny little thing called plot function that plots the function Um All right, so there's our function f And what we're going to do is we're going to try to find this is our loss function So we're going to try and find The bottom point right so we're going to try and figure out what is the x value Which is at the bottom so our seven step procedure requires us to start out by initializing So we need to pick Some value right so the value we pick with this say oh, let's just randomly pick minus one and a half Great So now we need to know If I increase x a bit does my remember this is my loss does my loss get a bit better I remember better is smaller or a bit worse So we can do that easily enough We can just try a slightly higher x and a slightly lower x and see what happens Right and you can see it's just the slope Right the slope at this point Tells you that if I increase x by a bit Then my loss will decrease because that is the slope at this point So if we change Our our weight our parameter Just a little bit in the direction of the slope Right, so here is the direction of the slope and so here's the new value at that point Right and then do it again And then do it again Eventually we'll get to the bottom of this curve Right So this idea goes all the way back to Isaac Newton at the very least and this basic idea is called Newton's method So a key thing we need to be able to do is to calculate this slope and The bad news is Do that we need calculus At least that's bad news for me because I've never been a fan of calculus. We have to calculate the derivative Here's the good news though Maybe you spent ages in school learning how to calculate derivatives You don't have to anymore the computer does it for you and the computer does it fast It uses all of those methods that you learned at school And it had a whole lot more Like clever tricks for speeding them up and it just does it all Automatically so for example, it knows I don't know if you remember this from high school That the derivative of x squared is 2x It it's just something it knows it's part of its kind of bag of tricks, right? So So pi torch knows that pi torch has An engine built in that can take derivatives and find the gradient the functions So to do that we start with A tensor let's say And in this case, we're going to modify this tensor with this special method called requires grad and what this does is it tells pi torch that any time I do a calculation with this xt It should remember what calculation it does so that I can take the derivative later Um, do you see the underscore at the end? An underscore at the end of a method in pi torch means that this is called an in-place operation It actually modifies this so requires grad underscore Modifies this tensor to tell pi torch that we want to be calculating gradients on it So that means it's just going to have to keep track of all of the computations we do So that it can calculate the derivative later Okay, so we've got the number three And let's say we then call f on it. Remember f is just squaring it though three squared is nine But the value is not just nine. It's nine accompanied with a grad function Which is that it's it knows that a power operation has been taken So we can now call a special method backward And backward Which refers to back propagation which we'll learn about Which basically means take the derivative And so once it does that we can now look inside xt because we said requires grad and find out as gradient And remember the derivative of x squared is 2x In this case that was three Two times three is six right, so um We didn't have to figure out the derivative. We just call backward and then get the grad attribute To get the derivative So that's how easy it is to do calculus in pi torch. So What you need to know about calculus is not how to take a derivative But what it means and what it means is It's a slope At some point Now here's something interesting. Let's not just take three, but let's take a rank one tensor also known as a vector 3410 and Let's add sum To our f function. So it's going to go x squared dot sum So now we can take f of This vector Get back 125 And then we can say backward And grad and look 2x 2x 2x Right, so we can calculate um This is this is Vector calculus, right? We're getting uh the gradient for every element of a vector With the same two lines of code So that's kind of all you need to know about calculus, right and if this is Um, if this idea that that a derivative or gradient is a slope is unfamiliar Um, check out Khan Academy. They have some great introductory calculus And don't forget you can skip all the bits where they teach you how to calculate The gradients yourself So now that we know how to calculate the gradient that is the slope of the function that tells us if we change Our input a little bit How will our output change? Correspondingly that's what a slope is Right and so that tells us For every one of our parameters if we know their gradients Then we know if we change that parameter up a bit or down a bit. How will it change our loss? So therefore we then know how to change our parameters So what we do Is let's say all of our weights called w We just subtract off them the gradients multiplied by some small number And that small number is often a number between about 0.001 and 0.1 Um, it's called the learning rate right and this year is the essence of gradient descent So if you pick a learning rate that's very small Then you take the slope and you take a really small step in that direction And another small step another small step another small steps And it's going to take forever to get to the end If you pick a learning rate that's too big You jump way too far Each time and again, it's going to take forever And in fact in this case, sorry like this case. We're assuming we're starting here and it's actually it's so big that it got worse and worse Or here's one where we start here and it's like it's not So big it gets worse and worse, but it just takes a long time to bounce in and out right so Picking a good learning rate is really important both to making sure that it's even possible to solve the problem And that it's possible to solve it in a reasonable amount of time So we'll be learning about picking how to pick learning rates in this course so Let's try this Let's try using gradient descent I said sgd. That's not quite accurate. It's just going to be gradient descent To solve an actual problem So the problem we're going to solve is let's imagine You were watching a roller coaster go over the top of a hump, right? So as it comes out of the previous hill, it's going super fast and it's going up the hill And it's going slower and slower and slower until it gets to the top of the hump And then it goes down the other side. It goes faster and faster and faster So if you like had a stopwatch or whatever or a sudden Some kind of speedometer and you are measuring it just by hand At kind of equal time points you might end up with something that looks a bit like this Right and so the way I did this was I just grabbed a range just grabs The numbers from naught up to but not including 20, right? So these are the time periods at which I'm taking my speed measurement And then I've just got some Quadratic function here and multiply it by three and then square it and then add one whatever right and then I also actually sorry I take my time minus 9.5 Square it times 0.75 add one and then I add a random number So that I add a random number to every observation So I end up with a quadratic function, which is a bit bumpy So this is kind of like what it might look like in real life because my my speedometer Kind of testing is not perfect all right, so We want to create a function that estimates at any time. What is the speed of the roller coaster? so we start By guessing what function it might be so we guess that it's a function a times time squared plus b times time plus c you might remember from school is called a quadratic So let's create a function right and so Let's create it using kind of the Arthur Samuel's technique the machine learning technique this function is going to take two things It's going to take an input Which in this case is a time And it's going to take some parameters And the parameters are a b and c so in in python you can split out A list or a collection into its components like so and then here's that function here So we're not just trying to find any function in the world. We're just trying to find some function Which is a quadratic by finding an a and a b and a c So the the Arthur Samuel technique for doing this is to next up come up with a loss function Come up with a measurement of how good we are So if we've got some predictions that come out of our function And the targets which are these, you know actual values Then we could just do the Mean squared error Okay, so here's that mean squared error we saw before the difference squared and then take the mean So now we need to go through our seven step process We want to come up with a set of three parameters a b and c Which are as good as possible. So step one is to initialize a b and c to random values So this is how you get random values three of them in pi torch And remember we're going to be adjusting them. So we have to tell pi torch that we want the gradients I'm just going to save those away so I can check them later And then I calculate the predictions using that function f Which was this Um, and then let's create a little function which just plots How good at this point are our predictions? So here is a function that prints in red Our predictions and then blew our targets. So that looks pretty terrible. So let's calculate the loss Using that mse function we wrote Okay, so now we want to improve this So calculate the gradients using the two steps we saw Call backward and then get grad And this says that each of our Parameters has a gradient that's negative Um, let's pick a learning rate of 10 to the minus five So we model apply that by 10 to the minus five And step the weights and remember step the weights means minus equals learning rate times The gradient There's a one little trick here which I've called dot data The reason I've called dot data is dot data is a special attribute in pi torch Which if you use it Then the gradient is not calculated And we certainly wouldn't want The gradient to be calculated Of the actual step we're doing. We only want the gradient to be calculated of our function f All right, so when we step the weights we have to use this special Dot data attribute After we do that Delete the gradients that we already had And let's see if loss improved. So the loss before was 25,800 Now it's 5400 And the plot has gone from something that goes down to minus 300 Oh to something that looks much better So let's do that a few times So I've just grabbed those previous lines of code and pasted them all into a single cell Okay, so preds loss backward data grad is none And then from time to time print the loss out And repeat that 10 times And look getting better and better And so we can actually look at it Getting better and better So this is pretty cool, right? We have a technique. This is the Arthur Samuel technique For Finding a set of parameters that continuously improves by getting feedback From the result of measuring some loss function So that was kind of the key step, right? This this is the gradient descent method So you should make sure that you kind of go back And feel super comfortable With what's happened and you know if you're not feeling comfortable that that's fine, right? If it's been a while or if you've never done this kind of gradient descent before Um This might feel super unfamiliar So kind of try to find the first cell in this notebook Where you don't fully understand what it's doing And then like stop and figure it out Like look at everything that's going on do some experiments do some reading Um until you understand That cell where you were stuck before you move forwards So let's now apply this to MNIST Um, so for MNIST We want to use this exact technique and there's basically nothing extra we have to do Except one thing we need a loss function And The metric that we've been using is the error rate or the accuracy It's like how often are we correct, right? And and that's the thing that we're actually trying to make Good our metric but we've got a very serious problem Which is remember we need to calculate the gradient To figure out how we should change our parameters And the gradient is the slope or the steepness Which you might remember from school is defined as rise over run It's a y new minus y old divided by x new minus x old so The gradient's actually defined when x new is is very very close to x old Meaning their difference is very small But think about it Accuracy if I change a parameter by a tiny tiny tiny amount The accuracy might not change at all Because there might not be any three That we now predict as a seven or any seven that we now predict as a three Because we change the parameter by such a small amount So it's it's it's possible in fact, it's certain that the gradient is zero at many places And that means that our parameters aren't going to change at all Because learning rate times gradient is still zero when the gradient zero for any learning rate So this is why The loss function and the metric Are not always the same thing We can't use a metric As our loss if that metric has a gradient of zero So we need something different So we want to find something that kind of Is pretty similar to the accuracy in that like as the accuracy gets better This ideal function we want gets better as well But it should not have a gradient of zero Uh, so let's think about that function um Suppose we had Three images Um, actually, you know what? This is actually probably a good time To stop because actually, you know, we've we've kind of we've got to the point here where We understand gradient descent um We kind of know how to do it with a simple loss function And I actually think before we start looking at the MNIST loss function We shouldn't move on um, because we've got so much So much assignments to do for this week already. So we've got Build your web application And we've got go step through step through this notebook to make sure you fully understand it So I actually think we should probably Stop right here before we make things too crazy. So before I do Rachel are there any questions? Okay, great. All right. Well, thanks everybody. I'm sorry for that last minute change of tack there But I think this is going to make sense Um, so I hope you have a lot of fun with your web applications. Try and think of something that's really fun really interesting Um, it doesn't have to be like important. It could just be some, you know, cute thing Um, we've had students before a student that um, I think he said he had 16 different cousins And he created something that would classify Uh a photo based on which of his cousins. I was feeling like his fiance meeting his family You know, um, you can come up with anything you like. Um, but you know, yeah show off your application and um Maybe have a look around at what ipai widgets can do and try and come up with something that you think's pretty cool Um, all right. Thanks everybody. I will see you next week