 Okay. So, welcome. Practical deep learning for coders lesson one. It's kind of lesson two because there's a lesson zero and lesson zero is why do you need a GPU and how do you get it set up? So if you haven't got a GPU running yet, then go back and do that. Make sure that you can access a Jupyter notebook and then you're ready to start the real lesson one. So if you're ready, you will be able to see something like this. And in particular, hopefully, you have gone to notebook tutorial. It's at the top, that's right up to zero zero here. As this grows, you'll see more and more files, but we'll keep notebook tutorial at the top and you will have used your Jupyter notebook to add one and one together, get in the expected result. Let's make that a bit bigger. And hopefully you've learned these four keyboard shortcuts. So the basic idea is that your Jupyter notebook has pros in it. It can have pictures in it. It can have charts in it. And most importantly, it can have code in it. So the code is in Python. How many people have used Python before? So nearly all of you, that's great. If you haven't used Python, that's totally okay. It's a pretty easy language to pick up, but if you haven't used Python, this will feel a little bit more intimidating because the code that you're seeing will be unfamiliar to you. Yes, Rachel? This is the people in the room that's okay if they don't. Oh yeah, yeah, no, because I'm trying to keep the MOOC secret, yeah, yeah. Okay, well now that we're here, I'll edit this bit out. So, as I say, there are things like this where people in the room, in person, this is one of those bits, so it's like this is really for the MOOC audience, not for you. I think this will be the only time like this in the lesson where we've assumed you've got this set up. Thanks a lot. Okay. All right, so yeah, for those of you in the room or in Fast.ai live, you can go back after this and make sure that you can get this running using the information in course v3.fast.ai. Okay, okay. Okay, so a Jupyter notebook is a really interesting device for a data scientist because it kind of lets you run interactive experiments and it lets us give you not just a static piece of information, but it lets us give you something that you can actually interactively experiment with. So let me explain how we think works well to use these notebooks and to use this material. And this is based on the kind of last three years of experience we've had with the students who have gone through this course. First of all, it works pretty well just to watch a lesson end to end. Don't try and follow along because it's not really designed to go at a speed where you can follow along. It's designed to be something where you just take in the information, you get a general sense of all of the pieces, how it all fits together. And then you can go back and go through it more slowly, pausing in the video and trying things out, making sure that you can do the things that I'm doing and that you can try and extend them to do it things in your own way. So don't worry if things are zipping along faster than you can do them. That's normal. Also, don't try and stop and understand everything the first time. If you do understand everything the first time, good for you. But most people don't. Particularly as the lessons go on, they get faster and they get more difficult. So at this point we've got our notebooks going, we're ready to start doing deep learning and so the main thing that hopefully you're going to agree at the end of this is that you can do deep learning regardless of who you are. That just means we may do at a very high level. I mean, world class, practitioner level, deep learning. So your main place to be looking for things is courseb3.fast.ai where you can find out how to get a GPU, other information and you can also access our forums. You can also access our forums and on our forums you'll find things like how do you build a deep learning box yourself and that's something that you can do later on once you've kind of got going. Who am I? So why should you listen to me? Well, maybe you shouldn't but I'll try and justify why you should listen to me. I've been doing stuff with machine learning for over 25 years. I started out in management consulting where actually initially I was, I think McKinsey and company's first analytical specialist and went into general consulting, ran a number of startups for a long time, eventually became the president of Kaggle but actually the thing I'm probably most proud of in my life is that I got to be the number one ranked contestant in Kaggle competitions globally. So I think that's a good practical way to actually train a predictive model that predicts things, pretty important aspect of data science. I then founded a company called Inletic which was the first kind of medical deep learning company. Nowadays I'm on the faculty at University of San Francisco and also co-founder with Rachel of FastAI. So I've used machine learning throughout that time and I guess I'm not really, although I am at the university, I'm not really an academic type, I'm much more interested in using this tool to do useful things. Specifically through FastAI we are trying to help people use deep learning to do useful things through creating software to make deep learning easier to use at a very high level through education such as the thing you're watching now. Through research which is where we spend a very large amount of our time which is researching to figure out how can you make deep learning easier to use at a very high level which ends up as you'll see in the software and the education and by helping to build a community which is mainly through the forums so that practitioners can find each other and work together. So that's what we're doing. So this lesson, practical deep learning for coders is kind of the starting point in this journey. It contains seven lessons, each one's available. We're then expecting you to do about 8 to 10 hours of homework during the week so it'll end up being something around 70 or 80 hours of work. I will say it varies a lot as to how much people put into this. I know a lot of people who work full time on FastAI. Some folks who do the two parts can spend a whole year doing it really intensively. I know some folks watch the videos on double speed and never do any but you can come at the end of it with a general sense of what's going on. So there's lots of different ways you can do this. But if you follow along with this kind of 10 hours a week or so approach for the seven weeks, by the end you will be able to build an image classification model on pictures that you choose that will work at a world class level. You'll be able to classify text again using whatever data sets you're interested in. You'll be able to make predictions on commercial applications like sales. You'll be able to build recommendation systems such as the one used by Netflix. Not Tory examples of any of these but actually things that can come top 10 in capital competitions that can beat everything that's in the academic community. Very, very high level versions of these things. So that might surprise you that the prerequisite here is literally one year of coding and high school life. But we have thousands of students now who have done this and shown it to be true. You will probably hear a lot of naysayers less now than a couple of years ago than we started but a lot of naysayers telling you that you can't do it or that you shouldn't be doing it or that deep learning's got all these problems. It's not perfect but these are all things that people claim about deep learning which are either pointless or untrue. It's not a black box as you'll see it's really great for interpreting what's going on. It does not need much data for most practical applications. You certainly don't need a PhD Rachel has one so it doesn't actually stop you from doing deep learning if you have a PhD. I certainly don't I have a philosophy degree and nothing else. It can be used very widely for lots of different applications not just for vision which is where you don't need lots of hardware that 36 cent an hour server is more than enough to get world class results for most problems. It's true that maybe this is not going to help you to build a sentient brain but that's not our focus. For all the people who say deep learning is not interesting because it's not really AI not really a conversation that I'm interested in we're focused on solving interesting real-world problems. What are you going to be able to do by the end of lesson one? This is an example from Nick Hill who's actually in the audience now because he was in last year's course as well. This is an example of something he did which is he downloaded 30 images of people playing cricket and people playing baseball and ran the code you'll see today and built a nearly perfect classifier of which is which. It's kind of stuff that you can build fun hobby examples like this or you can try stuff as we'll see in the workplace that could be of direct commercial value. This is the idea we're going to get to by the end of lesson one. We're going to start by looking at code which is very different to many of the academic courses so for those of you who have an engineering or math or computer science background this is very different to the approach where you start with a theory and then eventually you get to a postgraduate degree and you finally are at the point where you can build something useful. We're going to learn to build the useful thing today. Now that means that at the end of today you won't know all the theory. There will be lots of aspects of what we do that you don't know why or how it works. That's okay. You will learn why and how it works over the next seven weeks. But for now we've found that what works really well is to actually get your hands dirty coding. Not focusing on theory. Because there's still a lot of artisanship in deep learning unfortunately. It's still a situation where people who are good practitioners have a really good feel for how to work with the code and how to work with the data and you can only get that through experience. And so the best way to get that feel of how to get good models is to engage lots of models. Through lots of coding and study them carefully. And a Jupyter Notebook provides a really great way to study them. So let's try that. Let's try getting started. So to get started you will open your Jupyter Notebook and you'll click on lesson one. Lesson one pets. And it will pop open looking something like this. And so here it is. You can run a cell in a Jupyter Notebook by clicking on it and pressing run. But if you do so everybody will know that you're not a real deep learning practitioner. Because real deep learning practitioners know the keyboard shortcuts. And the keyboard shortcut is Shift-Enter. Given how often you have to run a cell don't be going all the way up here, finding it, clicking it. Just Shift-Enter. So type, type, type, type, type, type, type, type, type. You can go around to move around to pick something to run. Shift-Enter to run it. So we're going to go through this quickly and then later on we're going to go back over it more carefully. So here's a quick version to get a sense of what's going on. So here we are in lesson one and these three lines is what we start every notebook with. These things starting with percent are special directives to Jupyter Notebook itself. They're not Python code. They're code magics. Which is kind of a cool name. And these three directives, the details aren't very important but basically it says, hey, if somebody changes the underlying library code while I'm running this, please reload it automatically. If somebody asks to plot something, then please plot it here in this Jupyter Notebook. So just put those three lines at the top of everything. The next three lines load up the FastAI library. What is the FastAI library? So it's a little bit confusing. FastAI with no dot is the name of our software and then Fast.AI with the dot is the name of our organization. So if you go to docs.fast.ai this is the FastAI library. We'll learn more about it in a moment but for now just realize everything we are going to do is going to be using basically either FastAI or the thing that FastAI sits on top of which is PyTorch. PyTorch is one of the most popular libraries for deep learning in the world. It's a bit newer than TensorFlow. So in a lot of ways it's more modern than TensorFlow. It's extremely fast growing, extremely popular and we use it because we used to use TensorFlow a couple of years ago and we found we can just do a lot more, a lot more quickly with PyTorch and then we have this software that sits on top of PyTorch and lets you do far, far, far more things, far, far, far more easily than you can with PyTorch alone. So it's a good combination we'll be talking a lot about it. But for now just know that you can use FastAI by doing two things importing Star from FastAI and then importing Star from FastAI dot something where something is the application you want and currently FastAI supports four applications, computer vision natural language text tabular data and collaborative filtering. And we're going to see lots of examples of all of those during the seven weeks. So we're going to be doing some computer vision. At this point if you are a Python software engineer you are probably feeling sick because you've seen me go import Star which is something that you've all been told to never ever do. And there's very good reasons to not use import Star in standard production code with most libraries. But you might have also seen for those of you that have used something like Matlab it's kind of the opposite. Everything's there for you all the time you don't even have to import things a lot of the time. It's kind of funny we've got these two extremes of like how do I code. You've got the scientific programming community that has one way and then you've got the software engineering community that has the other. You both have really good reasons for doing things and with the FastAI library we actually support both approaches. In age you put a notebook where you want to be able to quickly, interactively try stuff out. You don't want to be constantly going back up to the top and importing more stuff and trying to figure out where things are. You want to be able to use lots of tab complete, be very experimental so import Star is great. Then when you're building stuff in production you can do the normal PEP 8 trial, proper software engineering practices. Don't worry when you see me doing stuff which at your workplace is found upon. This is a different style of coding. It's not that there are no rules in data science programming. The rules are different. When you're training models the most important thing is to be able to interactively experiment quickly. You'll see we use a lot of very different processes, styles and stuff to what you're used to. They're there for a reason and you'll learn about them over time. You can choose to use a similar approach or not. It's entirely up to you. The other thing to mention is that the Fast AI Libraries designed in a very interesting modular way and you'll find over time that when you do use import Star there's far less clobbering of things than you might expect. It's all explicitly designed to allow you to understand things and use them quickly without having problems. Okay. We're going to look at some data and there's two main places that we'll be attending to get data from for the course. One is from academic data sets. Academic data sets are really important. They're really interesting. They're things where academics spend a lot of time curating and gathering a data set so that they can show how well different kinds of approaches work with that data. The idea is they try to design data sets that are challenging in some way and require some kind of breakthrough to do them well. We're going to be starting with an academic data set called the PET data set. The other kind of data set we'll be using during the course is data sets from the Kaggle competitions platform. Both academic data sets and Kaggle data sets are interesting for us particularly because they provide strong baselines. That is to say, you want to know if you're doing a good job. With Kaggle data sets that come from a competition, you can actually submit your results to Kaggle and see how well would you have gone in that competition. If you can get in about the top 10%, then I'd say you're doing pretty well. For academic data sets, academics write down in papers what the state of the art is. How well did they go with using models on that data set? This is what we're going to do. We're going to create models that get right up towards the top of Kaggle competitions, preferably in the top 10%, or that meet or exceed academic state of the art published results. When you use an academic data set, it's important to cite it so you'll see here there's a link to the paper that it's from. You definitely don't need to read that paper right now, but if you're interested in why it was created and how it was created, all the details are there. In this case, this is a pretty difficult challenge. The PAC data set is going to ask us to distinguish between 37 different categories of dog, breed, and cat breed. That's really hard. In fact, every course until this one we've used a different data set which is one where you just have to decide is something a dog or is it a cat? You've got a 50-50 chance right away, and dogs and cats look really different. There are lots of dog breeds and cat breeds look pretty much the same. Why have we changed that data set? We've got to the point now where deep learning is so fast and so easy that the dogs versus cats problem which a few years ago was considered extremely difficult, 80% accuracy was the state of the art, it's now too easy. Our models were basically getting everything right all the time without any tuning so there weren't really a lot of opportunities for me to show you how to do more sophisticated stuff. We've picked a harder problem this year. This is the first class where we're going to be learning how to do this difficult problem. This kind of thing where you have to distinguish between similar categories in the academic context is called fine-grained classification. We're going to do the fine-grained classification task of figuring out a particular kind of pet. The first thing we have to do is download and extract the data that we want. We're going to be using this function called un-tar data which will download it automatically and un-tar it automatically. AWS has been kind enough to give us lots of space and bandwidth for these data sets so they'll download super quickly for you. So the first question then would be how do I know what un-tar data does? So you can just type help and you will find out what module did it come from because since we imported star we don't necessarily know that. What does it do? And something you might not have seen before even if you're an experienced programmer is what exactly do you pass to it? You're probably used to seeing the names URL, file name destination but you might not be used to seeing these bits. These bits are types and if you've used a type programming language you'll be used to seeing them but Python programmers are less used to it. But if you think about it you don't actually know how to use a function unless you know what type each thing is that you're providing it so we make sure that we give you that type information directly here in the help. So in this case the URL is a string and the file name is either union means either, either a path or a string and it defaults to nothing and the destination is either a path or a string that defaults to nothing. So we'll learn more shortly about how to get more documentation about the details of this but for now we can see we don't have to pass in a file name or a destination it'll figure them out for us from the URL so and for all the datasets we'll be using in the course we already have constants defined for all of them, right? So in this URLs module class actually you can see that's where it's going to grab it from so it's going to download that to some convenient path and untie it for us and we'll then return the value of path and then in Jupyter Notebook it's kind of handy you can just write a variable on its own and semicolon is just an end statement marker in Python so it's the same as doing this you can write it on phone and it prints it you can also say print but again we're trying to do everything fast and interactively, just write it and here is the path where it's given us our data next time you run this since you've already downloaded it it won't download it again since you've already untied it it won't untie it again so everything's kind of designed to be pretty automatic pretty easy there are some things in Python that are less convenient for interactive use than they should be for example when you do have a path object seeing what's in it actually takes a lot more typing than I would like so sometimes we add functionality into existing Python stuff one of the things we do is we add an ls method to paths so if you go path.ls here is what's inside this path so that's what we just downloaded so when you try this yourself you wait a couple minutes for it to download unzip and you can see what's in there if you're an experienced Python programmer you may not be familiar with this approach of using a slash like this this is a really convenient function that's part of Python 3 it's functionality from something called pathlib these are path objects path objects are much better to use than strings that lets you basically create sub paths like this it doesn't matter if you're on Windows, Linux, Mac it's always going to work exactly the same way so here's a path to the images in that data set alright so if you're starting with a brand new data set try to do some deep learning on it what do you do? well the first thing you would want to do is probably see what's in there so we found that these are the directories that are in there so what's in this images there's a lot of functions in FastAA for you there's one called getImageFiles there's an array of all of the image files based on extension in a path and so here you can see we've got lots of different files okay so this is a pretty common way for computer vision data sets to get passed around is that there's just one folder with a whole bunch of files in so the interesting bit then is how do we get the labels so in machine learning labels refer to the thing we're trying to predict and if we just eyeball this we can immediately see that the labels are actually part of the file name you see that right it's kind of like path slash label underscore number extension so we need to somehow get a list of these bits of each file name and that will give us our labels because that's all you need to build a deep learning model you need some pictures so files containing the images and you need some labels so in fastai this is made really easy there's an object called image data bunch and the image data bunch represents all of the data you need to build a model and there's basically some factory methods which try to make it really easy for you to create that data bunch we'll talk more about this shortly but a training set and a validation set with images and labels for you now in this case we can see we need to extract the labels from the names so we're going to use from name re so for those of you that use python you know re is the module in python that does regular expressions things that's really useful for extracting text I just went ahead and created the regular expression that would extract the label from this text so those of you who are not familiar with regular expressions super useful tool it would be very useful to spend some time figuring out how and why that particular regular expression is going to extract the label from this text so with this factory method we can basically say okay I've got this path containing images this is a list of file names remember I got them back here this is the regular expression pattern that's going to be used to extract the label from the file name we'll talk about transforms later and then you also need to say what size images do you want to work with so that might seem weird why do I need to say what size images I want to work with because the images have a size we can see what size images are and I guess honestly this is a shortcoming of current deep-lining technology which is that a GPU has to apply the exact same instruction to a whole bunch of things at the same time in order to be fast and so if the images are different shapes and sizes it can't do that so we actually have to make all of the images the same shape and size in part one of the course we're always going to be making images square shapes in part two we'll learn how to use rectangles as well which turns out to be surprisingly nuanced but pretty much everybody in pretty much all computer vision modeling nearly all of it uses this approach of square and 224 by 224 for reasons we'll learn about is an extremely common size that most models tend to use so if you just use size equals 224 you're probably going to get pretty good results most of the time and this is kind of the little bits of artisan ship that I want to teach you folks which is like what generally just works so if you just use size equals 224 that'll generally just work for most things most of the time so this is going to return a data bunch object and in FastAI everything you model with is going to be a data bunch object we're going to learn all about them and what's in them and how do we look at them and so forth but basically a data bunch object contains two or three data sets your training data we'll learn about this shortly it'll contain your validation data and optionally it contains your test data and for each of those it contains your images and your labels or your texts and your labels or your tabular data and your labels and so forth and that all sits there in this one place something we'll learn more about a little bit is normalization but generally in all nearly all machine learning tasks you have to make all of your data about the same size specifically about the same mean and about the same standard deviation so there's a normalize function that we can use to normalize our data bunch in that way Rachel come and ask the question come over here what does the function do if the image size is not 224? great so this is what we're going to learn about shortly basically this thing called transforms is used to do a number of things and one of the things it does is to make something size 224 let's take a look at a few pictures here are a few pictures of things from my data from my data bunch so you can see data.showbatch can be used to show me the contents of some of the contents of my data bunch so this is going to be 3x3 and you can see roughly what's happened is that they all seem to have been kind of zoomed and cropped in a reasonably nice way so basically what it will do is something called by default center cropping which means it will kind of grab the middle bit and it will also resize it so we'll talk more about the detail of this because it turns out to actually be quite important but basically a combination of cropping and resizing is used something else we'll learn about is we also use this to do something called data augmentation so there's actually some randomization in how much and where it crops and stuff like that but that's the basic idea is some cropping and some resizing often we also do some padding so there's all kinds of different ways and it depends on data augmentation which we're going to learn about shortly and what does it mean to normalize the images so normalizing the images we're going to be learning more about later in the course but in short it means that the pixel values and we're going to be learning more about pixel values the pixel values start out from 0 to 255 and some pixel values might tend to be really well I should say some channels because there's red, green and blue so some channels might tend to be really bright and some might tend to be not bright at all and some might vary a lot and some might not vary much at all it really helps train a deep learning model if each one of those red, green and blue channels has a mean of zero and a standard deviation of one we'll learn more about that if you haven't studied or don't remember means and standard deviations we'll get back to some of that later but that's the basic idea that's what normalization does if your data and again we'll learn much more about the details if you're not normalized it can be quite difficult for your model to train well so if you do have trouble training a model one thing to check is that you've normalized it as GPU man will be in power of 2, does its size 256 sound more practical considering GPU utilization so we're going to be getting into that shortly but the brief answer is that the models are designed so that the final layer is of size 7 by 7 if you actually want something where if you go 7 times 2 a bunch of times then you end up with something that's a good size all of these details we are going to get to the key thing is I wanted to get you training a model as quickly as possible but one of the most important things to be a really good practitioner is to be able to look at your data so it's really important to remember to go data.showbatch and take a look it's surprising how often when you actually look it's been given that you realize it's got weird black borders on it or some of the things have text covering up some of it or some of it's rotated in odd ways so make sure you take a look and then the other thing we're going to do is not just look at the pictures but also look at the labels and so all of the possible label names are called your classes that's where the data bunch you can print out your data.classes and so here they are the possible labels that we found by using that regular expression on the file names and we learned earlier on in that pros I wrote at the top that there are 37 possible categories and so just checking length data.classes it is indeed 37 a data bunch will always have a property called C and that property called C the technical details will kind of get to later but for now you can kind of think of it as being the number of classes for things like regression problems and multi-label classification and stuff that's not exactly accurate but it will do for now it's important to know that data.C is a really important piece of information that is something like or at least for classification problems it is the number of classes alright believe it or not we're now ready to train a model and so a model is trained in Fast.AI using something called a learner and just like a data bunch is a general Fast.AI concept for your data and from there there are subclasses for particular applications like image data bunch a learner is a general concept for things that can learn to fit the model and from that there are various subclasses to make things easier and in particular there's one called conf learner a convolutional neural network for you and we'll be learning a lot about that over the next few lessons but for now just know that to create a learner for a convolutional neural network you just have to tell it two things the first is what's your data and not surprisingly it takes a data bunch and the second thing you need to tell it is what's your model or what's your architecture so as we all learned there are lots of different ways of constructing a convolutional neural network but for now the most important thing for you to know is that there's a particular kind of model called a ResNet which works extremely well nearly all the time and so for a while at least you really only need to be doing choosing between two things which is what size ResNet do you want which is basically how big is it and we'll learn all about the details and what that means but there's one quarter of ResNet 34 and there's one quarter of ResNet 50 and so when we're getting started with something I'll pick a smaller one because it will train faster so that's kind of it, that's as much as you need to know to be a pretty good practitioner about architectures for now which is that there's two architectures or two variants of one architecture that work pretty well ResNet 34 and ResNet 50 start with a smaller one and see if it's good enough so that is all the information we need to create a convolutional neural network learner there's one other thing I'm going to give it though which is a list of metrics metrics are literally just things that get printed out as it's training so I'm saying I would like you to print out the arrow rate please now you can see the first time I ran this on a newly installed box it downloaded something what's it downloading it's downloading ResNet 34 pre-trained weights now what this means is that this particular model has actually already been trained for a particular task and that particular task is that it was trained on looking at about one and a half million pictures of all kinds of different things a thousand different categories of things using an image, a data set called ImageNet and so we can download those pre-trained weights so that we start with a model that knows nothing about anything but we actually start with a model that knows how to recognize the a thousand categories of things in ImageNet now I don't think I'm not sure but I don't think all of these 37 categories of PET or in ImageNet but there were certainly some kinds of dog there were certainly some kinds of cat so this pre-trained model already knows quite a little bit about what pets look like what photos look like so the idea is that we don't start with a model that knows nothing at all but we start by downloading a model that knows something about recognizing images already so it downloads for us automatically the first time we use it a pre-trained model and then from now on it won't need to download it again it will just use the one we've got this is really important we're going to learn a lot about this it's kind of the focus of the whole course this is called transfer learning how to take a model that already knows how to do something pretty well and make it so that it can do your thing really well we take a pre-trained model and then we fit it so that instead of predicting the a thousand categories of ImageNet with the ImageNet data it predicts the 37 categories of pets using your PET data and it turns out that by doing this you can train models in one one hundredth or less of the time of regular model training with one one hundredth or less of the data of regular model training in fact potentially many thousands of times less remember I showed you the slide of NICULES Lesson 1 project from last year he used 30 images and there's not cricket and baseball images in ImageNet but it just turns out that ImageNet is already so good at recognizing things in the world and 30 examples of people playing baseball and cricket was enough to build a nearly perfect classifier okay now you would naturally be potentially saying well wait a minute how do you know that it was going to actually that it can actually recognize pictures of people playing cricket versus baseball in general maybe it just learned to recognize those 30 maybe it's just cheating right and it's called overfitting we'll be talking a lot about that during this course right but overfitting is where you don't learn to recognize pictures of say cricket versus baseball but just these particular cricketers and these particular photos and these particular baseball players and these particular photos we have to make sure that we don't overfit and so the way we do that is using something called a validation set a validation set is a set of images that your model and so these metrics like in this case error rate get printed out automatically using the validation set a set of images that our model never got to see when we created our data bunch it automatically created a validation set for us okay and we'll learn lots of ways of creating and using validation sets but because we try to bake in all of the best practices we actually make it nearly impossible for you not to use a validation set because if you're not using a validation set you don't know if you're overfitting okay so we always print out the metrics on a validation set we always hold it out we always make sure that the model doesn't touch it that's all done for you okay and that's all built into this data bunch object so now that we have a conv learner we can fit it you can just use a method called fit in practice you should nearly always use a method called fit one cycle we'll learn more about this during the course but in short one cycle learning is a paper that was released I'm trying to think a few months ago less than a year ago yeah so a few months ago and it turned out to be dramatically better both more accurate and faster than any previous approach so again I don't want to teach you how to do to 2017 deep learning in 2018 the best way to fit models is to use something called one cycle we'll learn all about it but for now just know you should probably type myone.fit1cycle okay if you forget how to type it you can start typing a few letters and hit tab okay and you'll get a list of potential options okay and then if you forget what to pass it press shift tab and it'll show you exactly what to pass it so you don't actually have to type help and again this is kind of nice that we have all the types here because we can see cycle length we'll learn more about what that is shortly isn't integer and then max learning rate could either be a float or a collection whatever and so forth and you can see that momentums will default to this couple okay so for now just know that this number four basically decides how many times do we go through the entire data set how many times do we show the data set to the model so that it can learn from it each time it sees a picture it's going to get a little bit better but it's going to take time and it means it could overfit it sees the same picture too many times it'll just learn to recognize that picture not pets in general so we'll learn all about how to do this number during the next couple of lessons but starting out with four is a pretty good start just to see how it goes and you can actually see after four epochs or four cycles we've got an error rate of 6% so a natural question is how long did that took that took a minute and 56 seconds yeah so we're paying you know 60 cents an hour we just paid for two minutes of time I mean we actually paid for the whole time that it's on and running but we used two minutes of compute time and we got an error rate of 6% so 94% of the time we correctly picked the exact right one of those 94 dog and cat breeds which feels pretty good to me but to get a sense of how good it is maybe we should go back and look at the paper just remember I said the nice thing about using academic papers or Kaggle data sets is we can compare our solution to whatever the best people in Kaggle did or whatever the academics did so this particular data set of pet breeds is from 2012 and if I scroll through the paper you'll generally find in any academic paper there'll be a section called experiments about two thirds of the way through and if you find the section on experiments you can find the section on accuracy and they've got lots of different models and their models as you'll read about in the paper are extremely kind of pet specific they learn something about how pet heads look and how pet bodies look and pet images in general look they combine them all together and once they use all of this complex code and math they got an accuracy of 59% okay so in 2012 this highly pet specific analysis got an accuracy of 59% at least were the top researchers from Oxford University today in 2018 with basically if you go back and look at actually how much code we just wrote it's about three lines of code the other stuff is just printing out things to see what we're doing we got 94% so 6% error so like that gives you a sense of how far we've come with deep learning and particularly with PyTorch and Fast.ai how easy things are yeah so before we take a break I just want to check to see if we've got any and just remember if you're in the audience and you see a question that you want asked please click the love heart next to it so that Rachel knows that you want to hear about it also if there is something with six likes and Rachel didn't notice it which is quite possible just quote it in a reply and say hey at Rachel this one's got six likes okay so what we're going to do is we're going to take a 8 minute break so we'll come back at 5 past 8 so where we got to was we just we just trained a model we don't exactly know what that involved or how it happened but we do know that with three or four lines of code we built something which smashed the accuracy of the state of the art of 2012 6% error certainly sounds like pretty impressive for something that can recognize different dog breeds and cat breeds but we don't really know why it works but we will that's okay and in terms of getting the most out of this course we very very regularly here after the course is finished the same basic feedback which this is literally copy and paste it for the forum I fell into the habit of watching the lectures too much and googling too much about concepts without running the code now first I thought I should just read it and then research the theory and we keep hearing people saying my number one regret is I just spent 70 hours doing that at the very end I started running the code and oh it turned out I learnt a lot more so please run the code really run the code I should have spent the majority of my time on the actual code in the notebooks seeing what goes in and seeing what comes out so your most important skills to practice are learning and we're going to show you how to do this in a lot more detail but understanding what goes in what goes out so we've already seen an example of looking at what goes in which is data.showbatch and that's going to show you examples of labels and images and so next we're going to be seeing how to look at what came out so that's the most important thing to study as I said the reason we've been able to do this so quickly is heavily because of the fastai library is pretty new but it's already getting an extraordinary amount of traction as you've seen all of the major cloud providers either support it or are about to support it a lot of research is starting to use it it's making a lot of things a lot easier but it's also making new things possible and so really understanding the fastai software is something which is going to take you a long way and really understand the fastai software well is by using the fastai documentation and we'll be learning more about the fastai documentation shortly so how does it compare there's really only one major other piece of software like fastai that is something that tries to make deep learning easy to use and that's Keras Keras is a really terrific piece of software we actually used it for the previous courses until we switched to fastai it runs on top of TensorFlow it was kind of the gold standard for making deep learning easy to use before but life is much easier with fastai so if you look for example at the last years course exercise which is getting dogs vs cats fastai lets you get much more accurate less than half the error on a validation set of course training time is less than half the time lines of code is about a six of the lines of code and the lines of code are more important than you might realize because those 31 lines of Keras code involve you making a lot of decisions setting lots of parameters doing lots of configuration so that's all stuff where you have to know how to set those things to get kind of best practice results whereas these five lines of code any time we know what to do for you we do it for you any time we can pick a good default we pick it for you so hopefully you'll find this a really useful library not just for learning deep learning but for taking it a very long way how far can you take it well as you'll see all of the research that we do at fastai uses the library and an example of the research we did which was recently featured in Wired describes a new in natural language processing which people are calling the image net moment which is basically we broke a new state of the art result in text classification which OpenAI then built on top of our paper to do with more compute and more data and some different tasks to take it even further and like this is an example of something that we've done in the last six months in conjunction actually with my colleague Sebastian Ruder an example of something that's being built in the fastai library and you're going to learn how to use this brand new model in three lessons time and you're actually going to get this exact result from this exact paper yourself another example one of our alums Hamel Hussain who you'll come across on the forum plenty because he's a great guy very active built a new system for natural language semantic code search you can find it on github where you can actually type in English sentences and find snippets of code that do the thing you asked for and again that's being built with the fastai library using the techniques you'll be learning in the next seven weeks in production well I think at this stage it's a part of their experiments platform so it's kind of pre-production I guess and so the best place to learn about these things is on the forums where as well as categories for each part of the course there's also a general category for deep learning where people talk about research papers, applications so on and so forth so even though today we're kind of going to focus on a small number of lines of code to do a particular thing which is image classification and we're not learning much math or theory or whatever but over these seven weeks and then part two and other seven weeks we're going to go deeper and deeper and deeper and so where can that take you I want to give you some examples there is Sarah Hooker she did our first course a couple of years ago her background was economics didn't have a background in coding math computer science I think she started learning to code two years before she took our course she helped develop something she started a non-profit called Delta Analytics they helped build this amazing system where they attached old mobile phones to trees in the Kenyan rainforests and used it to listen for chainsaw noises and then they used deep learning to figure out when there was a chainsaw being used and then they had a system set up to alert ranges to go out and stop illegal deforestation in the rainforests she was doing well, she was in the course as part of her kind of class projects what's she doing now she is now a Google Brain researcher which I guess is one of the top, if not the top place to do deep learning she's just been publishing some papers now she is going to Africa to set up Google Brain's first deep learning AI research center in Africa now say like she worked her class off she really really invested in this course, not just doing all of the assignments but also going out and reading Ian Goodfellow's book and doing lots of other things but it really shows where somebody who has no computer science or math background at all can be now one of the world's top deep learning researchers and doing very valuable work another example from our most recent course Christine Payne she is now at OpenAI and you can find her post and actually listen to her music samples of she actually built something to do automatically create chamber music compositions you can play and you can listen to online and so again it's her background math and computer science actually that's her there classical pianist now I will say she is not your average classical pianist she also has a master's in medical research in Stanford and studied neuroscience and was a high performance computing expert at D.E. Shaw was about a pictorian at Princeton anyway she, you know, very annoying person good at everything she does but, you know, I think it's really cool to see how kind of a domain expert, in this case a piano, can go through the fast AI course and come out the other end I guess OpenAI would be of the three top research institutes Google Blaine and OpenAI would be two of them probably along with DeepLine and interestingly actually one of our other students or should say alumni of the course recently interviewed her for a blog post series he's doing on top AI researchers and she said one of the most important pieces of advice she got was from me and she said the piece of advice was pick one project do it really well make it fantastic so that was the piece of advice she found the most useful and we're going to be talking a lot about you doing projects and making them fantastic during this course having said that I don't really want you to go to AI or Google Blaine, what I really want you to do is go back to your workplace or your passion project and apply these skills there let me give you an example MIT released a deep learning course and they highlighted in their announcement for this deep learning course this medical imaging example and one of our students Alex who is a radiologist said you guys just showed a model overfitting I can tell because I'm a radiologist and this is not what this would look like on a chest film this is what it should look like and this as a deep learning practitioner this is how I know this is what happened in your model so Alex is combining his knowledge of radiology and his knowledge of deep learning to assess MIT's model from just two images very accurately and so this is actually what I want most of you to be doing is to take your domain expertise and combine it with the deep learning practical aspects that you'll learn in this course and bring them together like Alex is doing here and so a lot of radiologists have actually gone through this course now and have built journal clubs and American Council of Radiology practice groups there's a data science institute at the ACR now and so forth and Alex is one of the people who's providing a lot of leadership in this area I would love you to do the same kind of thing that Alex is doing which is to really bring deep learning leadership into your industry into your social impact project whatever it is that you're trying to do so another great example is this was Melissa Fabros who was a English Literature PhD she just studied like gendered language in English Literature or something and actually Rachel her first job taught her to code I think and then she came into the fast AI course and she helped Kiva a micro lending social impact organization to build a system that can recognize faces why is that necessary well we're going to be talking a lot about this but because most AI researchers are white men most computer vision software can only recognize white male faces effectively in fact I think of this IBM system is like 99.8% accurate on common white face men versus 60% accurate 65% accurate on dark skinned women so it's like 30 or 40 times worse for black women versus white men and this is really important because for Kiva black women are perhaps the most common user base for their micro lending platform so Melissa after taking our course and again working her ass off and being super intense in her study and her work won this $1 million AI challenge for her work for Kiva Karthik did our course and realized that the thing he wanted to do wasn't at his company it was something else and he thought blind people to understand the world around them so he started a new startup you can find it now it's called Envision you can download the app you can point your phone at things and it will tell you what it sees and I actually talked to a blind lady about these kinds of apps the other day and she confirmed to me this is a super useful thing for visually disabled users and it's not what I'm going to do with the content that you're going to get over these seven weeks and with the software can get you right to the cutting edge in areas you might find surprising for example I helped a team of some of our students and some collaborators on actually breaking the world record for training remember I mentioned the ImageNet data set lots of people want to train on the ImageNet data set we smashed the world record for how quickly you can train it we used standard AWS cloud infrastructure cost of $40 of compute to train this model using again fast AI library, the techniques that we learned in this course so it can really take you a long way so don't be kind of put off by this what might seem pretty simple at first we're going to get deeper and deeper you can also use it for other kinds of passion projects so Helena Saran actually can check out her Twitter account like Lista this art is basically a new style of art that she's developed which combines her painting and drawing with generative adversarial models to create these extraordinary results and so I think this is a super cool she's not a professional artist she is a professional software developer but she just keeps on producing these beautiful results when she started her art had not really been shown anywhere or discussed anywhere now there's recently been some quite high profile articles describing how she is creating a new form of art again this has come out of the fast AI course that she developed these skills equally important Brad Kensler who figured out how to make a picture of Kanye out of pictures of Patrick Stewart's head also something you will learn to do if you wish to this particular style this particular type of what's called style transfer was a really interesting tweak that allowed him to do some things that hadn't quite been done before and this particular picture helped him to get a job as a deep learning specialist at AWS so there you go another interesting example another alumni actually worked at Splunk as a software engineer and he'd signed a algorithm after like lesson 3 which basically turned out at Splunk to be fantastically good at identifying fraud we'll talk more about it shortly if you've seen Silicon Valley the HBO series the hot dog not hot dog app that's actually a real app you can download and it was actually built by Tim Onglade as a fast AI student project so there's a lot of cool stuff that you can do yes it was it was Emmy nominated so I think we only have one Emmy nominated fast AI alumni at this stage so please help change that alright the other thing the forum threads can kind of turn into these really cool things so Francisco who is actually here in the audience he's a really boring Mackenzie consultant like me so Francisco and I both have this shameful past but we left and we're okay now and he started this thread saying like oh this stuff we just been learning about building NLP in different languages let's try and do lots of different languages and he started this thing called the language model zoo and out of that there's now been an academic competition was one in Polish that led to an academic paper Thai state of the art German state of the art basically students have been coming out with new state of the art results across lots of different languages and this all is entirely being done by students working together through the forum so please get on the forum but don't be intimidated because remember a lot of the everybody you see on the forum the vast majority posting post all the damn time they've been doing this a lot and so at first it can feel intimidating because it can feel like you're the only new person there but you're not all of you people in the audience everybody who's watching everybody who's listening you're all new people and so when you just get out there and say like okay all you people getting new state of the art results in German language modeling I can't start my server I try to click the notebook and I get an error what do I do people will help you just make sure you provide all the information this is the you know I'm using paper space this was the particular instance I tried to use here's a screenshot of my error people will help you okay well if you've got something to add so if people are talking about crop yield analysis and you're a farmer and you think you know oh I've got something to add so please mention it even if you're not sure it's exactly relevant it's fine you know just get involved and because remember everybody else in the forum started out also intimidated right we all start out not knowing things and so just get out there and try it so let's get back and do some more coding yes Richard we have some questions there's just a question from earlier about why you're using resnet as opposed to inception so the question is about this architecture so there are lots of architectures to choose from and it would be fair to say there isn't one best one but if you look at things like the Stanford dawn bench benchmark for image net classification you'll see in first place and second place and third place and fourth place is fast AI Jeremy Hart and fast AI that's from the department of defense innovation team resnet, resnet, resnet, resnet resnet is good enough okay so it's fine there are other architectures the main reason you might want a different architecture is if you want to do edge computing so if you want to create a model that's going to sit on somebody's mobile phone having said that even there most of the time I reckon the best way to get a model onto somebody's mobile phone is to run it on your server and then have your mobile phone app talk to it it really makes life a lot easier and you get a lot more flexibility but if you really do need to run something on a low power device then there are some special architectures for that so the particular question was about inception that's a particular other architecture which tends to be pretty memory intensive and yeah, resnet so inception tends to be pretty memory intensive but it's okay it's not terribly resilient one of the things we try to show you is like stuff which just tends to always work even if you don't quite tune everything perfectly so resnet tends to work pretty well across a wide range of different kind of details around choices that you might make so I think it's pretty good so we've got this train model and so what's actually happened as we'll learn is it's basically creating a set of weights if you've ever done anything like linear regression or logistic regression you'll be familiar with coefficients we basically found some coefficients and parameters that work pretty well and it took us a minute and 56 seconds so if we want to start doing some more playing around and come back later we probably should save those weights so we can save that minute and 56 seconds so you can just go learn.save and give it a name it's going to put it in a model subdirectory in the same place the data came from so if you save different models or different data bunches from different data sets they'll all be kept separate so don't worry about it alright so we talked about how the most important things are how to learn what goes into your model what comes out we've seen one way of seeing what goes in now let's see what comes out this is the other thing you need to get really good at so to see what comes out we can use this class called classification interpretation and we're going to use this factory method from learner so we pass in a learn object so remember a learn object knows two things what's your data and what is your model it's now not just an architecture but it's actually a trained model inside there and that's all the information we need to interpret that model so we just pass in the learner and we now have a classification interpretation object and so one of the things we can do perhaps the most useful things to do plot top losses so we're going to be learning a lot about this idea of loss functions surely but in short a loss function is something that tells you how good was your prediction and so specifically that means if you predicted one class of cat with great confidence you said I am very very sure that this is a Berman but actually you were wrong then that's going to have a high loss because you were very confident about the wrong answer so that's what it basically means to have a high loss so by plotting the top losses we're going to find out what were the things that we were the most wrong on or the most confident about what we got wrong so you can see here it prints out three things German short head the four things Beagle 7.04 0.92 well what do they mean perhaps we should look at the documentation so if you we've already seen help and help just prints out a quick little summary but if you want to really see how to do something use doc and doc tells you the same information as help but it has this very important thing which is show in docs so when you click on show in docs it pops up the documentation for that method or class or function or whatever it starts out by showing us the same information about what are the parameters it takes along with the doc string but then tells you more information so in this case it tells me the title of each shows the prediction the actual the loss probability that was predicted so for example and you can see there's actually some code you can run so the documentation always has working code and so in this case it was trying things with handwritten digits and so the first one it was predicted to be a 7 it was actually a 3 the loss is 5.44 and the probability of the actual class was 0.07 okay so I you know we did not have a high probability of the actual class I can see why I thought this was a 7 but nonetheless it was wrong so this is the documentation okay and so this is your friend when you're trying to figure out how to use these things the other thing I'll mention is if you're a somewhat experienced Python programmer you'll find the source code of FastEI really easy to read we try to write everything in just a small number of much less than half a screen of code generally 4 or 5 lines of code so if you click source you can jump straight to the source code right so here is the plot top losses and this is also a great way to find out how to use the FastEI library cause every line of code here nearly every line of code is calling stuff from the FastEI library okay so don't be afraid to look at the source code I've got another really cool trick about the documentation that you're going to see a little bit later okay so that's how we can look at these top losses and these are perhaps the most important image classification interpretation tool that we have because it lets us see what are we getting wrong and quite often like in this case if you're a dog and cat expert you'll realize that the things that's getting wrong are breeds that are actually very difficult to tell apart and you'd be able to look at these and say oh I can see why I got this one wrong so this is a really useful tool another useful tool kind of is to use something called a confusion matrix which basically shows you for every actual type of dog or cat how many times was it predicted to be that dog or cat but unfortunately in this case cause it's so accurate this diagonal basically says oh it's pretty much right all the time and you can see there's some slightly darker ones like a 5 here but it's really hard to read exactly what their combination is so what I suggest you use is instead of if you've got lots of classes don't use a confusion matrix but this is my favorite named function in Fast AI I'm very proud of this you can call it most confused at most confused we'll simply grab out of the confusion matrix the particular combinations of predicted and actual that got wrong the most often so in this case the Staffordshire ball carrier was what it should have predicted and instead it predicted an American Pitball Terrier and so forth it should have predicted a Siamese and actually predicted Bermond that happened four times this particular combination happened six times so this is again a very useful thing cause you can look and you can say like with my domain expertise does it make sense that that would be something that was confused about so these are some of the kinds of tools you can use to look at the output let's make our model better so how do we make the model better we can make it better using fine tuning so far we fitted four epochs and it ran pretty quickly and the reason it ran pretty quickly is that there was a little trick we used these deep learning models, these convolutional networks they have many layers we'll learn a lot about exactly what layers are but for now just know it goes through computation and computation and computation what we did we added a few register layers to the end and we only trained those we basically left most of the model exactly as it was, so that's really fast and if we're trying to build a model of something that's similar to the original pre-trained model so in this case similar to the ImageNet data that works pretty well but what we really want to do is actually go back and train the whole model so this is why we pretty much always use this two stage process called when we call fit or fit one cycle on a conformer it'll just fine tune these few extra layers added to the end out of a run very fast it'll basically never overfit but to really get it good you have to call unfreeze and unfreeze is the thing that says please train the whole model and then I can call fit one cycle again and uh oh the error got much worse ok, why? in order to understand why we're actually going to have to learn more about exactly what's going on behind the scenes so let's start out by trying to get an intuitive understanding of what's going on behind the scenes and again we're going to do it by looking at pictures we're going to start with this picture these pictures come from a fantastic paper by Matt Zeiler who is the CEO of Clarify which is a very successful computer vision startup and his supervisor, his PhD Rob Fergus and they created a paper showing how you can visualize the layers of a convolutional neural network so a convolutional neural network will learn mathematically about what the layers are but the basic idea is that your red, green and blue pixel values that are numbers from 0 to 255 go into a simple computation the first layer and something comes out of that and the result of that goes into a second layer that goes to a third layer and so forth and there can be up to a thousand layers of a neural network ResNet 34 has 34 layers ResNet 50 has 50 layers but let's look at layer 1 there's this very simple computation it's a convolution if you know what they are what comes out of this first layer well we can actually visualize the specific coefficients, the specific parameters by drawing them as a picture there's actually a few dozen of them in the first layer so we won't draw all of them but let's just look at 9 at random so here are 9 examples of the actual coefficients from the first layer and so these operate on groups of pixels that are next to each other and so this first one basically finds groups of pixels with a little diagonal line in this direction this one finds diagonal lines in the other direction this finds gradients that go from yellow to blue in this direction this one finds gradients that go from pink to green in this direction and so forth so they are very very simple little filters that's layer 1 of an ImageNet pre-trained convolutional neural net layer 2 takes the results of those filters and does a second layer of computation and it allows it to create so here are 9 examples of a way of visualizing one of the second layer features and you can see it's basically learned to create something that looks for top left corners and this one is learned to find things that find right hand curves and this one is learned to find things that find little circles so you can see how layer 2 this is the easiest way to see it there are two things that can find just one line and layer 2 we can find things that have two lines joined up or one line repeated if you then look over here these 9 show you 9 examples of actual bits of actual photos that activated this filter a lot so in other words this little bit of function math function here was good at finding these kind of window corners and stuff like that this little circle E1 photos that have circles in so this is the kind of stuff you've got a really good intuitive understanding for at the start of my neural net it's going to find simple very simple gradients and lines the second layer can find very simple shapes the third layer can find combinations of those so now we can find repeating patterns of two dimensional objects or we can find kind of things that joins together or we can find these things well let's find out what is this let's go and have a look at some bits of picture that activated this one highly oh mainly they're bits of text although sometimes windows so it seems to be able to find kind of like repeated horizontal patterns and this one here seems to be able to find kind of edges of fluffy or flowery things this one here is kind of finding geometric patterns so layer 3 was able to take all the stuff from layer 2 and combine them together layer 4 can take all the stuff from layer 3 and combine them together by layer 4 we've got something that can find dog faces and let's see what else we've got here yeah various kinds of oh here we have bird legs so you kind of get the idea so by layer 5 we've got something that can find the eyeballs of birds and wizards or faces of particular breeds of dogs and so forth so you can see how by the time you get to layer 34 you can find specific dog breeds and cat breeds this is kind of how it works so when we first trained when we first fine tuned that pre-trained model we kept all of these layers that we've seen so far and we just trained a few more layers on top of all of those sophisticated features that are already being created and so now we're fine tuning we're going back and saying let's change all of these we'll start with them where they are but let's see if we can make them better now it seems very unlikely that we can make these layer 1 features better like it's very unlikely that the definition of a diagonal line is going to be different when we look at dog and cat breeds versus the image net data that this is originally trained on so we don't really want to change layer 1 very much if at all whereas the last layers you know this thing of types of dog face seems very likely that we do want to change that right so you kind of want this intuition is understanding that the different layers of a neural network represents different levels of kind of semantic complexity so this is why our attempt to fine tune this model didn't work is because we actually by default it trains all the layers at the same speed right which is to say it'll update those like things representing diagonal lines of gradients just as much as it tries to update the things that represent the exact specifics of what an eyeball looks like so we have to change that and so to change it we first of all need to go back to where we were before we just broke this model much worse than it started out so if we just go load this brings back the model that we saved earlier remember we saved it as stage 1 so let's go ahead and load that back up so that's now our models back to where it was before we kill it and let's run learning rate finder we'll learn about what that is next week but for now this is the thing that figures out what is the fastest I can train this neural network at without making it zip off the rails and get blown apart so we can call learn.lrfind and then we can go learn.recorded.plot and that will plot the result of our lrfinder and what this basically shows you is this key parameter that we're going to learn all about called the learning rate and the learning rate basically says how quickly am I updating the parameters in my model and you can see that what happens is as I this bottom one here shows me what happens as I increase the learning rate and this one here shows what's the result, what's the loss and so you can see once the learning rate gets past 10 to the negative 4 my loss gets worse so it actually so happens in fact I can check this if I press shift tab here my learning rate defaults to 0.003 so my default learning rate is about here so you can see why our loss got worse because we're trying to fine-tune things now we can't use such a high learning rate so based on the learning rate finder I tried to pick something well before it started getting worse so I decided to pick 1 in x6 so I decided I got to trade at that rate but there's no point trading all the layers at that rate so that the later layers worked just fine before when we were training much more quickly again the default which was to remind us 0.003 so what we can actually do is we can pass a range of learning rates to learn.fit and we do it like this you use this keyword in python you may have come across it before it's called slice and that can take a start value and basically what this says is train the very first layers at a learning rate of 1 in x6 and the very last layers at a rate of 1 in x4 and then kind of distribute all the other layers across that between those two values equally so we're going to see that in a lot more detail but basically for now this is kind of a good rule of thumb is to say after you unfreeze this is the thing that's going to train the whole thing pass a max learning rate parameter pass it a slice make the second part of that slice about 10 times smaller than your first stage so our first stage defaulted to about 1 in x3 so let's use about 1 in x4 and then this one should be a value from your learning rate finder which is well before things started getting worse you can see things are starting to get worse maybe about here so I picked something that's at least 10 times smaller than that so if I do that then I get 0.05788 so I don't quite remember what we got before a bit better right so we've gone down from a 6.1% to a 5.7% so that's about a 10 percentage point relative improvement training so I would perhaps say for most people most of the time these two stages are enough to get pretty much a world class model you won't win a Kaggle competition particularly because now a lot of fast AI alumni are competing on Kaggle and this is the first thing that they do but in practice you'll get something that's about as good in practice as the vast majority of practitioners can do we can improve it by using more layers and we'll do this next week by basically doing a ResNet 50 instead of a ResNet 34 and you can try running this during the week if you want to you'll see it's exactly the same as before but I'm using ResNet 50 instead of ResNet 34 what you'll find is it's very likely if you try to do this you will get an error and the error will be just run out of memory and the reason for that is that ResNet 50 is bigger than ResNet 34 and therefore it has more parameters and therefore it uses more of your graphics cards memory which is totally separate to your normal computer RAM this is GPU RAM if you're using the default Salamander AWS and so forth suggestion then you'll be having a 16 gig of GPU memory the card I use most of the time has 11 gig of GPU memory the cheaper ones have 8 gig of GPU memory and that's kind of the main range you tend to get if yours has less than 8 gig of GPU memory it's going to be frustrating for you anyway so you'll be somewhere around there and it's very likely that when you try to run this you'll get an out of memory error and that's because it's just trying to do too much too many parameter updates for the amount of RAM you have easily fixed this image data bunch constructor has a parameter at the end batch size this basically says how many images do you train at one time if you run out of memory just make it smaller so this worked for me on an 11 gig card it probably won't work for you if you've got an 8 gig card if you do just make that 32 it's fine to use a smaller batch size but it might take a little bit longer that's all if you've got a 16 gig you might be able to get away with 64 so that's just one number you'll need to try during the week and again we feed it for a while and we get down to a 4.4% error rate so this is pretty extraordinary I was pretty surprised because when we first did in the first course just cats versus dogs we were kind of getting somewhere around a 3% error for something where you've got a 50% chance of being right and the two things were totally different so the fact that we can get a 4.4% error for such a fine grain thing it's quite extraordinary in this case I unfroze it and feed it a little bit more went from 4.4 to 4.35 it's a tiny improvement basically ResNet 50 is already a pretty good model it's interesting because again you can call it most confused here and you can see the kinds of things that it's getting wrong and I actually depending on when you run it you're going to get slightly different numbers but you'll get roughly the same kinds of things so quite often I find that Ragdoll and Berman are things that it gets confused and I actually have never heard of either of those things so I actually looked them up on the internet and I found a page on the cat site called is this a Berman or a Ragdoll and there is a long thread of cat experts arguing intensely about which it is so I feel fine that my computer had problems I found something similar I think it was this pit wall versus daffodger bull terrier apparently the main difference is the particular kennel club guidelines as to how they are assessed but some people think that one of them might have a slightly red in those so this is the kind of stuff where actually even if you're not a domain expert it helps you become one because I now know more about which kinds of pet breeds are hard to identify than I used to so model interpretation works both ways so what I want you to do this week is to run this notebook make sure you can get through it but then what I really want you to do is to get your own image data set and actually Francisco who I mentioned earlier he started the language do model thread and he's now helping to TA the course he's actually putting together a guide that will show you how to download data from Google images so you can create your own data set to play with but before I do I want to that's also one of my questions before I do I want to show you because how to create labels in lots of different ways because your data set wherever you get it from won't necessarily be that kind of red checks based approach it could be in lots of different formats so just show you how to do this I'm going to use the MNIST sample MNIST is pictures of hand drawn numbers just because I want to show you different ways of creating these data sets the MNIST sample basically looks like this so I'm going to go path.ls and you can see it's got a training set and a validation set already so basically the people that put together this data set have already decided what they want you to use as a validation set so if you go path slash train dot ls you'll see there's a folder called 3 and a folder called 7 now this is a really common way to give people labels basically to say oh everything that's a 3 I'll put in a folder called 3 everything that's a 7 I'll put in a folder called 7 this is often called an ImageNet style data set because this is how ImageNet is distributed so if you have something in this format where the labels whatever the folder is called you can say from folder and that will create an ImageData bunch for you and as you can see 3, 7 it's created the labels just by using the folder names another possibility and as you can see we can train that get 99.55% accuracy blah blah blah another possibility and for this MNIST sample I've got both it might come with a CSV file that would look something like this for each file name once it's labeled now in this case the labels aren't 3 or 7 they're 0 or 1 which is basically is it a 7 or not so that's another possibility so if this is how your labels are you can use from CSV and if it's called labels.csv you don't even have to pass in a file name if it's called anything else then you can pass in the CSV labels file name so that's how you can use a CSV again there it is another possibility and then you can call data.pass to see what it found another possibility is as we've seen is you've got paths that look like this and so in this case this is the same thing, these are the folders I could actually grab the label by using your regular expression and so here's the regular expression so we've already seen that approach and again you can see data.class as it's found it so what if it's something that's in the file name or the path but it's not just a regular expression it's more complex you can create an arbitrary function that extracts a label from the file name or path and in that case you would say from name and function another possibility is that you need something even more flexible than that and so you're going to write some code to create an array of labels and so in that case you can just pass in from lists so here I've created an array of labels here are my labels, here's from lists and then I just pass in that array so you can see there's lots of different ways of creating labels so during the week try this out now you might be wondering how would you know to do all these things like where am I going to find this kind of information how would I, how do you possibly know to do all this stuff so I'll show you something incredibly cool let's grab this function and do you remember to get documentation we type doc and here's the documentation for the function and I can click show in docs and it pops up the documentation so here's the thing every single line of code I just showed you I took it this morning and I copied and pasted it from the documentation so you can see here the exact code that I just used so the documentation for FastAI doesn't just tell you what to do but step to step how to do it and here is perhaps the coolest bit if you go to FastAI FastAI underscore docs and click on docs source it turns out that all of our documentation is actually just Jupyter notebooks so in this case I was looking at vision.data so here is the vision.data notebook you can download this repo you can get cloned up and if you run it you can actually run every single line of the documentation yourself okay so all of our docs is also code and so like this is the kind of the ultimate example to me of experimenting is that you can now experiment and you'll see in GitHub it doesn't quite render properly because GitHub doesn't quite know how to render notebooks properly but if you get cloned this and open it up in Jupyter you can see it and so now anything that you read about in the documentation nearly everything in the documentation has actual working examples in it with actual data sets that are already sitting in there in the repo for you and so you can actually try every single function in your browser try seeing what goes in and try seeing what comes out there's a question will the library use multi GPU in parallel by default the library will use multiple CPUs by default but just one GPU by default we've probably been looking at multi GPU until part two we've already found it on the forum but most people won't be needing to use that now and the second question is whether the library can use 3D data such as MRI or perhaps again yes it can and there is actually a forum thread about that already although that's not as developed as 2D yet but maybe by the time the MOOC is out it will be so before I wrap up I'll give you an example of the kind of interesting stuff that you can do by doing this kind of exercise remember earlier I mentioned that one of our alumns who works at Splunk which is the Nasdaq listed big successful company created this new anti-fraud software this is actually how he created it as part of a fast AI part one class project he took the telemetry of users who had Splunk Analytics installed and watched their mouse movements and he created pictures of the mouse movements he converted speed into colour and right and left clicks into splodges he then took the exact code that we saw with an earlier version of the software and trained a CNN in exactly the way we saw and used that to train his fraud model so he basically took something and he turned it into a picture and got these fantastically good results for a piece of fraud analysis software so it pays to think creatively so if you're wanting to study sounds a lot of people that study sounds do it by actually creating a spectrogram image and then sticking that into a component so there's a lot of cool stuff you can do with this so during the week get your GPU going try and use your first notebook you can use lesson one and work through it and then see if you can repeat the process on your own data set get on the forum and tell us any little success you had oh I spent three days trying to get my GPU running and I finally did any constraints that you hit try it for an hour or two but if you get stuck please ask and if you're able to successfully build a model with a new data set we'll see you next week