 Okay, well welcome to part two of deep learning for coders. Part one was practical deep learning for coders. Part two is not impractical deep learning for coders, but it is a little different as we'll discuss. This is probably a really dumb idea, but last year I started like not starting part two with part two lesson one, but part two lesson eight because it's kind of out of the same sequence. So I've done that again, but sometimes I'll probably forget and call things lesson one, so part two lesson one and part two lesson eight are the same thing if I ever make that mistake. So we're going to be talking about object detection today, which refers to not just finding out what a picture is a picture of, but also where about the thing is. But in general, the idea of each lesson in this part is not so much because I particularly want you to care about say object detection, but rather because I'm trying to pick topics which allow me to teach you some foundational skills that you haven't got yet. So for example, object detection is going to be all about creating much richer convolutional network structures, which have kind of a lot more interesting stuff going on and a lot more stuff going on in the fast AI library that we have to customize to get there. So like at the end of these seven weeks, I can't possibly cover the hundreds of interesting things that people are doing with deep learning right now. But the good news is that all of those hundreds of things are, you'll see once you learn to read the papers like minor tweaks on a reasonably small number of concepts. And so we covered a bunch of those concepts in part one. We're going to go a lot deeper into those concepts and build on them to get to some deeper concepts in part two. So in terms of what we covered in part one, there's a few key takeaways. We'll go through each of these takeaways in turn. One is the idea, and you might have seen recently Jan LeCun's been promoting the idea that we don't call this deep learning, but differentiable programming. And the idea is that you'll have noticed all the stuff we did in part one was really about setting up a differentiable function and a loss function that describes how good the parameters are, and then pressing go and it kind of makes it work. And so this is kind of, I think it's quite a good way of thinking about it, differentiable programming, this idea that if you can configure a loss function that describes, scores how good something is at doing your task, and you have a reasonably flexible neural network architecture, you're kind of done. So that's one key way of thinking about this. And this example here comes from playground.tensorflow.org, which is a cool website where you can play interactively with creating your own little differentiable functions manually. The second thing then we learned is about transfer learning. And it's basically that transfer learning is the most important single thing to be able to do to use deep learning effectively. Nearly all courses, nearly all papers, nearly everything in deep learning education and research focuses on starting with random weights, which is ridiculous because you almost never would want to or need to do that. You would only want to or need to do that if nobody had ever trained a model on a vaguely similar set of data with an even remotely connected kind of problem to solve as what you're doing now, which it almost never happens. So this is where kind of the fast AI library and the stuff we talk about in this class is vastly different to any other library or course is that it's all focused on transfer learning and it turns out that you do a lot of things quite differently. So the basic idea of transfer learning is here's a network that does thing A, remove the last layer or so, replace it with a few random layers at the end, fine tune those layers to do thing B, taking advantage of the features that the original network learnt and then optionally fine tune the whole thing end-to-end and you've now got something which probably uses orders of magnitude less data than if you started with random weights, it's probably a lot more accurate and probably trained a lot faster. You know, we didn't talk a hell of a lot about architecture design in part one and that's because kind of architecture design is getting less and less interesting. There's a pretty small range of architectures that generally work pretty well quite a lot of the time. We've been focusing on using CNNs for generally fixed size, somehow ordered data, RNNs for sequences that have some kind of state, fiddling around a tiny bit with activation functions like softmax if you've got a single categorical outcome or sigmoid if you've got multiple outcomes and so forth. Some of the architecture design we'll be doing in this part gets kind of more interesting, particularly this first session about object detection. But you know, on the whole I think we probably spend less time talking about architecture design than most courses or papers because it's generally not the hard bit in my opinion. Okay, so the third thing we looked at was how to avoid overfitting and so the general idea that I tried to explain is at least the way I like to build a model is to first of all create something that's definitely terribly over parameterized, will massively overfit for sure, train it and make sure it does overfit because at that point you know okay I've got a model that is capable of reflecting the training set and then it's as simple as doing these things to then reduce that overfitting. If you can't start, if you don't start with something that's overfitting then you're kind of lost. So you start with something that's overfitting and then to make it overfit less you can add more data, you can add more data augmentation, you can do things like more batch norm layers or dense nets or you know various things that can handle basically less data. You can add regularization like weight decay and dropout and then finally, this is often the thing people do first, but this should be the thing you do last, is reduce the complexity of your architecture, have less layers or less activations. We talked quite a bit about embeddings both for NLP and the general idea of any kind of category of data as being something you can now model with neural nets and it's been interesting to see how since part one came out at which point there were almost no examples of kind of papers or blogs or anything about using kind of tabular data or categorical data in deep learning. Suddenly it's kind of taken off and it's kind of everywhere, so this is becoming a more and more popular approach. It's still little enough known that when I say to people like oh you know we use neural nets for time series and tabular data analysis there's often a kind of like weight really, but it's definitely not such a far out idea and there's more and more resources available including recent Kaggle competition winning approaches using this technique. Okay, so the part one which kind of particularly had those five messages really was all about introducing you to best practices in deep learning and so it's like trying to show you techniques which were mature enough that they definitely work reasonably reliably for practical real-world problems and that I had researched and tuned enough over a quite long period of time that I could kind of say okay here's a sequence of steps and architectures and whatever that if you use this you'll almost certainly get pretty good results and then had kind of put that into the fast AI library into a way that you could do that pretty quickly and easily. So that's kind of what practical deep learning decoders was designed to do. So this part two is cutting edge deep learning decoders and what that means is I often don't know the exact best parameters architecture details and so forth to solve a particular problem we don't necessarily know if it's going to solve a problem well enough to be practically useful it almost certainly won't be integrated well enough into fast AI or indeed any library that you can just press a few buttons and it'll start working it's it's all about stuff which I'm not going to teach it unless I'm very confident that it either is now or will be soon very practically useful technique so like I don't kind of take stuff which just appeared and I don't know enough about it to kind of know like what's the trajectory going to be so if I'm teaching it in this course I'm saying like you know this is you know either works well in the research literature now and it's going to be well worth learning about or we're pretty close to being there but it's going to take a lot of creaking often and experimenting to get it to work on your particular problem because we don't know you know the the details well enough to know how to kind of make it work every dataset or every example so it's kind of exciting to be working at this point it means that rather than fast AI and play torch being obscure black boxes which you just know these recipes for you're going to learn the details of them well enough that you can customize them exactly the way you want that you can debug them that you can read the source code of them to see what's happening and so forth and so if you're not pretty confident of you know object-oriented python and stuff like that then that's something you're going to want to focus on studying during this course because we we assume that I'm not going to be spending time on that but I will be trying to introduce you to some some tools that I think are particularly helpful like the python debugger like how to use your editor to kind of jump through the code stuff like that and in fact in general there'll be a lot more detailed specific code walkthroughs code coding technique discussions and stuff like that as well as more detailed walkthroughs of papers and the math and stuff and so anytime we cover one of these things if you notice something where you're like you know this is assuming some knowledge that I don't have that's fine you know it just means like that's something you could ask from the forum and say hey you know Jeremy kind of was talking about whatever static methods in python I don't really know what a static method is or why he was using it here can somebody give me some resources like you know these are kind of things that all they're not rocket science it's just just because you don't happen to have come across it yet doesn't mean it's hard it's just something you can learn I will mention that as I cover these research level topics and develop these courses I often refer to code that academics have put up you know to go along with their papers or kind of example code that somebody else has written on github I nearly always find that there's some massive critical flaw in them so be careful of like taking code from you know online resources and just assuming that if it doesn't work for you that you've made a mistake or something you know this kind of like research level code it's just good enough that they were able to run their particular experiments you know every second Tuesday or something so you should you know you should be ready to kind of do some debugging so forth so on that sense I just wanted to remind you about something from our old course wiki that we sometimes talk about which is like people often ask what should I do after the lesson like how do I how do I know if I've got it right and we basically have this thing called how to use the provided notebooks and the idea is this don't open up the notebook I don't know I said this in part one as well but I'll say it again and go shift enter shift enter shift enter until a bug appears and then go to the forums and say the notebooks broke it right the idea of the notebook is to kind of be like a little crutch to help you get through each step the idea is that you start with like an empty notebook and think like okay I now want to complete this process right and and that might be initially require you alt tabbing or whatever command tabbing to to the notebook and reading it figuring out what it says but whatever you do don't copy and paste it to your notebook type it out yourself right like so try to make sure you can repeat the process and as you're typing it out you need to be thinking like well what what am I typing why am I typing it okay so if you can get to the point where you can you know solve an object detection problem yourself in a new empty notebook even if it's using the exact same data set we used in the course that's a great sign that you're getting it right and that that'll take a while but the idea is that by practicing you know the second time you try to do it the third time you try to do it you'll check the notebook lesson right and if there's anything in the notebook where you think if you think I don't know what it's doing I hope to teach you enough techniques in this course in this class that you'll know how to experiment to find out what it's doing right so you shouldn't have to ask that but you may well want to ask like why is it doing that you know that's the conceptual bit and that's something which you may need to go to the forums and say like you know before this step Jeremy had done this after this step Jeremy had done that this is bit in the middle where he does this other thing I don't quite know why you know so then you can try and say like here my hypotheses as to why like try and work through it as much as possible and that way you'll both be helping yourself and other people will help you fill in the gaps right if you wish and you have the financial resources now is a good time to build a deep learning box for yourself when I say a good time I don't mean a good time in the history of the pricing of GPUs GPUs are currently by far the most expensive they've ever been as I say this because of the cryptocurrency mining boom I mean it's a good time in your study cycle I mean the fact is if you're paying somewhere between 60 cents and 90 cents an hour for doing your deep learning on a cloud provider particularly if you're still on a a k80 like an amazon p2 or google colab actually if you haven't come across it now let's you train on a k80 for free but those are very slow GPUs you know you can buy one that's going to be three times faster for maybe 600 700 dollars you need a box to put it in of course um but you know the the example in the bottom right here from the forum was something that somebody put together in last year's course so like a year ago they were able to put together a pretty decent box for a bit over 500 dollars um generally speaking you're probably looking at more like a thousand to fifteen hundred dollars I created a a new forum thread where you can talk about you know options and parts and ask questions and and so forth if you could afford it right now the gtx 1080 ti is almost certainly what you want in terms of the best price performance mix if you can't afford it a 1070 is fine if you can't afford that you should probably be looking for a second hand 980 or a second hand 970 something like that if you can afford to spend more money it's worth getting a second gpu so you can do what I do which is to have one gpu training and another gpu which i'm running an interactive jupiter notebook session in ram is very useful try and get 32 gig if you can ram is not terribly expensive a lot of people find that their vendor or person to buy one of these business classes the on cpu's that's a total waste of time you can get one of the intel i5 or i7 consumers cpu's far far cheaper but actually a lot of them are faster often you'll hear cpu speed doesn't matter that's if you're doing computer vision that's definitely not true it's very common now with these like 1080 ti's and so forth to find that the speed of the data augmentation is actually the slow bit that's happening on the cpu so it's worth getting a decent cpu um your again your gpu if it's running quickly but the hard drive's not fast enough to give it data then that's a waste as well so if you can afford an npm e drive they're super super fast you don't have to get a big one you can just get a little one you just copy your current set of data onto and have some big great array that sits there for the rest of your data when you're not using it there's a slightly arcane thing about pc i lanes which is basically like the kind of the the size of the highway that connects your gpu to your computer and a lot of people claim that you need to have a 16 lanes to feed your gpu um it actually turns out based on some analysis that i've seen recently that that's not true um you need you need eight lanes the gpu so again sort of like help hopefully help you save some money on your motherboard if you've never heard of pc i lanes before trust me by the end of putting together this box you'll be sick of hearing about them um you can buy all the parts and put it together yourself it's not that hard can be a useful learning experience it can also be kind of frustrating and annoying so you can always go to like central computers and they'll put it together for you there's lots of online vendors that will do the same thing now generally like make sure it turns on and runs properly and generally not much of a markup so it's not a bad idea um we're going to be doing a lot of reading papers basically each week we'll be implementing a paper uh or a few papers um and if you haven't looked at papers before they look something like on the left um that thing on the left uh is an extract from the paper that implements adam you may also have seen adam as a single excel formula on the spreadsheet that i created uh they're the same thing okay um the difference is in academic papers people love to use uh Greek letters um they also hate to refactor so you'll often see like like a page-long formula where when you actually look at it carefully you'll realize like the same kind of sub equation appears eight times you know they didn't know they didn't think to say above it like let t equal like this sub equation announce one i don't know why this is a thing but um i guess all this is to say like once you've read and understood a paper you then go back to it you look at it you're just like wow how did they make such a simple thing so complicated like adam right is like momentum and momentum on this momentum on the gradient and momentum on the square of the gradient that's it right and and it's this big long thing and the other reason it's a big long thing is because they have things like this where they have like theorems and corollaries and stuff where they're kind of saying like here's all our theoretical reasoning behind why this ought to work or or whatever and and for whatever reason you know a lot of conferences and journals don't like to accept papers that don't have a lot of this theoretical justification jeffrey hintons talked about this a bit how particularly you know a decade or two ago when no conferences would really accept any neural network papers then there was this like one abstract theoretical result that came out where suddenly they could show this you know like practically unimportant but theoretically interesting thing and then suddenly they could then start submitting things to to journals because they had this like theoretical justification so it's kind of yeah academic papers are a bit weird but in the end it's the way that the research community communicates their their findings and so we need to learn to read them but something that can be a great thing to do is to take a paper put in the effort to understand it and then write a blog where you explain it in you know code and normal english um and lots of people who do that um you know end up getting quite a following uh end up getting some pretty great job offers and so forth because you know it's just such a useful skill to be able to show like okay i can i can understand these papers i can implement code i can explain them in english one thing i will mention is it's very hard to read or understand something which you can't vocalize which means if you don't know the names of the greek letters like it sounds weird but it's actually very difficult to understand remember take in a formula that appears again and again that's got like squiggle right you need to know that that squiggles called delta or that's called sigma or whatever so like just spending some time learning the names of the greek letters is like sounds like a strange thing to do but suddenly you don't look at these things anymore and go like squiggle a over squiggle b plus other weird squiggle looks like a y thing right they've all got names um okay so now that we're kind of at the cutting edge stage um a lot of the stuff we'll be learning this class is stuff that almost nobody else knows about um so that's a great opportunity for you to be kind of like the first person to create an understandable and generalizable code library that implements it or the first person to write a blog post that explains it in clear english or the first person to try applying it to this slightly different area but which was obviously going to work just as well or whatever right so so when we say cutting edge research that doesn't mean you have to come up with like the next batch norm or the next atom or the next diluted convoluted diluted convolution it can mean like okay take this thing that was used for um translation and apply it to this very similar other parallel and LP task or take this thing that was tested on skin lesions and trusted on this data set of this other kind of lesions or whatever um that kind of stuff is super great learning experience and incredibly useful because the vast majority of the world that knows nothing about this whole field it just looks like magic you know you'll be like hey i've for the first time shown greater than 90 accuracy at you know finding this kind of lesion in this kind of medical data whatever um so you know when i say here experiment in your area of expertise you know one of the things we particularly look for in this class is to kind of bring in people who are pretty good at something else you know pretty good at meteorology or pretty good at de novo drug design or pretty good at um uh go dairy farming or whatever you know these are all examples of people we've had in the class so probably the thing you can do the best would be to take that thing you're already pretty good at and add on these new skills right because otherwise if you try to go into some different domain you're going to have to figure out how do i get data for that domain how do i know what any problems to solve in that domain and so forth right um where else often it'll seem pretty trivial to you to take this technique apply to this data set that you've already got sitting on your hard drive but that's often going to be the super interesting thing you know for the rest of the world to see like oh that's interesting you know when you apply it to meteorology data and you use this i don't know r and n or whatever suddenly it allows you to forecast over larger areas or longer time periods um so communicating what you're doing is super helpful we've we've talked about that before but i know something that a lot of people on the forums ask people who have already written is they when somebody's written a blog often people on the forum will be like how did you get up the guts to do that or what did you what have the process you got to before you decided to start publishing something or whatever and the answer is always the same it's always just you know i was sure i wasn't good enough to do it i felt terrified and intimidated of doing it but i wrote it and posted it anyway like you just like there's never a time i think any of us actually feel like we're not total frauds and imposters but we know more about what we're doing than us of six months ago right and there's somebody else in the world who knows as much as you did six months ago so if you write something now that would have helped you six months ago you're helping some people and honestly if you wait another six months then the year of 12 months ago probably won't even understand that anymore because it's too advanced now you know so like it's it's great to communicate wherever you're up to um in a way that you think would be helpful to the person you were before you knew that thing and of course something that the forums have been useful for is getting feedback about drafts you know and if you post a draft of something that you're thinking of releasing then other folks here can point out things that they find unclear or they think need some corrections or whatever so the kind of overarching theme of part two i've described as generative models but unfortunately then Rachel asked me this afternoon exactly what i meant by generative models and i realized i don't really know um so what i really mean is in part one that the output of our neural networks was generally like like a number you know or a category um where else the outputs of a lot of the stuff in part two are going to be like a whole lot of things you know like um the top left and bottom right location of every object in an image along with what the object is or a complete picture with the class of every single pixel in that picture or um an enhanced uh super resolution version of the input image or um the entire original input paragraph translated into French or you know it's kind of like often it just requires some different ways of thinking about things and some kind of different architectures and and so forth and so that's kind of like i guess the main theme of the kind of techniques we'll be looking at um the vast majority possibly all of the data we'll be looking at will be either text or image data um the uh it would be fairly trivial to do most of these things with audio as well it's just not something i've spent much time on myself yet um somebody asked on the forum about like well can we do more stuff with time series and tabular data and my answer was like i've already taught you everything i know about that and i'm not sure there's much else to say um particularly if you check out the machine learning course uh which goes into a lot of that in a lot more detail so i don't feel like there's more stuff to tell you i think that's a super important area um but i i think we're done you know uh we're done with that um we'll be looking at some larger data sets both in terms of the number of objects in the data set and the size of each of those objects um for those of you that are working with limited computational resources please don't let that put you off feel free to replace it with something smaller and simpler um in fact when i was designing this course i did quite a lot of it in australia when i went to visit my mum and my mum decided to book a nice holiday house for us with fast wi-fi and we turned up to the holiday house with fast wi-fi and indeed it did have wi-fi that was fast but the wi-fi was not connected to the internet um so i caught up the agent and i said like i found the adsl router and it's got an adsl thing plugged in and i followed the cable down and the other end of the cable has nothing to plug into so she called the um she called the the people you know renting the house uh the owner and uh called me back the next day and she said um actually the the town marina is called point leo actually uh point leo has no internet that's like wait what and so the good old australian government had decided to replace adsl in point leo with a new national broadband network and therefore they had disconnected adsl that had not yet connected the national broadband network so we had fast wi-fi um which we could use to Skype chat from one side of the house to the other but i had no internet luckily i did have a new surface book 15 inch which has a gts 1070 in it um and so i wrote a large amount of this course uh entirely on my laptop um which means i had to practice with relatively small resources i mean not tiny but like 16 gig ram and six gig gpu um so i can definitely you know i i i definitely and it was all in windows by the way so i can tell you that most of this you know pretty much all of this course works well on windows on a laptop what it's worth so you can always use smaller batch sizes you could use a cut down version of the data set whatever but if you have the resources you'll get better results if you can use the bigger data sets when they're available okay um now's a good time i think to take a somewhat early break so we can fix the forum so the forum is still down in the min okay i'll upgrade it um so what's the time seven 20 should we do five minutes yeah okay let's come back at seven 25 so let's start talking about um object detection and so here is an example of object detection and so hopefully you'll see two main differences from what we're used to when it comes to classification the first is that we have got multiple things that we're classifying which is not unheard of we did that in the the planet satellite data for example but what is kind of unheard of is that as well as saying what we see we've also got what's called bounding boxes around what we see a bounding box has a very specific definition which is it's a box right it's a rectangle and the rectangle uh has the object entirely fitting within it um but it's no bigger than it has to be okay you'll see this bounding box is perhaps for the horse at least slightly imperfect in that there's looks like there's a bit of tail here so it probably should be a bit wider and maybe there's leaving a little bit of hoof here maybe it should be a bit longer so like the bounding boxes won't be perfect but they're generally pretty good in most data sets that you can find um so our job will be to take data that has been labeled in this way uh and on data that is unlabeled to generate the classes of the objects and for each one of those they're bounding boxes uh one thing I'll note to start with is the labeling this kind of data is generally more expensive it's generally quicker to say horse person person horse car dog jumbo jet than it is to say you know if there's a whole like horse race going on to label the exact location of every rider and of every horse and then of course it also depends like what classes do you want to label you know do you want to label every fence post or whatever so generally generally always just like in like image net it's not like tell me any object you see in this picture it's an image that is like here are the thousand classes that we ask you to look for tell us which one of those thousand classes you find um just tell me one thing um for these object detection data sets it's here's a list of object classes that we want you to tell us about and you know find every single one of them of any type in the picture along with where they are so in this case why isn't there a tree or a jump labeled that's because for this particular data set they weren't one of the classes that the annotators were asked to find and therefore not part of this particular problem okay so that that's kind of the the specification of the object detection problem so let me describe um stage one uh and stage one is actually going to be surprisingly straightforward um and uh we're going to start at the top and work down we're going to start out by classifying the largest object in each image so we're going to try and say person actually this one is wrong dog is not the largest object so far as the largest object right so here's an example of a misclassified one uh bird correct person correct okay that'll be the first thing we try to do that's not going to require anything new so it'll just be a bit of a warm up for us the second thing uh will be to um tell us the location of the largest object in each image again here this is actually incorrect it should have labeled the sofa but you can see where it's coming from and then finally we will try and do both at the same time which is to label what it is and where it is for the largest thing in the picture okay and this is going to be obviously straightforward actually so it'll be kind of a good warm up to get us going again but what i'm going to do is i'm going to use it as an opportunity to show you some useful coding techniques really um and a couple of little fast ai anti details um before we then get on to multi label classification and then multiple object classification so let's start here the notebook that we're using uh is uh pascal notebook um and it's all of the notebooks through in the dl2 one thing you'll see uh in some of my notebooks is torch.coota.set device you may have even seen it in the last part just in case you're wondering why that's there uh i have um four gpu's on the university server that i use and so i can put a number from nought to three in here to pick one um this is how i prefer to use multiple gpu's rather than run a model on multiple gpu's which doesn't always beat it up that much and it's kind of awkward i generally like to have different gpu's running different things so i in this case um i was running something in this on device one and doing something else on another notebook in device two now obviously if you see this in a notebook left behind that was a mistake if you don't have more than one gpu you're going to get an error so you can just change it to zero or delete that line entirely so there's a number of standard object detection data sets just like image net is a kind of a standard object classification data set and kind of the the old classic kind of image net equivalent if you like is um pascal boc um visual object classes something like that yeah um uh the actual main website for it is like i don't know it's running on somebody's coffee warmer or something it goes down all the time every time he makes coffee i don't know um so some folks have mirrored it which is very kind of them so you might find it easier to to grab from the mirror um you'll see when you download it that there's a 2007 data set in the 2012 data set um that uh they basically were like academic competitions in those different years just like the image net data set we tend to use is like actually the image net 2012 competition data set um uh we'll be using the 2007 version in this particular notebook feel free to use the 2012 instead it's a bit bigger you might get better results um a lot of people in fact most people now in research papers actually combine the two um you do have to be careful because there's some leakage between the validation sets between the two so if you do decide to do that make sure you do some reading about the data set to make sure you know how to combine them correctly um the first thing you'll notice in terms of coding here is um this we haven't used this before i'm going to be using this all the time now this is part of the python 3 standard library called pathlib and it's super handy um it's basically gives you an object oriented access to a directory or a file um so you can see if i go path dot something uh it there's lots of things i can do um um one of them is iterator directory um however path dot iterate directory returns that right hopefully you've come across generators by now because we did quite a lot of stuff that used them behind the scenes without talking about them too much but basically a generator is something in um in python 3 which you can iterate over right so basically you can go for oh in that print oh for instance right um sorry print hi to low high or low okay um or of course you could do the same thing as a list comprehension right uh or you can just stick the word list around it to turn a generator into the list right so anytime you see me put list around something that's normally because it returned a generator uh it's not particularly interesting the reason that things generally return generators is that like what if the directory had 10 million items in you don't necessarily want a 10 million long list so the for loop just grab one do the thing throw it away grab the second throw it away so let's you do things lazily you'll see that the things that's returning aren't actually strings but they're some kind of object if you're using windows it'll be a windows path or Linux it'll be a posix path um most of the time you can use them as if they were strings so most like if you pass it you know any of the os dot path dot whatever functions in python it'll just work um but some external libraries it won't work um so that's fine uh if you grab one of these let's say let's say oh equals let's just grab one of these so in general uh you can change data types in python just by naming the data type that you want and treating it like a function and that will cast it right so anytime you try to use one of these pathlib objects and you pass it to something which says like I was expecting a string this is not a string that's how you do it okay so you'll see there's quite a lot of convenient things you can do one kind of fun thing is the slash operator is not divided by but it's path slash so like they've overridden the slash operator in python so that it works so you can say path slash whatever and that gets you you'll see like see how that's not inside a string right so this is actually applying not the division operator but the overridden slash operator which means get a child thing in that path that makes sense and you'll see if you run that it doesn't return a string it returns a pathlib object okay and so part one of the things a pathlib object can do is it has an open method right so this it's kind of it's actually pretty cool once you start getting the hang of it and you'll also find that like the open method takes all the kind of arguments you're familiar with you can say right or binary or encoding or whatever so in this case I want to load up these these JSON files which contain not the images but the bounding boxes and the classes of the objects and so in python the easiest way to do that is with the JSON library or there's some faster API equivalent versions but this is pretty small so you won't need them and you go json.load and you pass it an open file object and so the easy way to do that since we're using pathlib is just go path.open so these JSON files that we're going to look inside in a moment if you haven't used them before JSON is JavaScript object notation it's kind of the most standard way to pass around hierarchical structured data now obviously not just the JavaScript you'll see I've got some JSON files in here they actually did not come from the mirror I mentioned the the original Pascal annotations were in XML format but cool kids can't use XML anymore we have to use JSON so somebody's converted them all to JSON and so you'll find the second link here has all the JSON files so if you just pop them in the same location that I've put them here everything will work for you so these annotation files json's basically contain a dictionary once you open up the JSON it becomes a python dictionary and they've got a few different things in the first is we can look at images it's got a list of all of the images how big they are and a unique ID for each one one thing you'll notice here is I've taken the word images and put it inside a constant called images that may seem kind of weird but if you're using a notebook or any kind of IDE or whatever this now means I can tap complete all of my strings and I won't accidentally type it slightly wrong so that's just a handy trick okay so here's the contents the first few things in the images more interestingly here are some of the annotations right so you'll see basically an annotation contains a bounding box and the bounding box tells you the column and row of the top left and it's height and width and then it tells you that that particular bounding box is for this particular image so you'd have to join that up to over here to find it's actually oh one two dot jpeg okay and it's of category ID seven it also for some of them at least has a polygon segmentation not just a bounding box we're not going to be using that some of them have an ignore flag so we'll ignore the ignore flags some of them have something telling you it's a crowd of that object not just one of them right so that's that's what these annotations look like so then you saw here there's a category ID so then we can look at the categories and here's a few examples they basically each ID has a name here we go okay so what I did then was turned the this categories list into a dictionary from ID to name I created a dictionary from ID to name of the image file names and I created a list of all of the image IDs just to make life easier so you know generally like when you're working with a new data set at least when I work with a new data set I try to make it look the way I would want it to if I kind of designed that data set so I just kind of do a quick bit of manipulation and so like the the steps you see here and you'll see in each class are basically like the sequence of steps I took as I started working with this with this new data set except like without the thousands of screw-ups that I did I find like the the the one thing people most comment on when they see me working in real time having seen my classes is like wow you actually don't know what you're doing as like 99s and other things I do don't work and then the small percentage of things that do work end up here right so like this is like I mentioned that because machine learning and particularly deep learning is kind of incredibly frustrating because you know in theory you just define the correct loss function and a flexible enough architecture and you press train and you're done right but if that was actually all at talk then like nothing would take any time the problem is that all the steps along the way until it works it doesn't work you know like it it goes straight to infinity or it crashes with an incorrect tensor size or whatever and I will endeavor to show you some kind of debugging techniques as we go but it's one of the hardest things to teach because like I don't know maybe I just haven't quite figured it out yet but it's like the main thing it requires is tenacity I find like the biggest difference between the people I've worked with who are super effective and the ones who don't seem to go very far has never been about intellect it's always been about you know sticking with it basically never giving up so it's particularly important with this kind of deep learning stuff because you don't get that continuous reward cycle like with normal programming you've got like 12 things to do until you've got your flash endpoint staged up you know at each stage it's like okay we've successfully processed in the JSON and now we successfully you know got the callback from that promise and now we've successfully created the authentication system like you know it's this constant sequence of like stuff that works whereas generally with training a model it's a constant stream of like it doesn't work it doesn't work until eventually it does so it's kind of annoying okay so let's now look at the images so you'll find inside the doc dev kit there's 20 2007 and 2012 directories and in there there's a whole bunch of stuff that's mainly these XML files the one we care about the jpeg images and so again here you've got the use of pathlips slash operator and inside there's a few examples of the images okay so what I wanted to do was to create a dictionary where the key was the image id and the value was a list of all of its annotations right so basically what I wanted to do was go through each of the annotations that doesn't say to ignore it and append it the bounding box and the class to the appropriate dictionary item where that dictionary item is a list but the annoying thing is of course is that if that dictionary item doesn't exist yet then there's no list to append to so one super handy trick in python is that there's a class called collections dot default dict which is just like a dictionary but if you try and access a key that doesn't exist it magically makes itself exist and it sets itself equal to the return value of this function all right now this could be the name of some function that you've defined or it can be a lambda function a lambda function simply means it's a function that you define in place we'll be seeing lots of them so here's an example of a function all the arguments to the function are listed on the left so there's no arguments to the function and lambda functions are special you don't have to write return as a return is assumed so in this case this is a lambda function that takes no arguments and returns an empty list so in other words every time I try and access something in train annotations that doesn't exist it now does exist and it's an empty list which means I can append to it okay one comment on variable naming is when I read through these notebooks I'll generally try and like speak out the English words that the variable name is a nonic for a reasonable question would be well why didn't I write the full name of the variable in English rather than using your short mnemonic it's a personal preference I have based on a number of programming communities where the basic kind of thesis is that the more that you can see in a single kind of eye grab of the screen the more you can like understand intuitively at one go every time you have to your eye has to jump around it's kind of like a context change that reduces your understanding it's a style of programming I found super helpful and so generally speaking I try to particularly try to reduce the vertical height so things don't scroll off the screen but I also try to reduce the the size of things so that there's a mnemonic there which if you know it's training annotations it doesn't take long for you to see don't train annotations but you don't have to write the whole thing in so I'm not saying you have to do it this way I'm just saying there's some very large programming communities some of which have been around for 50 or 60 years which have used this approach and I find it works well it's interesting to compare like I guess my philosophy is somewhere between math and Java you know like in math everything's a single character the same single character can be used in the same paper for five different things and depending on whether it's in metallics or bold phase of capitals it's another five things I find that less than ideal in Java you know variable names sometimes require a few pages to fill out and I find that less than ideal as well so for me I personally like names which are you know short enough to not take too much of my you know perception to see it once but long enough to have a mnemonic also however a lot of the time the variable will be describing a mathematical object as it exists in a paper and there isn't really an English name for it and so in those cases I will use the same like often single letter that the paper uses right and so if you see something called delta or a or something and it's like something inside an equation from a paper I generally try to use the same thing just to explain that yeah so and by no means do you have to do the same thing I will say however if you contribute to fast AI I'm not particularly fastidious about coding style or whatever but if you write things more like the way I do than the way Java people do I certainly appreciate okay so by the end of this we now have a dictionary from file names to a tuple and so here's an example of looking up that dictionary and we get back a bounding box and a class you'll see when I create the bounding box I've done a couple of things the first is I've switched the x and y coordinates and the reason for this I think we mentioned this briefly in the last course the kind of computer vision world when you say like oh my screen is 640 by 480 that's width by height or else the math world when you say my array is 640 by 480 it's rows by columns i.e. height by width so you'll see that a lot of things like p i l or pillow image library and python tend to do things in this kind of width by height or columns by rows way numpy is the opposite way around so I again my view is don't put up with this kind of incredibly annoying inconsistency fix it right so I've decided fast AI is you know the numpy py torch way is the right way so I'm always rows by columns so you'll see here I switched my rows of columns I've also decided that we're going to do things by describing the top left x y coordinate and the bottom right x y coordinate the bounding box rather than the x y and the height width okay so you'll see here I'm just converting the the height and width to top left and bottom right so you know again it's kind of like I often find dealing with junior programmers in particular junior data scientists that they kind of get given data sets that are in shitty formats or happy apis and they just act as if everything has to be that way right but your life will be much easier if you take a couple of moments to make things consistent and make them the way you want them to be okay so earlier on I took all of our classes and created a categories list and so if we look up category number seven which is what this is category number seven is car let's have a look at another example image number 17 has two bounding boxes one of them is of type 15 one's of type 13 that is a person and a horse so this would be much easier to understand if we can see a picture of these things so let's create some pictures so having just turned our height width stuff into top left bottom right stuff we're now going to create a method to do the exact opposite because anytime I want to call some library that expects the opposite I'm going to need to pass it in the opposite so here is something that converts a bounding box to a height and width bb hw b bounding box to height and widths okay so it's again reversing the order and credit and giving us the height and widths so we can now open an image in order to display it and where we're going to get to is we're going to get it to show this that's that car we just saw it's a car right so one thing that I often get asked on the forums or through github is like well how do I find out about this open image thing where did it come from what does it mean who uses it and so I wanted to just take a moment because one of the things are going to be doing a lot and I know a lot of you aren't professional coders you have backgrounds and statistics or you know meteorology or physics or whatever and I apologize for those of you that are professional coders you know this already you need because we're going to be a lot doing a lot of stuff with the fast AI library and other libraries you need to be able to navigate very quickly through them okay and so let me give you a quick overview of how to navigate through code and for those of you that haven't used an editor properly before this is going to blow your minds right for those of you that have you're going to be like check this out guys check this out for the demo I'm going to show you in visual studio code personally my view is that like on pretty much every platform unless you're prepared to put in the decades of your life to learn Vim or Emax well visual studio code is probably the best editor out there it's free it's open source there are other perfectly good ones as well okay well so if you download a recent version of anaconda it will offer to install visual studio code for you it integrates with anaconda sets it up with your python interpreter and comes with the python extensions and everything so it's a it's a it's a good choice if you're not sure if you've got some other editor you like you know search for the right keywords in the hell so if I fire up visual studio code the first thing to do of course is do a git clone of the fast AI library to your laptop you'll find in the root of the repo as well as the environment.yml file that sets up a environment for GPU one of the students has been kind enough to create an environment-cpu.yml file and perhaps one of you that knows how to do this can add some notes to the wiki but basically you can use that to create a local CPU only fast AI installation and the reason you might want to do that is so that as you navigate the code you know you'll be able to navigate into pie torch you'll see all the stuff is is there anyway so I open up visual studio code and it's as simple as saying open folder right and then you can just point it at the fast AI github folder that you just downloaded and so the next thing you need to do is to set up visual studio code to say I want to use the fast AI condo environment please so the way you do that is with the select interpreter command and there's a really nice idea which is kind of like the best of both worlds between a command line interface and a GUI which is you hit this is the only command you need to know ctrl shift p you hit ctrl shift p and then you start typing what you want to do and watch what happens ctrl shift p I want to change my interpreter okay and it appears if you're not sure you can kind of try a few different things right so here we are python select interpreter and you can see generally you can type stuff in it'll give you a list of things if you can and so here's a list of all of the environments interpreters I have set up and here's my fast AI environment okay so that's basically the only setup that you have to do the only other thing you might want to do is to know there's an integrated terminal so if you hit ctrl back tick it brings up the terminal and you can the first time you do it it'll ask you what terminal do you want if you're in windows it'll be like power shell or command prompt or bash if you're on linux you've got multiple shells installed it'll ask so as you can see I've got it set up to use bash okay and you'll see it automatically goes to the directory that I'm in um all right so the main thing we want to do right now is find out what open underscore images so the only thing you need to know to do that is um ctrl t if you hit ctrl t you can now type the name of a class a function pretty much anything and you can find out about it so open image you can see it appears and it's kind of cool if there's something that's got like camel case capitalized or something that underscore you can just type the first few letters of each bit so I could be like open image for example right I do that and it's found the function it's also found some other things that match and there it is okay um so that's kind of a good way you can see exactly where it's come from and you can find out exactly what it is and then the next thing I guess would be like well what's it used for so if it's used inside fastai you could say find references which is shift let's fix that up should say shift f12 open image shift f12 and it brings up something saying oh it's used twice in this code base and I can go and I can have a look at each of those examples okay um if it's used in multiple different files it'll tell you the multiple different files that it's used in another thing that's really handy then is as you look at the code you'll find that certain bits of the code call other parts of the code so for example if you're inside files dataset and you're like oh this is calling something called open image what is that well you can wave your pointer over it and it'll give you the dot string uh or you can hit f12 and it jumps straight to its definition right so like often it's easy to get a bit lost in like things call things call things and if you have to manually go to each bit it's infuriating whereas this way it's always one button away right control t to go to something that you specifically know the name of or f12 to jump to the names the definition of something that you're clicking on and when you're done you probably want to go back where you came from so alt left takes you back to where you were okay uh go back um so whatever you use bim emacs adam whatever they all have this functionality as long as you have an appropriate extension installed if you use pycharm uh you can get that for free that doesn't need any extensions because it's python you know whatever you're using you want to know how to do this stuff okay um uh finally i'll mention um there's a nice thing called zen mode control kz um which basically gets rid of everything else so you can focus um but it does keep this nice little thing on the right hand side which kind of shows you where you are okay um so that's something that you should practice if you haven't played around with it before during the week because we're increasingly going to be you know digging deeper and deeper into fast ai and pipe torch libraries as i say if you're already a professional coder know all this stuff apologies for telling you stuff you already know okay so um we're going to um well actually since we did that uh let's just talk about open image um you'll see that we're using uh cv2 cv2 is the library is actually the open cv library um you might wonder why we're using open cv and i want to explain some of the innards of fast ai to you because some of them are kind of interesting and might be helpful too um the torch vision like the standard kind of py torch vision library uh actually uses uh py torch tensors for all of its you know data augmentation and stuff like that um a lot of people use um pillow the p i l the standard bit of python imaging library um i found uh i did like a lot of testing of all of these i found open cv was about five to ten times faster than torch vision so early on um i actually teamed up with one of the students from an earlier class to do the planet lab satellite competition duck when that was on and we used torch vision and because it was so slow uh we could only get like 25% gpu utilization because we were doing a lot of data augmentation and so then i used the profile to find out what was going on and realized it was all in in torch vision um uh pillow or or p i l is quite a bit faster but it's not as fast as open cv uh and also um is not nearly as thread safe so i actually talked to the guy uh who developed the the the thing that python has this thing called the global interpreter lock i don't know if you talked about this before the g i l which basically means that two threads can't do do pythonic things at the same time it's like it makes python a really shitty language actually for modern programming but we're stuck with it um so i spoke to the guy on twitter who actually made it so that open cv releases the g i l um so one of the reasons the faster your library is so amazingly fast is because we don't use multiple processes like every other library does for our data augmentation we actually do multiple threads and the reason we can do multiple threads is because we use open cv right unfortunately open cv is like a really shitty api it's kind of inscrutable a lot of stuff it does is poorly documented when i say poorly documented it it's documented but like in really obtuse kind of ways um so that's why i try to make it so like no one using fast ai needs to know that it's using open cv you know like if you want to open an image do you really need to know that you have to pass these flags to open to actually make it work do you actually need to know that if the reading fails it doesn't turn exception it just silently returns none you know it's these kinds of things that we try to do to actually make it work nicely right but as you start to dig into it you'll find yourself in these places and you'll kind of want to know you want to know why and i mentioned this in particular to say don't start using you know high torch for your data augmentation don't start bringing in pillow you'll find suddenly things slow down horribly you're the multi-threading won't work anymore or whatever like try to stick to using open cv for your processing okay so um so we've got our image we're just going to use it to to demonstrate the pascal library um and so the next thing i wanted to show you in terms of like important coding stuff we're going to be using throughout this course is is using matplotlib a lot better so matplotlib is so named because it was originally a clone of matlab's plotting library unfortunately matlab matlab's plotting library is awful but at the time it was what everybody knew so at some point the matplotlib folks realized uh oh they probably always knew that the matlab plotting library is awful so they added a second api to it which was an object oriented api unfortunately because nobody who originally learned matplotlib learnt the o o api they then taught the next generation of people the old matlab style api and now there's basically no examples or tutorials online i'm aware of that use the much much better easier to understand simple o o api so one of the things i'm going to try and show you because plotting is so important in deep learning is how to use this api and i've discovered some simple little tricks one simple little trick is plot dot sub plots is just a super handy wrapper i'm going to use it lots right and what it does is it returns two things one of the things you probably want to care about the other thing is an axes object and basically anywhere where you used to say plt dot something you now say ax dot something and it will now do that plotting to that particular subplot so a lot of the time you'll use this or i'll use this during this course to kind of plot multiple plots that we can compare next to each other but even in this case i'm i'm creating a single plot right but it's just it's just nice to only know one thing rather than lots of things so regardless of whether you're doing one plot or lots of plots i always start now with with this plot dot sub plots right and the nice thing is that this way i can pass in an axis object if i want to plot it into a figure i've already created um or if i it hasn't been passed in i can create one so this is also a nice way to make your map plot live functions like really versatile right and you'll kind of see this used throughout this course so now rather than plot dot iam show it's ax dot iam show right and then rather than kind of weird stateful setting things um in in the old style api you can now use oos you know get access that returns object set visible sets of property right it's all pretty normal straightforward stuff so once you start getting the hang of a small number of these oomap plot live things hopefully your find life a lot easier so i'm going to show you a few right now actually so let me show you a cool example what i think is a cool example um so one thing that kind of drives me crazy with um people putting text on images whether it be subtitles on tv your people doing stuff with computer vision is that it's like white text on a black on a white background or black text on a dark background you can't read it and so a really simple thing that i like to do every time i draw on an image is to either make uh my text and boxes white with a little black border or vice versa and so here's a like cool little thing you can do in map plot live is you can take a map plot live plotting object and you can go set path effects and say add a black stroke around it and you can see that then when you draw that like it doesn't matter that here it's white on a white background right or here it's on a black background that's equally visible right and like it's just i know it's a simple little thing but it's kind of just makes life so much better when you can actually see your bounding boxes and actually read the text so you can see rather than just saying add a rectangle i get the object that it creates and then pass that object to draw outline now everything i do i'm going to get this nice path effect on it um you can see map plot live is perfectly convenient way of drawing stuff right so when i want to draw a rectangle um that plot live calls that a patch and then you can pass in all different kinds of patches so here's again you know rather than having to remember all that every time please stick it in a function right and now you can use that function every time you know you don't have to put it in a library somewhere i always put lots of functions inside my notebook if i use it in like three notebooks then i know it's useful enough that i'll stick it in a separate library um you can draw text and notice all of these take an axis object right so this is always going to be added to whatever thing i want to add it to right so i can add text and draw an outline around it so having done all that i can now take my show image which and notice here the show image if you didn't pass it an axis it returns the axis it created right so show image returns returns the axis that image is on i then turn my bounding box into height width for this particular image's bounding box i can then draw the rectangle i can then draw the text in the top in the top left corner so remember the bounding box x and y are the first two coordinates right so b column two is the top left um this uh is the remember the tuple contains two things the bounding box and then the class so this is the class and then to get the text of it i just pass it into my categories list and there we go okay so now that i've kind of got all that set up i can use that for all of my object detection stuff from here on right um what i'd really want to do though is to kind of package all that up so here it is packaging it all it up so here's something that draws an image with some annotations right so it shows the image it goes through each annotation turns it into height and width draws the rectangle draws a text okay um if you haven't seen this before um each annotation remember contains a bounding box and a class so rather than going um for o in an and then going o zero o one i can destructure it okay this is a destructuring assignment so if you put something comma something on the left then that's going to put the two parts of the tuple or a list into those two things super handy so um for the bounding box and the class in the annotations um go ahead and do all that and so then i can then say okay draw a image at a particular index by grabbing the image id opening it up and then calling that draw and so it's tested out and there it is okay so you know that kind of seems like quite a few steps but to me when you're working with a new data set like getting to the point that you can rapidly explore it it pays off right you'll see as we start building our model we're going to keep using these functions now um to kind of see how things are going okay all right so step one from our presentation is to do a classifier okay and so i think it's always good like for me i didn't really have much experience before i started preparing this course a few months ago in doing kind of this kind of object detection stuff so i was like all right i want i want to get this feeling of even though it's deep learning of continual progress right so like what could i make work all right well what do i find the biggest object in each image and classified i know how to do that right so it's like this is one of the biggest problems i find particularly with the youngest students is they figure out the whole big solution they want generally which involves a whole lot of new speculative ideas that nobody's ever tried before and they spend six months doing it and then the day before the presentation none of it works and this group right where else like i've talked about my approach to Kaggle competitions before which is like half an hour if you date at the end of that half an hour submit something right and try and make it a little bit better than yesterday's so i kind of tried to do the same thing in preparing this lesson right which is try to create something that's a bit better than last thing okay so the first thing was like the easiest thing i could come up with was my largest item classifier so the first thing i needed to do was to go through each of those uh each of the bounding boxes in um an image and uh get the largest one right so i actually didn't write that first i actually wrote this first right so normally i like pretend that somebody else has created the exact API i want and then go back and write it right so i kind of i wrote this line first and it's like okay i need something which takes all of the bounding boxes for a particular image and finds the largest and well that's pretty straightforward um i can just sort the bounding boxes and here again we've got a lambda function so again if you haven't used lambda functions before this is something you should study during the week right they're used all over the place to quickly define a function like a once-off function and in this case the python stand uh the python built-in sorted function lets you pass in a function to say how do you decide whether something's earlier or later in the sort order and so in this case i took the product of the last two items of my bounding box list i.e the bottom right hand corner minus the first two items of my bounding box list i.e the top left corner so bottom right minus top left is the size the two sizes and if you take the product of those two things you get the size of the bounding box and so then that's the function do that in descending order i mean often um often you can take something it's going to be a few lines of code and turn it into one line of code and sometimes you can take that too far but for me i like to do that you know where i reasonably can because again it means like rather than having to understand a whole big chain of things my brain can just say like i can just look at that at once and say like okay there it is and also i find that over time my brain kind of builds up this little library of idioms you know and like more and more things i can look at a single line and know what's going on okay so um this now is a dictionary and it's a dictionary because this is a dictionary comprehension um a dictionary comprehension is just like a list comprehension i'm going to use it a lot in this part of the course um except um it goes inside curly brackets and it's got a key colon value right so here the key is going to be the image id uh and the value is the largest bounding box okay so now that we've got that um we can look at an example um and here's an example of the largest bounding box for this image right so obviously there's a lot of objects here there's three bicycles and three people okay but here's the largest bounding box and i feel like this ought to go without saying but it definitely needs to be said because so many people don't do it you need to look at every stage when you've got any kind of processing pipeline if if you're as bad at coding as i am everything you do will be wrong the first time you do it right but like there's lots of people that are as bad as be a coding and yet lots of people write lines lines of code assuming they're all correct and then at the very end they've got a mistake and they don't know where it came from right so like particularly when you're working with images right or text like things that humans can look at and understand keep looking at it right so here i have it yep that that looks like the biggest thing and that certainly looks like a person so let's move on uh here's another nice thing in pathlib make directory okay so it's a handy little method uh so i'm going to create a path called csv um which is a path to my uh large objects csv file um why am i going to create a csv file um pure laziness right we have an image classifier dot from csv right i could go through a whole lot of work to create a custom data set and blah blah blah to use this particular format i have but why you know it's so easy to create the csv chuck it inside a temporary folder and then use something that already you have right so this is kind of a something i've seen a lot of times on the forum is people will say like how do i convert this weird structure into a way that fast ai can accept it and then normally somebody on the forum will say like print it to a csv file so uh that's a good simple tip and the easiest way to create a csv file is to create a pandas data frame right so here's my pandas data frame um i can just give it a dictionary with the name of a column and the list of things in that column so there's the file name there's the category and then you'll see here why do i have this i've already named the columns in the dictionary why is it here because the order of columns matters right and a dictionary does not have an order okay so this says the file name comes first and the category comes second all right so that's a good trick to creating your csv's so now it's just dogs and cats right i have a csv file it contains a bunch of file names and for each one it contains the class of that object so this is the same two lines of code you've seen a thousand times what we will do though is to like take a look at this um the one thing that's different is crop type so you might remember the default strategy for creating uh uh what's size here 224 224 by 224 image in fast ai is to uh first of all resize it so the largest side uh sorry the smallest side is 224 and then to take a random crop this means it's rectangular a random square crop uh during training and then during validation we take the center crop uh unless we use data augmentation in which case we do a few random crops um for bounding boxes we don't want to do that because um unlike an image net where the thing we care about is pretty much in the middle and it's pretty big a lot of the stuff in object detection is quite small and close to the edge uh so we could crop it out and that would be bad so um when you create your transforms you can choose crop type equals crop type dot no and no means don't crop and therefore to make it square instead it squishes it so you'll see this guy now looks kind of a bit strangely wide right and that's because he's been squished like this rather than cropped okay and generally speaking uh a lot of computer vision models work a little bit better if you crop rather than squish but they still work pretty well if you switch right and in this case we definitely don't want to crop so this is perfectly fine right so we you know if you had like very long or very tall images that you know such that if a human looked at the squashed version you'd be like that looks really weird then that might be difficult to model but in this case we're just like eh looks a little bit strange but it's fine so the computer won't mind okay um so i'm going to kind of quite often just dig a little bit into some more depths of fast ai and py torch and in this case i want to just look at uh data loaders a little bit more so you already know that um let's just make sure this is all run so you already know that inside a model data object when there's lots of model data subclasses like image classifier data we have a bunch of things which include a training data loader and a training data set right and we'll talk much more about this soon but the main thing to know about a training about a data loader is that it's an iterator that each time you grab the next iteration of stuff from it you get a mini batch right and the mini batch you get is of whatever size you asked for and by default the batch size is 64 okay but you can pass put it like um however so in python the way you grab the next thing from an iterator is with next right but you can't just do that right and why can't you just do that the reason you can't do that is because you need to say like start a new epoch now right in general like this isn't just in py torch but for any python iterator you kind of need to say start at the beginning of the sequence please right and so the way you do that and this is a general python concept is you write it and it says please grab an iterator out of this object right and specifically as we'll learn later it means this class has to have defined an underscore underscore iter underscore underscore method which returns some different object which then has an underscore underscore next underscore underscore method right so that's how i do that right and so if you want to grab just a single batch this is how you do it x comma y equals next in a data loader right y x comma y because our our data loaders our data sets behind the data loaders always have an x you know the independent in a y the dependent variable so here we can grab a mini batch of x's and y's and now we're going to pass that to that show image command we had earlier but we can't send that straight to show image for example here it is for one thing it's not an umpire it's not on the cpu and its shape is all wrong it's not 224 by 224 by 3 it's 3 by 224 by 224 furthermore these are not numbers between 0 and 1 why not because remember all of the you know standard image net pre-trained models expect our data to have been normalized to have a zero mean and a one standard deviation so if you look inside actually let's use visual studio code for this since that's what you've been doing so if you look inside transforms from model so control t transforms from model tfm bang all right um which in turn calls transforms so f12 um actually transforms from model calls transport stats and here you can see normalize and it normalizes with some set of image statistics and the set of image statistics they're basically hard coded this is the image net statistics this is statistics use reception models right so there's a whole bunch of stuff that's been done to the input to get it ready to be passed to a pre-trained model so we have a function called denorm for denormalize it doesn't only denormalize it also fixes up the the dimension order and all that stuff right and the denormalization depends on the transform okay and the data set knows what transform was used to create it so that's why you have to go model data dot and then some data set dot denorm and that's a function that is stored for you that will undo everything right and then you can pass that and mini batch but you have to turn it into numpy first okay so this is like all the stuff that you need to be able to do to kind of grab batches and and look at them right and so after you've done all that you can show the image and we've got back our last list so that's looking good um so in the end we've just got the standard four lines of code we've got our transforms we've got our model data um conv learner dot pre-trained we're using a resnet 34 here we're going to add accuracy as a metric fix some optimization function do an lr find and that looks kind of weird not particularly helpful normally we would expect to see a uptick on the right so the reason we don't see it is because we um intentionally remove the first few points and the last few points the reason is that often the last few points shoot so high up towards infinity that you basically can't see anything so the vast majority of the time removing the last few points is a good idea however when you've got very few mini batches sometimes it's not a good idea and so a lot of people ask this on the forum here's how you fix it right just say skip by default it skips 10 at the start so this goes to say five by default it skips five at the end we'll just say one right and so now we can see the the shape properly um if your data sets really tiny you may need to use a smaller batch size like if you only have like three or four batches worth there's nothing to see okay but in this case it's it's fine we just have to plot a little bit more okay so we pick a learning rate we say fit after one epoch um just training the last layer it's 80 percent let's unfreeze a couple of layers do another epoch 82 percent unfreeze the whole thing not really improving why are we stuck at 80 percent kind of makes sense right like unlike image net or dogs versus cats where each image has one major thing they were picked because they have one major thing and the one major thing that's what you're asked to look for a lot of the Pascal data set has lots of little things and so a largest classifier is not necessarily going to do great but of course um we really need to be able to see the results to kind of see like whether it makes sense so we're going to write something that creates this and in this case i'm kind of like um i after working with this a while i know what the 20 um Pascal classes are so i know there's a person in a bicycle class i know there's a dog in a sofa class so i know this is wrong it should be sofa that's correct bird yes yes chair that's wrong i think the table's bigger motorbike's correct because there's no cactus that should be a bus person's correct bird's correct cow's correct plant's correct car's correct so it's looking pretty good okay so um when you see a piece of code like this um if you're not familiar with all the steps to get there it can be a little overwhelming right and i i feel the same way when i see a few lines of code and something i'm not very familiar with i feel overwhelmed as well but it turns out there's two ways to make it super super simple to understand the code um or there's one high level way the high level way is run each line of code step at step print out the inputs print out the outputs most of the time that'll be enough right if there's a line of code where you don't understand how the outputs relate to the inputs go and have a look for the source right so now all you need to know is what are the two ways you can step through the lines of code one at a time um the way i use perhaps the most often is to take the contents of the loop copy it create a cell above it paste it out dent it write i equals naught and then put them all in separate cells and then run each one one at a time printing out the input samples i mean i know that's obvious but the number of times i actually see people do that when they ask me for help is basically zero because if they've done that they wouldn't be asking for help okay um another method that's super handy and there's particular situations where it's super super handy is to use the python debugger who here has used a debugger before so half to two-thirds so for the other half of you this will be life-changing actually a a guy i know this morning who's actually a deep learning researcher um wrote on twitter and his his message on twitter was how come nobody told me about the python debugger before my life has changed and like this guy's an expert but because like nobody teaches basic software engineering skills in academic courses you know nobody had thought to say to him hey mark you know what there's something that shows you everything your code does one step at a time so i replied on twitter and i said good news mark not only that every single language in existence in every single operating system also has a debugger and if you google for language name debugger it will tell you how to use it right so there's a meta piece of information for you um in python the standard debugger is called pdb right and there's two main ways to use it the first is to go into your code and the reason i'm mentioning this now is because during the next few weeks if you're anything like me 99 percent of the time you'll be in a situation where your code's not working right and very often it'll have been on the 14th mini batch inside the forward method of your custom module right it's like what do you do right and the answer is you go inside your module and you write that right and if you know it's only happening on the 14th iteration you type if i equals 13 like that right so you can set a conditional breakpoint that's got a breakpoint um pdb is the python debugger fast ai imports it for you if you get the message that pdb is not there then you can just say import pdb okay so let's try that and you'll see it's not the most user friendly experience it just pops up a box right but the first cool thing to notice is holy shit the debugger even works in a notebook right so that's pretty nifty uh you can also work in the terminal of course um and so what can you do you can type h for help right and there are plenty of tutorials here the main thing to know is this is one of these situations where you definitely want to know the one letter mnemonics right so you could type next but you definitely want to type in right you could type continue but you definitely want to type c i've listed the main ones you need right so what i can do now that i'm sitting here is like it shows me the line i'm car it's about to run okay so one thing i might want to do is to print out something and i can write any python expression and hit enter and find it okay so that's that's a useful thing to do um i might want to find out like more about like well where am i in the code more generally i don't just want to see this line but what's the before it and after it which case i want l for list right and so you can see i'm about to run that line these are the lines above it in the below it okay um so i might be now like okay let's run this line and see what happens so go to the next line is n okay and you can see now it's about to run the next line uh one handy tip you don't even have to type n if you just hit enter it repeats the last thing you did so that's another okay so i now should have a thing called b right unfortunately single letters are often used for debugger commands so if i just type b it'll run the b command rather than print b for me right so to force it to print you use p print b okay uh so there's a burn um all right fine let's do next again um right at this point if i hit next it'll draw the text okay but i don't want to just draw the text i want to know how it's going to draw the text so i don't want to know next over it i want to s step into it so if i now hit s to step into it i'm now inside draw text and i now hit n i can see draw text and so forth okay and then i'm like okay i know everything i want to know about this i will continue until i hit the next break point so see we'll continue until i'm back at the break point again um what if i was zipping along and this happens quite often that like let's step into d norm here i am inside d norm and what will often happen is if you're debugging something in your pi torch module and it's hidden exception and you're trying to debug you'll find yourself like six layers deep inside pi torch but you want to actually see back up what's happening where you called it from right so in this case i'm inside this property but i actually want to know what was going on up the call stack i just hit you and that doesn't actually run in a thing it just changes the context of the debugger to show me what called it and now i can type you know things to find out about that environment okay um and then if i'm going to go down again it's deep okay so like i'm not going to show you everything about the debugger but i've just showed you all of those commands right um yes aza oh something that uh we found helpful as we've been doing this is using from ipython.core.debugger import set trace and then you get it all prettily colored you do indeed excellent tip let's learn about some of our students here aza tell us i know you're doing an interesting project can you tell us about it okay hello everyone uh i'm aza uh here with my uh my collaborator brit and we're using this kind of stuff to um try to build a google translate for animal communication now so that involves playing around a lot with like unsupervised machine neural translation and doing it on top of audio where do you get data for that problem that's sort of the hard problem so there you have to go and like we're we're talking to a number of researchers to try to collect and collate large data sets but if we can't get it that way we're thinking about building a living library of the audio of the species of earth that involves going out and like collecting a hundred thousand hours of like gelata monkey vocalization so i don't know that's critical yeah all right um that's great here okay uh so let's get rid of that set trace um the other place that the debugger comes in particularly handy is as i say if you've got an exception right particularly if it's deep inside pytorch so if i like when i times a hundred here obviously that's gonna be an exception i've got rid of the set trace so if i run this now okay something's wrong now in this case it's easy to see what's wrong right but like often it's not so what do i do percent debug pops open the debugger at the point the exception okay so now i can check like okay preds dot well i don't know len preds 64 i times 100 gotta print that because i as a command 100 oh no wonder okay and you can go down you can go up you can list whatever okay so um i do all of my development both with the library and of the lessons in jubiter notebook um i do it all interactively um and i use you know percent debug you know all the time along with this idea of like copying stuff out of a function of putting it into separate cells running it step by step um there are similar things you can do inside for example visual studio code there's actually a jupiter extension which sets you select any line of code inside visual studio code and it'll and say run in jupiter and it'll run it in jupiter and create a little window showing you the output um there's neat little stuff like that personally i i think jupiter notebook is better um and perhaps by the time you watch this on the video you know jupiter lab will be the main thing jupiter labs like kind of the next version of jupiter notebook that are pretty similar um wow i just broke it totally okay well we know exactly how to fix it so we will worry about that another time they will debug it this evening um okay so to kind of um do the next stage uh we want to create the bounding box right and now creating the bounding box around the largest object may seem like something you haven't done before but actually it's totally something you've done before okay and the reason something you've done before is we know that um we can create a regression rather than a classification neural net right in other words a classification neural net is just one that has a sigmoid or softmax output and that we use a cross entropy or binary cross entropy negative or likelihood loss function like that's basically what makes it if we don't have the softmax sigmoid at the end uh and we use mean squared error as a loss function it's now a regression model right and so we can now use it to predict a continuous number rather than the category we also know that we can have multiple outputs like in the planet competition we did a multiple object classification what if we combine the two ideas and do a multiple column regression so in this case we've got four numbers top left x and y bottom right x and y yeah and we could create a neural net with four activations we could have no softmax or sigmoid and use a mean squared error loss function and this is kind of like where you're thinking about it like differentiable programming right it's not like how do i create a bounding box model it's like all right what do i need i need four numbers therefore i need a neural network with four activations okay that's half of what i need to know the other half i need to know is a loss function in other words what's a function that when it is lower means that the four numbers are better because if i can do those two things i'm done okay um well if the x is close to the first activation the y is close to the second and so forth then i'm done so that's it i just need to create a model with four activations with a mean squared error loss function and that should be it right like we don't need anything new so let's try it so again we'll use a csv right and if you remember from part one to do a multiple label classification your multiple labels have to be space separated okay and then your file name is comma separated so i'll take my largest item dictionary create a bunch of bounding boxes for each one separated by a space using a list comprehension i'll then create a data frame like i did before i'll turn that into a csv and now i've got something that's got the file name and the four bounding box coordinates i will then pass that to from csv again i will use crop type equals crop type dot no um we'll uh next week we'll look at transform type coordinate for now just realize that when we're doing scaling and data augmentation that needs to happen to the bounding boxes not just to the images image classifier data dot csv gets us to a situation where we can now grab one mini batch of data we can denormalize it we can turn the bounding box back into a height width so that we can show it and here it is okay remember we're not doing classification so i don't know what kind of thing this is it's just a thing but there is a thing okay so i now want to create a comnet based on resnet 34 but i don't want to add the standard set of fully connected layers that create a classifier i want to just add a single linear with four outputs so fast ai has this concept of a custom head if you say my model has a custom head the head being the thing that's added to the top of the model then it's not going to create any of that fully connected network for you it's not going to add the adaptive average pooling for you but instead it'll add whatever model you ask for so in this case i've created a tiny model it's a model that flattens out the previous layer so remember i'm normally would have a seven by seven by i think five twelve previous layer in resnet 34 so it just flattens that out into a single vector of length 25088 right and then i just add a linear layer that goes from 25088 to four there's my four outputs so like that's the simplest possible kind of final layer you could add i stick that on top of my pre-trained resnet 34 model so this is exactly the same as usual except i've just got this custom head all right optimize it with adam use a criteria i'm actually not going to use msc i'm going to use l1 loss so i can't remember if we covered this last week we can revise it next week if we didn't but l1 loss means rather than adding up the squared errors add up the absolute values of the errors so it's like it's it's normally actually what you want adding up the squared errors really penalizes bad misses by too much so l1 loss is generally better to work with okay i'll come back to this next week but basically you can see what we do now is we do our lr find find our learning rate learn for a while freeze two minus two learn a bit more freeze two minus three learn a bit more and you can see this validation loss which remember is the absolute value mean of the absolute value of the pixels we're off by gets lower and lower and then when we're done we can print out the bounding boxes and lo and behold it's done a damn good job okay so we'll revise this a bit more next week but like you can see this idea of like if i said to you before this class do you know how to create a bounding box model you might have said no nobody's taught me that right but the question actually is can you create a model with four continuous outputs yes can you create a loss function that is lower if those four outputs are near to four other numbers yes then you're done okay now you'll see if i scroll a bit further down it starts looking a bit crappy any time we've got more than one object and that's not surprising right because like have a held you decide which bird so it's just said i'll just pick the middle which cow i'll pick the middle how much of this is actually potted plant i'll pick the middle right this one it could probably improve but you know it's got close to the car but it's a pretty weird right but nonetheless you know for the ones that are reasonably clear i would say it's been a pretty good job okay all right so that's time for this week i think you know it's been a kind of gentle introduction for the first lesson if you're a professional coder there's probably like not heaps of new stuff here for you and so you know in that case i would suggest like practicing learning you know about bounding boxes and stuff if you aren't so experienced with things like debuggers and matplotlib api and stuff like that there's going to be a lot for you to practice because we're going to be really assuming you know it well from next week okay thanks everybody see you next monday