 Okay, so let's look at how we can automate a process like the one that we packed together yesterday. And actually what we might do is have a look at something that Radick put on the forums, because I kind of want to have all the stuff we want to include there and then we can automate the whole thing. Yeah, here's Radick's thing. Okay, so Radick, do you want to tell us about what this thing is? What your project or goal or what is this forum post about? This is another of Radick's crazy inventions, but essentially this is how I learned, right? So if I, this is the first way of learning at least, I think of this as the first way of learning. So what does it say? For those who haven't read it yet, can you just take us through what it is? Oh yeah, absolutely. So yesterday in the workshop we covered some material and I wanted to find a fun way to practice it. So I went to Kaggle, I looked for competitions and there was a competition that seemed like participating in it could allow me to practice what we learned about the workshop. So I really created this as a resource for others. You know, I've been creating such things for a couple of years now and maybe it requires also a little bit of understanding of Kaggle, which somebody who has not played around with Kaggle yet they might not have it. So I thought, hey, let me put this together, share it with others. So this is a getting started part with a community competition. So first images of plants and you're supposed to detect one of nine or one of 10 classes of plant diseases that they can be affected with. I haven't looked too much at the data. Essentially, I relied on first AI functionality where if I present the image data loaders and then by the token if I present the learner with data that seems to be appropriately formatted, then the learner will do the rest. So I tried to restore this as quickly as I could just put it together and to practice on my own as well while doing this and hopefully this can be useful to others. Great. Should we try going through together if somebody's got a fast AI one here already. Sounds fun. Let's try it. That's great. And I like the fact that you're looking to run it on paper space rather than on Kaggle. I think that's good practice. So let's let's do the same thing that Radick did. Yeah, so on Kaggle, you can see all their competitions that are running active competitions. So there are various different types, right? There's kind of the normal ones that have money involved and as well as having money involved, they also have ranking points. So that gives you the opportunity to try to become a master or grandmaster or whatever by getting ranking points. Then there are some which are just for kudos. So there's no money involved and no ranking points. And you can kind of use the little buttons at the top to find getting started competitions. So those are just knowledge. Some have prizes. I wonder what prizes they have. A TPU start, 20 extra hours of TPU time for four weeks. There you go. Kind of sounds like a drug. Like you use TPUs to win and then it'd be hard to keep going. I love it. And then there's also playground competitions. Which they kind of repeat each month. All right. So it looks like Radix picked out a kudos competition. So we're not going to get any money or ranking points. We're just doing it for the enjoyment and the learning. So we have to click before you can download the data. You have to click join competition. And this is a really common mistake people make is they try to download the data without doing that. You'll get an error. So as it says, you can download the data by running this command here. Now that's not going to be installed yet. So we need to install it. So we can install it with pip install minus user, etc. I feel like given how much we're typing pip install minus user, whatever it might be worth creating an alias for. This is taking a long time. Do you like paper space so far generally. Very, very much. Yeah. Yeah, I it's the first platform I found which I feel like I can I can use this, you know. I have to give another try them. Not opening my Jupyter lab. Yeah, make sure you do all the previous walkthroughs because it really does take you through like how to take advantage of this. And it's, um, yeah, otherwise, it's not particularly exciting. All right, I'm having some trouble with Jupyter lab. So I'm just going to go to and use their rather unappealing GUI. Hopefully that'll work. Even that's not working. Did they have the ability to SSH into these machines, you know, they do not. But because you've got Jupyter lab installed, you get a full terminal. So it doesn't really make any difference. And you can also connect VS code to them. Anyway, for some reason, this is the first time this has happened. It's not liking me. I can't open a Jupyter lab notebook either. It's timing out. I'll see if I can fire it up in VS code then. It's unusually sluggish today. This is my experience. I've never, I've tried it before. Maybe it's just a year or so. The curse came with me. It used to happen to me all the time. And they seem to have really improved. I'm going to try. I've never had much info. I'll try a paid one. I have a green dot. It's promising. Wait, where did the green dot go? That was the other machine. I've got two tabs. The green dot is the free one. I can't connect through VS code either. Not even wanting to start. It should be a red dot. It's not working. No worries. Change of plans. We'll do things locally. Yellow paper space. What the hell just happened. Let me switch to my other user. Let's see if Kaggle is installed. There's no Kaggle. Here I am with no Kaggle. I would generally start by creating a T-Max session. I would like to be able to run a few things at the same time. I would run pip install minus user Kaggle. There we go. Now, Kaggle is not just a Python library, but it also has a command line tool. Because I did minus user, it installed the command line tool into my home directory, into the dot local folder. And binaries, things that you can execute, are generally put in bin. Dot local slash bin in my home directory isn't in my path, and therefore I can't type Kaggle. As we know, to fix that, if you're on paper space, you would modify slash storage slash bash dot local. Or here in my local machine, I would just modify my home directory dot bash. On paper space, if you do this, if you modify it, it will not run before Jupyter notebook runs. Correct, which is fine. Okay. Yes, because Jupyter notebook doesn't need access to this. Unless you want to do exclamation mark Kaggle, if you want to put exclamation mark Kaggle in Jupyter, you would need to put it in pre-run.sh. Is that your point there? It is, yes. Cool, great. Yeah, so Jupyter dot bash dot local only will execute in a new terminal. Sorry, go on. So I suppose like in some operating systems, I think the local bin directory is on the path by default. So maybe you think that it's just like you've never seen that, but it's possible. There's distributions around. It could be confusing with like user local bin or whatever. There's some of them on Mac that are. Yeah, slash user slash local slash bin is not in your home directory and that is always part of your path. But this is something in your home directory. Oh yeah, okay. All right. Okay, so by default, Ubuntu has a bunch of stuff in your bash RC, by the way. So I'm just going to go to the bottom. So to go to the bottom in vim, it's shift G to go to the bottom and then O to open up a new line underneath this one. So insert beneath. And so we will export path equals tilde slash dot local slash bin and then colon and then everything that's already in your path. So that pre-pens it to our path. So I could close and reopen my bash RC or I can re-execute it and any exported variables I want to go into my shell. So to execute stuff and put variables into your current shell, you can type source. So source dot bash RC is going to save me having to close and reopen my terminal and dot bash RC is the last thing on the last line. So I can just do that. And so now we can Kaggle. Okay. So the next thing we need is some way to authenticate and Kaggle uses something called Kaggle dot JSON to do that. So if I go to Kaggle, you can grab it. There you are. You can grab it by clicking create your API token and what that will do is it will download a file called Kaggle dot JSON to your computer. And so once it's downloaded depending on where you are on Mac, it might be in your Twitter slash downloads directory and on Windows it'll be in slash mount slash C is your like Windows C drive and it'll be in your user's username downloads directory. So they say it needs to actually be in a directory called dot Kaggle. So I'm going to make the dot Kaggle. I think it's probably just created that for us when we tried to run it. That's good. So now I can copy it. And in this case I think what I'm going to do is just copy it from my other account copy dot Kaggle slash there's my JSON and I'll copy it into my by the way so I want to forget the JPH00's home directory so tilde JPH00 refers to the home directory belonging to JPH00. Tilde on its own means the current user's home directory. So I'm going to copy it over to dot Kaggle. There we go. And change its ownership so it's owned by JPH00. So you won't have to do this because you'll be downloading it and copying it from your downloads. I'm just doing this because I'm copying it from a different user. All right. So yep. That now belongs to JPH00. So now I should be able to go back into that username and type Kaggle. Okay. Great. So I've got Kaggle installed and we'll do a check from time to time to see whether anything is working. Not really. Okay. So the Kaggle competition said we can download it with this command. So I'll copy that and let's create a directory for the competition and run that command. Nice. All right. Did anybody have any questions or anything about this as we wait for that to download? So Jeremy, we can use Mamba install here since you're doing it on local, right? It's just like you're demonstrating paper space. Yes, if it's on yes, it is on it is on KondaForge. So yeah, Mamba install Kaggle should be fine. Although, you know, to be honest, like for simple pure Python stuff like this, I often just use pip anyway because things like this, like pretty much most tools are pip, Python libraries, like pip is the main thing people are kind of targeting so you can be sure that that's going to be the most recent version. Unless the documentation explicitly says like we provide Konda packages as well, there's often a good chance that the Konda packages will be behind. So if I was going to do a Mamba install, I would be inclined to like double check that this is actually the most recent version. But yeah, I mean, as you see, I just use I just use pip anyway, I suspect for something like this. Did it used to be the case that you I remember something about like cookies and there was a browser extension and maybe you had your own tool for this or am I just hallucinating this? Did it used to be this way? From like an older class there? Okay. Okay, so there are these zipped things so we can unzip it. Okay, so I hate it when that happens because it actually takes ages so minus Q for quietly unzip it. Right. Okay, so that's going to give us our data. I guess one thing is forgetting at our Kaggle.json on the paper space. The easiest way is to click the file upload button. So there's just a little upward pointing arrow button. If you click that, it'll upload it. And then yes, copy it to tilda slash dot Kaggle and it does have to have the correct permissions which is hopefully you might be to recognize this so that's four plus two, six and then so chmod 600 on that file would give you the correct permissions. Okay, so now the only problem is that this is my desktop which does not have a GPU so that was actually a stupid place to put this. So I'm going to copy this to my GPU server so to copy files from one Linux or Mac thing to another very easy way to do it is SCP secure copy and type the name of the file and then type where you want to send it to except I don't have that set up here. All right, so I'm just going to go back to my normal user zero copy tilda jph 00 git pettie disease classification I'm going to copy that here ch j Howard that okay so you can use SCP to copy file to another machine and off that goes so how does it know what local colon is so there's a very underutilized handy file called ssh config where you can type things like host local and when I SSH to that it will actually SSH to this host name and it will actually use this user name and it will set up we haven't talked about SSH forwarding but if you know about that it will set up SSH forwarding so this is just a little trick for people who use SSH that using the SSH config file is great and it's not just for SSH it's also for anything that uses SSH including SCP SCP is a secure copy over SSH all right so now that's done I can log in to that machine and now we're on a CPU GPU machine so to check your GPUs you can type invidia-smi and so this has got 3 GPUs and I can move that file I just copy it into here into here so should we use SCP or rsync? either is fine I use SCP just because I don't have to type any flags to it strictly speaking SCP is kind of considered deprecated nowadays but it actually works fine unzip that cool okay making good progress let's see what we've got okay so there's a sample submission.csv there's a train.csv train images test images so LS train images this has got like 10,000 things in it, that's going to be annoying so if you pipe to head so remember this vertical bars called pipe means to take the output of this program and pass it in as the input to this program and this program shows you the first 10 lines of its input okay so actually it turns out that's got folders for each category so I don't really need to pipe it to head okay and so then we could do the same thing with one of these bacterial leaf light and pipe that to head there we go so now we might want to know like oh how many of those are there so instead of piping to head we can pipe it to word count which is WC but despite the name it doesn't only count words if you pass in L for line it'll do a line count so that's how many bacterial leaf light images there are so it's really useful to yeah you know play around with these things you can pipe into so head WC, another useful one is tail which is the last 10 lines and then one we've seen before is grep so not particularly useful but show me all the ones with the number 33 in it okay and you can use head and tail also on files so head is very useful for CSV files if you're in your Jupyter notebook and it's streaming at you that it cannot read a CSV file, it cannot parse a CSV file you can just jump into console or even from Jupyter notebook just to head yeah well let's try it right because so I think we know that if you type cat and a file name it will it will send it to output which by default prints it to the screen so we could pipe that to head right now real Unix gurus will say well that was silly because actually if you look at the band page for head if you pass it a file name it does the same thing but to me I prefer to learn a smaller number of composable things so piping stuff to head is not a bad idea and we could even and you know another nice thing about cat is I can pipe it into grep and search for everything with I don't know how many of these ADT45s are there let's grep for ADT45 and then pipe that into word count but count lines a whatever ADT yeah so you can quickly get some information at the console which yeah I think can be quite useful alright so next thing to do I reckon is to fire up a jupiter so let's see the jupiter notebook excuse me Jeremy if you are interested my paper space and the instance has started up now so I don't know if yours would have to look at that fantastic alright so it's probably worth just quickly going through the exact same process one more time I guess isn't it so we open up the terminal pip install cagle user that's interesting so this is because I installed stuff to that condo directory the other day and so if I go which pip it's actually finding that one and I don't want there to be a pip there so we'll remove it condo oh in my home directory okay let's try that again I have to reopen this terminal how confused is it which pip there we go okay now it's happy ctrl r install to find the last thing I typed saying install okay we've got the path issue again so vim slash I think I prefer radix approach we're putting it in pre-run so that way we have the ability to use this if we wish in jupiter so export path equals local in and then the current path one of the confusing things I find about bash it got me a couple of times if you are doing export something a variable name you need to have the equality sign straight after the variable name it won't work it's just one of these little quirks yeah bash is a very old program and it has these weird old quirks about white space sensitivity so that's a really important point to mention thank you and I'll run it here as well rather than restarting and so now kegel should exist it does it runs that's good all right and so let's copy this into my downloads directory or else I guess what I could do yeah let's just do that copy kegel kegel slash mount slash the users j downloads and so we should be able to now upload it from my downloads directory there it is okay and so it's created a dot kegel directory for us wait oh sorry this is my wrong sorry let's do that again cd tilda slash dot kegel yeah it's created a kegel directory for us and so we should be able to move the thing that we just uploaded to slash notebooks into here and the permissions will be wrong so we can fix them okay and so let's see if it works here as well it does and paper spaces network is faster than my connection in Australia not surprisingly although you know mine wasn't bad actually okay so oh that was a dumb place to put it obviously I don't want to put it in patty disease classification you know we're only going to use this for this notebook I guess so maybe move that to slash notebooks and so let's create a patty folder and unzip it so that means okay that's interesting there's no one zip but we know how to deal with that why isn't ctrl r working for me oh because ctrl r does a refresh oh that's annoying isn't it so how do we search our history in these terminals oh well that's fine I will just type it in manually and we will figure out how to make ctrl r working at some other point so micro memba minus c condo forge minus prefix tilde slash condo install probably need the install first yeah a lot of the keyboard shortcuts don't work in the browser based terminal which is actually pretty annoying they work a bit better on mac than on windows because of windows the control key is both used for the Linux terminal commands and it's also used for like the normal browser commands whereas on mac they use command for the browser commands and so the control key doesn't get overwritten so this would probably be a better experience on mac actually than windows okay so we're going to install unzip and hopefully by the time people watch this video if it's like July or later remember and unzip will already be installed okay let's check okay we have an unzip that's good so that is on its way so that's going to use up a gigabyte of space in my persistent storage which you might not want to do that right you might want to do that then instead you unzip it into your home directory if you unzip it into your home directory it won't be there if you close it down and reopen it right so you might want to create a little script for yourself that does the caggle download and the unzip on your notebook and then you can run that each time you start it up so you know these are the issues I mean the look having said that the overage cost on paper space I believe is 29 cents per gigabyte per month so convenience of putting it in storage is probably worth 29 cents for the one month you probably weren't going to want it there so maybe that's just a better plan I do know though that the well maybe this is a problem actually because I do know the paper space slash notebooks and slash storage are very very very very very slow you can actually see that when we're unzipping this so maybe this is a bad idea maybe we shouldn't put data at least when there's lots of files because this is painful I'm going to cancel it and see how far it got du minus sh train images 426 and how about test images wouldn't you know it was nearly finished but yeah I think I think this is actually slower so I'm going to remove that and I have a strong feeling if we move it back to our home directory it's going to be faster I sure hope so and the reason I care is not so much for the unzipping speed but when it comes to like training a model we don't want it to be taking ages to open up each of those files well you see even IRM minus IRF takes a long time so well that's running let's move patty slash zip file pop it into our home directory there we go then cd to our home directory so in terms of the steps we're going to do it would be first we would make a directory for it we would then do the caggle download actually which we can just copy easily enough from caggle and then we would unzip let's see how long it takes so the time unix command runs whatever command you put after it and tells you how long it took did I not move it there move dot dot slash patty I didn't move it there yeah time unzip quietly patty so yeah so I think what I would do now I think about it is I would have a patty directory in my notebooks I wouldn't store anything big here I just have my notebooks here and I would put a script here called get data say and it'll just have each of the steps I need so the steps would be cd to my home directory make the patty folder cd to the patty folder do the wget well not wget caggle competitions download I should say unzip it patty disease yeah and I think that's it right so we can make that executable with ch module plus x to add the executable permission to it and so yeah that so then all I have to do is run that thing each time I start up paper space and yeah it's only going to take eight seconds to unzip and it took about five seconds to download so that actually that's not going to be really any trouble at all is it cool and that's you know slash notebooks remember is persistent on this machine so that's all good so now we can create a notebook for it and so my first step is always just to import computer vision functionality in general which is the same thing we used yesterday and now you know exactly what that does and then my second step is to look at the data so it's easiest to look at the data if we set a path to it so it's going to be in our home directory and it's going to be called patty flash well that's okay it's just flash patty right you can go past that home wow I didn't know that that's quite quite neat yeah it is quite neat so that's that okay so we can path.ls tells me what's in there and if you remember my trick from yesterday I also like to set that to be the path.base path just so that my ls's look a bit easier to read there we go so at this point we could create a data frame by reading the csv of path train.csv okay so we've got 10,000 rows each one is a jpeg each one's got a label and so let's take a look at one of the images shall we oh yeah path slash train actually you know let's make life a little bit easier for ourselves by creating a train path because you know it's just so good to be lazy slash 100330 .jpeg pig oh no because then they're inside the label directory yes so what we actually probably should have done would be to say train path .ls patty train is that not right train underscore images and that's another good reason to put it in a variable so you have to change it in one place and so there we have that and so let's create I don't know let's call it the bacterial leaf light path equals train path slash so now we should be able to go BLP and look at that image there we go we have an image so might be nice to like find out a bit about this we look at the size so it's a 480 by 640 image great another way we can take a look at an image from yesterday you can go files equals get image files and pass in a path and this will be recursive so I can do this as you can see this has got the 10,000 okay and that number there matches that number there so that's a good sign and so another way to do that would have been to go image equals pil.create file 0 and we could even take a look at a few right so if we wanted to check that the image size seems reasonably consistent we could go o.size for o in well actually pil image.create o .size for o in files 10 for example this is not particularly regress but it looks like they're generally 480 by 640 files they're all the same size which is handy that's interesting and they're probably bigger than we normally need you know we normally use images that are around 224 or so having said that I don't know if like presumably this is some disease thing patty disease competition rice classify the images according to their disease so I can't even tell that this thing has a disease so I don't know how big it needs to be to see the disease so it is possible it'll turn out that we actually need full-sized images so like I would start by using smaller images and kind of see how we go anyway 640 by 480 is not giant so we should be fine the the csv file has got one extra bit of information which is the variety Radek did you find out what this variety thing is about from the doubt I didn't know that csv file existed but it's fun because we can build a multimodal model on this data I see it's the type of rice as opposed to the type of disease yeah so maybe you know the different diseases might look different depending on what type of rice it's on my guess is that we wouldn't need to use that information because given how many images there are I would guess that it's going to do a perfectly good job of recognizing the varieties by itself without us telling it unless there's a whole lot of different types of varieties which we can check easily enough right by checking the data frame grabbing the variety and doing a dot value counts and we can see how many there are of each okay so there's not okay so look I mean there's a couple of tiny varieties but on the whole most of it is ADT45 and quite a bit of does seem like a bit of a rice session today doesn't it lots of rice going on yeah so I think it's very unlikely that this variety field could help because there's so many examples of the main one anyway that it's going to be able to recognize it I mean at some point we can try it but I would I would be making that a pretty low priority for this competition and so yeah given we're doing a practice walk through I'd be inclined to fire up fast book and the intro and see if we can just basically do the same thing that we did last time so I'm going to merge these back together again we've already got those two we've got those well there's not much there is there oh I'm in APL mode I wonder my things aren't working don't know how that happened haven't used APL today copy paste okay so this is how we did cats so we needed a labeling function now in our case the labels are very easy each image is inside the directory which is its label so the parent folder name is the label and so we already have a function to label from folders so we can actually just do image loaders from folders because that's all we need so we're still going to need the path train and valid actually you have different names so let's fill all of those in so we're going to have path train equals train underscore what was it images images yep and test images train underscore images valid percent so that's fine we'll do that the same as last time okay it's expecting to have train and valid subfolders or valid percent so hopefully that'll work let's try it and we'll use the same resize as last time okay oh no that did well did that work no it didn't work because we've got that's interesting test images so my guess is it's got confused by the fact yes okay so possibly what we should instead do is use train path here and use valid percent instead I wonder if that'll fix that problem there we go that's fix that problem okay great so we should then be able to create a learner dot fine tune let's just do one epoch to start with there it goes so it can be useful to kind of make sure it's being reasonably productive as it's training and we can do that with nvidia smi nvidia smi minus minus help so much help so there's let's take a look here we've only got one GPU so that's fine bloke query okay we're not modifying anything demon I think that's the one we want nvidia smi demon okay that's just finished so while it was running so this is something people often say to use watch nvidia smi to like have it refresh but actually I don't think most people know that there's a demon sub command just as you can see it shows you every second how it's going and it's showing me the most important thing is this column SM SM stands for symmetric multiprocessor that's kind of what they call it instead of a CPU for their GPUs and it's showing me that it's being used 70 to 90 percent kind of effectively if you like and that's a good sign so if this was like under 50 then that would be a problem but it looks like it's using my GPU reasonably effectively and it's got the error rate down to 13 percent so we are successfully training a model so that sounds good Jeremy just a quick question when you're saying that like if it's under 50 percent then that can be a problem is that because you've oversized the GPU like when you selected it or like it just what that would mean thanks it's a good question just rename this it would probably mean that we're not able to read and process the images fast enough and so in particular I guess is that if they're in slash storage or slash notebooks you would see the SM percent be really low because I think it would be taking a really long time to open each image because it's coming from a network storage and so generally yeah a low SM means that your I O your input output your reading or processing time is too high and so the ways to fix that would be a few one would be to move the images onto the local machine so they're not on a network drive resize the images ahead of time to make them a more reasonable size and a third would be to decrease the amount of kind of augmentation that you're doing or another would be to pick a different instance type with CPUs so those are basically the things alright just to end that's just my comment also has a lot of useful information like you could have a region and stuff like that it's also a useful command people don't know to know that it exists yeah a lot of details here if you're looking for the IDX of your GPU GPUs and some of the variables here are a little bit more descriptive so it might be easier to get started with that command or to at least use it every now and then and if you'd like to have this one running in a loop which is what I generally do just do nvdr-smi minus yeah I agree this is useful but I would suggest in a loop to use the Dmon because there's only two columns you care about and this one does not show you SM right so if you want to actually see it's being utilized you need to use Dmon and you can also see the actualization so just look at these two columns the other ones you can actually ignore yeah okay I think that's a pretty good place to stop I'm glad you put us on to this competition Radick it looks fun and I feel like we've got a reasonable start so yeah maybe next time we can try um doing a submission um and we could also try creating a cackle notebook for other people to see how does that sound sounds excellent one thing I also like about this is that we're coming up across problems as we go and jumping through those hoops and and these are the sorts of roadblocks that we'll have to place I guess exactly and if you guys you know repeat these steps or do it on another data set or whatever and hit some roadblocks then it's really helpful if you've solved them come back tomorrow and tell us what happened and how you solved it and if you didn't come back tomorrow and tell us to fix it for you I think they're both useful things to do so things like Radick's example of like doing a bash environment variable and having a space next to the equal sign you know that kind of stuff I forget even to mention it but really useful information you know this competition is nice because it's relatively small but 10,000 images and it's a light with what we're doing in the course but if you'd like to try something out on a competition that is not active right now you can still do this because cackle allows you to like submission thing and this opens up many competitions to you to play around with the current competitions that are how do they call it ranked competitions they award you points and their prices they are not on images so explore something on your own to try the methods on another competition on an image that might be something quite useful so to find those you'll need to scroll to the bottom and click explore all competitions and yeah this will let you see closed competitions as well and you can even see I guess here you go you can find out which one is the most popular of all time that can be interesting crypto forecasting well of course it would be that's a bit sad but there you go that's interesting this patent phrase one is super popular that's good to see instant gratification alright thanks all see you next time bye