 So I did try and put my slide to use the slides capability and Jupyter Notebook, but I couldn't get it to look nice. So you're just gonna have to read my notebook as I go along with it So basically my talk is going to be about installing and running Jupyter Notebook and pandas on EC2 and then at the end of it I'm gonna talk about using DASC and DASC distributed to fire up a little cluster with very very little effort to actually solve problems Okay I'm going to do a bit of my talk about actually getting Jupyter Notebook set up on a new EC2 instance and That's what I've got up here if you notice. I'm actually hosting this on one. I just made this morning So I could actually go through the steps. I'm gonna do in this talk So why why would you actually want to? Set up Jupyter Notebook on EC2 instead of your own computer There are a couple of different reasons. One if you've got a Windows computer in a locked-down corporate environment Just make it so loud like easy and do everything on the Linux machine on the cloud Two when we start talking about bigger data And EC2 instance is very close to s3. So they've almost got hard to escape sex to S3 so you want to store gigabytes of files over intermediate files or maybe your data somewhere You can just actually pull it from s3 on to your Jupyter Notebook really quickly And three when you want to scale up your instance and you want some more horsepower It's really easy to do by just shutting it down. So again Anything up to a eight extra large Cluster computers and they will only cost you two dollars an hour So that thing's got a ridiculous 60 gigabytes of memory Though you haven't looked down here to a somebody Oh Jesus Christ So for $13 an hour you can get a machine with That much memory Yeah But the one I'm kind of going to be recommending is going to be So this one I've been using and that costs 10 cents now and so at 10 cents an hour I Kind of just leave it up all day and turn it off when I go on the holidays especially since I've got my clients credit card The the good thing about those is being a T2 instance is Amazon actually scales them up and down you see they're actually they've got first of all here So that means they've actually hosted somewhere and they they kind of give you extra credits when they're Computationally expensive tasks. So it means they're cheap and They can actually do the same around the book as a M What one of those m instances is boring to I can't remember what they're up to at the moment But only if you do it And they're about 10 times Yeah So I find it quite good hosting something which most of the time is just sitting there waiting to do some work on it But when you actually want to do some work, it's that's what they see if I'm If anyone else has any other Then you've got any of the stuff I say here and no I've only been Working in the secret system for the past three weeks. So Cool So even though I run Linux out of my work machine, I decided to run this guy up on EC2 I guess the disadvantage of that is I'm not going to be able to do any work on the plane and Yeah, I have to pay extra money Which again, I'm getting my boss to pay for so Yeah, and the course of this talk what I'm actually going to do is I'm going to start up a EC2 instance Going to SSH to it. How many people have actually used Amazon EC2 and started instances on it? Okay, cool. So there'll be a new one for a lot of you and Then from scratch I'm going to install Deeper than I forgot it and then basically once I've done that I'll probably go back to using this one There's a lot of I'm going to put this on the web which I'll then post on meetup with my slides But there's a few links here that I used to do it Some of them use the Amazon AMI that's really set up with Jupyter Notebook on it I decided not to do that because it's easy to set up and it's nice to see what steps you have to do if you want to set it up on your own computer Most of them use Higher security on the Jupyter Notebook then I've got on this example and The example I've got is basically IP based security which Given that when you go Jupyter Notebook, it's really easy to just do that and be able to get a shell up which Straight away and with this thing we can just do that That's It's probably not a good idea to keep the security up for a long time All right, so if you're using this for a long time, it's probably about the first thing I'm going to do tomorrow Things in there but at the moment, I've just got this set up so that And Only this IP address I'm using which is probably all of AUT at the moment Can then access this particular EC2 instance Okay, any questions any comments? Okay, cool so Amazon EC2 basically is where you can plug in your credit card and you get computers and It's really nice For instance, if you've got the kind of work where it can be scaled across lots of computers And dust will help you do that Because what it does is it turns the tasks that you used to think It'll take you know a week on one computer. All you do is you fire up 100 or so computers and you can get that down to a couple of hours or something if I did the actual map That is right So You know Amazon will leave fire up a hundred computers a thousand computers basically as big as your credit card limiters and That's really nice to do So if you have a look at this interface So after you've gone and you've got an Amazon account given them your credit card details You signed up to EC2 I think possibly they will send you a text message now to sign you up Just to verify you're a real person and not someone trying to scam someone's credit card There's a couple of things to go through but you'll have a interface here that Show you what you've got And as you can see here, this is the instance. I'm actually running Just so that you guys know that that's my original one and You look down here. We've got some additional data about that instance All Amazon stuff has a public eye thing Which if you notice, that's the one I'm using up there to connect to Notepad and they've also gone internal IP which probably that needs to know at that moment importantly what you will also want to set up when you Do this is you want to set up a IAM role Because if you don't do that when you initially start your instance you can never give it a IAM role and You want to do that. You want to get access to your own buckets on S3 very easily, okay So I'm just gonna hit this launch instance And I like Ubuntu so I'm gonna select a 64-bit Ubuntu server Mostly because I don't like saying yum install. I'm always able to get install I said I like this instance Okay So that's a you know, it's probably about the size of a not very high-powered laptop Configure it. Don't worry about your Network and subnet. I'm going to have to use these later when I start Generating a cluster, but those things you can have them anyway. You may want to actually Have an IP address assigned to it So you don't always have to be sure what IP is if you've got it up for a long time I'm not going to tell you about that, but importantly you want to assign a IAM role to it Or you just create a new one. It doesn't matter if it's blank to begin with as long as you put one in there So this one here Basically you have a look at what it is If I'm losing you don't worry or I'll get that here You want to give an Amazon S3 full access so that if you start putting your data on S3 Which is one of the reasons why we chose to start things on Amazon You can access it All right, rest of it don't really worry about too much It's going to give me an ankle if I hard drive. That's fine What about tags security group? What you want to do is you want to make a security group for it I've got one here pretty good for the good, but You can work that out yourself. It doesn't really matter At the beginning what security group give it because unlike the IAM role you can change the main security groups while that instance is running and you can change permissions while the instance is running as well Which is really handy Fine, and it's telling me what's happening. I'm gonna start it All right, the key pairs Ten bit get a bit confusing when you start it off. These are basically so you can use this agent to your computer I'm not really going to cover that. I'm going to leave you guys to work that one out yourself It's a fighter to say that I have got a key pair already generated here I have got a file on my local computer called snprivate.pum which I'll be using when I want to communicate with this instance and I know how to use SSH How many people know what SSH is? Okay, so for those that don't SSH is just a way of Getting a command line up on every note that all the names can get abused and it's good to secure and works Then used to be But then I think that he's got rid of it It's used to be able to have some sort of JavaScript done SSH mine, but I think they just decided that wasn't secure enough. Okay, so here's the one I just started Is this So it's big enough for everyone to see you or do I need to make it bigger? That's fine Yeah So I've got this one sitting in there now the only real important thing you have to do is Grab your public IP and then you're going to want to use SSH While that's starting up. I'll actually show you what you've got to do for the security group So there's two things you need to do. You need to be able to SSH into your machine Key SSH with key pairs is really secure. So you're actually okay Having that from anywhere on the internet, but if you wanted to you could tie that down to Whatever your IP is at the moment Okay, and then you've got it quite a nice secure instance there, it's how to check your profile, etc, etc What I've done here is because you've got Due to the notebook that you're going to want to do at some point you're going to have to HTTP on for 8888 and That one day you don't want to let open to anywhere in the internet because I turned off the passwords on that one and That's going to be really insecure I would suggest you follow the Security configuration in each of them those files for setting a password Outbound You just allow it to have outbound Yes, I've got two running instances So now I've got an instance running on this public IP address Right, so that's how I got my old one So this is just using SSH from the command line might want to bump that up a little bit The only difference here is you've got a This file which contains my security key Which is just sitting in my local directory and because we're doing an advantage instance We have to remember that the user name is a dump. This is a real hassle with VCH every time you do your IP address So you trust this guy and because it's easy to write the address is again, right So now I'm here. I can start and stall in due to the notebook on this machine I go back to my other one And basically I've got these two commands to do it I'm choosing the anaconda installation because that really comes with everything and It's quite nice because it installs a new Python That you don't have to pseudo And one of the advantages of actually having another machine to do this kind of stuff You don't really have to worry about learning a machine with lots of Python packages that you may need to use It's a completely different machine All right, so you don't have to worry about keeping your work one or your Clean so If you have a look at anaconda or tell you how to install it And then it says to do this W get is just going to download this phone my local directory One of the great things about being on EC2 is you have awesome internet access these are the difference between files And then we have a look we've got that file And we have to run it with that shit And it's going to tell you to look at the license agreement So users and a condo piping yeah, it's very because I just start using it under any bad things about using it I don't know So so that's that's basically my choice So The anaconda type of distribution, so you've already got Python installed on this machine by fault since it's a Limits machine But it's just handy having a separate Python that's not system Yeah, yeah, so it's got heaps of these packages up here Yeah, you know, so it's that small So it's really kind of a Distribution they call it so it's part of that They also have their own You can they have one called condo And a whole bunch of pre-compiled binary Packages specifically designed that it work more generally designed that it you so some of those can be really Tricky to pile itself. If you just go and stand around for this one It's not the one way and it's more up to date as well Yes, I'm normal Python at this point here, I'd start picking school and do the notebook but since I'm using anaconda I Do condo in school and I don't have to have a group access And that's just saying say used to everything do it quietly so now I've got Jig they're in school There's that little bit of configuration I have to do To turn off security I Don't hate me. It's just a little bit easier to do this demonstration and we go into this file we add the magic lines Which serve Jupiter notebook on every IP address And At this point here I'm going to introduce you to something called T-Mux T-Mux is kind of like screen How many people have used Okay, so What's basically going to happen here is since we're running a computer remotely If I just run a process now And start to the notebook as soon as I Log out of the session of SSH to the notebook Okay, so what you want to do is you want to have some way of basically storing a command line Still going in a process still going on your EC2 computer and then attaching to it and detaching to it as you want so that doesn't really stop when you Make you could if you wanted to just run it with no hot stuff like that Using something here T-Mux Screen starts me and a Session here like no CD No, but these commands are And then you start it up Cool So now it says it's got a server running All right, and now We just press control B and D. I haven't got those in there and I've detached and I can just exit out of there So this is my local Computer now and then when I want to get back into it I can use this again And then I can get back to that section And it's still there for me. So it hasn't hasn't gone away So that's that's a trap for young players When they close their SSH session, they do give them a book goes away and they're like You know, so they use something like screen or T-Mux to And now just to test that everything's working fine so What you can do now, and I've been using this feature a lot is You just open the terminal now from And internally that's actually using a T-Mux as well So if you want to do values here or do things there that could work out Of course, what's actually going to happen when you do this is It'll probably say something like connection refused Because this is we have to go here and make sure That you've got the security group to let yourself Connect to it via HTTP Okay, so that that doesn't come defaulted the instance you have to actually go and add that But that that'll be what's happening You get to know a lot about these security groups and so on Because Amazon are actually quite good at shutting stuff down Yeah, so you can add another port there and I think they Recommend there's another project which will actually Is meant for more multi-user usage I haven't really looked into that But yes, if you want to go open the notebook to someone else, what you do is you add their IP address here Or do the proper security thing and just give you the password I'd probably actually recommend using the password and filtering the IP addresses to be honest Just because it's such a big security hole having someone being able to Directly get a route shell on your machine I'm surprised that you know, Russian hackers haven't taken over this machine already to be honest Okay, any other questions about just installing Jupiter notebook Imagine if anyone's interested they just go and have a look at the instructions Oh Yeah Yeah, well that would be a bit of configuration you'd have to do set up the demon most most of the stuff Yeah, but you're gonna have to want to put a note up after that as well So that you can actually quite your shell, but the problem with doing that is then you can't stop it explicitly Tim actually just handy to get in there But I I'll be surprised that there isn't a way to set it up So as soon as the incident starts going to do the notebook starts When I had this installed before it actually gave me an access token on that page But for some reason this particular installation from Anna Conviz set up that way Which is why I needed to actually see what the access Oh Well, we'll add that to our list Right So now we're Onto the actual fun bit of the talk where I say let's play around with Jupiter notebook and And look at some data Right. Um, so what I was told I wasn't allowed to use my data set from work Because we're not allowed to tell people that we know how to do science see things at work We keep our tools to ourselves. So What I'm doing is I'm playing around with New Zealand electricity market database, so Put your hand up. You know what New Zealand electricity market is Okay, cool So basically New Zealand when they had the electricity reforms they put in one of the most forward-looking electricity markets in the world and so every half hour on 200 nodes on which are locations in New Zealand over suppliers or Sellers about to pass over supplies or demand about just a There is a price week down Okay, this is all done by solving a big network flow optimization taking into account losses Going from point to point taking into account people's demand Um, for instance wind farms always put in a bid that says they want to sell all of their electricity at zero dollars Um, because you know, they don't sell their electricity. It just gets done And say humbly When Humphill is using cold generation it'll put out a A price for its electricity saying how much it would be to generate a megawatt of electricity from coal What then happens is they work out the demand of each node They work out how much electricity they need supplied Where's the cheapest is going to come from what the transmission muscles are blah blah blah all gets very complicated But then they sell at the marginal price So that means even though the wind farms have said that they're going to supply you at electricity at zero dollars if Humphill does have to come online and It says I'll sell you at a hundred dollars The wind farms will get that hundred dollars too Okay, so that's how electricity prices are set in New Zealand, which is why we get things like in the news made here There was time last one or was it two winters ago where there was this huge spike, you know a hundred times normal prices because You know, basically the hydro like stars and you don't want to sell electricity And this is really expensive. So So New Zealand the electricity market is very complicated. For instance, if you look at Australia, they have five nodes covering the whole country Which means it's quite an interesting problem to look at if you want to look at big data sets um so if you go to The electricity authority, it's got a whole bunch of data sets here going from 1996 October to November 2016 And each of these are in months with data for 200 nodes in the New Zealand electricity market Uh being the final prices, uh every half hour Okay, so medium sized data probably difficult to load all of that onto one machine All right, but you don't really need to Go all out and you know buy a hundred machines and do it on the super cluster um super computer So first, I'll I'll do an example of why putting stuff on s3 and working with pandas is really nice um because pandas will just simply read data from s3 and provided you have that iam role Set up that says I want to let this computer access my s3 data It will automatically access Your private s3 data as well under the same um log, man If you haven't set up that iam role even if it was empty and now you go back and say I want to let it access our history Uh, you will have trouble doing this step Okay, and so, um I've read in the csv you see here. It's got the trading date trading period node price I've asked for the top of it with head if I didn't put that out One of the great advantages of pandas is that it has really nice representations of things Um, so you can really muck around with your bathroom and see what's happening So you see here. It's got all your data from January the 1st, 1997 Down to The students first of January Um There I see you can check this all that data sitting in there heads a lot easier when you're doing the presentation because it doesn't take up so much space Um, there's also the sky Yeah, which Does some nice stuff that shows you how many different prices what the means of all the prices were excited, etc One of the things that you notice here is that's picking up trading period and treating it as a number Whereas it actually should be category Because at 48 half hour blocks during the day There's no real reason why it shouldn't even mean Okay, um Oh the first one, yeah So only first one line road Now i'm pressing shift enter on all of these cells to get them to run Uh, which after you've been using pandas for a day just becomes automatic, but can be quite confusing Um, and like good. Um, otherwise what you can do here is by the default view Um, is you can press never done this run And if something takes a really long time, maybe you've done a lift to infinity Uh, interrupt kernel fear is is quite convenient right, um If you want to just put data here and choose a different type of cell I'm not really going to go into a lot more pandas stuff. Um, jib the notebook stuff It's quite easy just to have a play around with this kind of stuff there Okay, so we've selected here the first cell the other thing you can do here is you can just put up That's how they probably generated this in a window machine because they've all got capitals everywhere all the file lines of capitals and them And you can just get um columns out that way as well as well as those So for those of you that's wondering what's what's going on underneath. Um, it keeps basically a python Um, it keeps one of those um going underneath. So if you declare a variable in there, which I'm going to do next That variable will hang around your session I can come back to it in three days as long as you don't restart your computer I restart the EC2 computer or Do this um restart up here for the kernel All right, so that's actually one of the other reasons why it's handy to have an EC2 instance Which you keep on So I have looked at the starter I can see a few things one trading date should be a date All right, so I've told it to pass the dates and trading date This is a really handy function here because um argument to read csv because it just picks up the date for quite a lot of different formats Like if you have minutes and so on as well. Um, you can tell it what columns it should index in Um, which to be honest sometimes gets to be more of a pain in the arse against work Because So, yeah So now when you look at month data It's now maybe these things into indexes, which makes it great when you want to plot it but when you want to then Access it I can't now go And get all the node data from it because it recognizes it as a variable Uh, which you can still get but it's a pain in the arse So when I want to actually see what all the node data is there I do this little thing called reset index, which puts it back what you still put forward and say Why do you need ones? Okay, so these are all the kind of things you want to do to ensure data hygiene Um, when you're loading in lots of data, you want to just make sure everything's right We've got the range of dates you expect. It's really easy to do in pandas Next thing here is you can do queries Uh, over who to 201 is basically Auckland prices So I can now say get me the Auckland prices for January 2,090 January 1997 Um, and it will go in if filters out and finds All of those prices right Which means we can do this boom and you see that's a Um graph of all of the prices in Auckland for January 1997 Um by our powers Which is a really cool thing that um, and then you can do that for us The only unfortunate thing is you have to remember to put that line up And then you want to actually see these things Okay, um I saw somewhere in Stack Overflow how to get it working automatically. I haven't seen how to do it Does anyone know how to get that run on your laptop? I think maybe in your complex files that you can put it in there Because that's another one of the gotchas especially when you reset the kernel Um, which I'll just Yeah, I won't do it yet. It'll come up and it will just print this bit here Um, and it won't actually print, um, this Now when you actually look have a look at what's happening behind here that what's happening is this um X subplot object here is providing a HTML representation I think that gets picked up by Jupiter the notebook And it means that there's quite a few things you want to do if you want to customize stuff and I've seen people actually doing a javascript pivot table Um, we've actually live shipped around the data and stuff So there's there's a lot of magic going on that here. Um, which makes it very cool Uh, yeah, so panthers provides a clock function here, which Does a lot of the faults that you expected to do first time Um, so the reason I've made these guys indices when I loaded it It was a pain in the ass immediately afterwards Is because it knows that you didn't want to see a plot of the price against it's Um, if you want to you can customize it by doing something mine There's a better way of doing this from the stuff But if you just want to see a trading period against against price You can do it like that. So yeah, so you probably want to sort that or Actually do a colonize. There's a few The important thing is that the plot command actually does a lot Oh Yeah Um, all right, so what's other things you can do you can do aggregation behavior so, um, what I want to do here is I want to take all the data for all time And I want to say Look at it over the trading date. So that means all of the Periods are going to be collapsed I then say I want to take the mean of all of those trading periods And see if we get the major gap so It looks weird here because what is actually done is it's taking the mean of Basically one to 48 And see there's 12.5 quite nicely. That's also also giving you the mean of the actual prices Yeah, so this kind of aggregation stuff is really powerful and You'll want to be using quite a lot of the stuff to do map reduce your content Things, especially if you want to start doing it using on bigger datasets Because this will actually end up coming out automatically, which is a really cool thing. We're going to show you how you want to get to the end Um And you can do You can apply any function there So you can actually supply your own lambda functions there if you want to do something specific to that data And there's a whole bunch of things that you're basically mapping over this group Of course the first thing that people will do is this Try and look at it There are One of the unfortunate things is they don't have a nice representation of these group by objects by default Um, it would be kind of nice if they actually had some sort of table. That's it. We've got a count of six things here One of those words. Oh, yeah so basically If you get where I'll just If you just look at price here It'll actually show you the profiles and maximum of All of those trading periods, which is quite useful Trading periods still sitting there as a category as a as a number I should call them category So there's a whole lot of things we can use to look at data But let's have a look at some more pretty graphs Um, I actually didn't need to do this but quite often Uh, you may find that you need a packet and you haven't got installed on the machine There's a little bit of magic that jupy the notebook does by just putting a Exploitation mark here. It actually just runs that on the command line So quite often you want to install a package. You don't actually have to go back here and go conga install You can just go here and Hit conga install there. You can also use And it's Nice percent percent is not I think that's what you do And that'll actually do the same thing See this is like the period I already had that installed But if you didn't that would install it. It's not like fault in uh in a conga And it needs to get through So here you see now a nice box plot of all of the dates in that month All right, so you can actually get an idea of the distribution of the data process um, you can do other things like maximum instead of the mean and You can decide what Tell it to you specifically So they'll start telling you what the maximum and the current prices Which of those days And one of the really cool things you can do is you can start unstacking stuff um Okay, what does unstack do? If you've looked at the data here you see that node is actually a identifier to each row What happens if I want to turn it to me? I run unstack And it will expand my data and you see if that's you've now got price By every node and you've got it's like a pivot table in excel and then when you go here I've got rid of the legend because there's too many stuff things have been fought You can actually see how price is changing by during Yeah, during the day the month and the trading period By each of the different nodes. So you see they're actually quite highly correlated except for these two dates Um, which would be probably something you'd want to look into A little more interested in the start of the anymore time So any questions about that loading a data set and I'm playing around with it? No Oh So what happens when your data gets bigger? Okay, so The interesting thing here is one of the things that um People get when they start getting bigger datasets is they actually start running out of memory before they're running out from the trading power Um, and that's usually because they say I just want to load all of those 240 files And and see what's going across what's going to happen. Um Now That will be fine because maybe you actually want to see just for Auckland what prices have been for the last 20 years All right, I remember each of those files are all of the nodes per month So to get the Auckland data out you would actually have to run all of those files And then extract the Auckland data out of them and then you're only left with 1 200 for the size So you're back to the size of you know one month But you've got to go and load all those things into memory at the same time Maybe a hassle blah blah blah may block out your computer um But If you've been listening to this talk, you can load this thing for desk um So desk desk probably desk. I hope that happens before I say desk or something um So you've got this, um Um Object here which pretends it's a data frame like a normal pandas data frame, but It allows you to do cool things like I'll hang on you're loading from file But instead of a particular file. I just want to do a glock All right, so this here is going to load all of the 2016 data It's going to allow you to do a lot of the things that you wanted to do before for instance You can't set your index when you're doing a read csv here. There's a kind of a few things you can't do But it does allow you to do a lot of things Oh, oh that was quick Why because there hasn't actually done anything right? so it's lazily doing stuff it will Allow you to work out a data frame from this No So now it's physically going and actually reading in those 12 files And and outputting them here, but it will also well, there's a couple of things going on here when it does this compute Um, it does it in partitions Right, so we'll actually try and conserve your memory Simply simply and to you can actually do stuff on your best data That best data Um, and try and shrink it down into a more reasonable data set even if it starts would be Okay, so what's the example here? I just want to see for Auckland um What the mean is over all of the trading periods, so Basically in a typical typical Auckland day You get your lump in the morning where everyone's making breakfast and you get a lump at night Stuff's happening. People are making dinner doing their washing probably um All right, so that's gone and done that for all of 2016 No, but we want to get bigger And it made it to on one of my On this one computer Okay, so this is this is the Final thing I'm going to do here, and I'm just going to take the time Uh a little bit more of it because I think it's very cool um Distributed will allow you to run Dastard frames and so on on Many computers Really easily um I would have actually had to do this Further up by the way, but let's start a star with this tool called desk is easy to Which you'd have to in school as well I'm not going to go into details here Because the magic is awesome All right, so basically there's this tool here, which will go and fire up by default Four computers, but I'm going to live a little and make it six With some extra stuff here just because ec2 wants me to have a subnet You can see Uh, I'm going to give it that role here so I can access ec2 And I'm going to make it This instance. Okay. I wouldn't actually recommend using t2 instances for clusters Probably because you you're just going to have one work One thing you want the cluster to do and then take it down again So you don't need the burstable ones, but I'm going to put them on here because they're cheaper And this will go and start them all up and provision for me and I started this now because it takes about four minutes There you go, and you see it's starting to get all of these instances right Is there anything special to allow your first instance to be able to create new instances? Yeah, inside your first instance you may know Yeah, so that's why you need the IAM role as well. If you have a look here I have a Jupyter notebook IAM role And the good thing about these is you can make them well within running, but this has got uh, full access to ec2 And it's also Yeah, and directly down to my credit card Um, sorry, I'm sure they're trying these hacks in the American Hex as well. Um, okay So, yeah, so that's got them got them all up and running now and is now going to install Basically inacom the piping distribution on each of those workers because you want them to do piping stuff, right? And that's what takes the most time In the meantime, I'm just going to tell it to use my local machine Once I get the IP address for the pivoted cluster, I'll pop it in here And I'm going to tell it okay just to read in this 2016 data Because I want to show you year by year here, so I'm going to start two years Okay, and basically what I want to do here Is I want to take all of this data. I want to group it by the trading period And find the meaning of meaning of it. So Exactly what we did here But now I want to do it over two years instead of one Okay, and it's said, okay, I haven't appeared at this yet But one of the really cute things you can do here is oh, what's going to go happen when I do this so, um This here shows all the computations that would happen when I want that calculation So I really think that's all but let's make it a bit smaller Six and we'll get all of So we'll just do it for three months in this case And you see here Two months, sorry because we don't have December 2016 Um Dask is automatically generating this task network Which means it knows it can run these two things here on the left mode at the same time All right So that that's kind of what you're getting for free for Dask. It's it's actually working out How things work and also it has really smart algorithms to say if you're running this on Computer, which one how to keep the data local and stuff So if it knows these two things have to come together at some point we'll try and run them on the same computer So it doesn't have to move the data around a lot It's Really quite impressive what it does Um, just running this on my machine Of course, it works it out because I only see that for two months of data Still waiting to that to install That took a while So the next thing I'm going to want to do Is when I do load everything I'm going to say all of 2000 and because When you're working on these things, it's always easiest to do it. I'm just going to say for two years I want to then Take these years these dates here and make them into an actual gear column All right um And then what I'm going to do is I'm going to take that year and unstack it and pop it It's got that Um, I've only spun out 10 at the moment, but I'm probably gonna actually have to start across with like 50 Because So you see that kind of takes a long time at the moment, uh, oh, yeah, cool. And if we want to see what that What I've done here is I've extracted the year All right The reason I want to do that is because I then want to unstack it and pop it And so you can see how The years are different and how demand may be changing every year So given that that's probably go down another two minutes To run before the cluster up, it takes about two minutes. Does anyone have any questions? Yeah, so, um Handers doesn't need it but gas goes to Excess history. I think it I think it means that because it does this look and function But um dust is actually really quite good because it will actually say you haven't got this Please include it at one point I mean this this visualized thing is actually a pain in the ass when you actually have to install graphers and Anaconda doesn't do a really good job with brass beads apparently Um, so you have to go in the school that from abt But when I tried to run this visualized with the anaconda in school It came up and says oh, I can't do this. It's probably because it's probably the anaconda. Here's the issue And so the error message actually had the um The url of the issue on anaconda So the dust people do kind of Get this stuff up to date Like one of the things in this cluster Um, they changed the server over So Because it's got visualization tool attached to that cluster So they they changed the library it would even if you focus something Happened there and then gas within a couple of days have changed the um gas so So they're actually on top of it, but I guess it's because they're at the top of play machine stack of software Like it does seem to do magic, but there's there's a lot of elephants on the way down and it's one of them has a bit of a Hiccup Yeah, um, I had I haven't used it, but yeah coming back I think gas is is kind of more an abstraction So it tries to take All of the work generating this graph and even the work that's Out of your hands so you don't have to try to get out of And with I think even price start here if you're still quite close to the job, I can use price start is anyone Yeah One of one of the cool things I noticed about gas Is it does um have a delayed Object that you can start adding to your own library So if you've ever used a mocking platform or so, we get an idea of what the gas delayed object has basically it says You turn in the function and you just say I am delayed And then you start doing if you then pass that into further functions Instead of actually doing anything to the function. So you like summing up the results of delay It will then basically store this and say oh you want the summer And just keep on remembering what you want to do to it and then it can build up a tree like this And then when you say compute it will actually go back and do all of the stuff trying to compute and count memory See you as well. Yay Uh, so I've got that up I now have a cluster of six amazing machines Right, you've got this this nice little interface there to make sure that you're actually doing stuff Um I then say oh where's the head machine and how do I talk to it? You go back up here I love the way shortcuts still work here. So control slash those stuff I say Instead of doing this locally connect to my fabulous machine If you have a look at it, it'll go fly And it says it's got 10 cores And five machines. So unfortunately that ec2 it keeps one machine just as the It doesn't really make sense when you've got small clusters, but you know, if you're running 100 clusters, you want to keep them that separate Um Let's do the stuff Now Previously, I've gone through and computed it and then done these operations All right Now I'm just going to do these operations Why? Yeah I'll pull back to Just doing the same Um Things which remember this here is Oh, let's see. Let's put the name there I don't mind I can do that Oh, unfortunately, I just going to to Get you guys to believe me, but I will So we'll load all 2001 data From We're going from here It's not really what I wanted to do It will show you Feeding all of those file reads Um across the cluster So the fact that this just happens automatically and I've just got one line of code there to tell it to cluster. I think it's amazing Um, but even if you don't have a cluster using gas because it's Okay, sorry, I didn't get that final thing there, but um, I will try and endeavor to work out well It's like the reminds me of Yeah Let's let's test that out You're a fun monster I've been trying to figure out the machines that are in here Okay, these ones. Well, I was just wondering about that This is all machines load And they're actually really good That's where they have a problem Still dead, baby I don't know if it's got all done I'll see you tomorrow Yeah I like the way that it's not too computer It's just there Now it works everyone it works now look look gaze upon its beauty And and then what you must do