 So my name is Julia. I work for a company that does data mining and when I'm lucky I get to use pandas and I fight on When I'm unlucky I get to use Java So pandas is a library for data analysis It was written by a guy called Wes McKinney originally for financial data analysis So a lot of time series analysis and it provides you with some really good data structures and a ton of useful helper functions for data Cleanup for transforming your data for doing statistics on your data And it's especially useful when you combine it with iprithon notebook, which is a web-based notebook I made this presentation in iprithon notebook and make slides now And it's pretty much amazing if you want to know what it looks like it looks like this So I'm going to walk through an example of using pandas to To analyze some data about how many people are biking on the bike paths in Montreal So right now not too many But in the summer there'll be more So the first thing to do is import the data from a CSV So we can use this function called read CSV I got this data from the Montreal Open Data website Don't know about So we tell it the encoding the separator And we get this we get an object called a data frame Which if you know R is like an R data frame if you don't you can think of it as like a database. So there are rows there are columns There's one row for each For each day of the year and there's one column for each bike path. So there's seven bike paths I don't really know what meson of one and meson of two means but They seem to be popular and It's indexed by date And I've told it to parse the dates from the file and to parse them correctly by putting the day first instead of the month Because then an American library Okay, so now we have this this data frame which we've called bike data and We want to plot it. So we do bike data dot plot and we get this beautiful graph, which is a bit noisy But otherwise very pretty So I want to know a little bit more about this data set So I can do describe and it tells me that there's at most 8,000 people And a few is 47, but there's at least 47 every day of the year even in the middle of February I Can also take a slice of this data frame by column and look at just two columns So we can see that these two Columns are really highly correlated right like every time one goes up the other one goes up and I Wanted to figure out why this is so I Decided to look at some weather data So I wrote a little function called get weather data, which goes to weather office dot gc.ca and does a bunch of stuff so I'm not going to go into detail what this does But it says like read the CSV skip the first 16 rows where for some reason there's some metadata This is the index column parse my dates for me drop the columns. I don't need Get rid of some special characters drop some more columns And then concatenate them all together so This was pretty easy to write and if we look at our new data frame We see that we have the temperature the weather and all kinds of fun stuff to play with Cool The only problem is that our break data was every day and this weather data we have every hour, which is great But it's not what we need So because Panthers was written for dealing with time series data. It's really good at this so What we can do is call resample on the temperature column and Say how equals mean so we take the average temperature every day and make this We seem to have lost the right side of my screen, but I promise you right here is a mean temperature column Which contains the mean temperature every day? And If we draw the graph of this What we get is You remember last year in March when I got really warm Do people remember that when it was like 20 degrees? People also went biking then And over here as well there is this temperature spec in April which seems to correspond to this but and then over here like There's this big spike downwards and this has nothing to do with the weather with with the temperature Maybe but like not so much Rain Let's talk about rain So I wrote this is super long one liner Which I'm going to walk through with you So I'm gonna make a rain column in my bike data So you remember how in weather we had a column which was like fog or rain or freezing rain or snow So we take that and then we look at str for some string functions Check to see if it contains rain And then convert that to a one or a zero which I think technically we don't have to do but I wanted to demonstrate the dot map Method because you can put anything in it and it's really powerful And then again resample that every day So what that gives us is the percentage of the day that it was raining So it's 0.5. It was raining for half the day one if it was raining for the whole day And if we plot that we get another nice graph Which says hey if we look at the spike down or it's surprise it was raining and then over And then there's some similar things so this this still isn't perfect But do you believe me that that this is telling us something? Okay And That's pretty much it for me But I hope this is convinced you that it's really easy to use You can download this presentation at that URL And there's a really good book by Wes McKinney who wrote pandas Which has a ton of examples kind of like this except there's more and they're better There's a really really great documentation there is a mailing group where you can mail all your problems about pandas and That's all I have to say. Thank you