 So here we are at the course page, so we're coming to pandas continued. I will open it in a new tab and let's see where we are. So it's right after tidy data. So working with data frames, I will also paste this link in HackMD. Okay. Yeah, so I have Jupyter started here. I will split it out. So I'm just continuing from the bottom here. So yeah, so what we're doing now, we saw the basic idea of the data frame. So why we use it and that it's because once we put stuff in this format, then we have a lot of standard tools we can use. And most of our other work becomes looking up the right pandas methods. So once we have that data frame, what all can we do? So this is sort of a big tour. Well, short tour, but fast tour for half an hour. We will, yeah. So whenever I use pandas and basically always going to the documentation. So I have a general idea of what I need and I have to go there and search for it and then use it. So the things where I'm about to say, I don't remember them off the top of my head, not most of them. And I don't think anyone expects you to either. So keep that in mind for what to focus on here. So here we go. So yesterday we made a data frame from scratch or from some data. So as you can see, we can do them ourselves too. So first we have pandas and date range. So pandas has these different tools for working with dates. Is this going to work? No, because our stuff isn't important because I've restarted the Jupyter notebook. Okay. So here we see we made a date index. So we save that as dates and now we can make a data frame out of that. So basically the point is by passing in arrays and something for an index, you can start making data frames manually. Can you clarify a little bit what that date range is? Because at least I missed what's going on. So this doesn't really matter for anything else. I guess we can see what is in the database. So that's sufficient. We have six dates from 2013. Right, yeah. And it got put in the data frame. And I mean, the only idea here is that if you have some data and you need to assemble something yourself, you can do that. Or here's an example, creating a bunch of data frames out of a dictionary. So data frame, dictionary, these are the column names and these are the values in each column. And if we print it, we see ABCDEB. Yeah. So it's basically the data up there. So this sometimes happens when you aren't given the data frame in a convenient format and you need to assemble it yourself. So we can split up data frames. So here, let's see what this is. So this is a data frame to two. This is taking the first two rows of a data frame. And if we run this whole thing here, well, this is three data frames with the first two rows. It's basically splitting it apart into three pieces and then we can combine them together. They're expecting an array or some sort of list like thing as the first argument. There we go. So we basically split it apart and combined it together again. Okay. We can merge data frames. So okay, here's the problem. So these runner things were from yesterday. So I'm going to have to go rerun those cells. So here's a runner's data and I've re-executed it. You can find it if you scroll up the docs or maybe someone can paste it in HackMD. So now we have this extra thing and this is the ages. So now we have runners and we have ages. And can we combine these together? So quite similarly to what you would do in SQL, you can merge things. So here, runners.merge, so we have the runner's data frame and we have ages. They have the same names of the runners and we're merging on runners. So the runners get matched up and when we run it, now we see there's a new age column here. So that's another important point. So once you have these different data frames with different kinds of... One person is having problems following, I assume others as well, or someone else as well because we're skipping a little bit in the nodes. So just to clarify, we have essentially created two databases which we could, I guess, show they are visible. So there's the age database. Oh, actually the runner's database is not visible. If you scroll up a little bit, there it is. So we have a list of runners and I guess it's not that important what the content is, but there's three runners and then there's times for those for different distances. But it's a database and then we have ages for those runners and we want to add those ages to the same database, right? Yeah, that's what this merge operation is doing. So we process data into two data frames and now we can easily combine them using the merge thing. So once you have stuff in data frames, you can be combining it, you can split it, you can do all kinds of things like this very quickly and here's the merge data frame and it didn't require iterating yourself, it didn't require making dictionaries and looking it up, you've just got it. And whatever your analysis is, you can merge stuff together to make what you need. So we went over group by yesterday. Yeah, so group by, I'm going to rerun my whole notebook to get all these cells back. Yeah, splitting this lesson didn't work very well because we've got now the same thing. Yeah, I'm sorry, this is just not working whilst splitting it over a day didn't work. So group by, let's demonstrate. So we talked about it yesterday. So it's like you want to do something on a data frame and you want to compare for the Titanic data frame of yesterday. We have some children and we want to see what the difference is in survival or something like that is with them. So we copy this. So here we're making a new data frame, a new column in Titanic. So we have age, is age less than 12? I can run that alone and we see false, false, false, false for most of the people. So Titanic. The most travelers are not children but they're still, we will find some of them. Yeah, so now if we look at the Titanic data frame, we see there's a new child, a new column child, which is true for everyone that's under the age of 12. And now what do we do? Now we can group by this. So now we're taking the reported sex of someone and if they were a child or not, we're taking the survived, which is zero or ones and computing the means. So let's see what happens here. Okay, so now we have sex male or female, child, were they a child or not, and the chance they survived. So we see that, well, the saying woman and children first was perhaps there's some evidence to support that back during the times of the Titanic. Children had a higher survivability and women had a higher survivability. And the difference between men and women, if adults is surprisingly small, but for children, that's a very big difference. I assume it's just small numbers for the male and female children. So now we come to the difficult thing. We have, sorry, I was reading it the wrong way around. So now we have a little bit more exercise time. So with half an hour, there's no way that this whole lesson can be done. But that's for you. That's why it's available. So we will have some time to work on this next set of exercises. So what you should do, basically explore what we did in your own time and see if you can do a little bit more. And then we can have, say, 10 minutes for this. And then we'll come back and talk a little bit and summarize what you can see in the rest of the lesson. Okay, let's go. I'm writing this here. Let's switch to HackMD, go down to the exercise and continue yourself. See you in a bit. Bye. Hello. Yes, I'll share the screen. So yeah, hopefully you had enough time to do something in this exercise. There's a good point that I misinterpreted the data I showed. And in fact, female children had less survival than adult women. Yeah. Okay, so we have just five more minutes left. We took a little bit of time. So I'm going to show this time series things. So dealing with dates and times can be one of the most annoying parts of work. With data and pandas can make that a lot easier. So what I propose is that instead of trying to type along, you watch me because this will go a bit quickly. So in order to get the CSV file, we can't pandas read CSV with the URL doesn't work because they've blocked it. So if I do from file open from URL and I paste it here, it opens the CSV file, but also it also downloads it for me. So now I can open the CSV file and it's information on noble prizes. So we see here, there's dates people were born. And if we do info, we see that the dates are objects. So they're just strings basically. That's not very convenient. So pandas has these functions to date time, which is sort of magic. So it basically says figure out what the format is and convert it for me. So let's run this. So here we're taking the born column, we're converting it to a date time and we're saving it as born again. And then let's look at it. So now we see born and died are date time 64s. So they're actually considered dates now. And gear is also a date. So let's look at it and see what the difference is. Well, it looks the same. But what do we get now? So using this DT attribute, we can do, for example, noble born dot DT dot day. Well, you may spell born. And we see the day they were born or month or year. So here using these, we can do math on them. So this is taking the died column minus the born column, using the DT accessable to see how many days that is and divide it by 365. So let's take a look. So here we see the number of years they've lived and it goes on. So here now we're making a box plot of it. So based on the field, we can see the average amount of time each winner has lived. And now this is basically where we have to stop. So there's more exercises and a little bit more below that you can do yourself. But we just don't have time. There is too much here. So what are we going to do now? So now, Johan is here and we'll talk about visualization. So the plot we just saw is a good transition to that. I'd really recommend going and reading a pandas, like it's quick start guide and all because there's just so much there. And I've had to read these guides so many times to be able to follow. But I hope that you're inspired to see why this is a useful thing to read or to learn about. Like NumPy, pandas can do a huge amount of different things. So it's always worth checking all the options, looking at the documentation. But also you just can't know everything. Basically, when you know what you want to do, that's the time when it's the best to check how pandas works for that purpose. Yeah, okay.