 Good morning evening and everything in between one of the things we've been doing so far is we've been talking about Doing data analysis inside of Python and you know, we've been looking at different libraries to do that So we've talked about numpy, scipy, matplotlib, and that's what we're going to be doing today with pandas So pandas is just another library out there that exists that is used for data analysis Now, why are there so many libraries? Well, if you think about it numpy Inside pi what they're doing is they're sort of just doing the mathematical calculations and matplotlib is just doing the plotting Now the reason why we also have pandas sort of also in our back pocket is There is a another programming language out there. It's known as R. It's not bad. It's not terrible It's it's just another language and one of the things that R does really well is data analysis specifically it uses some object that it refers to as a data frame That's a pretty great object that we'll see today. And so what We're doing is effectively saying well, I would like to do I would like to use sort of a data frame But inside of Python and that's obviously where we start with pandas To at least start even before we get to the data frame. We want to think about our data So to start we have something known in pandas known as the series data type Let's take a look at least How that would be implemented? So I've already imported my pandas and I'll just go ahead and call this x PD dot series Now the entire idea here is that I have Some values now the way I want you to think about these values for a second is Imagine each one of them is associated to a different quality or a different feature So let's say for example where we're gonna show my age. I'm old I'm 35. Okay, then we're going to show my weight. I don't want to deal with that But yes, I am 200 pounds. I don't know the KG for that, but I'm heavy And then we'll go ahead and do I guess one more my height my height I'm roughly speaking about 60 inches right 60, you know roughly speaking 510 Okay 510 5 times 2 I'm 70 inches. Ah, that makes me feel better Anyways, now the entire idea is each one of these values has an associated Kind of title to them and that's where we get into The index parameter now the entire idea to the index parameter is what it's doing is it's a source It's it's attaching a label to our numbers So in our case that first one is referring to my age That second one is referring to my weight and that last one is referring to my height Now if I come in and just do a print on X right Print on X and what we see is that's exactly what's going on my age or sorry that 35 For me is being associated to the age index My weight is being associated to the or sorry my 200 is being associated to the weight index and The 70 is being associated to my height index. I could go in even further and Look at each one of these almost as if we were treating it like a dictionary So I could come in and just put some square brackets on my X and say height Okay, this is great, right? You know, you can do some of the same things that we've done in dictionaries Object we've we've already done this. So where's the benefit? This is where we start to say well Hang on a minute. I'm dealing with multiple Records when I'm doing data analysis. I'm not just dealing with one person Who has all these things? I might be dealing with tons of different people And so I have tons of different records and that's where we get into the idea of the data frame now The entire idea to the data frame is exactly that I'm going to take these series seriesies Sarai the plural of series a series of series And I'm going to now Associate them into one giant almost think of it like its own little tiny spreadsheet I'm just not gonna see it visually Now I'm going to change this just a bit I am going to start by creating a dictionary mostly because This is going to be one of the easiest ways to convert it into the data frame So to start I am going to go ahead and start with that same thing Let's say I have a series of data of People I'm gonna say we have five people and I want to record at their different ages in this case So I was 35. We'll say that I'm dealing with some 21 year old a 25 year old a 45 year old And I don't know someone fresh out of high school and 18 year old. Okay, fair enough Well, that's one entry But we also had some different records that we were going through and the second one was our weight so the same kind of thing will come in I'll go ahead and Just make my height one as well for the sake of simplicity. So again, I was 200 the 22 year old is 35 pounds. I'm just kind of making numbers up as you can imagine the 25 year old. We'll say is 150 45 year old I'll 45 okay, I don't know we'll say is 180 and then the 18 year old fresh out is a hundred and 25 okay that different weights doesn't matter. I'm the fattest one Anyways, I will not at least be the shortest one. I will at least Maintain that there are some people that are shorter than me There are 75 and I don't know 58 how's that? Okay, so I've done a lot of data. I've just built out a dictionary whoop-de-doo But what I can do with this is because I've built a dictionary One of the things that pandas will allow us to do is convert this into a data frame So this is another very common variable name DF data frame And I'll pass it in so data frame and simply put I'm just going to give it D I'm gonna give it my dictionary now the entire idea here is that I could then take that and Don't print D print D F and So that's exactly what you're seeing. I see suddenly my entry for my zeroth person myself 35 year old age 200 pound weight 70 inches in height and Then I've got each one of my different people as well Now there are some things that we can do with this one of the things I'm going to introduce now is this idea of Doing exploratory data analysis in the repel so the repel all this is the repel the console Repels just sort of the acronym for how Python Processes its data, but the reason why we are doing that is because what I can do is Since I'm in here and you can see I've got a little blinking cursor going on here I can type things in here. Literally if I did DF. I'm gonna see the exact same result Okay, that's fine, but maybe I want to do some data analysis. Okay. Well, let's say for example DF age Okay, so I can extract out a specific Feature from each one of my people so suddenly I have all of the ages Well, okay, since I can do that Maybe I can do some different types of analyses and in fact, that's what Pandas will do for us I can come in for example and say let me extract out those ages and then What is the mean of my data? Oh? Okay, so roughly speaking the average age of our our data set is 28.8 you can already imagine I can do the exact same thing with weight and Height Now one of the ones that I like to use very commonly is That will come in and we can also go with describe so describe what it's going to do is it's going to Describe your data so quite literally if we take a look at this it's looking at each one of the features and Then it's just describing All of those basic What's the word I'm looking for all those basic statistics from each one so say for example we have Account of five. We have five entries for each one. What were the averaged? Values of each one. What were those standard deviations? What were the men's maxes and then roughly speaking if we were to divide this into I guess you could call it quarters Here are the quarter sort of ranges for our data So it's some fun ways that we can start using Pandas and we'll actually see in just a bit how we can go a little further