 So now that we've loaded our data into pandas, one of the things that we typically like to do is we like to do data analysis on it. However, one of the things that we might have to deal with is the fact that in our case, Iris, we have tons of different species. Well, maybe I only want to filter out for a specific species. Say, for example, I only wanted to look at the Satosa species, and I only wanted to do analysis on that. So there's a number, there's a few ways you can do this, right? There's actually two separate ways I'm going to show you today. The first one, before we get there, is I want to at least kind of introduce a particular approach that you can do inside of pandas. Now, what I mean is, let's say for example, I come in print Iris. Now, I've already shown that what we can do is we can extract out all the records from a single feature or column. So in this case, I could print out all of my sepal lengths. What I could also do with this is multiply them by two. And in fact, you'll see what it's basically doing here is it's taking that column and then taking all of the values and all row by row, it's multiplying it by two. Now, it's not changing it in the actual data frame. That's something completely different, but we are able to do operations on the columns on a row by row basis. The reason why this matters is because I can also do, and I said operations, not math, I could do something like equal equal, a comparison. Now, in our case, I would only see all things are false because there's not, I don't believe there's any entries in here where the sepal length is equal to two. But I do know that we have a species feature again, and we also could instead say, well, let me look at when the sepal feature is equal to Satosa. Oh, okay. Now you notice what we got going on here. Those first few entries, again, if you've taken a look at the iris csv file, all of those first few entries are Satosa's. So I'm seeing that these rows are returning true. And then towards the bottom there, those rows are returning false. Okay, now why did I present that as an idea? Well, one of the things that we can do with this is we could just extract out when we see a true statement. And what I mean by that is I'll come in. First, I'm going to do it over here so you can see what I sort of mean. So again, if I did iris species equal equal Satosa, I get this. What I can do with that is I can say iris. Now again, what I'm going to effectively say is filter out all of the entries where this is true or sorry, where this is false. Only provide me where the rows say that this is a true statement. And that's exactly what I'm seeing here. I'm seeing these zeros are referencing. Oh, you're dealing with entry zero. Here's the data for that entry. And I can go on and on. I could do the exact same thing for say, for example, genica. There we are. And the same thing. It's coming in. It's saying what entry in your data frame is virginica. And again, it's it may not be in sequential order that depends on the data set you're working with. But I can extract them out. Okay, fair enough. I could take this now. And I am going to just save them for each one. So Satosa Versa color again, if I can't pronounce it, leave a comment in the comments with how you print. I don't know. Anyways, so I'll just call this one Satosa Versa color and virginica. Okay, and I run this in the same approach. Now that I've done these filtered data frames, I have the same options that I can work off of. So in my case, I could say, for example, do Satosa dot head, Versa color dot head. I can see there's a color head Satosa dot describe. And I'm doing exactly that I'm only getting a description of where these are broken down. So this is one approach, only one. There is another approach using something known as the group by function inside of pandas. The entire idea is what if I wanted to do this, I wanted to do this filter approach on every single group that just happens to be in my data frame. So in this case, I've got my various colors, Satosa's, virginica's, I want to make and filter each one of them, but I don't want to have to do it like this. One of the reasons is what happens if I'm dealing with 50, you know, species. Oh, well, that's a lot of species and I don't want to do that. What we can do is we could come in and I'm going to go ahead and first call this groups. Now again, it's loaded in and the option is to use something called get our group by species. And what we're doing is well, anytime that we see a common or records that have the same species, put them in their own group together. And so what we should be able to do is I run that no errors because I didn't have a print statement or anything. But if I look at my groups, I've got a nice little object going on here. What I can do with this is I can start to do some approaches with it. What I mean by that is I can start to extract out those different groups. So in this case, I'll do it over on the side for a second groups that get group Satosa is again it was grouped by species and one of those groups was the Satosa group. And that's exactly what we're doing. You can see once again, I have a way that I can group these out. So groups dot get group Satosa and I'll just go ahead and do that print statement over here this time. And so again, it just did exactly what it meant to do. So one of the last things I'll kind of present with this is what happens when I create a group. Well, one of the things that I might want to do is I might want to find some data out about each one of the groups. Maybe I want to say what's the average of the what is the average for each one of these things, each one of these features for each group. And yes, we can do the dot describe. But one of the things that we can do is we can also use a for loop. Instead, I'm going to go for species for species name I like that better than species for species data in groups. Now the reason why I did for species name comma data is because when we kind of use the for loop approach that we're about to do, we are actually going to be given for each entry a tuple. First thing the tuple is going to have is as you can guess it here the group's name or the species name. And then the second thing is going to be the data frame itself. I like to call it data or some variant of that like species data maybe another approach that I go with. To at least start I'll start with print species species name. And so that's exactly what we see. Satosa, Varus, Color, Virginica. Awesome. Fantastic. That's exactly what I wanted from those species names. Now, like I said, I want to do some type of analysis on there. Okay, fine. So one of the things I'm going to then do is I'm going to add in some print statements. So let's see, sepal length, just copying those really fast width, pedal length and pedal width. Okay, fair enough. Each one of these is going to be separated by a comma. And again, this is where the data is going to be beautiful because now I have that data. I can do the exact same things that we've been doing with my data analysis thus far with the entire data frame now just condensed down. So sepal length, I think it's Len. I'm going to double check length. Okay, it was length length. And again, I'm just looking at say the average for this. So mean. And I'm going to take that same entry and just update which feature I'm working off of. So I'm looking at the pedal length here. And then finally, my pedal, pedal width. And so again, I take all this, I've got my data sets grouped. And then I have some quick analysis for each one of those data frames, those tinier grouped data frames, extract out the feature I'm looking for. And in this case, show me their mean. And what do you know, that's exactly what I'm getting. So here's sepal for my Satosa, the average sepal length is 5.0. The sepal width is about 3.4 pedal length. So all the same things that we've seen in the past. But now as you can see, it's a way for us to use it through pandas.