 Hello everybody, today we're going to be taking a look at the group by function and aggregating within pandas group by is going to group together the values in a column and display them all on the same row. And this allows you to perform aggregate functions on those groupings. So let's start reading in our data and take a look. So we're going to do import pandas as PD. And then we're going to say our data frame is equal to and we'll say PD dot read underscore CSV. Do an open parentheses are and our file path and we're going to be looking at the flavors CSV right here. So right here we have our flavor of ice cream. We have our base flavor whether it was vanilla or chocolate. Whether I liked it or not the flavor rating texture rating and its overall or its total rating. Now these are all my own personal scores. So, you know, I've spent years researching this. So these are all very accurate, but this should be a low stress environment to learn group by and the aggregate functions. So the first thing that we can do is look at our group by now you can't group by while you can you can group by flavor. But as you can see, these are all unique values. What we need is something that has duplicate values or or similar values on different rows that all group together. So this base flavor is actually a perfect one to group it on and we'll do that by saying DF dot group by do an open parentheses and we'll just specify base flavor. And this will then group together those values and I need to make sure I can spell properly. This will group those flavors together. So let's run this. And as you can see, it actually is its own object. So it has a group by data frame group by object. So now that we've grouped them. Let's give it a variable. So we'll say group underscore by underscore frame. So it's equal to copy this, we'll run it. And now what we need to do is run our aggregations in order to get an output. So we're gonna say dot mean. And that's all we're going to put just for now, just to get an output that we can take a look off and then we'll build from there. So let's go ahead and run this. And right here, we have our base flavor, which is now saying is the index of chocolate or vanilla, and then it's taking the mean or the average of all the columns that have integers. Notice that it did not take the liked column and it did not take the flavor column, because those are strings and they cannot aggregate those and we'll take a look at that later. But it took all the values that have integers and then it gave us the average of those ratings. Really quickly, I wanted to give a huge shout out to the sponsor of this entire Panda series and that is Udemy. Udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses. If you want to master pandas, this is the course that I would recommend. It's going to teach you just about everything you need to know about pandas. So huge shout out to Udemy for sponsoring this panda series. And let's get back to the video. So right off the bat, as averages with chocolate, I have a much higher rating overall than the ones with vanilla bases. Now, we can actually combine all of this together into one line. We can do something like this. So we'll say df.groupby, we'll say dot mean, just like this. And this will actually run it before we didn't have any aggregating function on there. So it didn't run. But now that we combine it all into one, it will run properly. Now there are a lot of different aggregate functions, but I'm going to show you some of the most popular ones and the most common ones that you will see. So let's copy this right here. So we can do dot count. And when we run this, we can look at the count. And this will show us the actual count of the rows that were aggregated. So for chocolate, we have three. So there's going to be three all the way across. And for vanilla, we had six. So we're looking at a higher count of vanilla, which if you're comparing it to this mean up here, that could be a big skew towards the chocolate, because if you have one or two good chocolates, it could really pull the numbers up. Whereas if you had two good vanilla's, but the all the other ones are bad, it pulls that average down. So knowing the count of something is really good. Let's take a look at the next one. And we can do min and max. And I'll just run these really quickly. We can do min. And when we run this, the first thing that you should notice is that it now has a flavor and a light column. And that's because min and max will actually look at the first letter in the string or the first set of letters. If there are, you know, chocolate something, it'll look at the first and then it'll actually populate it. So chocolate with the CH chocolate is the very first or the minimum value for that string and for a cake batter, that is the minimum value in vanilla as well. Now with the light, it's interesting because apparently I liked all the chocolate ones. I'm going to go take a look. So chocolate, I liked chocolate, I like chocolate, I like. So there is no no option in this light column. So yes, was the only option. And now let's look at max. And it should do the exact opposite, which is going to take the highest value, even if it's a string. So Rocky Road, the letter R comes later in the alphabet. So that's what it's looking at. And so does vanilla. And then we have yes as well. And then of course, right here, it's taking the max value. So before when we were looking at men, I just focused on those, but it still does the exact same thing to these integer columns as well. So for the max value for vanilla, it was mint chocolate chip. That was our base. So I had a rating of 10 for this vanilla row or grouping. And then we can also look at the some And there are all the sums for these. And again, it only does integer because we can't add the strings here are the sum or the total values for all of them. And for the total values, since we had, you know, six rows that were grouping into this vanilla, we now have a lot or a much higher score for vanilla. Now that's a really simple way to do your aggregations, but there is actually an aggregation function. And let's take a look at this, because this is a little bit more complex, although when I write it out or show you, hopefully it makes a lot of sense. We can do dot a GG. So this is our aggregate function. And what we need to pass into our aggregate function is actually a dictionary. So let's do an open parentheses, and we're going to do a school wiggly bracket. And then we need to specify what we're going to be aggregating on or what column. Let's do this flavor rating. Let's copy this. We'll do flavor rating. And I need to put that as a string. And then we'll do a colon. And now we can specify what aggregate functions we want. So we've done some count mean min and max all of those. And we can actually put all of those into here and perform all of those aggregations on just one column. So let's make a list. And then let's say mean max count. And what's another one, some, let's do all four of those only on this flavor rating column. And when we run this, we have our base flavor right here, chocolate and vanilla. But now we don't have multiple columns. We have one column with multiple columns of our aggregations. And it is possible to pass in multiple columns like that. So we'll do extra rating. And we'll just come right here and do a comma. Then we'll say a texture rating. And then a colon, I don't know why I spelled it out when I copied it, but I did. And then we'll do the exact same ones. And now when we run it, we're getting the exact same columns mean max count and sum for flavor rating, then mean max count and sum for our texture rating. Now, so far, we've only grouped on one column, but we can actually group on multiple columns. Let's go back up here to our data. I should have just copied this down here. Let's go back down and just look at this. So really, we only grouped it on this base flavor, but you can do multiple groupings or group by multiple columns. So let's do our base flavor, which we did already as well as the light column. So we're going to say df dot group by then we'll do an open parentheses. And then instead of just passing through one string, we're going to do a list. And we'll say base flavor. Oops, comma, and then we'll do liked. So now when it groups this, it should put two groupings. And let's run this and just see. Oops, I got to say let's just do dot mean. So now we have our chocolate and a vanilla. And remember chocolate only had yes. So that's the only one that's going to group on. But vanilla had a no and a yes. So if we look at the vanilla, we have our base flavor vanilla and then within liked, we have no and a yes, which can show us that within our vanilla when we group on these our nose were really low, but our yeses were really high. And we actually had a pretty similar rating or very close to the same rating at the same time. As the ones we really liked and chocolate. And just like we did above, we can take this dot ag. And I'm going to copy this. And it'll perform it on each of those rows. Let me close that. And what did I do wrong? Oh, I need the squiggly bracket. And it'll show us each of those. So the mean max count and some for all of the chocolate and vanilla, as well as the groupings of liked yes and no. Now, after we've looked at all that. And that's how I usually do it. There is one shortcut function that can give you some of these things just really quickly. And so let's go back up here and take this. It's just called describe. And if you've ever done it, it's just going to give you some high level overview of some of those different aggregations. So let's run this. And it's going to give us our chocolate and vanilla. And within each column, it's going to give us our count, our mean, our standard deviation, I believe is what that is. Our minimum 25%, 50, 75, and 100, which is our max, then our count and our mean. So a lot of those aggregate functions. But the describe is, you know, a very generalized function. We can't get as specific as we were with the previous ones that we were looking at. But I just wanted to throw this out there in case this is something that you'd be interested in, because it, you know, technically is showing a lot of those aggregate functions just, you know, all at one time. So that is our group buying and aggregate functions within pandas. I hope that that was helpful. I hope that you understood, you know, everything that we were working on. If you liked this video, be sure to like and subscribe and check out all my other videos on Python as well as pandas. I will see you in the next video.