 Hello everybody, in this lesson, we're going to be taking a look at group by and order by in MySQL. Now when you use the group by clause in MySQL, it's going to group together rows that have the same values in the specified column or columns that you're actually grouping on. Once you group those rows together, you can run something called an aggregate function on those rows. Let's see how this actually works. Let's go ahead and copy this right here, bring that down and let me go back up one. Let's go ahead and write gender right here. Now we want to group on this gender column and we're going to say group by gender. Let's go ahead and run this. We'll see what we get. And so we have male and female. Now we could get the exact same output by saying select distinct gender from this table. What is group by doing that the gender actually isn't doing? Well it's actually rolling up all of these values into these rows. So later when we run aggregate functions like average, min, max, we'll do it based off of these rows and all those rows are rolled up into these two rows and we'll see that in a little bit. Now what if I was to come up here and in this demographics we have a first underscore name. What would happen if I'm selecting the first name but I'm grouping by the gender? Let's go ahead and run this. If we come right down here, we pull this up, you can see that the select list is not in group by clause and contains non aggregated columns. What this means is that when you are selecting a column, if it's not an aggregated column like say average of something, if we're not using the aggregate functions in the select statement, it has to be in the group by these have to match. So this gender has to match this group by if we're not performing an aggregate function on it. Let's go ahead and run this. And now it works properly. Now let's go back up. Let's run this query because I want to select everything again. Well let's say we wanted to take a look at the average ages for gender. So what we're going to do is we're selecting gender. We're also grouping by gender. But what we're going to do is add a comma and we'll say the average that's AVG that stands for average. And then we're going to put in here age. So now this right here is an aggregate function. This does not need to go in the group by, we're just grouping on the gender. And then we're performing this aggregate function or kind of a calculation based off of those grouped rows for gender. So let's go ahead and run this and take a look at the output. So what this is telling me is that for the males, all of the male rows that were grouped, the average age is 41 point, let's say three. And for female, the average age is 38.5. So super quickly, you can tell that the average age of females is lower than the average age of males. Now we'll take a look at aggregate functions more in just a little bit. Let's actually go to a different table. Let's come right down here. We're going to go to the salary table and just select everything for now. Let's go ahead and run this. Now what we're going to actually be grouping on is this occupation right here. Now there's a lot of unique values. It's not as distinct as the gender, which only had two values. You'll notice we do have a few that are the same. We have ones like office manager. So when we come up here, let's say occupation. And of course we need to group by the occupation as well. Now let's run this. You'll notice that office manager only has one row. Let's say we also want to group on the salary. Let's say salary. Now we can group on multiple. So we're going to say salary like this. So we're grouping on the occupation as well as the salary. Now let's run this. You'll notice that we have two rows for office manager. Now this is because this salary and this salary for those two employees are different. We have 50,000 and 60,000 for this. I just wanted to demonstrate that if these had both been 50,000, there would only be one row office manager 50,000. But because this is a unique value different than 50,000, they have their own individual rows, which we would then perform our aggregate calculations on. Let's go and get rid of that because we will not be using that anymore. I just wanted to demonstrate it really quickly. So before we were looking at gender and average age and we're also grouping on the gender, we can perform other aggregate functions as well. Let's take a look at some of those. We could look at the max age as well. The max is going to show us the highest value within each of those groupings. So we have a male and female. The max age for those for the male is 61 and the highest age for the female is 46. We can do the exact same thing exactly. You can say men or the exact opposite thing. We can say the minimum age. So this is going to be the lowest for both the male and the female. Go ahead and run this. Now we have female and male and the minimum age is 29 and 34. And there is one last one that I want to show you, which is count. We're going to do count. Now count is going to count the actual rows within this age column. So if we run this, you'll see that we have four females for count and we have seven males. It's just telling us a count of how many values is in this column when we're actually grouping on the gender. So that's how we can use group by to actually roll up and group all of these similar values within a column or columns and perform our aggregate functions on them. Now let's come down here and what we're going to take a look at is order by. I'm going to say order by. Now let's actually pull in this demographics table right here. We're just going to say select everything and run this really quickly after we add a semicolon. So order by. Order by is going to actually sort the result set in either ascending or descending order. Let's take a look at how this works. At the very end, we could say order by and we could order by the first underscore name. So we're going to take this column, we're going to order all of our rows based off of this one column. Let's go ahead and run this. So it's going to do it based off ascending orders means smallest to largest. Now this is a text column or a character column. So we do it a to Z. So Andy and April all the way down to Tom. Now by default, this is in ASC order ascending order. And if we run this, it's going to be the exact same output. But we can change this to do it the opposite highest to lowest or Z to a by doing descending. So now if we run this, you'll see that goes Tom all the way down to Andy. Now let's take a look at ordering on something like gender and age because we can do both at the same time. So let's order by the gender first. Go ahead and run this. And you'll see that all the females are grouped together and then all the males are grouped together because that's just the order in which it is. But we can do an additional column. We can also do it based off of the age. Let's go ahead and run this. So now within the females, since that came first in our order by, we're ordering by the gender. And then we're also ordering by the age after we've ordered by the gender. So now it's 29 all the way up to 46. Then 34 for males all the way up to 61. Now we can change this just for the age. Let's say we want to do age descending. So gender will stay the same in ascending order. And now age will be in descending order. Let's go ahead and run this. Now female and male stayed the same, but now it starts at the highest down to the lowest. Now this is something that I would absolutely do in real life except sometimes you can make mistakes and sometimes you do the wrong column first. Let's do age and then we'll do gender. Now if we run this, the gender is not going to be used at all. And this is because there are no unique values that are going to be on the same row. So notice all these values are completely unique. So the gender never is actually used to order anything on because if there were things like 34, 34, 34, 34, 34, these would be ordered based off of the gender. But since there's no unique fields, this is really pretty useless. That's why the order of the order by or the columns that you place in the order by are actually quite important. Now the last thing that I want to show you and I'll just go back to gender and age is that you don't actually have to use the column names. We can use the column positions. Now I will preface this by saying I don't recommend doing this, but I sometimes do it in shorthand for just a quick query. If I know the column position and I don't want to write out the whole name. Sometimes I do it, although it's not best practice. But let's take a look at it. So gender is the one, two, three, four, fifth column. I'm going to replace this with five and age is the one, two, three, four column. So these are the positions of the fields but not the names of them. If we run it, we're going to get the exact same output because these represent these columns appropriately. But again, I just don't recommend it. It's kind of a slippery slope that I've fallen down myself many times. And when you get to more advanced SQL and you're creating things like stroke procedures and triggers and all these things, this can actually cause a lot of issues. If you were to add any columns or remove any columns, then you'd be ordering by the wrong column because let's say this last name got removed, we didn't want it for some reason, then the gender is one, two, three, four. Now we're ordering on the wrong column and that would be a big mistake. So just by best practice, it is better to do gender, age, but I just wanted to show you that in case you want to be like me and kind of go down the wrong path. So that is everything we're going to take a look at with group by and order by. In the next lesson, we're going to be taking a look at having versus where.