 Lastly, let's find our total emissions for each state. So what we need to do here is within each state, sum up all the emissions from all the facilities within that state. So far we've just been looking at all the facilities collectively, but let's actually tally up the quantities within each state. So in order to do that, let's create a new object called STEM for state emissions. And we'll do, we'll reference our initial dataset and we're gonna use this group by function. So we're gonna say we wanna group by the variable state here. So they're gonna be our groups and within each group we wanna perform whatever comes next. In this case, we wanna sum up the values. When we do this, we retain all of our columns there, but ultimately our total reported emissions are the sum within each state. Also one byproduct of this is that state becomes an index here. So let's reset that. Let's include in here this function reset index and that'll just turn the state back into a normal column. There we go. So now we see state is just a regular column. Okay, so we've done that. We've summed all the total reported direct emissions within each state. Now let's find what the max is. So our max emissions will be the max of STEM total reported direct emissions. And let's see what that max value is. Some really large value. What is this? This is 372 million and some change. Okay, but we don't know what state that belongs to. So now we need to find which row contains this max value. So let's just make some w variable call it f and we'll say STEM total reported direct emissions equals equals max m. Note that I'm making a logical statement here. I'm using this equals equals to say which of these values are equal to max m and what we'll get then is a Boolean variable. In other words, a variable containing only true and false. So it's false everywhere where it doesn't contain max m and then we've got this one true in row 45 or index 45 here, but still that doesn't tell us what state and just tells us the index but we're getting there. So now we just need to query that row. We say STEM of f when you get the row that has the max total reported direct emissions, the state being Texas and there we have our answer. A quicker way to do this would be to say max row, STEM.loc, these are our loc function here of STEM total reported direct emissions and instead of max we're gonna use IDX max. So return the index where max is located, not the max value itself. And I need to have square brackets here. So we can then see what max row is and it reports all the variables, all the variable values for whatever the max is. We see it's Texas here and we could put this into a nice print statement. I could say print the state with the highest facility emissions is and then we say max row, there we go. So we found our answer. All right, so we've covered histograms, mean, median and then ultimately summarizing data within groups. Note that Texas here, it is the largest of the continental United States. And so you might say, well, of course it has the highest total direct emissions. It's a really, really large state, lots of people, lots of industrial activity, makes a lot of sense. If we wanted to normalize this by state size, we could look at the average, what the average is across all the facilities. See if there's any state that's got whose facilities are producing above average emissions, which case we replace the sum here with me. So you can replace the sum with any other functions if you wanna summarize your data differently within groups. Okay, stop there for now.