 Good morning evening and everything in between as you can tell what we're gonna be talking about is descriptive statistics now I say in Python because we're gonna actually use Three libraries when we're generating our descriptive statistics numpy sci-pi and map plot lib So the entire idea what is descriptive statistics? Before I even think about that think about it like this. We're trying to describe Statistics and okay, we're describing things think about it like how would you describe me? Okay, well, okay If you think about it, what are the qualities that you would use to describe me leave a comment of what you how you would describe it Anyways, the entire idea is think about it. Well, what are my physical qualities? Well, I have brown hair I'm hairy So there's some measures of hairiness that are going on when we're trying to describe an atom There's also, you know, something like my energy, right? I'm using my hands a lot. They're going in and out of frame I'm kind of I've got a Je ne sais quoi about me up this entire idea of describing me is getting a little weird So let's get back to the numbers. Let's say for example. I had 10,000 numbers. Okay, that's a lot of numbers and You know, maybe plotting helps a little bit But it's not really giving me a description on what's going on with my data and that's the entire idea to Descriptive statistics trying to describe our numbers and again, if we think about it, we were talking about how I was hairy and quite energetic Just a second ago. The same kind of thing is what we're looking for when we talk about our numbers now Not hairiness per se, but there are qualities about a Collection of numbers that we would like to know about namely something like say for example, it's location Relative to every all of the data the spread. So how far, you know, does that data differ from each other and the symmetry if we were to Say for example plot this How does it look? Is it a very symmetrical? Place and so that's where we get into a lot of those super simple versions of Things that you can do inside of or the things that you did in elementary school You know, you learn the how to average numbers in elementary school. We now call it mean because Someone but that's where a lot of these Statistical analysis come from they're super basic, but they're meant to tell you something about that data Now like I was saying we're doing this inside of Python and I've already gone ahead and generated a large Function here. So the entire idea is I want to generate data Relative to how far it is From some middle point. Okay, what does that mean? The entire idea is let me go ahead and run this so you can kind of see So from this visualization, I'm generating a large number of numbers and they're centered at some middle point In my case 10 now what that means is if you kind of take a look at this and you count out All of those 10s there are 10 10s because again add as many relative to how Close or far from the 10 you are in this case 10 now if you look Oh, well one's only generating one one There are only in this case five fives and the same thing's happening over on the opposite end So even you know as we increase there's only five 15s There's only two 18s and I've gone ahead and done a nice little visualization a histogram Using PLT matplotlib.hist and so you can see it's just kind of showing me what that data looks like Fair enough. Okay. All right. Now the entire idea. Why did I produce this? Well again now I can also do descriptive statistics to find out a little bit more about my data You can see it's got a slight hair. It's going just slightly, you know over to the or over to the Left side of the 10 and so if I did something like in p dot mean Again, I'm utilizing the numpy Library I'm using the mean function of it and I'm passing it in an array and so I run this and So what we're seeing going on here is roughly speaking The average even though kind of there's more nines that the 20 is you know sticking out pretty hard And so okay roughly speaking my average is 10 point Almost a one almost a one The same kind of thing could go on if I started to skew my data So instead let's say for example instead of 10. I went with 15 So if I ran this you could start to see oh, this is a perfect example So now my data is not leaning in the center, but it's actually starting to move slightly There's a lot more data going on on the right side in this Data set and so you can see that same thing happening in our average It's 12.8 now because again, there's large larger number of large numbers And so it's higher than 10 same thing could happen if say for example, I did it with five oh And see you know, I I'm using a very rudimentary approach here So roughly speaking if you hit past five you're you're just kind of giving you a one now after some point You know it works out in the end Anyways, the entire idea is once again. I could start to use this data and just to get rid of those Histograms for a second I could use that data to say for example look at the median the median in this case is quite low again It's because everything's skewed over or That data is skewed in the opposite direction. So when I everything was generated at the 15 middle point, okay, it's all over there And so we're starting to see that my median is roughly speaking hitting about here So the entire idea to this is this is a way for us to start to describe our data and At least start to plot it out. Now, as you can see the context clues here I've shown you the mean I've shown you the median Well, okay the next one on that agenda Was mode so quite literally, okay? Well, we check out the mode And we get an error, huh? That's because numpy does not have a mode function, but that's actually why for example, I'm utilizing sci-pi now the entire idea is sci-pi is another Scientific calculation library. It's quite popular. It has a lot of statistical analysis is analyses in it and in our case It's very similar to how we were working with the matplotlib library We were using matplotlib.pyplot to plot out our data So in this case we're using sci-pi.stats to get the statistical analysis now for my sake I am just I'm not gonna like shorthand it in any weird abbreviation. I'm just gonna call it stats But I could go stats that mode and for our sake I'll start with the 10 the very you know the basic one that we were working off of I run it and you can see instead of it giving you just a number what it's actually giving you is say for example a nice little object that's containing the mode 10 and How much that was appearing in this case counts so 10 appeared 10 times if I increase that to say 15 You can already kind of guess I should see an array of 15 is the highest number because it appears 15 times and the same thing just going one last time five is appearing Five times which is the most out of all the other numbers, so it's a way for us to at least start to look at the the Again Location of our data now the one thing to think about That I want to kind of jump back to was if we took a look at these plots, right? If I plot out this first Histogram everything's leaning more to our Left side my left side and the same thing was happening when I worked on my 15. It was leaning to the right side and When it was out in the center it was Very much in the center now what we were talking about with this is actually the idea of spread and Symmetry so the entire idea as you can see there's Numbers of ways that we can work off of spread each one of those Operates the same kind of way you can play around with those numbers as well But when we're thinking about symmetry, we're starting to kind of take our mean our location data and our spread data and see whether or not our data is Symmetrical you know is it in our case What we would consider a normal bell curve So the entire idea is as you could see from the sides NumPy does not have anything for this, but we do have again the stats the sci-pi stats library There we go, I want to do it underneath it this time So I'll go ahead and just add in at least a little descriptive text for this so skewness and So instead of stats mode stats skew So again, you know, this is our our tin Going on there and if we run this we're gonna see that the skewness is it's not a perfect zero Again, that's because we're generating. I think one extra 20 there or something of that But you can see it's getting very close to a zero So zero point zero six very close to a zero Now let's see what happens if I say for example use the high or low numbers. So in our case say for example the 15 okay. Well, if I run this again 15 is leaning heavier to the right side You can see my skewness now is a negative zero point four So it's increasing on the negative side now the negative and what? statistics You know instructors and people will say is you know, it's left heavy because it's got this giant left tail going on there That's how you could think about that the same thing it is This generate data will be right heavy because it's got this large right tail. So if it's a positive Number that's very far from zero. It has a long right tail. This thing is what we would consider the tail Now the other approach to this if we were to continue with the slides one more step is something known as kurtosis And so the way to think about it is if we were thinking about skewness, right When it was at a 10 it was very Very much here in the center and then as we started to shift, you know The data it either moved to the left or the right and so the way I want you to think about that That's me working off of like a horizontal axis kurtosis is now thinking about it working off of a Vertical axis and so the entire idea is well how tall is my data or is it very flat almost? Uniform in nature and so if we were to take a look at this same kind of thing is going on there instead of us looking at our skewness we could look at our kurtosis and You've guessed it kurt sis there we go If I run this okay fair enough, we're seeing Actually, let me go back to the 10. That's the one. I want to really focus on So if we look at this kurtosis all right fair enough, it's it is a number It's it's neither high nor low. It's it's a number But what happens if we start to skew that data, right? What if I wanted to have a large number of? Numbers that are very close to 10 and not a lot that are further away from 10 How am I gonna do that quite simply? I'm just gonna cheat and say oh well, you know that number is How far you are? times two You know I played around with it. It kind of works So this is a way to kind of demonstrate the super high peaks and guess what okay? If this is our kurtosis at some sort of the normal level Then what we can see is that when I have a much higher peak going on here That's when I'm almost reaching one, you know a whole Quite a high kurtosis quite a high height the same thing you go on if I drop down So in this case, you know again, there's most of the data centered in the very center If I did say for example 0.5 now, it's gonna error because you can't have a Float for our range So I am going to just convert this into an integer to demonstrate this so again now It's a whole number and so if I visualize this you can see it's not as Uniform sorry, it's not 100% uniform I could drop this down to like I don't know a half if I did something like that now we're okay You can start to see that's super super almost flat every number is appearing quite a large number of times And in fact microtosis for this my height is Very low very like negative 1.1 going on there so again, just to kind of Think about these things. These are approaches to Finding out some more information about your data, and here's how you can use sci-pi and matplotlib numpy to Look at that data