 Hello everybody. Today we're going to be building visualizations in pandas. In this video we'll look at how we can build visualizations like line plots, scatter plots, bar charts, histograms and more. I'll also show you some of the ways that you can customize these visualizations to make them just a little bit better. With that being said let's go right over here start importing our libraries and we'll start with importing pandas as pd and this one is really all you need to actually create the visualizations in pandas but we may get a little bit crazy and so we're going to do a few different ones as well like import numpy as np and then we're going to do import matplotlib.pyplot as plt. Now I may or may not use this I just you know when I get into visualizations I may want to change some different things so we're going to at least have them here in case we do want to use them. Let's go ahead and run this. So now let's get our data set that we're going to be using so let's say data frames equal to pd.read underscore csv and hit this in right here. Now we're going to be doing these ice cream ratings let's take a look at this really quickly. Now these values are completely randomly generated they're not real in any way but that's what we're going to be using because I just wanted something kind of generic something that wouldn't be too crazy confusing just something that we could use and you guys can understand that there's just numerical values but let's also set that index really quick so we'll say data frame dot set underscore index and then we'll say date and then we'll say that's equal to the data frame and we have this date column right here as our index so we have January 1st, 2nd, 3rd, 4th and then we have our ratings right here and again these are all just integers and they're pretty easy or are really easy to demonstrate how you can visualize these so that's why we're using it today. So the way that we visualize something in pandas is we use something called plot so let's just take our data frame we'll do data frame dot plot and we'll do our parentheses now let's go in here really quickly let's hit shift tab and this is going to come up and this is pretty important because this kind of is going to tell us what we can do within this plot and unfortunately there isn't like a quick overview we just have this dock string but we have our parameters right here these are what we can pass in to kind of customize our visualization so the data is going to be our data frame then we have our x and y labels we can specify the kind and this one's important because we can specify what kind of visualization do we want we can do a line plot horizontal a vertical bar plot histogram box plot and then a few others including area pie density all these other things we can also specify if we want it to be a subplot and a lot of these things that i'm specifying you know i'm going to show you how to do you can use different indexes you can add titles add grids legends styles all these different things i mean you can go through here because there are a lot but you can specify and you know customize all of these things we won't be going to all of them but i will show you some of the ones that i probably use the most and that i think are the most useful to know right away so let's get out of here and we're just going to do df dot plot and when we run this we'll get this right here and that was super super easy created a line plot by literally doing just about nothing but by default it's going to give us a line plot so if we come up here when we say kind and let me get that out of the way it's equal to line and we run this so by default without us actually having to input anything it's giving us that line plot as a default so we can specify that's a line plot as you can see we already have all of our data right here we didn't have to specify anything it kind of automatically took it in it is visualizing all three of these columns and it has this little legend right here and we can specify where we want that there is an argument to be able to do that it also gave us these tick marks of two four six eight ten again it read in and said it's only going from zero point zero to one point zero that is kind of the peak and so it kind of automatically gave us these ticks for us again that's another thing that you can specify we make it go up to two five ten a thousand whatever you want it to be and then we're doing this based off of this date value right here really quickly i wanted to give a huge shout to the sponsor of this entire panda series and that is you to me you to me has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that i would recommend it's going to teach you just about everything you need to know about pandas so huge shout to you to me for sponsoring this panda series and let's get back to the video if we wanted to break these out by the actual column we could go in here and say subplot is equal to true and it's actually subplots whoops and now we can run that and then we can see each of those columns being broken out by themselves instead of them all being in one visualization it's now three separate visualizations let's go right over here we can get rid of the subplots i want to show you just some of the different arguments that you can use to make this look nice because i don't want to do this on every single visualization i just want to show you what you can do so we have this one right here we can add a title notice there's no title or anything really telling us what that is so we can say comma title and we'll say ice cream ratings if we run this we now have this nice title right here now we can also customize the labels and the titles for the x and y axis it automatically took this date which is right here this is our date index it automatically took that for us but we can customize that if we'd like to all we have to do is comma then we'll say x label is equal to and so our x is this date one right here and we can say daily rating and then we can do the y label we'll say y label is equal to and for this one we can say scores i hope you cannot hear my dog in the background because they are being insane but let's go ahead and run this and now we have these daily ratings on the x axis and on the y axis we have scores now let's go right down here and start taking a look at our next kind of visualization which is going to be a bar plot so we'll do df dot plot we'll do kind is equal to and for this one we're gonna say bar now this is what your typical bar plot will look like and a lot of the arguments that we just did on the line plot you can also apply to this bar plot something that's unique to the bar plot is that you can also make it a stacked bar plot all we have to do is go in here we'll say comma and we'll say stacked is equal to true and now it's going to make it a stacked bar chart instead of just you know your regular bar chart let's go ahead and run this and as you can see this is now stacked on top of one another with each of these columns all representing the values that they have now we don't always have to do every single column we can also specify the column that we want so let's take the flavor rating for example I could do flavor oops flavor rating good night flavor rating then it's only going to take in that flavor rating column and if you notice we don't have a legend that's only when you have multiple values which we are only looking at this one column so all the values are right here now in this bar chart it automatically defaults to a vertical bar chart but you can change it to a horizontal bar chart let's go ahead and take a look at how to do that bring back all of them we'll do df dot plot dot and then we'll say bar h and I don't know if I can keep in that kind equals bar let me run this yeah I need to get rid of that because the bar dot h is its own um this is its own function so now I'm going to run this it should just have a stacked bar chart except now it should be horizontal so now you can see this worked properly it's basically the exact same thing as a vertical bar chart just now horizontal which may look better especially depending on if you have values like this or you know something else that just looks better being horizontal now the next one that we're going to take a look at is the scatter plot so we're going to say df dot plot dot scatter and if we run this we're going to get an error what we need in order to run this properly is we need to specify the x and the y axis in order for this scatter plot to work so let's go here and we'll say x is equal to and we can take any of our columns that we have up here so we'll say x is equal to texture rating and then y is equal to we'll do overall rating now when we run this it should work properly let's go ahead and take a look now if we go in here and we do shift tab we can also see some other things that we can specify though let's go right down here so we have our x and we have our y and those are the ones that we just did we can also pass through an s which is going to tell us or change the size of the actual dots right here in our scatter plot then we can also do a c which is the color of each point let's start with the s let's say s is equal to let's just do 100 we'll see what that looks like so we have a much larger number let's do 500 and see what that looks like so we can make these much larger on our visualization depending on what you're looking for we can also look at the color let's put comma c so for color we can say color is equal to and let's do yellow let's see if this works so now we've changed it to yellow that looks absolutely terrible but it does work now let's move on to the histogram histogram is always a good one it's very similar to something like a bar chart but what's great about a histogram is you can specify the bins so let's go ahead and say df dot plot dot hist then we'll do an open parentheses and let's go ahead and hit shift tab in here take a look at this one as well so some of our parameters are the actual columns of the data frames that we want to pull in we can choose the bins and they have a default of 10 in here and so let's take a look at how this works so we'll just run this as it is so this is by default what this histogram is going to look like let's go ahead and specify our bins we'll just say it was 10 by default let's just do 20 see what that looks like there are smaller columns right off the bat and remember histograms are really good for showing distribution of variables you know that's really what a histogram is for but of course since these are completely random numbers this histogram isn't going to make any sense at all but you can at least kind of see visually how it works and if I didn't mention it before which I should have the bins represent how many kind of tick marks are down here so if we just do one only going to be one very large you know histogram we could even go further down from 10 and do five so now there's only one two three four five so the distribution gets smaller and things get more compact as you spread it out again like we did a hundred it's going to spread it out a lot and this is what it shows you know it's showing the distribution of those bins across however many you want so the 10 by default you know it usually is pretty good for a lot of different things now let's go down here and look at the box plot and the box plot is a pretty interesting one let's go ahead and visualize it really quickly and then I'll kind of explain how this one works let's do df.box plot let's run this and really what we're looking at is some different markers within our data this line right here is the minimum value within that column we also have the bottom of the box which is the 25th percentile of all the values within just this column this is 50 percent then we have 75 percent and then up here we have our maximum value so I can take a glance at this and see that we have a low minimum a high maximum and it definitely skews towards the lower range whereas if I look over here we have a lower minimum and a higher maximum and you can see that this mid-medium point is at 0.6 versus 0.4 over here so this skews a lot higher now let's go down here and take a look at an area plot we'll do df.plot.area and let's just run this this is what we're going to get by default now something I wanted to show you earlier I just haven't gotten around to I want to show you something called figure size or fig size so for this it's you know it's just looks small looks a little bit cramped let's say we want to increase the size of this and we'll say fig size oops fig size is equal to and let's just do parentheses and say 10 comma 5 that should be pretty large this is going to make it a lot larger just something I wanted to throw in there but I look at these area charts as pretty similar to like a line chart if we went and compared those should be pretty similar but they're different visually and you know you absolutely can use these for different types of visualizations but I don't use this one a lot if I'm being honest that's why it's kind of towards the end of the video but you definitely can do it well let's go on to our very last one of the video that's going to be the beautiful pie chart let's say df.plot.pi we'll do an open parentheses and let's run it we're going to get this error that's because we need to specify what column we're working with here so let's just say the y and that's what we need let me open this up for us right here we have our y and this is our our label or a column that we're going to plot that's really all we need we can just say y is equal to labor rating oops labor rating and let's run this now we get this visualization right here let's make this one a little bit bigger big size is equal to 10 comma 6 now it's a little bit bigger it definitely depends so this legend is gonna auto populate you know you can make this as big as you want and obviously it's going to look a little bit better if you do it larger and these colors auto populate now you can customize these colors although I found these ones to be just when you have a lot of them it's harder to customize them as easily but you know definitely look into it these are things that everything in here is almost something that you can customize in some way although it does get a little bit tricky you definitely have to do some research and some googling around just to kind of figure out how to do those things now one last thing that I wanted to show and something you know I could have probably done at the beginning is you can actually change what visual this is and we can do that pretty easily within matplotlib there are different styles and so let's go right here let's add a new row a new cell and we'll say print we'll do plt so that's that matplotlib right here we'll do plt dot style dot available and what this is going to do whoops what this is going to do is show us all these different types of stylings that you can do to kind of change up this visualization and then once we find the one that we like we'll just do plt dot style dot use and then in the parentheses we'll just specify which one we want now there's all these seaborne ones and seaborne is a really great um really great library let's try seaborne deep I haven't tried this one at all let's go ahead and try this and just change just some of the colors some of the visuals we can try something like 538 try this that looks quite a bit different and let's try something like um classic I don't know what this one looks like let's just try it so you can try out all these different styles find one that you'd like find one that you think looks really nice and you can run with it through all your visualizations so this has been our video on visualizing data in pandas I think is a really good introduction on how you can visualize data within python and in future videos we'll look at matplotlib and seaborne which are some really great libraries for visualizing data which I use a lot so I hope that you enjoyed this video if you did be sure to check out all my other videos on python and pandas and I will see you in the next video