 another tutorial on Plotly using Python. So here we are in a Jupyter notebook. I'm going to execute this first cell as per usual importing my cascading style sheet so that we have a bit of a better looking notebook here. So distribution plots. Let's start off by just importing our Plotly library as per usual. I'm importing iPlock because I want to plot right inside of my notebook. And I've got to initialize this notebook mode. So let's do that. And then I'm going to import the high level graph objects as GO. So import plotly.graph underscore OBJS for objects as GO. But something new. I'm also going to import the figure factory as FF. Let's do that. Let's create a bit of data for us for working within this notebook. So I'm going to import the random module and I'm going to import numpy using the standard abbreviation NP. And I'm also going to seed the pseudo random number generator. So let's create three computer variables. I'm going to call them age, salary and binary gender. So for age we're going to do a uniform distribution. So np.random.uniform. The low must be 21. The high 75. And I want a thousand data point values in that uniform distribution and that domain from 21 through 75. Next the salary I'm taking from a normal distribution. And you see the three arguments here. LOC means the mean. Scale means the standard deviation and size is the number of values I want. So from a mean of 3,000 with a standard deviation of a thousand, I want a thousand data point values from this normal distribution. And lastly from the random module, we're just going to do random dot choices and purely to keep things easy here and easy to explain at least, we're going to stick with this binary distribution, binary values sample space for my gender variable here. So only female amount just to make things easy and a thousand of those please. I'm going to import pandas because I just want to create a data frame. And here we have the computer variable df. And I'm going to do a pandas dot data frame. And I created by key value pairs for a with a Python dictionary. So I have age as my column header. And then the age variable that we created there, salary was salary and gender with a binary gender. And I'm just going to create two sub data frames there. So if we look down the gender column only include female. And down the gender column only include male. I'm going to call those sub data frames female and male. So let's run that. And then just let's look at the first five rows. So we just call the head function a female dot head method there. And we have the first five rows, we see the age there. And then the gender column will only find females there and we find the salary column there. So you can see with these pandas data frames, they actually look like this flat spreadsheet kind of files. And let's look at the last five rows of the male. And again, just to make sure the gender column will only contain males. So let's create our first bare bones distribution plot. And the way that we're going to do that is to create a computer variable called fig. And that is going to be ff for our figure factory there dot one of the methods there is the create this plot. And it takes a couple of arguments. The first one is his data. And that's the data we want in this distribution plot. Now this this distribution plots going to look like histograms nothing other than a special kind of histogram. So we've got to give it a list to work with. And what we're going to do is take the whole data frame, go down the salary column. And we say the values in that and then to list the two list function there we call on this on the values of the salary because we just want to create this this list to work with. And you see it is there inside of the square brackets, the group labels, while we're only going to do one group here. And I'm just going to call it salary salary distribution. And as with a histogram, we have to have a bin size. And for each, we have to have its own bin size, but we only have one here. So in our list, we'll only have 200. So that might not make a lot of sense until we actually see what a distribution plot looks like. So let's run that. And there we have our distribution plot. We have the nice histogram down the bottom here. And indeed, the bin size is 200. And we also see this kernel estimate here, kernel density estimate as it tries to draw this this distribution line here. And there is our group label, we only have, we're only plotting one thing. And that's the salary distribution. So there we go. This is called the rug plot underneath. And each of these little vertical lines is actually one of our salary values. And you can see the distribution. You can also see that we took this from a normal distribution. And you can see the Gaussian type Bell shaped, at least that it attempts to take there. So let's just add a title. And we're going to do that by just using one of the ways to do it at least. And that is just to call fig layout dot update. So I've created my figure, this is above. And I'm going to update the layout. So just another way of doing it, instead of doing it via dictionary, as you've seen before, and I'm just going to add a title. And that title is just going to be salary distribution. There we go. We've got a beautiful title up above. So that's not too much fun. Let's just create two datasets. So now my his data, I'm going to make a list of those. So this is so many ways of doing things in plotly. And you might find that confusing to start off with, but it also creates a lot of power. And you can find the way that works for you. So here I'm going to take his data, create a computer variable. And I'm going to pass a list of values. The first list, I'm going to take the female subterfame, the salary column, the values in that column, and then create a list of that. So the two lists there. And then same for the male, my group labels are now going to be female salary and male salary. And now I'm going to create my fig. And let's create this plot, the his data. I just passed the his data there. So I'm not saying his data equals his data, because these are just keywords, the normal standard keywords that we actually don't have to use them. And then group labels is going to be that list. And then my bin size, I want 200 and 200. So the same bin size for each, which means you can make the bin sizes different for each of those. Let's do that. I plot. And now we can see we have male salary in orange, then female salary in this blueish color. And you see the rug plot for each of those beautifully done. Let's change the colors of this. So everything exactly the same, but I'm going to bring in a new argument to my create this plot here. And colors, I'm going to do an RGB with, with the opacity here and 0.8 and 0.8 for the opacity. You can see 2020, 20. So that's going to be very dark green, 151, 151, 50. So it's sort of a middle gray color. Let's run that and have a look. And there we go. You can see the light color for male, the darker color for female there. And because we set the opacity, so you can actually see the one shine through the other. Now instead of this kernel density estimate plot here, we can actually just use the data that comes out of that, create a mean and a standard deviation, so that we can create this normal distribution as, as instead of this kernel density estimate that we see there. We again have our his data, our group labels, we create this plot, we have the his data, the group labels, the bin sizes, but now the curve type is new. It's a new argument and we're just going to set it to normal. And here's just one other way that we can update this layout or create this layout. So I'm going to call fig dot layouts instead of inverted in the quotation marks and the square brackets. I'm just calling dot layout and dot update. And I'm passing, yeah, I'm passing this dictionary to it. So key value pair, the title fitted. So just another way. It just makes it so powerful and easy to use. You can use whatever way fits you. So now we can see this normal curve that it took from the data, just doing the mean and the standard deviation so that we can draw this normal distribution here. And you can see the two values there for male and female. So in case you want to omit some things, there's three things here. That's our curve, our histogram and our rug plot. So we're going to omit a few things. So we're going to say show histogram as being false and show the rug plot also as false. Everything else, exactly the same, except that we've added an X axis inside of our update to our layout here as a key value pair, the key being X axis, the value being another dictionary and that dictionary having two key value pairs title being salary and the domain being 1000 to 5000. So we can even bring that in. And there we go. We just have these two very nice smooth curves there. So you can see with this distribution plot you can do so much and you can well imagine some data that will look beautifully if represented with these distribution plots. I'll see you in the next tutorial.