 Now this is a very exciting lecture, very exciting notebook that we're going to look at and we're going to visualize data. This is one of the richest things we can do with data as human beings, we can look at data and understand so much of it if those plots and graphs are correctly created though. Now there are many plotting packages in Python, we really have this embarrassment of riches, matplotlib, perhaps the biggest of all of them, we also get seaborne for statistical plots, many others. The one that we're going to look at is called plotly. Plotly, plotly, you can go on their website and name more about all of the graphs that you can do with plotlib, because I'm only going to show you a few. It is such a rich package that we can create so many plots, so we're barely going to scratch the surface here, there's just too much, we'll be here for months if we look all of those. So I'm going to show you the basic ones, the most important ones, and remember you choose the plot by the type of data that your variable has. Data visualization number 06, notebook 06, so let's look at the packages that we're going to use in this notebook, we're going to use pandas as always, so we're going to import pandas as pd, and then we're going to import plotly. Now plotly has many different modules inside of them and they approach how we generate these plots in all in a slightly different way. So the two that we're going to import here is graph objects, so import plotly.graph underscore objects and we're going to do that with the namespace abbreviation go for graph objects. plotly.express, it's a fairly newish module and makes it very easy, very easy to plot from a dataframe object, so plotly.express we'll import that as px and then plotly.io we'll import that as pio and we'll use pio immediately using the templates.default property there and we'll set that equal to plotly underscore white, that's a theme and there's quite a few themes that you can find, so let's just run this as well, so that we import that because I just want to show you pio.templates, we just say pio and we use the templates attribute there, we see we get the default which is plotly white and I've set that as the default, you also have ggplot2, seaborne, simply white, plotly, plotly white, plotly dark, presentation, X grid off, Y grid off, grid on and none, so these are different kinds of themes that you can do, many people like to code with a dark or black background and you can do that right inside here of colab, you can set this to a black background and white text and if you want to do plots there then set it to plotly dark there and your plots will be whatever nice dark background, but anyway I like the white background, it's just easier for us to see when we do this, now if you are doing this on a mac, so suddenly when I run a use colab on my mac that's a retina type display, there is this magic percent config and I can say inline back end dot figure underscore format equals equals or equals retina retina as a string and if I do that on a mac then those plots specifically format plotlib is going to be much crisper, so this is on a windows machine I only want to record HD quality and so I'll run that cell but this is not a mac so that's not going to really work for us but in case you are using a mac run that cell and then of course the percent load underscore xt google dot colab dot data underscore table remember that's this for us to have these tables printed out nicely on the screen and then as before with the two previous networks our spreadsheet lives inside of our google drive so we're going to do that drive function so let's import our data as always we have to mount i drive it's going to open a separate tab we're going to log in to our our google drive once again and we're going to copy that security code and we're going to paste it in here and then I've got the percent ls that's linux for you know just show me what's inside of this folder or directly list what is inside there we'll have that as well so I'm going to log in and I'll see you on the other side so we're going to import the customers dot csv folder of spreadsheet file there using read underscore csv that function from pandas and we assign that to our computer variable df and then there's a australia underscore rain dot csv file also going to import that and I'm assigning that to the computer variable rain so we have two data frame objects here so let's just look at the first one df and we'll see some some data about customers we see an index there and then attrition underscore flag and you see existing customer a trited customer so these are customers that lift the financial institution and so customers that were lost this is the kind of dataset where we might want to build a AI model a machine learning model to predict what customers are going to leave the banking institution for instance we see age hoe me dependence they have the education level the marital status the income category et cetera so quite an exciting data dataset there as far as customers are concerned so first of all some attributes shape attribute as always we see we have data on 10127 customers and we have data on 14 variables let's have a look at those variables using the columns attribute and we see all of them listed there and then perhaps more importantly we see the d types we want to know what data type pan is things that these variables are in and remember this is tidy data always remember when you have your spreadsheet it's got to be tidy data so i want to introduce you to a new method called the info method we've never seen that one but what that's going to do is just give you a little bit more information it's still going to give you all the column headers so all my variables all 13 of them there 14 i should say and then non null and we see all of them 1127 they're all there so there's no missing data there so we get that little extra piece of information we show we get to see how much memory it takes up 1.1 megabytes of memory in our system and then obviously the d type there as well so i do remind you go to the plotly website you can click on it there in the notebook and there is a lot more but let's just start introducing ourselves to to plotly so the first type of plot we're going to do is a bar plot now a bar plot is a frequency plot it's going to count how many times each sample space element occurs and we use a bar plot for categorical or discrete data categorical or discrete data if we look up here and we look at attrition flag that was an object so that's going to be categorical data so let's just look at the value counts methods i mean say df dot attrition underscore flag so i can use the dot notation a to get to that column and it's going to express that as a panda series and on that i'm calling the value underscore counts method and remember i'm not putting normalize equals true as an argument and so it's not going to give me the relative frequency it's going to give me the absolute values or the frequency so 8500 existing customers and we lost 1627 customers and that is ideal for creating a bar chart because every bar must be a bar for a sample space element in a categorical or discrete variable that's what we use a bar plot for so let's create this bar plot and once again as with most of these things it looks a bit complicated when you start and on top of that they're more than one way to be one more than one way to go about this so let's do it this way first of all i'm going to create a computer variable i'm going to call it churn underscore fig remember churn churning means losing customers so churn underscore fig en i'm going to set to that an empty figure from the graph objects module in the plot package remember we used the namespace abbreviation geo so geo dot figure and that's just going to create an empty figure and now we're going to have to add some some color to this figure some information to this figure and for that we add what is what i've called traces trace is this thing on inside this empty graph so we're going to say churn underscore fig that's our computer variable that holds an empty figure object a plotly figure object and i'm going to use the add underscore trace method on that so what is it that i'm going to add to that what traces i'm going to do i'm going to add a bar chart to that so go dot bar now what you've seen me do here also is use a lot of spacing and when i create graphs specifically i like to do that so what i'm going to do here is i'm just going to retype the code just to show you because the first time you see this it also looks a bit weird so i just want to show you how this works so if i were to say churn underscore fig and i'm going to set that to go dot figure and f i g u r e and open and close parentheses okay i've got my empty figure figure object there i'm going to say churn underscore fig and then i'm going to do the add trace so add underscore trace and open my parentheses and there's going to be automatic closing parentheses there now what i want to do here is not to start typing the arguments here i'm going to press enter return because i want to list all the elements separately okay so i'm going to say what i want there is a bar chart so it's already slightly you can see from the cell above okay so a bar chart is what i want to add to there and look at the tool tip that comes up that shows me all the different arguments that i can pass it's just phenomenal all these things now you can use all of them to change the look and feel of your graphs but i'm going to press enter return once again so that i now can list all of these on my x axis i want the following so my x axis i'm going to have two separate strings so i'm going to have my first one is going to be existing customers and then lost customers so excuse me for copying and pasting there so i'm going to say control c and so let's just go there and say control v so i'm going to have these two and then on my y axis also a python list and my list is going to be 8500 and what was our other value 1627 1627 so the reason why i do that why i like to put things you know with all these spaces and not just start typing as you could see there there's so many arguments that you can pass and it goes off the screen and that becomes very difficult to read so i like to use this spacing so let's run this and we see what it looks like now this was done in v manually i typed in these values that i want to see on the x axis and i typed in how high those bars were going to be so let's delete this one and this is the one we have now simple bar chart so let's put a little code comment there for ourselves let's say simple bar chart there we go and if i could spell simple that'd be very nice now it is a bar chart and it indicates that this is a categorical variable by one neat little visual cue and that's the fact that they are there's a big space between the two bars if you look at a bar chart and you if you look at a chart and you see these bars and the spaces between them immediately that must denote to you and hopefully the person who created that chart knows what they were doing but that's a visual cue that we're dealing with a categorical variable and it's a frequency chart we're just showing a count visually showing a frequency or count of the sample space elements of a discrete or categorical variable purely because it has this little space in between now look at this why i love plotly is because it's very interactive when i hover over some of these bars immediately it comes up the existing customers comma 8500 and their lost customers comma 1625 so it's very interactive and you actually see a bunch of things here at the top that you can do i can save this plot to my internal hard drive or to the cloud here as a png file because i might use microsoft word or google docs and i'm writing a report or use some powerpoint whatever and i just want to take a static image of this and put it in my report you can just do that and there's all sorts of other things and when these charts become more complex i'll show you a little bit more about that now let's just look at education level for instance let's look at the sample space elements of the education we'll see we see uneducated graduate college unknown high school postgraduate doctorate and if we use the value underscore counts method we're actually going to get a count of all of these so that might be something we want to chart but now they're too many i don't i don't want to type in all of those values and i certainly neither the actual sample space elements or the frequency so i've got to be a bit verbose with how we go about this as i say that's this one way and i just want to sort of build it up from from one side climbing this this mountain so what i'm going to do is i'm going to take this df dot education on the score level so that series call the value counts and i want to save the values so there's a dot values attribute to this value counts so dot values so that's only going to be the values and then furthermore i'm using the two list method on that so the values attribute and then the two list method so that's going to convert this for me into a python list and remember when we did that plot the x and the y we passed python lists so now i can see the list of values there so all i want now is these actual names they graduate high school unknown so the way to get to those would be df dot education on the score level so that's the pen the series calling the value counts argument value counts method on that and then instead of the dot values attribute i'm using the dot index attribute and then two list so it's going to convert that series into a list i've assigned that to the levels computer variable and now we see a list of those strings graduate high school unknown uneducated etc because now when i create my chart it's it's a bit easier to do one step easier at least and that we don't have to manually enter all of that data so i'm going to create another figure i'm going to call it churn underscore figure again and call it something a bit better anyway go dot figure an empty figure so that we can call this ad trace method on our figure and to that we're going to pass a bar chart so go dot bar on x i'm going to have this python list which are all my levels and on y i'm going to have totals and now you see something new a marker that's one of those plenty plenty arguments we can use as the marker and i'm going to run this code and you'll just see what it does but see what i mean by it goes off the side of the page and anyway what i'm going to do is pass a dictionary to this marker argument and i'm going to do a color and a line so color colon and then a whole list of colors and what i'm doing is every bar is going to have a slightly different color and the colors that we're going to use we're going to represent in this way first one is just going to be red i just use the red keyword there as a string and then i'm using this rgba rgb is red green and blue and a is for transparency so it goes from zero to 255 so i'm going to have 50 red 50 green and 50 black that's going to give me quite a darkish gray and then i want it to be 50 transparent and the next bar and the next bar so all i'm doing here is to building up specific colors and then the outside line of each of the bar i want that to be color black and with one also passed as key value pairs in a dictionary so very overwhelming but you'll soon get the hang of it so there i've introduced different colors to each of these it still indicates to me that this is categorical variable and at least the natural fact it is because we've got those discrete sample space elements don't we graduate high school and we can see the majority of the customers here were graduates so let's add another little element so this time i'm going to use this exact figure that i'm still busy with and i'm going to add to it a brand new brand new method i'm still with churn underscore fig now i'm using the update underscore layout update underscore layout method and to that i'm going to have a keyword called title and i'm going to say title equals and now i put in a nice string there number of customers customers in each education level and then i'm using the dot show method there so i'm still using that same figure but this time we've added a title and we can see the nice title above the by the way i didn't mention it before but you can clearly see the black outline and the different levels of gray that i specified for each one of these so let's add a little bit more to our figure so it's still churn underscore fig still the same figure again update underscore layout that method and now i'm going to have x axis and y axis as my two arguments so i'm going to say x axis equals and instead of passing a dictionary remember way back when we first looked at python we can also use the dict function so i'm going to use a dict and i'm just saying title equals education level remember if i passed curly braces to it i would have had to have both title title would have had to be in quotation marks and then there would be a colon and then education level so you can do either of the two ways of doing a dictionary and then we're going to call the dot show method again and now let's add it also these x and y values x and y axis values education level on this side and counts on that side and again it's quite interactive when i hover over each of these it's going to show me the exact value so we interactive and again i've got all access to all of these things that we'll see a little bit later so let's just do something more just to show you know even more of the properties that are available so i'm going to overwrite my churn underscore fig there we go it's a go dot figure so let's add a trace and the trace is going to be a bar x is going to be levels y is going to be totals and then i'm saying text equals levels i'm going to see what that's going to do and then text position equals outside marker i'm going to have the color so this i'm setting this all as as you can see there is a dictionary and one of the key element key value pairs the value even is even a dictionary so a dictionary inside of a dictionary so you just have to keep keep your wits about you so color colon deep sky blue so they're all going to have the same color the line is going to have a color of black and a width of one and then opacity is going to be open nine so yet another way to do that i'm going to update the layout so churn underscore fig dot update layout i'm going to add a title i'm going to do another update layout i could have done all this in one but i'm going to have an x axis title i'm going to have x x axis underscore tick angle and i'm setting that to negative 25 and on the y axis i'm just going to have the dict of title and then we're going to show the plot so let's see all the things that happened so really a bit overwhelming but i'm going to show you what's available so first of all you see we have categorical variable we can see that because there's gaps between now remember when we say text equals levels let's just set these actual values on top of all the bars for us and then the text position outside so that's just going to write it just outside or on top of each of these and we can see the color and the outline there as well and then the tick angles we set to 25 degrees and you can do that you know in case those words are too long and they start overlapping with each other you can add a little bit of a tick angle there so you can see there's so many things we can start adding to this and make it a really compelling chart so let's do one more and in this instance we've still got wise levels access totals a mark is there but our orientation is our new one and we set that to h and you'll see i've swapped it on my y my x axis because what's going to happen now is that we're going to have a horizontal bar chart and change the color to orange as well and when you have very long sample space elements it's better to do horizontal so those you don't have to put those words in angle because that always looks a bit amateurish i don't know i don't like it anyway so those words are too long the sample space elements in my categorical variable they i like to do a horizontal plot great let's have a look at the cross tab remember that pandas function pandas dot cross tab it's going to create a contingency table and we do one categorical variable against another so we're going to do attrition flag and education level and we're going to set that as a contingency table let's have a look at that and there we see college doctorate graduate but for each of my two customers the customers that we've lost and the customers that have stayed with us so let's do levels as attrition underscore edu columns dot two list so we're using that dot two list and this is what it's going to give us remember let's just going to give us again those levels as far as all of these at the top are concerned so remember the attrition flag is the one that goes on the left hand side that's the first one and the second one that you mentioned there that goes across the top so if i call that the columns on this contingency table of mine what it's going to do is going to go across that top and we did that as a two list now let's have a look at this what i'm going to extract from my contingency table now is the following the attrition values and the existing customer values so i'm going to say from this little contingency table that i have which is actually just a pandas data frame i'm going to use in integer location so i lock i want the first row all the columns and i'm going to do that as a list so i pass the two list method there and what that's going to give me is all these first values 154 95 487 306 and if i do the second row that has an index of one all the columns that is going to give me that second bunch of values and you can go through all of this code but let me show you what it comes out as because what we've done here if you look at this we see that we've broken it up by lost customers and existing customers so how did we do that let's go back i've created my empty figure and i've added two traces to that see there's a two add underscore trace and another add underscore trace so i've done it twice both of them are bar charts and the first one is levels and values and the text is the attracted values and here we see the wise existing values and my text is existing values and outside there i've put on the outside the actual values not the the sample space elements so you can see very nicely a very very intuitive plot i get a lot of information from this plot i can see that the graduate levels was the most what an education level of my lost customers while graduate was also the the highest there so yeah i will i will totally agree with you very intimidating when you start creating these plots i'm going to show you you know how to do it slightly easier but i want to throw you in the deep end here just to show you what is possible what control you can have over your your your plots and when it comes to bar plots though i would say you know you have to extract the values that goes into your bar plot some of the other plots you don't have to do this it automatically just is taken out from the our data frame but here with with with bar charts you really have to do this almost by hand and i just want to show you we added the two traces but why are they next to each other that's in this update underscore layout we have this bar mode and we set group and that's going to group them together so for both sets here lost in existing customers we see them right next to each other one other thing i want to show you wat i do love about plotty as well these and agency on the side they totally interactive so if i click on lost customers it disappears from the dataset and now i only look at the existing customers i can bring them back and i can take away the existing customers now i only see the lost customers very interactive type of plotting and that's brought to you by plotly i've got a little advanced exercise there for you it might take you a little bit but i've given you the whole solution there if you want to have have a look at it so next up we're going to look at histograms now histograms is also going to give us this ideal of a frequency count but we're not going to see any gaps between the bars and that's going to indicate to us that this is a continuous numerical variable and as i mentioned before i'm going to show you a slightly different way and this time we're not going to use the geo graph objects but we're going to use plot the express so what i'm going to do here look at that's just much much less code for us to write i'm going to call my little figure age underscore hist a histogram of the age so i'm going to say px remember that's the namespace abbreviation for the express module in plotly plotly dot express and it has a function called histogram and all i have to pass to it is the data frame that contains my data and the variable that i'm interested in as the x so x equals customer age and then i'm just going to call the dot show method on this plot and immediately much simpler code we can see a distribution of the age ages of all our of all our customers and we can see it's what we would call a normal distribution it's this bell shaped curve so most of our customers was here and we can even hover over each of these little rectangles because they're just long within rectangles and what it's going to show us there is the count of each of those and as you can see plotly decided by itself what the bin size is going to be it decided what would be optimal and what we can see it decided the bin size was going to be it's just a single age so of customers age 38 there were 303 customers of customers age 40 there were 361 customers 42 there were 426 so it decided the bin size should just be one so just counting each of those values now we can bring a little bit more information to that with a something that don't use too often but a stacked histogram and for stacked histogram we're going to use the color argument and it has nothing to do with choosing the actual color so let me show you i'm going to do age underscore hyst which i'm just overwriting my computer variable to that i'm passing this histogram object from plot the express so px.histagram first of all i'm telling it what data frame it comes from on my x axis i'm going to have the customer age but now i'm going to group it by one of my categorical variables and that is the attrition flag and the keyword argument for that is color so not the actual color but how i want to group this this plot by so i'm going to group the age the age is by the attrition flag i'm going to give it a title i'm going to give it an opacity so that it's a chance through a little bit so that we can see the two the two separate sets of values and then i'm going to add a marginal plot and that's one of the nice things in plotly we can add other plots on the margins of our plot and what we're going to do is a rug plot and i'll show you what a rug plot looks like so let's run this cell and there you go we can see both of those on top of each other so the tritid customers and the existing customers now if you look at that the red ones are the tritid remember they were much much smaller number of those so it's stacked on top of each other and that's why i really don't like to stack because now it looks like this see through red there you know you've got to sort of see what the difference is looking towards the left between the top and the bottom of this little pinkish while it's just 70 opacity red there you know what the top and bottom of those are fortunately with plotly i can just hover over those and i can see you know see the actual values anyway the rug plot there is just if we look at all the ages at the bottom each one of these marks would be one of our customers and many of them are on top of each other so you see some are a bit broader than others but it also gives you some indication of all the different customers so that's one way to go about it let me show you this way or what are we going to do yes just to change actually just change these so that customer underscore age that was taken from the column header now that that doesn't look good at all so the way that we would change that is using this labels labels there and we with labels we pass a dictionary and we say the key value pairs so customer underscore age when you find that in the plot please change it for me to customer age so that's certainly going to help us out as far as this is concerned just to give us that slightly better look so we have customer age that looks a little bit better and when we look here we'd had attrition underscore flag that now has customer group because here i said when you find attrition flag please change it to customer group so it just looks a little bit a little bit meter so let's have a look at this one we're going to do sort of the same thing but now we can use a graph object instead of an express plot so i'm going to say age underscore haste i'm going to say go dot figure that's a blank figure and i take that figure and i do the add trace to that and what do i want to add well i want to add a go dot histogram that's a uppercase h and on the x axis i want df dot customer underscore age as simple as that so here with the histogram i really don't have to worry about extracting all of these values as we did with the bar chart really bar charts take a lot of work but as you can see here we get back the exact same thing we get the exact same thing the thing about this graph objects though i can really trade a much finer way of doing this so you know each trace i can design separately you have given each of them a name i've controlled the bins so i've got an x bins here i'm passing a dick to that i wanted to start at 25 ended 80 and i want the bin size to be 5 and then for the trited customers and how do i get these customer ages by the way i'm using conditionals en i think you pretty well familiar now what these conditionals will do and then in the end i've got this bar mode equals overlay and i've set update my traces is another method there and i'm setting the opacity to 75 and let's have a look at what all of this is going to look like so there we go now it is no longer stacked it's just an overlay so they both started zero at the bottom but everything else you know i can still decide i only want to see my existing customers i only want to see my last customers you know et cetera and look at these bins like quite a bit broader now and when i hover over these it'll actually show me it goes from the age of 35 to 39 and there's 1124 existing customers there but if i hover over the last customers again the age of 35 to 39 and there's 177 in there so quite a lovely thing to do and there's a little you can click on there if you want to read much more about histograms so next up we're going to look at box and whisker plots now if you page through a journal read a journal online you're going to see a lot of box and whisker plots they really pretty neat plots give a they give us a lot of information and it's all about the distribution as a histogram is about a continuous new vehicle variable so let's have a look at these box plots i'm going to create a long computer variable there ages underscore churn box underscore box underscore px so this time i'm going to use plotly express and i'm calling the box function there first one first element the first argument is my data frame on my x axis i want attrition flag so remember i had my customers and my lost customers on the y axis i want the ages please i'm giving it a title and i'm changing some of the label so i know it's going to do customer underscore age but instead of that i just want age and i know it's going to do attrition underscore flag and for that i just want customer group and then i call the dot show method and let's have a look at a wonderful box and whisker plot so there we go so because we've said x as the attrition flag remember they were that's a categorical variable the two sample space elements existing customers and attracted customers so it's going to do that for me and then i've put the change the label to customer group here and from customer underscore age we put the age there so we see the age range there and we see this box and whisker plot so what's a what is a box and whisker plot all about very helpful in plotly if i hover over there it's going to tell me what all these lines are so we start right in the middle of my box there that's the median and the median for our age of the existing customers of 46 and then the top and the bottom of the box that is the quartile one and quartile three for the bottom and the top and you can see there quartile one is 41 age so quarter of the customers they were younger than 41 and quartile three three quarters were younger than 52 and then you see the whiskers and the whiskers do something a little bit different because you can see on this side i had the two little dots there and on this side we don't have dots those dots are statistical outliers now how do we decide on statistical outliers now first of all on the minimum side there seems to be no statistical outliers so the bottom whisker gives me the actual youngest customer 26 the top it calls an upper fence and then the top of those that were 73 so those two are suspected outliers and how we count how we calculate these fences is we take the interquartile range remember that's the difference between the third and the first quartile so third quartile minus first quartile and we multiply that by one and a half so we take the interquartile range multiply by one and a half and we add that to the third quartile and we subtract that from the first quartile and if there are no values outside of those new fences then there's no outliers if there are values outside of that fence there'll be outliers so once again the interquartile range plus this third quartile value so 52 minus 41 that's 11 multiply by one and a half add it to 51 and that gets us close to the upper fence sometimes there's a bit of rounding so that gets us to the upper fence and then anything beyond that so here with the attrited customers there was there were no customer ages above one and a half times above and below the first and the third and the first quartiles so these will actually now be the whiskers will actually be minimums and maximums okay great stuff so let's go about the long way and let's use graph objects so what i'm going to do here is i'm going to create two python lists the age of my existing customers in the age of my churn customers and i'm going to use just a bit of logic there so conditionals df dot attrition underscore flag equals equals existing customer equals equals a trited customer take the custom age column and convert that to a python list please so i'm going to do all of that so now i have these two lists because i'm going to build a separate trace for each so still the go dot figure and then as always i'm going to add one trace to it i'm going to add another trace both of those traces are draft object boxes and the yx is there i have existing age on this one i have churn age the name i'm going to give it as existing customer the name here i'm going to give lost customer the marker color i'm going to give that green and so i don't i'm not using rgb colors i'm just using the some colors that i have specified names and then this one i'm adding a box mean this is a new one box mean equals two and box points equals all so we're going to see what that is all about this one i'm going to say box mean equals sd so not true or false but the string sd and but box points all as well and i'm going to do a update layout all in one this time a title the x axis i'm passing as a dictionary the y axis i'm passing a dictionary because on the x axis i have title group and title count so let's see what this what's this looks like as far as is this is concerned so what those box points are it's actually going to give me a all the age values all of them and it does a bit of jitter left and right spacing so all the dots are not on top of each other so you can sort of see from the distribution of these dots as well you know how that data was spread so on the left side we had box mean equals two so it adds this little dotted line so as you can see there from the tool tip is a hover over that that's mean as 46.26 and the median was 46 if i say box mean equals sd it actually gives me this diamond and the top and bottom of the diamond is actually that is the standard deviation that's there so i can see a mean and standard deviation i can see a median and i can see all my quartiles so you can see a lot of information and once again you know we can just take one of them away and just zone in on the ones that we that we do want so box and whisker plots next up we're going to look at scatter plots and that's where we compare numerical variables to each other so we no longer have a categorical variable in there although i'll show you can but it's all about numerical variables so there we go scatter plots on my x axis i'm going to have a numerical variable and on my y axis i'm also going to have a numerical variable so those are pairs of values for each of my observations let's use graph objects first so i'm going to say age underscore mob so i'm going to have go dot figure and then to that i'm going to add a trace and what do i add a scattered plot so i've got to tell each give specifics for the x and the y value of each of the markers that we're going to have so customer age that's a continuous numerical variable and months on book that's another numerical variable so it's pairs of numerical variable and the mode i have to set for scatter plots and i'm setting the mode to markers you also get lines and markers plus lines i just want markers just little dots and then i'm going to update my layout you've seen that before so let's have a look at our first scatter plot so what that's going to do for us so each value that i hover on is going to be the age comma the month on the books as simple as that it's a scatter plot it's pairs of numerical variables each dot represents all this marker remember we said mode equals markers so each of these are markers so each of these little marker dots is a specific observation a specific customer and that's a thing i want to show you remember all of these at the top i can actually click and drag and select only a certain part of it en i can actually pan around with this tool i can pan around and look at the different data or you know if i've lost my way you know i can zoom in and zoom out but i can just go back reset the axis and all the data is there again so you can imagine that this can become very very useful so now we're going to use a plotly express for another scatter plot so i'm going to do this age underscore mob underscore group underscore px very descriptive of what my plot is going to be it's a px dot scatter plot first i pass my data frame what do i want on the x axis customer age what do i want on the y month on books but now i'm adding a size of my markers because i can add a third numerical variable to a scatter plot so i can actually have this almost three dimensional data here so i'm going to set the size of my markers to the dependent count so how many dependence does my customer have and that's going to determine how large my dots are and i'm also going to add a categorical variable to it and that's what i was alluding to i'm going to separate two two sets of values out so by the attrition flag i'm going to have that and then i'm going to add some marginal plots as well so on marginal why i'm going to have a box plot on marginal x i'm going to have a box plot because remember originally my x and y axis they're both numerical variables so i might as well do you know look at the distribution of that numerical variable and do box plots on my two margins i'm also going to add a trend line using ordinary squares and that is a linear regression model and we're going to do modeling as well i'm going to show you just how to do modeling but plotly we'll just do it for us and a little mathematical model to that as a trend line i'm going to add a title and with the labels i'm going to change all those values so that i don't have those ugly underscores and let's see what this plot turns out to be and there we go so the first of all the size of these dots that gives me an indication of how many dependents they have so on this axis here i see the distribution of the months there and on this side of the age so i can see these little marginal plots and there i can see the two trend lines as well so i have a little linear regression model there once again you know i can only look at the trited customers so i can really zoom in just on some of them they pan around zoom around whatever i need to do just to if i'm going to present this data to someone you know just you can really zoom in get very specific about about your data so here we're just going to add histograms and rug plots show you a little something a little different and there we go so i've got a histogram on this side and i've got a rug plot on on that side so really you can do a lot of this so let me show you another way to add the dimension and that is by this facet column so that's a new one and the facet column is going to be attrition flag and what that's going to be is create two separate ones for me there's a facet row facet underscore row and facet underscore column so here's column now this time what i've done with the dependent count i've added that to color and now it's going to do this sort of color separation now it's not ideal here for this very discrete data so if i do have a continuous medical it'll be nice to add that so the color is also going to give us some information and you can see my two linear regression models these two little straight lines and it even shows me some statistics there shows me this intercept the slope and an r squared value if you don't know what those are don't worry by the end of the course you'll know exactly what the linear regression is all about and how to calculate these these values the last plot so we're going to look at is this some time series data very nice to see how things change over time so let's look at the time series data so remember the Australian rainfall so let's just call the info method on that we see we have a lot of data there there seems to be some missing data because not all the values are there and we can see our d type there so what i'm going to do here i'm going to do this we're going to go do a line plot i'm going to say rain and then i'm going to say rain dot location is Darwin and or so that's the awesome remember rain location equals Hobart so we're going to look at Darwin and Hobart and on the x axis we're going to have date and on the y axis we're going to have max temp and we're going to have color equals location and we're going to have the title and the labels so let's just have a quick look at what this what this looks like so there we see a Hobart and Darwin and we see the minimum maximum temperatures and we can see the years down here at the bottom so that was x was date and if we go up to date we can see that date was an object remember before we changed that to a date time series remember date time data type and this is an object though just a string i just want to show you that plotly is clever enough to understand what is going on here and it will do you know it'll do this for us let's look at one more scatter plot we're going to do rain location is Darwin or Hobart again x is a date y is max temp color by location and then i'm sitting the labels there and i'm adding this new one update underscore x axis arrange slider so if i just want to take these sliders and move them in i can really zoom in on only certain of the dates and that's fantastic once again why i love plotly because these things are interactive while i'm giving a presentation i can really interact with these plots and that's much better than for me at least than study plots so let's just look at one more let's just look at at the rainfall for these two and you can see clearly the difference in the rainfall or at least when in the other info is and what the difference is between the two sides of australia there as fast the different dates are concerned so that's it i'm just scratching the surface as far as what plotly can do but what i wanted to show you here is this different plots for different types of data so categorical categorical variables will go bar charts for continuous numerical will do box and whisker plots or histograms and for numerical variables these scatter plots and i wanted to show you the difference at least that we have plotly express and the graph objects and graph objects much more of a boast that you have to be as far as the code is concerned but a lot more things that you have control over and then the plotly express perhaps a little bit easier so go out there read up about plotting there it's a science in its own and how to convey data properly what you didn't see are pie charts now pie charts unless the circumstances are extreme on not good plots human beings are not good at interpreting the the the fraction of you know cutting up a pie we are very bad at that and especially when you start adding 3d plots and you start you know angling the pie chart a little bit you can really cheat with those plots and overemphasize certain parts of the data which really isn't good so really stay away from pie charts please and i think that's like a little stab in the back of people who you know who produce who produce good plots when they do their presentation stay away from those stay away from those pie plots so go out there try to learn more about plotly and have a look at the other packages that are out there as far as plotting are concerned you know choose your favorites look at met plotlib look at seaborn there's so many of them out there give a give us wonderful plots