 Hi everyone, welcome to the Data Analysis and Visualization with Parts of Workshop. In this workshop, we are recovering. Firstly, let me tell you things about us. So I am Diamond. I'm a year of course head of course student studying in Temasek Junior College under the integrated program. A very friendly person. You are free to connect with me. Eugene, do you want to introduce yourself? I'm Eugene. I'm also a Dean student at the University and I'm a J1 student this year. Okay, so a general overview of this workshop. First, we'll go to a basic introduction about detail analytics, how it's used in this modern world, in businesses and etc. And then we'll be going into various pattern libraries such as your NumPy, your pandas, which these modules are basically used to manipulate data and find trends or patterns between them which we usually cannot see just by looking at the data set. And then after that we'll be going into data visualization modules such as your Matplotlib and Seaborn, which allow you to represent the data or the trend and show it on a graph basically so that it's easier to view. And one good thing about all these libraries is that it's all open source basically. All of these NumPy, pandas, Matplotlib and Seaborn workshops are basically open source. At the end we'll also do a bit of machine learning which is using scikit learning. The main exciting task for today is that by the end of this workshop you should be able to create a cheat sheet with all the functions that have taught you Matplotlib, Seaborn, from all the libraries basically. And this cheat sheet will be able to help you in the future. Let's say you want to go back to what you've learned from this workshop you can just refer to the cheat sheet and after that you can just apply the function and basically know what it does, the cheat sheet. So this is an example of a cheat sheet. It's from data camp basically. So this is an imprudent V6 cheat sheet. For example, if you notice a touch of Python or something, you can just look at this cheat sheet and based on that you can just increase your refresh on all the variables and how to import stuff, how to select import, etc. And then you can also see our data types such as a list, strings, NumPy arrays, popular libraries used, such things. So what is data analytics? So data analytics is basically the process of examinating data sets, finding trends, draws, and drawing the conclusion and pattern from the information that those data sets contain. So this includes a range of tools and processes which is used to find such patterns and solve real-world issues or existing issues by using these trends and this data basically. So data analytics is basically very important to modern businesses and communities because firstly it improves decision-making, important decision-making such as which would revolutionize businesses such as you can use the data to see how the company is doing or basically just teach on all the basically fine patterns. So this ultimately fosters business growth. Okay, so some open source Python libraries that we will be covering today is, as I said before, NumPy Pandas which is used for data manipulation and data visualization modules such as PetPoplib and Seaborn. Notice that we separated it because data analytics is basically not only manipulating data but it is also visualizing them and presenting them in a presentable state such that if we were to present to our clients or anyone, it would be quite easy to understand. They do not have to look at the Python console and like, oh, okay. So because they're not very familiar with Python, they can't, it's a bit hard. It's not user-friendly. It's not very nice way to present. So that's why we have modules such as PetPoplib and Seaborn to pop your graphs. Many different types of graphs there. Okay, so Eugene, we're going to talk about NumPy. So in this case, let's imagine a scenario where maybe say you want to you want to perform operators over a large list or a range of numbers. You wouldn't want to enter it through the entire list because it is quite memory intensive, right? I mean, NumPy is also quite memory intensive but it just makes it easier for you to do it. So then it's when NumPy comes in. So NumPy, but before we get into the thing you have to install everything first. But for this workshop, we won't be installing anything for now because we're going to use Google Collab which has everything installed for you. So if you do not have Python installed, just install Python and I gave you an integrated developer environment as well and to install the NumPy module just run it install NumPy in the batch terminal. This will install the NumPy package for you. And since now that you have downloaded this NumPy package you can import the NumPy package into your Python script to actually use using the same import of NumPy as NB where NB is actually the standard naming convention for NumPy that we use at Shophand because we do not want to type out NumPy.arrange every time we use the option itself. It's a bit inconvenient, right? So that's a short break. Okay, so I guess we talked about this earlier the import of NumPy as NB into your JavaScript to use it. So now you may wonder what is NumPy? Well, NumPy is really just a open source pattern that brings for you to do mathematical operations over large arrays of cases of numbers very easily and the functions are very distributed so it's very user friendly. So rather than just looking through and type arrays or arrays of numbers you can actually perform these mathematical operations using NumPy without actually having to write the entire code for iteration yourself. So big question then, how then do we use NumPy? We really have imported it. So the next step is actually to have let's say you have a list of numbers, right? You can convert this list of numbers into a NumPy array by using NumPy.arrange where you insert the array as a parameter here. This will actually convert the array that you have into a NumPy array. So let's say I have an array that has the numbers 1, 2, 3, 4 and if you put this array into the NumPy.arrange function here, which will give me a NumPy array that has the elements 1, 2, 3, and 4 and you can use this array for different mathematical operations later on. So wait, oh I'm hoping this side is Other than that, there are so many different ways you can initialize a NumPy array where you can use NumPy.zeros where the main parameter will actually be a couple that contains the rows and columns. This creates a NumPy array that is full of zeros based on the rows and columns that are specified. Similarly, NumPy.once also does the same thing, does it close the NumPy array with once instead. And NumPy.mp creates a NumPy array that already has pre-assigned values inside, but how it actually works is similar to NumPy.zeros where it creates a NumPy array that is specified to your rows and columns that you specify in the top-up. You can also initialize a NumPy array using the other functions, NumPy.arrange where the key parameters here will be start, end, and step. Start meaning the number to start from, and meaning the number to end at, but keep in mind this end is actually not inclusive. So let's say I specify that start as 0 and an end as 10. You actually have an array that is from 0 to 9, but not 10 because it's not inclusive. Step refers to the interval between numbers. So let's say I set the set to 2. Do I actually go from 0 to 2, to 4, to 6, etc. Yeah, also one thing to note is that the step is basically optional. So if you didn't go with the step parameter right, so the default step is basically 1. So you just give numbers from 0 to, let's say your start is 0 and end is 10, like what Eugene mentioned. So we just give the numbers from 0 all the way to 9 because it's not inclusive. NumPy is more inclusive. The end is not inclusive then. NumPy.full you will have to specify a shape in the form of a double, as well as a fill value. What this fill value actually is is that it's the value that you use to fill the entire NumPy array with. So let's say we call a NumPy array with the shape 1.5, it will actually give a NumPy array that shape 1.5 with whatever fill value that is specified. It sounds a bit not intuitive, but it will be very natural when the example is actually made. NumPy.full is similar to NumPy.full, but you have the fill value, but you actually follow the shape from a different array. So let's say you have a separate array and you want to create, I don't know how to create this, I'll just go to the example for this one. So in this case, here I actually initialize the 5.5.5 array of zeros using NumPy.full that contains 5.5.5 where one of the 5 here is actually the rows and the other 5 is the columns. Similarly for NumPy.full I initialize it with 5.5.5, it feels a 5.5.5 array that's full of 1s. NumPy.fp is also similar. I said double here as 5.5 where there will be 5 rows with 5 columns and there will be a decadent equals a full and it will actually just give you values inside the NumPy array. So sometimes it was similar to NumPy.full, sometimes it doesn't but in this case it does. But in this case here, I actually initialize array using NumPy.arrange which as you can see it starts zeros of 10 and steps is 2. So meaning it will start at 0 and it will stop at 9 where it will actually sort of put the number at each interval of 2. So it will go from 0 to 4, 6 and 8. But do keep in mind that this step thing is actually not necessary and usually it will be difficult to do one but in this case I included here to show you guys. NumPy.fp I guess now it becomes a bit clearer it's using a double, 5.5 by the way, 5 rows and 5 columns then it will be actually filled with the value of 5 because there is what I specified here. So the parameter here is that the first parameter which is this 5 represents the amount of rows you want in that array, the matrix and then the 5 on the left, on the right side represents the columns. How many columns you want in that set of the matrix to basically have an array in here. So this will be where the what I can actually come in. So in this case here I have an array that has a shape of it should be 1.5 where there is 1 row and 5 columns because there is 1, 2, 3, 4 and 5. So when I call NumPy a full line 8 equals to array 1 it will actually copy the shape from the array 1 as 1.5 which is right here. So there is now actually a array that is 1.5 shape of 1.5 1 row 5 columns that is filled with 3. So now that you have the initialize of the NumPy array, you can actually use two of the formulas that you can call NumPy.min to get the min of the array NumPy.median to get the median of the array, NumPy.standard, scd to get the standard division of the array and many other functions plus minus divide times power flow division and trigonometric functions, it works really well. So in this example here I call NumPy.min when I specify the array as array 1, it actually gives me the mean of the array which is 5, 2 over 3. Yeah, and when I call NumPy.median array 1, it actually gives me the median which is 5 and so on. So, oh no, oh no. Okay, so to concatenate NumPy arrays, it may sometimes be important to concatenate NumPy arrays for example of a degree because they can have rows, different rows and columns. You can call NumPy.hxt x and y where x and y for the two different arrays but you need to make sure that the number of rows in these two arrays are actually the same to set those two arrays horizontally. I think of it as like we want to create a rectangle with two squares. To make sure that it happens you must make sure that the height of the arrays are the same, make it the number of rows. So similarly we stack x, y, x, y, x and y because of two different arrays and they actually set two arrays vertically. So already at the stacking two squares horizontally, you're hitting of it as stacking two boxes vertically but you need to have actually the same number of columns like same width to actually stack those two together. So I think this image actually summarizes it quite well. In the case of NumPy.hxt a, c it actually in this case the array a has three rows so does array c so you can actually stack those two together. Similarly for b stack a and b, a has four columns, b also has four columns so you can actually vertically stack them together. For more statistical functions you can call NumPy.hxt where you specify the mickey arguments will be the array itself the array that you want to compute the percentiles from and the percentile or the list of percentiles you want to compute. And you do need to make sure that the percentiles you actually specify is between the range of 0 and 100 percent not the percent but 0 and 100 each signifying like the percentage you want. NumPy.hxt also works the very similar way it provides an array that you specify along with a quantal or a list of quantals to actually compute the quantals specified and these quantals must be different from 0 and 100 like 0.5 for example 0.25 So we can also call np.metmall which stands for matrix multiplication to basically perform matrix multiplication over on between two arrays two matrices. So we have to call np.metmall as shown here we call a which is equals to np the array of this array basically and then we call p equals to np the array of which is a different array of 5, 6, 7, 8 and then we call np the matrix multiplication passing the parameters as the first array and the second array should be want to multiply it will result in this array which is basically the matrix multiplication of A times B basically so you can see this in this example where we multiply A times B So for statistical analysis we can also call np. we can also call np. correlation coefficient so what this does is that it finds the correlation coefficient basically co-relational coefficient means is that it's basically measuring the strength of a linear relationship with its values ranging from negative 1 to 1 basically negative to a positive relationship negative is a negative relationship and it is positive relationship where if it goes to 1 actually a stronger linear relationship between these two points and one thing to note is that both your arrays, your X let's say your first array and the second array which you're calling this function on must have the same shape in order for this function to work or it will result in a cheaper error basically So two hours, we only have two hours and we do want to have some exercises if you want to do them so we won't cover the rest of the functions here so if you are interested you can actually follow this link here to find out more from the documentation itself Alright, now that we know about NumPy basically how to handle NumPy arrays, call functions and all that now let's go into a bit more advanced to pandas which is a... so let's imagine a scenario where your boss basically asks you to get a data of sales where the sales volume was very large but then the data set basically looks like this, has a lot of rows a lot of columns, oh my gosh yes, but the thing is that in Excel usually when we are tasked with data manipulation we use tools such as Excel, Microsoft Excel but the problem with Excel is that there are limits with the amount of rows or columns a data set can be loaded in which in this case for Excel is the limit of a million rows and it's basically also too slow to identify all of the columns or let's say you want to carry out a function let's say multiplication across this entire row it'll be a bit slow, you'll have a bit of a delay if there's a lot of rows here columns also so one way to solve this how else can we do it? one way to solve this is that you can't just tell your boss, oh my gosh it's not possible you might just get rejected but your boss will be angry at you this guy is in keyboard mode, no we don't want that you want your boss to feel good about it so that's where pandas comes in pandas as in not these pandas yes, very cute but not these pandas but the Python module pandas, a very powerful library indeed so firstly let's know about pandas background information what is pandas? pandas is basically an open source library or data analysis in Python, we know about an entire array but what if we want to deal with n dimensional arrays, maybe 3 dimensional arrays 3 dimensional data sets and all these that's when pandas comes in pandas basically provides fast, flexible and expressive data structures designed to make working with data sets very easy and basically takes very less time you can also perform a series of operations such as multiplication or finding the sum of all these arrays in a coder basically manipulate your data using pandas it's basically an Excel but it's an open source version which is pandas let me tell you the advantages of using pandas over Excel because you're still not convinced on why you want to switch from Excel to pandas, pandas is coding so speed and scalability pandas basically handles your large data sets very efficiently compared to Excel it also provides flexibility because pandas provides very flexible and powerful data structures in which we'll explain shortly you also have integration with other tools pandas can basically since Python is a programming language and there's like millions of libraries available which you can use Python libraries such as your matplotlib and cbond we can use pandas to manipulate the data but we can integrate such the data we manipulated into basically visualizing them through modules such as your matplotlib and cbond which is very convenient because if you're using Excel then we need to memorize how to create a graph in this Excel we have to go here and click make a graph and then maybe enter the statistical data of the graph so that it can plot out the time chart or whatever for you but in pandas not only can you do that in Python not only can you do that but you can also after manipulating data of pandas you can also customize your plot customize your graphs which we will be showing shortly in the data visualization modules and one strong point is that so Excel Microsoft Excel is not free especially for businesses business enterprises it's not cheap it doesn't come cheap but pandas is open source it's free you can use it anyone can just download the page and install pandas and use this amazing tool in order to manipulate your data because it basically does the same things et cetera but even better and faster so it's worth learning pandas but wait hold on before we start we don't have pandas installed but for this workshop for simplicity sake we are not going to install it in the terminal but if you are interested in coding on your own you can install your pandas on your laptop by running install pandas and with those you can use cmd or if you are using macOS you can go to your terminal and I think there is a function for that it should be in your documentation pandas documentation for ease of installation you can just go to the bench terminal in the id and just run pandas it's actually really easy CMD I'm pretty sure it will input some other commands before the pandas works so yeah so after installing pandas package we can actually import it into our script by running this command called import pandas as pd similarly to how we imported numpy as np we also have a standard naming convention for pandas which is pd which we use for short hands we don't want to type pandas. whatever function we are calling it is basically very tedious how can I say we can ease this process by using pd in a shorter form okay so all about pandas pandas basically has many functions for you to use user friendly as well for example we can read csv files if you have a data analytics job data csv files are very common they are basically files which contain all your data and can be read by software such as excel 2 pandas also has a csv function to read csv which is pd.read csv and inside the brackets you can specify where your file is in order to basically access the data set we imported it this is one method of how you import a csv file which you already have first of all let's create this variable called dataframe df equals to pd.read csv we pass in the file path here for example in this sample data set what we did is that after we put the pd we put in the parameter we just print the head of it which will be getting shortly how we just get the first 5 files basically what you need to understand here is that you can basically access the file pandas.read csv let's say you don't have the file as a csv you also have it as an excel file it's an excel file because that's what many people use nowadays so if you have it as an excel file you also have this a function which is pd.readxl which not basically you can not only read a csv file but you can also read excel files right okay there are also like many other files you can read you can read.json pickle and i'm not wrong and many others if you want to this actually makes it a lot easier for you because you can do it in one script yourself okay now now that we imported the files let's create a data file this is the second method of creating a pandas data file so apart from importing a csv file or an excel file or an external data set we can also create our own data set from scratch okay so how do we do this is that for example we can start off with a dictionary i'm sure a few of you are familiar with python dictionaries so for example let's say we have a dictionary called data it goes to passing the key as the name and then you've got your list of names here which is Tom, Joseph, Chris, John and then there's another key here which is H and then you pass in the respective ages of people which is called Joseph, Chris, John and if we put we can create this data frame by passing in df equals to pd dot data frame and since python is case sensitive you need to capitalize the d and the f data frame and pass in data as the parameter or basically it's basically the dictionary which you specify here and you get something like something like this image here which is a data frame of your name and your each data frame has something with a rows and columns yeah so it's sort of similar to like an excel type table with just rows and columns just like a spreadsheet kind of thing so now that we've created a data frame now what we can do here is that we can also access certain columns of data frames by calling df and specifying your column name here so what this does is it returns you the rows of the data under that specific column for example in this example here we put pd data frame and specify the data frame column of skaters and we print that hit of this so like I mentioned before what does this hit do dot hit function is that it basically prints the first five rows of the data set for example if we print okay let's say we are printing this df dot stickers column of the data set right and if you call dot hit on it we'll basically just print the first five columns that I mentioned in this example you can see here and there's also we can also use there's also a function called df dot and you can put in your column name dot describe and this function basically gives you the number of rows of data in each category for example let's say for example here we put in the data set df right previously and if we call the df dot skaters which is the column name notice that this skaters is not in strings because when you call the dot function you need to include the string the name of the column in the string call it here dot describe and what this does is basically it just gives you the amount of data under that column which is in this case 2833 and how many unique values there are so basically what you can do is that you can detect whether let's say for example the majority of the data there is like ship ship ship ship and you have some which is like an error or something and ask what this will do is that you'll call it as a unique value right and I'll put here six unique values and for example you can also find the frequency which is like here and yeah it will also give you the data type project but the frequency of the most occurring value in the actual column itself so in this case it's 2617 okay so similar to describe we can also call the left column whatever column name you want but value counts so what this does is basically it gets the different unique values and the number of these how many of those unique values there are in that specific column or in that column for example df of slickers of value counts gives you the different values of different unique values that it detects and also gives you the frequency how many times it occurs shift right is the majority which is 2670 in that column and then you also have canceled which is 60 times in that column resolved which is up here 47 times and the respective data and how many times it has appeared in that specific column right yeah okay there's also more functions such as accessing a certain let's say you want to access that certain data type that will instead of the entire column inputting function and applying functions on the entire column let's say for example we can use okay let's say we want to locate a specific data in that column okay all we can do here is that we can call df.look loc and pass in the label but also what this look basically it stands for locate df.locate and then pass in what data you want it to locate and it will basically locate all the rows of data right these are columns you can go by rows it will locate the row of data under that label which has that the row which basically contains the label and output all of the rows which satisfy the condition of having that label and it also works with boolean arrays and which will show short data okay so you can locate the rows of data under the label like I mentioned before and we can also locate certain data rows of data by basically passing in the index of the index as in the calling let's say I want the data on this on the fourth row and fifth column fifth column of the data set and basically output the row okay yeah actually it's not really xy it depends on what your data set actually is but it does do if you want it does look at an entire row it does do like df.Iloc4 but look at my advantage the zero base index language so we will actually locate the fifth row instead and I would say the entirety of the fifth row instead alright so in this case we also locate the zero and what this does is that it reprints that row which is output that row which is a corresponding row with the label zero so it returns me with all the rows which has the label zero that you can see in the example of the right alright yep there are also more functions such as df.Iloc4 like I mentioned before but in this in this scenario here I'm actually doing df.SentersEquals it was cancelled where it's actually giving me a Boolean array of the different rows that actually satisfy this condition so let's say the first row satisfies the condition it returns true second row doesn't really satisfy so it stops then so on and so forth so what df.Iloc4 actually does is because remember I said that it actually works with Boolean queries so what it does is actually that if it's true it will actually return the row that is true so in this case it returns everything that is cancelled everything that is cancelled okay there's also an alternative method to this which is the function of df.Query so those doing SQL you might be familiar with this but it has this function called df.Query where you can insert the condition as a string for example for a previous example we found all the rows which has cancelled the cancelled under the speakers column we can put df.Query status equals what's that called cancelled what that does is that it basically outputs all your rows which satisfy that condition which is basically returning all the rows with the speakers being cancelled yeah alright in this case I call to find the price each all the columns with the price each which is basically more than a certain value in this case 90 so df.Query passes the price each but it must be capitalised because you know how like here on the data set here all the columns are basically capitalised right and this is case sensitive you must put exactly let's say the entire thing is caps right then you must be at the entire column should be also caps here if it's not caps there then it should be caps here basically you must follow that and then let's say if I want to find out the values which is more than 90 what we can do here is that we have already all the columns in which your price each is more than 90 and this gives the price of all the items in the column rather than the rows in which your price is more than 90 yeah okay now let's talk about more functions right let's say you want to calculate basically you just want to aggregate the data frame values over multiple functions or just one function itself and you can call df.EDG where you have to specify the data frame that's specify the data frame if the column is up with the one as well as the list of functions that you want to actually aggregate by so what this does is actually just aggregate by the list of functions that you go and put there because of what you can do not quite or some we can really cover earlier not quite of mean we can cover earlier so it actually computes you the aggregated values of the data frame so it just computes a mean send a data frame etc if you go and put it in so in this example here I actually just call mean I don't know you don't even need to read it up in first place df.EDG but in this case I just call their pricey sales basically all this is actually going to aggregate the function mean over these two columns with pricey and sales so it will just complete me up the mean here okay so what we saw here was basically creating data sets and importing CSV files which are basically cleaned for you in reality most of the cases is not in which your data set is not cleaned there are many NAN values basically your data set has a lot of un-clean things basically when you run your code it will output a lot of errors due to those errors in that data set and I will tell you more about those errors what are such errors in that short link right now okay so one example is that basically when you have your data set right Excel tends to usually save it as let's say there's nothing in that specific column so in specific cell you'll save it as an NAN value but in pandas it will usually cause an error when you're let's say calling a multiplication on it let's say you want to multiply all the values in this column with maybe whatever value you want or you want to find the sum of all the values in this specific row or column okay and for that what we can do is that to fix this issue we can call df.is.bracket and .sum so what this does basically it doesn't fix the problem but what it does is that it returns all the number of any values in that specific column for example okay let's say we have all these columns right and then we can see that so the sum of what this does is okay so everything is zero here because there is nothing there's no NAN values in all these columns but if you can see there are a few anomalies such as your address line, your state and your territory columns 1,074 1,486 and 2,521 things basically NAN values so what this means is that it basically means that that cell is empty and the stock contains is an NAN value so maybe carry out functions or aggregate functions or multiply or do any operations on it it will cause an error because it doesn't know what it is it's not an integer it doesn't know what it is so for this how we can fix it is that how we can fix this is that we can call this function called dropNE where you put the subset the parameter subset is equals to the column name in which you want to eradicate all the NAN values and in place of that NAN value what you want to put in place is if you want to actually edit the data frame itself and if you edit the actual data frame itself to clarify what this does is that you will drop that basic NAN value and just delete that of the column base once that you manipulate your data frame you need to save it of course in order to export it so how to save this let's say you want to save it back to excel file to send your client you can call df.toExcel and specify the file file in which you want the file to be saved and then you can also specify the sheet number for where you want how many sheets you want and then missing data RAP that you know if you have empty cells or missing data in some cells of your data frame what you can do here is that what it does is that when you pass in this parameter it will basically be represented with those cells will be represented with NAN rows so there are much more functions more capabilities than this if you talk strangely we will be moving on to other libraries if you guys want to do exercises we will have some of them if you want to do them it will not be less difficult it depends you don't have the resources to go collapse which we should think of this so really you can try some of the exercises if you don't have that that's fine so let's pause here now that we've covered data manipulation manipulating data co-relational co-efficient of the data frame or your sum carrying out functions mean of it whatever now what we can do here is that it's a lot of missing we have covered data manipulation but we haven't covered how to present this data so how can we do this how can we visualise this data that's where data visualisation comes in okay so we're just becoming metrolipers so we all know that data visualization is quite important to show trends to show meaning through your data so actually with that in mind co-relational co-relational co-relational co-relational co-relational co-relational co-relational co-relational co-relational it's actually very easy to use it's very intuitive just like multipleかて but there is also a part high level so co-revolta is here to help you so it's really user friendly it makes graphholding in scripts pretty easy very easy for you it's almost like a reason to be with X and Y values. So yeah, it just has this graph here. It's a very small graph here, but yeah. So, just coming into the solution now, similar to just now what we covered for bandage and numbering, you can just use PIP install macro-lip inside the ID. You can do it in PowerShell as well, but that will be a bit more, a bit more if you go up. All right, so we guys said we'll be going a bit too fast, so there will be a five-minute screen where you can maybe piece yourself. Any questions you can ask, be free to ask us. Yeah, that'll be a five-minute screen. Thank you. Okay. Yeah. Take photo of David. Yeah. So, this is going to be a bit of a shame. Okay. Okay. If you need any help, can ask Ivan. Yes. That's interesting. Yeah. We should do again. Yeah. That explains that. Yeah, it is like very easy. It does things faster, more operation faster. Yeah. Faster than your normal tightness. That's right. Normally, I remember last time when I tried to import the CSV, I had to use a very... Because, because right there, it just used the right to import the CSV. No. Is everything okay? Yeah. Thank you. Thank you. And the light. It's good. Thank you. Thank you for the listen. Oh my God, yay. We got one person to listen now. Oh my God, that's the key. That's how good I am. Eugene, Eugene. Eugene. Eugene, Eugene. Come on, let's go. Come on, bro. There are people here listening now. Come on. Come on. Did you get caught? Yeah. Okay. Let's say you want to create a Instagram or a program to show your manipulated data. So how can you do this? You can call PLT. Firstly, let me show you how to import this matplotlet. How we can import matplotlet is that we can do matplotlet dot pyplot, import matplotlet dot pyplot as PLT. So what this does is it imports the matplotlet package up but it has a standard limit convention of PLT basically. We need to type the entire thing out especially for this long. Alright, so let's plot this graph. Let's also, in order to plot this graph Instagram, you have the first call PLT, but you can paste and then you pass it a list of arrays and how many bits as in for example, when you have a paragraph how many bars you have. So what this means is it controls the amount of bars which it has. So let's say you have 10 values and you have 10 bars. So each bar represents one of the values basically. That's how it works. There's also this function called PLT.bar where you can specify the X basically the row of the dataset. So the column of the dataset in which you want it to be represented as the X axis and the height is basically the list of values for your Y axis basically. Okay, so this example we have PLT.bar where you put your X value as your 1.0 and 1 and 2 basically and your height is equals you pass it this array of 10 and 4. So this will result in a paragraph show here as your right lab is your so your first bar is like 10 and then your second bar is 4. So this is the paragraph. Let's go for a histogram now. When you enter PLT about a histogram and pass it your values and you specify your bins, let's say you want 10 bins then you have 10 bars you can come here 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. So it basically creates this many bars many these type of things yeah so let's say we want to plot a different type of plot a scatter plot which is a way to show the distribution of theta over a plane right okay so okay so how we can do this is that we can call this function PLT of scatter where you specify a few random values of x basically what x axis is and y axis so when you call this on let's say you want to plot the distribution of theta of a specific column columns of the theta frame let's say I want to find the number of sales in these 3 days so you can pass in the number of days column as your y axis and the number of sales column as your x axis and this will basically show the distribution of theta of your sales in these 3 days and now that we have plotted the graphs we may want to customise the grid whatever plot you have so for this we can basically call PLT of grid visible so what these parameters does is basically your visible parameter hides your axis or shows the axis using you can say we pass in true or false we pass in true and show the axis behind the grid and if you show false you will basically you won't see all those let's say you have your plot paper you know how their boxes behind their plot and axis but if you do not want this a bit annoying then you can just pass in the parameter as false and you can also use your keyword what these k, w, a, l, g what your keyword arguments mean is that you can pass in arguments such as your color, lines now and all these are other parameters which is customise your plot alright so now that we have learnt how to customise plots and k cross how to leave it I want this plot to start from 0 and it's 20 or my x axis will start from my x axis will start from 15 and add up to be 30 so how can we do this this is where your y limit or x limit option comes in so this is a plt of y limit let's say I want to limit my y axis to this range of values you can call plt of y limit pass in which number you want autumn limit and what's your upper limit let's say I want 20 to be your autumn limit so my y axis starts from 20 and my upper limit to be maybe around 50 so I want my y axis to end at 15 so you have a range of 20 to 50 and you basically add a plot with your y axis starts from 15 to 20 which is what you specify so for example if I have a line plot that looks like this and I want to just focus on a specific part of the plot zoom into that part of the plot you can call plt of y limit specify the values to be 100 that's your minimum limit and your maximum limit to be 250 and the result in your y axis is y limit if you want to do that to the x limit you can also call plt of x limit and the parameters are the same so as you can see here the y axis starts from 100 and ends at 240 250 because you have shown here your minimum limit and your maximum limit we can also use as mentioned before plt of x limit to limit your right and left so let's say I look at plt of x limit 10 and past 10 50 will start from 10 and end at 50 maximum is 50 and the minimum is 10 and in this case 50 is one of the rare cases in which I added 50 is also 50 okay so that's what we're going to use for the anomalous data such as your field machine learning models you can see here your loss is going really down to your evaluated loss if you're not sure what loss everything is, don't worry we'll be covering it later okay so let's zoom into this specific part then we can use this plt of y limit as I want to limit the range of y to this specific rate so let's zoom into that specific part of the graph then we can use it for example okay for example in this case let's say I want to focus on this part over here then I can limit my y axis to maybe 2 and 8 y axis a bit really I want my y axis to be from 0 to 10 in this case then it will stretch the graph and show how let me zoom in and see how this data is represented yeah so these are some examples of plt limit use you can also call lego's customize your plt by adding x lego y lego which will then add lego's in your x axis respectively so let's say we have this data set of plt.csv and we want to plot this what we can do is that we can call it plt and plt what this does is it plots the month column of the plt.csv as your x axis and it will plot your passengers column of your plt.csv and after that we can also set lego's such as your plt of x lego as your setting your y lego your y axis lego as passengers and plt of show to show the graph this data graph and then you go get your amazing graph with them similarly we can also make a line plot here in which pass in x axis at the y the real column of the plt.csv and in y axis you pass the passengers column of your plt.csv of course here you have to specify your data most of the plt.csv to know it for if you know which data you are referring to and your data shows and shows the plot so now that we've talked you how to plot the graphs naming that is important so you can see that basically because there is a plt of label but there is also many more customizing features that you can do so one such feature is that the plt of label, the plt of title function here where we can specify the title of the graph let's say I want the graph to be the graph of x again so y of x or something like that then we can put it as a string put it as a string in this curve and it will basically set the title of the graph and then let's say you want to display a legend scan the plot with different color circles or points then you can make the title to show, let's say my orange color represents this and my blue color represents this of one so that it's easier to understand of the person viewing it so display graphs you can show to show the graph you can create a figure which you can maybe search for the document because you don't have time so we didn't color the figure part yet because my plot it not only can make graphs but it can also create figures which is basically used in all machine learning also and then we can also use plt of CLF, we just clear the figure so let's say you want to clear that plot then plt of clear will basically show the plot and delete the plot and then you can plot another graph and then after you make your graph let's say you want to save your graph as a jpeg then you can hold plt of save figure and then in that function you can pass your parameter as the plot path in which you want the plot to be saved to so there are many more functions and you can go to this link in the documentation to kind of look at the jpeg image seriously jpeg image so now let's go to c1 c1 is basically a more advanced version of matplotlib so basically it's built on matplotlib it's a more advanced version, more customizability with fewer lines of code and it offers various statistical visualizations such as your kernel density estimation or your computer which is very useful for your exploratory data analysis and c1's default visual style is more aesthetically pleasing than your matplotlibs so this means that you can create more nicer plots than your matplotlibs one with fewer lines of code and maybe even good enough for it to be publicized to other clients or companies okay so to install c1 in your library in your computer we can copy and install c1 in your cmd command and after you're done with that, you want to import it in your script you can then import c1 as sns where sns is a standard for c1 so now let's make our first plot okay let's make it a bit more interesting maybe linear regression but this basically predicts the trend of your data so based on that, you can see where your data is going you can predict your next data, your future data so for this, let's assign this pd to csv file import the csv file as this variable called sales and call snsplot regplot regressionplot where your x is specified as the month column and y is specified as the sales column of this data set called sales which we imported right here using the pandasv csv function so what this does is that basically from the regression plot of your sales in your y axis and based on this, you can see your current trend of your sales of how your sales are going going well or going bad based on that so regression plot is one of the various plots offered by c1 alright and other such various plots include snsplot, poxplot capplot, capplot all these amazing plots in which so all these plots there's a lot of different types of graphs the important part is that you need to be able to choose the right graph in order to visualize a certain data let's say you want to show a distribution of data over in order to show the distribution of data over times or regression plot to really predict the trend of data that's a lot more so these are visual representation of your c1 plots poxplot, lmplot violentplot, capplot gplot to show the concentration of your data and all this there's also as mentioned before, sns has a lot of customizability features and some of this includes the same theme for your graph so for example some of the themes for the function of sending themes is sns.send theme and some themes include dark, grey, white, grey, dark white these are really interesting themes but you can just try them out and see which you prefer if you send them in and then there's also a different parameter called context which you can do, it's just some of the context parameters that we can go into the proper cluster and this really changes the design of your graph you can try them out but you might not understand now but once you try them out you can understand okay now let's have the practice now put into practice in what you've learned from the slides yeah okay okay thanks