 Hello. In this lesson, we are going to talk about the visualization and how we can use Python to create graphs to aid our statistical inference and data analysis. Before we get into the actual visualization, however, we're going to import and clean some of the data that we will be using later on in the lesson. So with that, we can go ahead and get started. All right. So we've got our data set or our code here. We've got three libraries that we will be using within this demonstration. Throughout the lesson, we'll use our Google collab library to import the data. We'll use pandas to read in a CSV file, and we'll use this plot nine to actually do the plotting. And I want to point out that we are using this asterisk to tell Python to import all the commands from plot nine without the need for a nickname. This is one of the few libraries that we'll use throughout the course, in which you don't need to give it a nickname, you can simply import all the commands as is. So with that, let's go ahead and get started. I'm using the first option from lesson one to import our data the good the mount Google Drive option. And I've already got it set up here. I also commented out my file path for easy use. And so I will go ahead and read this data in, we'll just call it survey, and we'll say PD dot read CSV, and just give it the file path. So we looked into this file before, and we know that there is no extra header or anything so no need to skip rows, but to give you an idea about what this looks like this is a survey that was collected from students that took this course in University during the spring of 2023. And we can see that they have answered all these questions, but we can also see that there's a number of unnecessary columns, some metadata, all of this that we want to remove in order to make it usable in the future. So the first thing I'm going to do is print the names of the columns to the screen. And this is a good practice to get into because it very quickly allows us to see what columns there are, and possibly which columns then need to be removed. And so we can see that there's all of these sort of metadata about which section the students were in this was a campus based court survey so it's all of that stuff. There's also these numbers here that aren't attached to any data at all that we need to remove. And so we'll start in to step three of our data cleaning which is to remove some of these unnecessary columns. And so the first thing that I'm going to do. I'm going to override my existing data frame, and I'm going to say survey, which is what I called my data dot drop, and this will drop any columns that we tell it to. So I'm going to, but we need to tell it which columns and we give them those columns in square brackets. And so for these, this command, I'm going to come back up here. I'm going to copy these top command top columns here, as well as these bottom ones here. And notice that I have not removed any of the number columns, and that's because there's so many of them that I'm actually going to do a secondary removal that sort of speeds up the process. So this, I'll go ahead and run this command. So we've now dropped these columns. So if we come in here and say survey dot columns and run that. And there is an error, because I've already dropped those columns so when I go to rerun them. It doesn't like it so you this is something that you might come across in your own homework in your own coding assignments is when you try to rerun something that you already did. Sometimes it doesn't find what you're trying to do maybe you've removed a row that no longer exists. And now you're trying to call it in this case I've removed columns that don't exist so it can't drop them. So if we go through here, and just type it in another code block we can see that the columns are very similar to appear but we no longer have the section base columns or this and incorrect the score base columns. And we still need to remove these, you know, random numbers 1.0 1.0.1. And to do that, we're going to do a similar process but we're going to tell Python to do all of the one all of the columns that start with one simultaneously. So to do that, we say survey dot look, give a column to say all of the rows, and then we do tilde survey dot columns dot str dot starts with one. And this will actually drop every column that has a string that starts with one. So if we run this and then our printing. Oops, typo there. If we print the columns at the end, we can now see that we've got just the questions, which is exactly what we want. However, it's not exactly where we would like it to be, we need to rename these columns. And so to do that, I'm going to once again override my original data and say survey dot rename. And then we say columns. And then we open curly brackets. And now the basic way that we do this is that we take, come up here and take the first column name that we want to rename. Put it in quotes, and then outside of that quote to do colon, whatever we want to rename it so here, we'll rename it to home. And so this will do the first column, comma and enter down to do the second columns. And so we can retype all of these. But I am going to just grab it from off screen and copy and paste it here. And that just makes it a little faster. So you are watching me type all of these, but it all follows the same basic formula. The original row name column colon, new row name, comma, next one. So we can run that. And then again, if we look at our column names, we can say survey dot columns, and once again see that now all of these are more easier to work with, and will ultimately help us with when we're plotting, which we'll return to in the next video.