 So let's keep going with the next section of our data frame. And now we will, okay, now that we loaded the data, we actually, we obviously want to, you know, do things with it. And the first, you know, very basic thing we want to do is table the data is, for instance, to be able to access a specific column or a specific row of the, of the data frame. So this is done with these two attributes. So dot columns two will list columns of data frame. And as we've already seen, dot index will list rows of certain data frame. So let me just reload my data frame here. And let's have a look at the values returned by these two attributes. So as I briefly mentioned before, these two, the values returned by dot index and dot columns, actually they're all sort of class of objects. So you see that for the index, it's an object called an index range. And for column, it's an object called an index. But in practice, these two objects are, you can consider them as a simply as a sequence of values. And again, you can, you know, iterate over it, for instance, with a, with a for loop. As I've shown before, they're also very easy to convert to a list or a tuple with a construction method of these classes. So for instance, here, I'm simply, if I want to get the columns as a list, I can simply convert it with by calling list here. But again, often you don't have to do this because you can directly use them as a sequence. So with these attributes, I can get values of columns and rows, so indexes, but I can also set new values. So let's say, if for some reason I want to change the name of the columns, I can do this by simply passing values, by setting a value to df.column. So here you see what I do is that I'm uppercasing all the column names. Okay, so I take the name of the columns, I loop over it, and I convert them to uppercase. And for the index here, what I'm doing is I'm simply adding a prefix passenger underscore and then the number of rows. Okay, so with .columns and index, you can both set and get the values of the column names and of the row names. All right, so here I just reset the column names and the index to what it was before. So for column names, you see that I'm changing, I'm setting a new value here. And later we will see what, how exactly this works, but basically in pandas, you can apply sort of function to all values of a column, or in this case of column names. And if I want to reset the index, I can use this reset index method of data frame. So now you see my index is back to the standard to default, which is simply numbers from zero to the number of rows minus one. Now, if I don't want to get another basic attribute of data frame is shape. So with dot shape, I will get a tuple that contains the number of rows and number of columns in my data frame. So you see here, if I print df.shape, I get this tuple here. So the first value is the number of rows. And the second is the number of columns in my data frame. And this, this object is a tuple. Okay, so now what I can, for instance, if I wanted to get these values, number of rows and number of columns in two different variables, two different objects, I can simply use value unpacking in Python. So I will write row count comma column count equals df.shape. If I do this, now you see that I have the row count and column count in two different variables. So this is a value unpacking. It is clear for how it works for everyone or does someone want that we quickly explain it. I mean, yeah, so if you need an explanation, please speak up. Otherwise, I will skip this initial material. But yeah, basically, value unpacking is simply attributing, assigning values to several variables on a single line. Okay, so if everyone is clear with this, it's perfect. All right, so now we've seen how we can access, you know, row names, column names, how many rows, how many columns do I have. Now let's see how I can actually access the content of a column or of a first of a column. All right, so let me reload the data frame here. So the way, I mean, there are different ways that I can access the content of a column, but the easiest is to use this syntax with square brackets. Okay, so you see that this, for people who are familiar with R, this will be very familiar to them because it's the same type of syntax. So if I do the name of my data frame object, so df, and then a square brackets, I can then pass the name of a column to access this column. Okay, so if I do df, and then the string age, I will get all the values from the age column. There's another syntax that I can use to access a single column is to do df dot and then the name of the column. Okay, so df dot age will give me exactly the same result as square brackets age. One important difference is to see that when I use the square brackets notation, I need to actually pass a string. So the string that is the name of the column, whereas with the dot notation, I don't pass a string, right? I directly give the, I simply type the name of the column. So if I try this, okay, this does not work. So these are two ways that I can select a single column. Now, if I want to select more than one, then I have to resort to using the syntax with the square brackets. And so the way it works is that I simply, and now I have to, I can give a list of elements of columns, so is that I want to to get. So here, let's say I want to get the age and sex columns, then I open, maybe I have a space here to make it clear. I use the square brackets to select columns. And then inside, I have to give a list of columns to select. Okay, so in Python, remember list you created with square brackets. So I open and close square brackets, and I give a list of columns I want to get. And I see that I, this returns me a subset of my data frame with the two columns that I selected. Okay, just a note here to say that for this particular syntax, it really, I mean, I have to pass a list of values. So if I try to pass another type of iterable, for instance, I try to pass a tuple. Okay, so this will not work. So you really have to give a list of values here. It cannot be any type, any other type of iterable. All right, so now I'm able to select a single column, and I'm able to select multiple columns. With the syntax of selecting a column, I'm also able to actually assign new values to that column. So for instance, here, I have now changed the column citizenship of my data frame. So the last column here, I've set all the values to UK. Okay, so you simply access the content of column, and then you assign it a new value. So if you pass a single value, then all the elements, all the rows in that column will get assigned this same value as you see here. I sorry, I forgot to say that if, so if actually, before running this comment here, you saw that I did not have any citizenship column in my data frame. So the behavior here is that if a column does not exist, it's automatically created. Okay, and if it exists, then it will be overwritten. So here's a, now the citizenship column was already created, it exists here. So if I assign it new values, these values will be created. If the column did not exist, then it would be first created as we just, just so. So yeah, just a warning here that, you know, when you create a new column, or you want to modify values of a column, then either you have to pass a single value, in which case all the rows have the same value, or you have to pass a sequence of values that exactly matches the length as a number of rows in the data frame. So if I try to, for instance, do something like this, it would fail because my, remember my data frame was 891 rows, but here I'm passing only three values. Okay, so I get this ugly error message. If I scroll to the bottom, I see that what the problem is. So unfortunately, I don't know why, but for some reason, Pandas has always these, whenever there is an error, you always get these ugly error messages, but usually when you scroll to the bottom, you see what the problem is. All right, so we've seen now how we can access the content of a column. We've seen how we can create a new column or assign new values to the column. So another, sometimes, you know, I added something that I want to add column, but sometimes I also want to delete them. And to do this, I can use the drop method. Okay, so if I call my data frame dot drop, then I can delete, actually I can delete either columns or later we will see that you can also delete rows, but to delete columns, I simply pass the name of the column I want to delete. And then there's another argument that is useful is to say whether the modification should be in place or not. So let's maybe what we could first do, let's first set in place to rows and try to do this like that. All right, so here you see I'm dropping a column on this, on my data frame, and I assign the output to a new data frame here that I call DF2. So you see that this DF2 data frame now has one as the citizenship column is no longer there because I deleted it. But in the original data frame, sorry, I actually still have it. Okay, and this is the effect of this in place argument. So if it's false, then the method of the function will return a copy of the data frame with the column dropped, but the original data frame is not modified. And conversely, if I set in place equals true, then actually I'm modifying the original data frame. Okay, so now DF has really one less column. And if you give in place equals true, then the call to the method actually returns now. So it returns an object now. Okay, you will see in pandas, there are many methods that take this in place argument, and it's always the same idea. If I set in place equals true, then I'm making a modification on the actual data frame where I'm calling the method. And if I set it to false, then I return a modified copy of the data frame, and I do not touch my original data frame. As far as I know, the default value is always false. It's kind of a safe choice to avoid that you accidentally delete a column in your data frame. So if you want to modify your original data frame, you always have to put to set in place explicitly pass in place equals true. All right, so that's how I remove a column from data frame. Now, another aspect that I want to talk about here is that the types of data that you store in a data frame. So what happens is that each so when a data frame is created, each of the column can contain a specific type of data. Okay, and these can be the following. So you can have integer values can have float values or values with a decimal. And then we have this object type, which is kind of categories that will regroup everything that is basically not a number. So if you have a string, they will be classified as objects. So here in my Titanic data set, you see that I have, for instance, all the strings of this object type. So name and sex columns here. Then I have age, which is float. So maybe some of the values have a decimal point. And then we have three columns that are integers. So the passenger class, survive and family, and then the fare, which in this case, I agree should really be a float and embarked again, it's a string. So it's type object. And there are other types that are possible. So if you have a column with true, false, there would be Booleans. If you had a column with times or dates, you'd have a date time type. And if you have columns that contains categories or factors of a variable, then this would be a category. So by default, when you load a data set, pandas will try to assign these object types automatically. But for some reason, you're not happy with the assignments that was made. You can change it with as type method. So I can call this on a column to change the type of this column. So in the example here, maybe let's just remind us how the data looks like. All right. So here I'm creating a copy of this data frame. And I want to change. So the fare column, remember by default, it was loaded as a float. So decimal numeric values, which is in this case, appropriate, I think. But if for some reason we wanted to convert it to a string, then I could simply call the as type method on the column. So remember here I have my data frame. Then I select the column fare with the dot fare syntax. And then I call this method as type on this function. And I pass it the type I want to convert it to. So here I want to convert it to string. And you see that now the type of fare is object. Because remember that strings are classified as objects in this object type in a pandas data frame. And of course, now that I converted my fare column to a string, then it means I can now manipulate it as if it was a string because it contains strings. So for instance, if I wanted to add a dollar sign in front of each of the values, I could easily do that. Okay, so sometimes because different types of objects have different methods that you can apply to them. This is why it's sometimes necessary to convert a certain column into a certain type. For instance, now that my fare is a string object, I can also apply all the string methods on it. For instance, if I want to count the number of times I have a five value in the string, I could use the count method of string. All right, another type of category that is often useful is another type of object, sorry, that is often useful is to make it a category. So if you have a column that contains values that are factors. So typically in the case of our data frame here, maybe the passenger class, so the values are one, two, or three. By default, pandas has loaded this as integers. But maybe I would actually try to want to like to like to convert it to categories because actually the classes corresponds to categories rather than integers. So again, I will select my column with the dot notation here, so df.pclass. And then I call as type and this type, this time I will give the indicator that I want to make it a category. And now you see that the type of my, of the values contained in my column respond to category. And now that I changed it to a category type of object, I can use the dot cat method to, for instance, apply all types of methods that are only applicable to categories. For instance, I can easily now rename the categories to let's say like here Roman numbering one, two, three instead of Arabic one, two, three that we had before. All right, so yeah, we still, I think it's 1020. So what I suggest is that we do the micro exercise three here. And it's a bit longer, I think, than the other one. So yeah, let's see, let's take five to 10 minutes to do it. And we will quickly correct it. And then we will have a 15 minute break column. We start with a single column of data frame, which is actually a panda, what in pandas we call a series object or a series object is simply a vector of values. And actually when you act, a data frame is if you want a collection of a series and each column of the data frame corresponds to a series. So when I do here pd.series, what I do is I simply create a new vector of value. So as if I had a single column of a data frame. And here you see that I'm filling it with random numbers from between zero and 100. And I create them as strings. And then I add a percent. Okay, so we start with these columns that we, you know, let's assume that we loaded it from a data frame. And now, because we would like to work with these values as numbers, and not as strings, we have to convert them to integers. Well, of course, if I try to directly convert these strings, it will not work because of the percentage sign here. And basically, a Python will tell me, you know, I don't know how to convert this to an integer. So the first thing I need to do is I need to remove the percentage sign from the string. And so because the values are strings, so I have all the regular, you know, methods of base string in Python at my disposal. Okay. And one of these methods is, I indicated here, the hint is the strip method. So what does a strip do? I will just illustrate here with a small test. So test is just a string object here. And I'm, see, I'm printing it here in the second line. By the way, so I often use this, this called f strings, very handy in Python. So it's, it allows you to simply give a string and you place a f in front of it. And then inside with curly braces, you can expand any variable to its, its value. So here I will print the test variable. Now, let's see, I will now apply the strip method on it. And you see by default strip removes white space, leading and trailing white spaces. Okay. But if I pass it the values, then I can actually remove strip anything that I want. So for instance, in my case, I have a percentage. So if I say strip, and now I give a past percent as a, the string that I want to remove as an argument, you see that, sorry, type, oh, yeah, so it was in the web string, you need to use a different type. I have it because I'm already using it here. So it's also, all right. So you see that now I stripped as a percent value or character from the end of the, of the line. So what I can do is I can apply this to my entire vector here, my entire column of values with dot str, dot strip. And now I will strip the percentage. All right. So now you see my values have become, there are still strings. So the, the type, if I, you can see the type here is still object. But at least I got rid of the trailing percentage. And so now I can easily convert them to, I mean, now it's possible to actually convert them to, to an integer. So and for this, I use the as type method. And here I will say, oops, int for integer. Okay. So I still have the same values, but now you see that the data type is int 64. So it's a, they have become actual numbers instead of being strings. And after this, I could, you know, work with it as if they were numbers. So I could, for instance, apply mathematical operations on it and so on. All right. Any, any questions? So if it's not the case, then I suggest we do is a